This is an automated email from the ASF dual-hosted git repository.
dspavlov pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/ignite-teamcity-bot.git
The following commit(s) were added to refs/heads/master by this push:
new 072798ed IGNITE-21899 recovery for broken 2022 local DB with
dump-assisted auto-repair (#222)
072798ed is described below
commit 072798edb7976fc4638f0cc819ee744a81ef20a9
Author: ignitetcbot <[email protected]>
AuthorDate: Sat May 9 15:45:24 2026 +0300
IGNITE-21899 recovery for broken 2022 local DB with dump-assisted
auto-repair (#222)
- Improved migration/crash diagnostics and diagnostic folder handling
- Clean-check fixes and migration recovery integration tests
- Limit GridIntList migration to known compatible caches
Codex co-authored-by: Dmitriy Pavlov <[email protected]>
---
README.md | 24 +-
docs/install.md | 87 ++-
.../java/org/apache/ignite/ci/db/DbMigrations.java | 2 +-
.../java/org/apache/ignite/ci/web/Launcher.java | 3 +
jetty-launcher/build.gradle | 79 ++-
migrator/build.gradle | 7 +-
.../apache/ignite/migrate/GridIntListMigrator.java | 644 ++++++++++++++++++++-
.../org/apache/ignite/migrate/MigratorArgs.java | 4 +-
.../GridIntListMigratorIntegrationTest.java | 144 +++++
.../ignite/migrate/GridIntListMigratorTest.java | 302 ++++++++++
.../LegacyPersistentStorageCompatibilityTest.java | 18 +-
.../ignite/tcbot/common/conf/TcBotWorkDir.java | 7 +
12 files changed, 1253 insertions(+), 68 deletions(-)
diff --git a/README.md b/README.md
index babc1ec3..147e709d 100644
--- a/README.md
+++ b/README.md
@@ -111,9 +111,18 @@ The `migrate-GridIntList` database migration updates
persisted TeamCity Bot data
`org.apache.ignite.tcbot.common.util.GridIntList`.
During startup, `DbMigrations` runs `GridIntListMigrator.migrateOnInstance`
once and stores the migration marker only
-after the scan finishes successfully. The migrator iterates over Ignite cache
entries in keep-binary mode, recursively
-checks cache values, nested binary objects, lists, sets, maps, and object
arrays, and rebuilds only values that contain
-the legacy `GridIntList` type.
+after the scan finishes successfully. By default the migrator scans only the
caches whose persisted value graph is known
+to contain compacted TeamCity parameters or statistics backed by `GridIntList`:
+
+| Cache | Persisted GridIntList path |
+| ----- | -------------------------- |
+| `teamcityFatBuild` | `FatBuildCompacted.buildParameters`,
`FatBuildCompacted.statistics` |
+| `teamcityFatBuildType` | `BuildTypeCompacted.settings`,
`BuildTypeCompacted.parameters`, snapshot dependency properties |
+| `teamcitySuiteHistory` | `SuiteInvocation.suite/tests ->
Invocation.parameters` |
+
+Within those caches the migrator iterates over entries in keep-binary mode,
recursively checks cache values, nested
+binary objects, lists, sets, maps, and object arrays, and rebuilds only values
that contain the legacy `GridIntList`
+type. The standalone migrator's `--cache` option is an explicit offline
override for targeted diagnostics.
For each legacy list, the migration preserves the logical list contents, not
the backing array capacity. If normal
deserialization is available, it reads the old object through
`GridIntList.array()`. If binary fallback is needed, it
@@ -121,13 +130,14 @@ reads both persisted fields, `arr` and `idx`, validates
that `idx` is inside the
`arr[0..idx)`. The copied values are then written as the new TC Bot
`GridIntList` type.
The migration is intentionally fail-fast from the database marker point of
view. Per-entry failures are logged with the
-cache name and key, counted, and reported after the scan. If any entry fails,
the migration throws an exception and the
+cache name and key, counted, and reported after the scan. Failed entries are
also written as an Ignite dump plus a small
+manifest under `<ignite-work>/diagnostic/grid-int-list-migration-recovery`. If
any entry still cannot be repaired, the
`migrate-GridIntList` marker is not written to `apache.doneMigrations`, so the
issue can be fixed and the migration can
be retried instead of silently leaving mixed old and new data.
The same migrator can also be run as a standalone tool from the `migrator`
module against an Ignite work directory. The
standalone module uses the same Ignite version as the rest of the project
through the shared `ignVer` Gradle property.
-The heavyweight legacy storage compatibility/perf test is excluded from the
regular `:migrator:test` task. Run it
-explicitly with `./gradlew :migrator:legacyDbCompatPerfTest --no-daemon` when
checking old Ignite 2.14 persistent
-storage compatibility.
+Heavyweight persistent-storage integration tests are excluded from the regular
`test` and `build` tasks. Run them
+explicitly with `./gradlew :migrator:integrationTest --no-daemon` when
checking old Ignite 2.14 persistent storage
+compatibility or migration recovery for corrupted binary metadata.
diff --git a/docs/install.md b/docs/install.md
index 55571128..9b88807d 100644
--- a/docs/install.md
+++ b/docs/install.md
@@ -17,6 +17,9 @@ gradlew.bat :jetty-launcher:clean :jetty-launcher:distZip
--no-daemon
The web distribution is
`jetty-launcher/build/distributions/jetty-launcher.zip`. It contains `bin`,
`lib`, and
`war/ignite-tc-helper-web.war`.
+The production launcher creates `work/diagnostic` on startup. JVM heap dumps
and fatal error logs are written there:
+`java_pid<pid>.hprof` for OOME heap dumps and `hs_err_pid<pid>.log` for JVM
crash logs.
+
## Linux service
Unpack the distribution and put production config into `work`:
@@ -41,7 +44,7 @@ User=tc-bot
Group=tc-bot
WorkingDirectory=/opt/ignite-teamcity-bot/current/bin
Environment="JAVA_HOME=/usr/lib/jvm/java-17"
-Environment="JETTY_LAUNCHER_OPTS=-Dteamcity.helper.home=/opt/ignite-teamcity-bot/work"
+Environment="TCBOT_WORK_DIR=/opt/ignite-teamcity-bot/work"
ExecStart=/opt/ignite-teamcity-bot/current/bin/jetty-launcher
Restart=on-failure
@@ -60,15 +63,27 @@ sudo systemctl status tc-bot-service
Use generated `bin/jetty-launcher`; it already contains the Java 17 module
options required by Ignite and Guice.
-## Windows production check
+## Production-like clean checks
-Use a separate clean checkout. Set `PR_REF` only for PR checks:
+Use a separate clean checkout. Set `PR_REF` for PR checks before running
PR-specific tasks. The heavyweight
+persistent-storage integration tests are disabled by default: uncomment
`RUN_INTEGRATION_TESTS=1` when the local PR
+check must also cover legacy Ignite storage and migration recovery. The
integration task is executed only after the
+optional PR/ref checkout, so the checked ref defines whether the task exists.
-```
-set "REPO=C:\Tmp\ignite-teamcity-bot-check"
-set "DIST=C:\Tmp\tc-bot-prod-check"
+<details>
+<summary>Windows</summary>
+
+```bat
+@echo off
+setlocal
+
+set "CHECK_ROOT=%~dp0"
+set "REPO=%CHECK_ROOT%ignite-teamcity-bot-check"
+set "DIST=%CHECK_ROOT%tc-bot-prod-check"
set "PR_REF="
rem set "PR_REF=pull/200/head"
+set "RUN_INTEGRATION_TESTS="
+rem set "RUN_INTEGRATION_TESTS=1"
if not exist "%REPO%\.git" git clone
https://github.com/apache/ignite-teamcity-bot.git "%REPO%" || exit /b 1
cd /d "%REPO%" || exit /b 1
@@ -79,11 +94,16 @@ git clean -fdx || exit /b 1
if not "%PR_REF%"=="" (
git branch -D pr-check 2>NUL
- git fetch origin %PR_REF%:refs/heads/pr-check || exit /b 1
+ git fetch origin "%PR_REF%:refs/heads/pr-check" || exit /b 1
git switch pr-check || exit /b 1
)
call gradlew.bat clean build --no-daemon || exit /b 1
+if "%RUN_INTEGRATION_TESTS%"=="1" (
+ call gradlew.bat :migrator:integrationTest --no-daemon || exit /b 1
+) else (
+ echo Skipping migrator integration tests. Uncomment RUN_INTEGRATION_TESTS
to enable.
+)
call gradlew.bat :jetty-launcher:clean :jetty-launcher:distZip --no-daemon ||
exit /b 1
powershell -NoProfile -ExecutionPolicy Bypass -Command "Remove-Item
-LiteralPath '%DIST%' -Recurse -Force -ErrorAction SilentlyContinue;
Expand-Archive -LiteralPath
'jetty-launcher\build\distributions\jetty-launcher.zip' -DestinationPath
'%DIST%' -Force" || exit /b 1
@@ -93,3 +113,56 @@ copy /Y "conf\branches.json"
"%DIST%\jetty-launcher\work\branches.json" || exit
cd /d "%DIST%\jetty-launcher\bin" || exit /b 1
call jetty-launcher.bat
```
+
+</details>
+
+<details>
+<summary>Linux</summary>
+
+```bash
+#!/usr/bin/env bash
+set -euo pipefail
+
+SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
+REPO="${REPO:-$SCRIPT_DIR/ignite-teamcity-bot-check}"
+DIST="${DIST:-$SCRIPT_DIR/tc-bot-prod-check}"
+PR_REF="${PR_REF:-}"
+# PR_REF="pull/200/head"
+RUN_INTEGRATION_TESTS="${RUN_INTEGRATION_TESTS:-}"
+# RUN_INTEGRATION_TESTS=1
+
+if [ ! -d "$REPO/.git" ]; then
+ git clone https://github.com/apache/ignite-teamcity-bot.git "$REPO"
+fi
+
+cd "$REPO"
+git fetch origin master
+git switch master
+git reset --hard origin/master
+git clean -fdx
+
+if [ -n "$PR_REF" ]; then
+ git branch -D pr-check 2>/dev/null || true
+ git fetch origin "$PR_REF:refs/heads/pr-check"
+ git switch pr-check
+fi
+
+./gradlew clean build --no-daemon
+if [ "$RUN_INTEGRATION_TESTS" = "1" ]; then
+ ./gradlew :migrator:integrationTest --no-daemon
+else
+ echo "Skipping migrator integration tests. Set RUN_INTEGRATION_TESTS=1 to
enable."
+fi
+./gradlew :jetty-launcher:clean :jetty-launcher:distZip --no-daemon
+
+rm -rf "$DIST"
+mkdir -p "$DIST"
+unzip -oq jetty-launcher/build/distributions/jetty-launcher.zip -d "$DIST"
+mkdir -p "$DIST/jetty-launcher/work"
+cp conf/branches.json "$DIST/jetty-launcher/work/branches.json"
+
+cd "$DIST/jetty-launcher/bin"
+./jetty-launcher
+```
+
+</details>
diff --git
a/ignite-tc-helper-web/src/main/java/org/apache/ignite/ci/db/DbMigrations.java
b/ignite-tc-helper-web/src/main/java/org/apache/ignite/ci/db/DbMigrations.java
index ac2296e4..9b46cdeb 100644
---
a/ignite-tc-helper-web/src/main/java/org/apache/ignite/ci/db/DbMigrations.java
+++
b/ignite-tc-helper-web/src/main/java/org/apache/ignite/ci/db/DbMigrations.java
@@ -322,7 +322,7 @@ public class DbMigrations {
String cacheFilter = null;
boolean apply = true;
boolean verbose = false;
- int reportEvery = 500;
+ int reportEvery = 50000;
long updated = GridIntListMigrator.migrateOnInstance(
ignite,
diff --git
a/ignite-tc-helper-web/src/main/java/org/apache/ignite/ci/web/Launcher.java
b/ignite-tc-helper-web/src/main/java/org/apache/ignite/ci/web/Launcher.java
index b80aa420..d93cf1ab 100644
--- a/ignite-tc-helper-web/src/main/java/org/apache/ignite/ci/web/Launcher.java
+++ b/ignite-tc-helper-web/src/main/java/org/apache/ignite/ci/web/Launcher.java
@@ -24,6 +24,7 @@ import java.io.IOException;
import java.io.Reader;
import org.apache.ignite.ci.db.TcHelperDb;
import org.apache.ignite.tcbot.common.conf.TcBotSystemProperties;
+import org.apache.ignite.tcbot.common.conf.TcBotWorkDir;
import org.eclipse.jetty.server.Server;
import org.eclipse.jetty.server.ServerConnector;
import org.eclipse.jetty.ee8.webapp.WebAppContext;
@@ -46,6 +47,8 @@ public class Launcher {
if(dev)
System.setProperty(TcBotSystemProperties.DEV_MODE, "true");
+ TcBotWorkDir.resolveDiagnosticDir();
+
Server srv = new Server();
ServerConnector connector = new ServerConnector(srv);
diff --git a/jetty-launcher/build.gradle b/jetty-launcher/build.gradle
index 748e965c..7ba67562 100644
--- a/jetty-launcher/build.gradle
+++ b/jetty-launcher/build.gradle
@@ -20,22 +20,68 @@ apply plugin: 'application'
application {
mainClass = 'org.apache.ignite.ci.TcHelperJettyLauncher'
- applicationDefaultJvmArgs = igniteJava17JvmArgs +
["-Dteamcity.helper.home=../work",
- "-Dfile.encoding=UTF-8",
-
"-Dteamcity.bot.regionsize=16", // 16g Durable Memory region
-
"-Dhttp.maxConnections=30",
- "-server",
- "-Xmx16g",
- "-XX:+AlwaysPreTouch",
- "-XX:+UseG1GC",
-
"-XX:+ScavengeBeforeFullGC",
-
"-XX:+UseStringDeduplication",
-
"-Djava.rmi.server.hostname=app02",
-
"-Dcom.sun.management.jmxremote",
-
"-Dcom.sun.management.jmxremote.port=9010",
-
"-Dcom.sun.management.jmxremote.local.only=false",
-
"-Dcom.sun.management.jmxremote.authenticate=false",
-
"-Dcom.sun.management.jmxremote.ssl=false"]
+ applicationDefaultJvmArgs = igniteJava17JvmArgs + ["-Dfile.encoding=UTF-8",
+
"-Dteamcity.bot.regionsize=16", // 16g Durable Memory region
+
"-Dhttp.maxConnections=30",
+ "-server",
+ "-Xmx16g",
+ "-XX:+AlwaysPreTouch",
+ "-XX:+UseG1GC",
+
"-XX:+ScavengeBeforeFullGC",
+
"-XX:+UseStringDeduplication",
+
"-Djava.rmi.server.hostname=app02",
+
"-Dcom.sun.management.jmxremote",
+
"-Dcom.sun.management.jmxremote.port=9010",
+
"-Dcom.sun.management.jmxremote.local.only=false",
+
"-Dcom.sun.management.jmxremote.authenticate=false",
+
"-Dcom.sun.management.jmxremote.ssl=false"]
+}
+
+tasks.named('startScripts') {
+ doLast {
+ def unixDiagnosticOpts = '''if [ -z "${TCBOT_WORK_DIR:-}" ]; then
+ for opt in $JETTY_LAUNCHER_OPTS
+ do
+ case "$opt" in
+ -Dteamcity.helper.home=*)
TCBOT_WORK_DIR=${opt#-Dteamcity.helper.home=} ;;
+ esac
+ done
+fi
+TCBOT_WORK_DIR=${TCBOT_WORK_DIR:-"$APP_HOME/work"}
+mkdir -p "$TCBOT_WORK_DIR/diagnostic"
+JAVA_OPTS="\\"-Dteamcity.helper.home=$TCBOT_WORK_DIR\\"
\\"-XX:+HeapDumpOnOutOfMemoryError\\" ''' +
+ '''\\"-XX:HeapDumpPath=$TCBOT_WORK_DIR/diagnostic\\" ''' +
+ '''\\"-XX:ErrorFile=$TCBOT_WORK_DIR/diagnostic/hs_err_pid%p.log\\"
$JAVA_OPTS"
+
+'''
+ def windowsDiagnosticOpts = '''if "%TCBOT_WORK_DIR%"=="" (
+ for %%A in (%JETTY_LAUNCHER_OPTS%) do (
+ set "TCBOT_OPT=%%~A"
+ if "!TCBOT_OPT:~0,23!"=="-Dteamcity.helper.home=" set
"TCBOT_WORK_DIR=!TCBOT_OPT:~23!"
+ )
+)
+if "%TCBOT_WORK_DIR%"=="" set "TCBOT_WORK_DIR=%APP_HOME%\\work"
+if not exist "%TCBOT_WORK_DIR%\\diagnostic" mkdir
"%TCBOT_WORK_DIR%\\diagnostic"
+set DEFAULT_JVM_OPTS=%DEFAULT_JVM_OPTS%
"-Dteamcity.helper.home=%TCBOT_WORK_DIR%" ''' +
+ '''"-XX:+HeapDumpOnOutOfMemoryError"
"-XX:HeapDumpPath=%TCBOT_WORK_DIR%\\diagnostic" ''' +
+ '''"-XX:ErrorFile=%TCBOT_WORK_DIR%\\diagnostic\\hs_err_pid%%p.log"
+
+'''
+
+ unixScript.text = unixScript.text.replace(
+ '# Use "xargs" to parse quoted args.',
+ unixDiagnosticOpts + '# Use "xargs" to parse quoted args.'
+ )
+
+ windowsScript.text = windowsScript.text.replace(
+ '@rem Execute jetty-launcher',
+ windowsDiagnosticOpts + '@rem Execute jetty-launcher'
+ )
+ windowsScript.text = windowsScript.text.replace(
+ 'setlocal EnableExtensions',
+ 'setlocal EnableExtensions EnableDelayedExpansion'
+ )
+ }
}
distributions {
@@ -58,6 +104,7 @@ tasks.register('prepareLocalWarRun') {
into workDir
}
mkdir new File(workDir, 'tcbot_logs')
+ mkdir new File(workDir, 'diagnostic')
}
}
diff --git a/migrator/build.gradle b/migrator/build.gradle
index 6aa62210..88c807a8 100644
--- a/migrator/build.gradle
+++ b/migrator/build.gradle
@@ -16,6 +16,7 @@ dependencies {
testImplementation project(':tcbot-teamcity-ignited')
testImplementation "org.apache.ignite:ignite-slf4j:$ignVer"
testImplementation "junit:junit:$junitVer"
+ testImplementation "org.mockito:mockito-core:$mockitoVer"
}
application {
@@ -27,6 +28,7 @@ test {
maxHeapSize = '1536m'
systemProperty 'compat.work.dir',
file('src/test/work/ignite-db-compat').absolutePath
exclude '**/LegacyPersistentStorageCompatibilityTest.class'
+ exclude '**/GridIntListMigratorIntegrationTest.class'
failOnNoDiscoveredTests = false
testLogging {
@@ -35,8 +37,8 @@ test {
}
}
-tasks.register('legacyDbCompatPerfTest', Test) {
- description = 'Runs the heavyweight legacy Ignite persistent storage
compatibility/perf test.'
+tasks.register('integrationTest', Test) {
+ description = 'Runs heavyweight migrator integration tests against real
Ignite persistence.'
group = 'verification'
testClassesDirs = sourceSets.test.output.classesDirs
@@ -46,6 +48,7 @@ tasks.register('legacyDbCompatPerfTest', Test) {
maxHeapSize = '1536m'
systemProperty 'compat.work.dir',
file('src/test/work/ignite-db-compat').absolutePath
include '**/LegacyPersistentStorageCompatibilityTest.class'
+ include '**/GridIntListMigratorIntegrationTest.class'
testLogging {
events 'passed', 'failed', 'skipped', 'standardOut', 'standardError'
diff --git
a/migrator/src/main/java/org/apache/ignite/migrate/GridIntListMigrator.java
b/migrator/src/main/java/org/apache/ignite/migrate/GridIntListMigrator.java
index 194e7cf6..e4d90fe2 100644
--- a/migrator/src/main/java/org/apache/ignite/migrate/GridIntListMigrator.java
+++ b/migrator/src/main/java/org/apache/ignite/migrate/GridIntListMigrator.java
@@ -30,21 +30,30 @@ import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import javax.cache.Cache;
+import java.io.ByteArrayOutputStream;
import java.io.File;
+import java.io.IOException;
+import java.io.ObjectOutputStream;
+import java.nio.charset.StandardCharsets;
+import java.nio.file.Files;
+import java.nio.file.Path;
+import java.nio.file.StandardCopyOption;
+import java.text.SimpleDateFormat;
import java.util.*;
+import java.util.concurrent.TimeUnit;
import java.util.concurrent.atomic.AtomicLong;
/**
* Offline migrator for TeamCity Bot Ignite persistence.
* <p>
- * Recursively scans all entries in Ignite caches and replaces any occurrence
of the legacy type
+ * Recursively scans entries in known TeamCity Bot caches that may contain the
legacy type
* org.apache.ignite.internal.util.GridIntList with the new type
* org.apache.ignite.tcbot.common.util.GridIntList, preserving the int[]
payload.
* <p>
* Usage:
* export IGNITE_WORK_DIR=/abs/path/to/work_backup
* ./gradlew -p migrator run --args="--verbose --report 200" # dry run
with verbose report
- * ./gradlew -p migrator run --args="--apply --report 500" # apply to
all caches
+ * ./gradlew -p migrator run --args="--apply --report 50000" # apply to
all caches
*/
public final class GridIntListMigrator {
@@ -58,6 +67,30 @@ public final class GridIntListMigrator {
*/
private static final int DEFAULT_PAGE_SIZE = 256;
+ /**
+ * Failed entries printed to the final diagnostic report for each failed
cache.
+ */
+ private static final int FAILURE_DETAILS_LIMIT_PER_CACHE = 5;
+
+ /**
+ * Production caches whose persisted value graph may contain GridIntList.
+ */
+ private static final List<String> GRID_INT_LIST_CACHE_NAMES =
Collections.unmodifiableList(Arrays.asList(
+ "teamcityFatBuild",
+ "teamcityFatBuildType",
+ "teamcitySuiteHistory"
+ ));
+
+ /**
+ * Time to wait for operator input before automatic repair starts.
+ */
+ private static final long AUTO_REPAIR_WAIT_SECONDS = 60;
+
+ /**
+ * Test/support override for the auto-repair wait time.
+ */
+ static final String AUTO_REPAIR_WAIT_MILLIS_PROPERTY =
"gridintlist.migration.autorepair.wait.millis";
+
/**
* Default constructor.
*/
@@ -147,16 +180,18 @@ public final class GridIntListMigrator {
boolean apply,
boolean verbose,
int reportEvery) {
- Collection<String> cacheNames = new ArrayList<>(ignite.cacheNames());
-
- if (cacheFilter != null && !cacheFilter.isEmpty())
- cacheNames.removeIf(n -> !n.contains(cacheFilter));
+ Collection<String> cacheNames = cacheNamesToScan(ignite.cacheNames(),
cacheFilter);
log.info("GridIntList migration - Caches to scan: {}", cacheNames);
Transformer transformer = new Transformer(verbose);
long totalUpdated = 0;
long totalFailed = 0;
+ long totalScanned = 0;
+ List<MigrationFailure> failureDetails = new ArrayList<>();
+
+ if (reportEvery <= 0)
+ reportEvery = 1;
for (String cacheName : cacheNames) {
IgniteCache<Object, Object> rawCache = ignite.cache(cacheName);
@@ -177,32 +212,48 @@ public final class GridIntListMigrator {
try (QueryCursor<Cache.Entry<Object, Object>> cur = c.query(q)) {
for (Cache.Entry<Object, Object> e : cur) {
+ Object key = null;
+ Object val = null;
+ String valType = "not-read";
+
try {
- Object v = e.getValue();
- TransformResult tr = transformer.transform(v, 0);
+ key = e.getKey();
+ val = e.getValue();
+ valType = typeName(val);
+
+ TransformResult tr = transformer.transform(val, 0);
if (tr.changed) {
if (apply) {
- c.put(e.getKey(), tr.val);
+ c.put(key, tr.val);
updated.incrementAndGet();
}
else if (verbose)
- log.info("DRY-RUN would update key={}",
e.getKey());
+ log.info("DRY-RUN would update key={}", key);
}
-
- long s = scanned.incrementAndGet();
-
- if (s % reportEvery == 0)
- log.info("Scanned={} updated={}", s,
updated.get());
-
}
catch (Throwable t) {
failed.incrementAndGet();
- log.error("Entry migration failed [cache={}, key={}]",
cacheName, e.getKey(), t);
+ MigrationFailure failure = new
MigrationFailure(cacheName, typeName(key), valType,
+ safeToString(key), safeToString(val), key,
failureMessage(t));
+
+ failureDetails.add(failure);
- scanned.incrementAndGet();
+ if (verbose)
+ log.warn("GridIntList entry migration failed: {}",
failure, t);
+ }
+ finally {
+ long s = scanned.incrementAndGet();
+ long globalScanned = totalScanned + s;
+ long globalUpdated = totalUpdated + updated.get();
+ long globalFailed = totalFailed + failed.get();
+
+ if (globalScanned % reportEvery == 0) {
+ logProgress(cacheName, s, updated.get(),
failed.get(), globalScanned, globalUpdated,
+ globalFailed);
+ }
}
}
}
@@ -211,18 +262,571 @@ public final class GridIntListMigrator {
totalUpdated += updated.get();
totalFailed += failed.get();
+ totalScanned += scanned.get();
}
+ logProgress("all caches", totalScanned, totalUpdated, totalFailed,
totalScanned, totalUpdated, totalFailed);
+
if (totalFailed > 0) {
+ String failureSummary = failureSummary(totalFailed,
failureDetails);
+
+ System.err.println(failureSummary);
+ log.error(failureSummary);
+
+ RecoveryDump dump = dumpFailedEntriesSafely(ignite,
failureDetails);
+
+ if (tryAutoRepair(ignite, failureDetails, dump)) {
+ log.info("GridIntList migration auto-repair completed. Deleted
failed entries: {}", totalFailed);
+
+ return totalUpdated;
+ }
+
throw new IllegalStateException("GridIntList migration failed for
" + totalFailed +
- " entries. Migration marker will not be written.");
+ " entries. Migration marker will not be written.\n" +
+ failureSummary);
}
- log.info("GridIntList migration finished. Total updated: {}",
totalUpdated);
+ log.info("GridIntList migration finished. Total scanned: {}. Total
updated: {}", totalScanned, totalUpdated);
return totalUpdated;
}
+ /**
+ * Logs progress to both application log and console. Console output is
intentional because web-app startup may look
+ * stuck while the persistent cache scan is still moving.
+ *
+ * @param cacheName Current cache name.
+ * @param scanned Entries scanned in current cache.
+ * @param updated Entries updated in current cache.
+ * @param failed Entries failed in current cache.
+ * @param totalScanned Entries scanned in all caches.
+ * @param totalUpdated Entries updated in all caches.
+ * @param totalFailed Entries failed in all caches.
+ */
+ private static void logProgress(String cacheName, long scanned, long
updated, long failed, long totalScanned,
+ long totalUpdated, long totalFailed) {
+ String msg = "GridIntList migration progress [cache=" + cacheName
+ + ", scanned=" + scanned
+ + ", updated=" + updated
+ + ", failed=" + failed
+ + ", totalScanned=" + totalScanned
+ + ", totalUpdated=" + totalUpdated
+ + ", totalFailed=" + totalFailed + "]";
+
+ System.err.println(msg);
+ log.info(msg);
+ }
+
+ /**
+ * @param totalFailed Total failed entries.
+ * @param failures Recorded failure samples.
+ * @return Human-readable diagnostic report.
+ */
+ static String failureSummary(long totalFailed, List<MigrationFailure>
failures) {
+ StringBuilder sb = new StringBuilder();
+
+ sb.append("Failed GridIntList migration entries:
").append(totalFailed);
+
+ if (failures.isEmpty())
+ sb.append(System.lineSeparator()).append("No entry details were
captured.");
+ else
+ appendFailuresByCache(sb, failures);
+
+ sb.append(System.lineSeparator()).append("Suggested actions:");
+ sb.append(System.lineSeparator())
+ .append("- If you want to repair these entries manually, stop this
service now.");
+ sb.append(System.lineSeparator())
+ .append("- If the service is not stopped, this migrator will try
to dump and remove all ")
+ .append("failed entries automatically after the timeout.");
+ sb.append(System.lineSeparator())
+ .append("- Failed entries are written to the recovery dump when
dump creation succeeds.");
+ sb.append(System.lineSeparator())
+ .append("- If the data must be preserved, inspect the cache/key
pair manually, ")
+ .append("fix the value that matches the reason above, and rerun
startup.");
+ sb.append(System.lineSeparator()).append("- For deeper diagnostics run
the offline migrator with --cache ")
+ .append("<cache-name> --verbose --report 50000.");
+
+ return sb.toString();
+ }
+
+ /**
+ * @param sb Target string builder.
+ * @param failures Failure details.
+ */
+ private static void appendFailuresByCache(StringBuilder sb,
List<MigrationFailure> failures) {
+ Map<String, List<MigrationFailure>> byCache = new LinkedHashMap<>();
+
+ for (MigrationFailure failure : failures)
+ byCache.computeIfAbsent(failure.cacheName, key -> new
ArrayList<>()).add(failure);
+
+ sb.append(System.lineSeparator()).append("Failed caches:
").append(byCache.size());
+
+ for (Map.Entry<String, List<MigrationFailure>> cacheFailures :
byCache.entrySet()) {
+ List<MigrationFailure> entries = cacheFailures.getValue();
+
+ sb.append(System.lineSeparator()).append("-
cache=").append(cacheFailures.getKey())
+ .append(", failedEntries=").append(entries.size())
+ .append(", shown=").append(Math.min(entries.size(),
FAILURE_DETAILS_LIMIT_PER_CACHE));
+
+ for (int i = 0; i < Math.min(entries.size(),
FAILURE_DETAILS_LIMIT_PER_CACHE); i++) {
+ sb.append(System.lineSeparator()).append(" ").append(i +
1).append(". ")
+ .append(entries.get(i));
+ }
+
+ if (entries.size() > FAILURE_DETAILS_LIMIT_PER_CACHE) {
+ sb.append(System.lineSeparator()).append(" ... ")
+ .append(entries.size() - FAILURE_DETAILS_LIMIT_PER_CACHE)
+ .append(" more failed entries in this cache are omitted
from the console summary.");
+ }
+ }
+ }
+
+ /**
+ * @param allCacheNames Existing cache names.
+ * @param cacheFilter Optional cache-name filter. Non-empty filter is
treated as explicit offline override.
+ * @return Cache names that should be scanned.
+ */
+ private static Collection<String> cacheNamesToScan(Collection<String>
allCacheNames, String cacheFilter) {
+ if (cacheFilter != null && !cacheFilter.isEmpty()) {
+ List<String> filtered = new ArrayList<>();
+
+ for (String cacheName : allCacheNames) {
+ if (cacheName.contains(cacheFilter))
+ filtered.add(cacheName);
+ }
+
+ log.warn("GridIntList migration cache filter [{}] was provided.
Scanning matching caches explicitly: {}",
+ cacheFilter, filtered);
+
+ return filtered;
+ }
+
+ Set<String> existing = new LinkedHashSet<>(allCacheNames);
+ List<String> selected = new ArrayList<>();
+
+ for (String cacheName : GRID_INT_LIST_CACHE_NAMES) {
+ if (existing.contains(cacheName))
+ selected.add(cacheName);
+ }
+
+ Set<String> skipped = new LinkedHashSet<>(existing);
+
+ skipped.removeAll(selected);
+
+ log.info("GridIntList migration selected known caches: {}. Other
caches are skipped: {}", selected, skipped);
+
+ return selected;
+ }
+
+ /**
+ * Dumps failed entries and removes them after a short operator prompt.
+ *
+ * @param ignite Ignite instance.
+ * @param failures Failed entries.
+ * @param dump Recovery dump, if dump succeeded.
+ * @return {@code true} if failed entries were removed.
+ */
+ private static boolean tryAutoRepair(Ignite ignite, List<MigrationFailure>
failures, RecoveryDump dump) {
+ if (failures.isEmpty())
+ return false;
+
+ waitBeforeAutoRepair(dump != null);
+
+ long removed = removeFailedEntries(ignite, failures);
+ String msg = "GridIntList migration auto-repair removed " + removed +
" entries. Recovery dump: " +
+ (dump == null ? "not created; see previous dump error" : dump);
+
+ System.err.println(msg);
+ log.warn(msg);
+
+ return removed == failures.size();
+ }
+
+ /**
+ * @param ignite Ignite instance.
+ * @param failures Failed entries.
+ * @return Recovery dump paths, or {@code null} if dump failed.
+ */
+ private static RecoveryDump dumpFailedEntriesSafely(Ignite ignite,
List<MigrationFailure> failures) {
+ try {
+ RecoveryDump dump = dumpFailedEntries(ignite, failures);
+ String msg = "GridIntList migration dumped failed entries. " +
dump;
+
+ System.err.println(msg);
+ log.warn(msg);
+
+ return dump;
+ }
+ catch (Exception e) {
+ String msg = "GridIntList migration failed to dump failed
entries.";
+
+ System.err.println(msg + " " + e);
+ log.error(msg, e);
+
+ return null;
+ }
+ }
+
+ /**
+ * Waits before automatic repair. System.in is intentionally not consumed
here: in the web launcher it is already
+ * used as the service stop signal.
+ */
+ private static void waitBeforeAutoRepair(boolean dumpAvailable) {
+ String prompt = "GridIntList migration can remove the listed failed
entries. "
+ + (dumpAvailable ? "Recovery dump was created. " : "Recovery dump
was not created. ")
+ + "If you want to repair them manually, stop this service now
within "
+ + autoRepairWaitSecondsForDisplay()
+ + " seconds. If the service keeps running, auto-repair will
start.";
+
+ System.err.println(prompt);
+ log.warn(prompt);
+
+ try {
+ Thread.sleep(autoRepairWaitMillis());
+ }
+ catch (InterruptedException e) {
+ Thread.currentThread().interrupt();
+ }
+ }
+
+ /**
+ * @return Auto-repair wait timeout in milliseconds.
+ */
+ private static long autoRepairWaitMillis() {
+ return Long.getLong(AUTO_REPAIR_WAIT_MILLIS_PROPERTY,
TimeUnit.SECONDS.toMillis(AUTO_REPAIR_WAIT_SECONDS));
+ }
+
+ /**
+ * @return Auto-repair wait timeout in seconds for operator-facing
messages.
+ */
+ private static long autoRepairWaitSecondsForDisplay() {
+ return Math.max(1, (autoRepairWaitMillis() + 999) / 1000);
+ }
+
+ /**
+ * @param ignite Ignite instance.
+ * @param failures Failed entries.
+ * @return Dump file path.
+ */
+ private static RecoveryDump dumpFailedEntries(Ignite ignite,
List<MigrationFailure> failures) throws IOException {
+ File workDir = igniteWorkDir(ignite);
+ Path dumpDir =
workDir.toPath().resolve("diagnostic").resolve("grid-int-list-migration-recovery");
+
+ Files.createDirectories(dumpDir);
+
+ String ts = new SimpleDateFormat("yyyyMMdd_HHmmss").format(new Date());
+ String dumpName = "grid_int_list_migration_failed_" + ts;
+ Path igniteDump = createIgniteDump(ignite, failures, dumpDir,
dumpName);
+ Path manifest = dumpDir.resolve(dumpName + "_manifest.jsonl");
+ List<String> lines = new ArrayList<>();
+
+ for (MigrationFailure failure : failures)
+ lines.add(failure.toJsonLine());
+
+ Files.write(manifest, lines, StandardCharsets.UTF_8);
+
+ return new RecoveryDump(igniteDump, manifest);
+ }
+
+ /**
+ * Creates a built-in Ignite dump for caches with failed entries.
+ *
+ * @param ignite Ignite instance.
+ * @param failures Failed entries.
+ * @param diagnosticDir Diagnostic directory.
+ * @param dumpName Dump name.
+ * @return Ignite dump path, if it was found and moved to diagnostics.
+ */
+ private static Path createIgniteDump(Ignite ignite, List<MigrationFailure>
failures, Path diagnosticDir,
+ String dumpName) throws IOException {
+ Set<String> caches = new LinkedHashSet<>();
+
+ for (MigrationFailure failure : failures)
+ caches.add(failure.cacheName);
+
+ try {
+ ignite.snapshot().createDump(dumpName, caches).get();
+ }
+ catch (Throwable t) {
+ throw new IOException("Unable to create Ignite dump " + dumpName +
" for caches " + caches, t);
+ }
+
+ Path src = findIgniteDump(igniteWorkDir(ignite).toPath(), dumpName);
+
+ if (src == null)
+ return diagnosticDir.resolve(dumpName);
+
+ Path dst = diagnosticDir.resolve(dumpName);
+
+ if (!src.equals(dst)) {
+ if (Files.exists(dst))
+ deleteRecursively(dst);
+
+ Files.move(src, dst, StandardCopyOption.REPLACE_EXISTING);
+ }
+
+ return dst;
+ }
+
+ /**
+ * @param workDir Ignite work dir.
+ * @param dumpName Dump name.
+ * @return Found dump path or {@code null}.
+ */
+ private static Path findIgniteDump(Path workDir, String dumpName) throws
IOException {
+ try (java.util.stream.Stream<Path> paths = Files.walk(workDir)) {
+ return paths
+ .filter(Files::isDirectory)
+ .filter(path -> dumpName.equals(path.getFileName().toString()))
+ .findFirst()
+ .orElse(null);
+ }
+ }
+
+ /**
+ * @param path Path to delete.
+ */
+ private static void deleteRecursively(Path path) throws IOException {
+ try (java.util.stream.Stream<Path> paths = Files.walk(path)) {
+ Iterator<Path> it =
paths.sorted(Comparator.reverseOrder()).iterator();
+
+ while (it.hasNext())
+ Files.delete(it.next());
+ }
+ }
+
+ /**
+ * @param ignite Ignite instance.
+ * @return Ignite work dir.
+ */
+ private static File igniteWorkDir(Ignite ignite) {
+ String workDir = ignite.configuration().getWorkDirectory();
+
+ if (workDir == null || workDir.isEmpty())
+ workDir = ignite.configuration().getIgniteHome();
+
+ return new File(workDir == null || workDir.isEmpty() ? "." : workDir);
+ }
+
+ /**
+ * @param ignite Ignite instance.
+ * @param failures Failed entries.
+ * @return Number of removed entries.
+ */
+ private static long removeFailedEntries(Ignite ignite,
List<MigrationFailure> failures) {
+ long removed = 0;
+
+ for (MigrationFailure failure : failures) {
+ IgniteCache<Object, Object> cache =
ignite.cache(failure.cacheName);
+
+ if (cache == null) {
+ log.warn("GridIntList migration auto-repair: cache is not
available, skip {}", failure);
+
+ continue;
+ }
+
+ if (cache.withKeepBinary().remove(failure.keyObj))
+ removed++;
+ else
+ log.warn("GridIntList migration auto-repair: entry was not
removed {}", failure);
+ }
+
+ return removed;
+ }
+
+ /**
+ * @param obj Object.
+ * @return Base64 Java serialization, or short failure text.
+ */
+ private static String serializedBase64(Object obj) {
+ if (obj == null)
+ return "";
+
+ try {
+ ByteArrayOutputStream bytes = new ByteArrayOutputStream();
+
+ try (ObjectOutputStream out = new ObjectOutputStream(bytes)) {
+ out.writeObject(obj);
+ }
+
+ return Base64.getEncoder().encodeToString(bytes.toByteArray());
+ }
+ catch (Throwable t) {
+ return "<serialization failed: " + failureMessage(t) + ">";
+ }
+ }
+
+ /**
+ * @param val Value.
+ * @return JSON-escaped value.
+ */
+ private static String json(String val) {
+ if (val == null)
+ return "";
+
+ StringBuilder sb = new StringBuilder();
+
+ for (int i = 0; i < val.length(); i++) {
+ char ch = val.charAt(i);
+
+ switch (ch) {
+ case '\\':
+ sb.append("\\\\");
+ break;
+
+ case '"':
+ sb.append("\\\"");
+ break;
+
+ case '\n':
+ sb.append("\\n");
+ break;
+
+ case '\r':
+ sb.append("\\r");
+ break;
+
+ case '\t':
+ sb.append("\\t");
+ break;
+
+ default:
+ sb.append(ch);
+ }
+ }
+
+ return sb.toString();
+ }
+
+ /**
+ * Recovery dump paths.
+ */
+ private static final class RecoveryDump {
+ /** Built-in Ignite dump directory. */
+ private final Path igniteDump;
+
+ /** Failed entries manifest path. */
+ private final Path manifest;
+
+ /**
+ * @param igniteDump Built-in Ignite dump directory.
+ * @param manifest Failed entries manifest path.
+ */
+ private RecoveryDump(Path igniteDump, Path manifest) {
+ this.igniteDump = igniteDump;
+ this.manifest = manifest;
+ }
+
+ /** {@inheritDoc} */
+ @Override public String toString() {
+ return "igniteDump=" + igniteDump + ", manifest=" + manifest;
+ }
+ }
+
+ /**
+ * @param obj Object.
+ * @return Type name safe for diagnostics.
+ */
+ private static String typeName(Object obj) {
+ return obj == null ? "null" : obj.getClass().getName();
+ }
+
+ /**
+ * @param obj Object.
+ * @return String representation safe for diagnostics.
+ */
+ private static String safeToString(Object obj) {
+ if (obj == null)
+ return "null";
+
+ try {
+ return String.valueOf(obj);
+ }
+ catch (Throwable t) {
+ return "<toString failed: " + failureMessage(t) + ">";
+ }
+ }
+
+ /**
+ * @param t Throwable.
+ * @return Compact failure message.
+ */
+ private static String failureMessage(Throwable t) {
+ String msg = t.getMessage();
+
+ return t.getClass().getName() + (msg == null || msg.isEmpty() ? "" :
": " + msg);
+ }
+
+ /**
+ * Compact failed-entry diagnostics.
+ */
+ static final class MigrationFailure {
+ /** Cache name. */
+ private final String cacheName;
+
+ /** Key type name. */
+ private final String keyType;
+
+ /** Value type name. */
+ private final String valueType;
+
+ /** Safe key text. */
+ private final String key;
+
+ /** Safe value text. */
+ private final String value;
+
+ /** Original key object. */
+ private final Object keyObj;
+
+ /** Failure reason. */
+ private final String reason;
+
+ /**
+ * @param cacheName Cache name.
+ * @param keyType Key type name.
+ * @param valueType Value type name.
+ * @param key Safe key text.
+ * @param value Safe value text.
+ * @param keyObj Original key object.
+ * @param reason Failure reason.
+ */
+ MigrationFailure(String cacheName, String keyType, String valueType,
String key, String value, Object keyObj,
+ String reason) {
+ this.cacheName = cacheName;
+ this.keyType = keyType;
+ this.valueType = valueType;
+ this.key = key;
+ this.value = value;
+ this.keyObj = keyObj;
+ this.reason = reason;
+ }
+
+ /**
+ * @return JSON line for recovery dump.
+ */
+ private String toJsonLine() {
+ return "{"
+ + "\"cache\":\"" + json(cacheName) + "\","
+ + "\"key\":\"" + json(key) + "\","
+ + "\"keyType\":\"" + json(keyType) + "\","
+ + "\"keySerializedBase64\":\"" +
json(serializedBase64(keyObj)) + "\","
+ + "\"valueType\":\"" + json(valueType) + "\","
+ + "\"value\":\"" + json(value) + "\","
+ + "\"reason\":\"" + json(reason) + "\""
+ + "}";
+ }
+
+ /** {@inheritDoc} */
+ @Override public String toString() {
+ return "cache=" + cacheName
+ + ", key=" + key
+ + ", keyType=" + keyType
+ + ", valueType=" + valueType
+ + ", reason=" + reason;
+ }
+ }
+
/**
* Infers consistentId by reading first subdirectory under work/db.
*
diff --git a/migrator/src/main/java/org/apache/ignite/migrate/MigratorArgs.java
b/migrator/src/main/java/org/apache/ignite/migrate/MigratorArgs.java
index 97447636..5df9f25b 100644
--- a/migrator/src/main/java/org/apache/ignite/migrate/MigratorArgs.java
+++ b/migrator/src/main/java/org/apache/ignite/migrate/MigratorArgs.java
@@ -31,7 +31,7 @@ public final class MigratorArgs {
boolean apply = false;
boolean verbose = false;
String cacheFilter = null;
- int reportEvery = 500;
+ int reportEvery = 50000;
String workDir = null;
static MigratorArgs parse(String[] args) {
@@ -65,4 +65,4 @@ public final class MigratorArgs {
return cliArgs;
}
-}
\ No newline at end of file
+}
diff --git
a/migrator/src/test/java/org/apache/ignite/migrate/GridIntListMigratorIntegrationTest.java
b/migrator/src/test/java/org/apache/ignite/migrate/GridIntListMigratorIntegrationTest.java
new file mode 100644
index 00000000..d66095b2
--- /dev/null
+++
b/migrator/src/test/java/org/apache/ignite/migrate/GridIntListMigratorIntegrationTest.java
@@ -0,0 +1,144 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.migrate;
+
+import java.nio.file.Files;
+import java.nio.file.Path;
+import org.apache.ignite.Ignite;
+import org.apache.ignite.IgniteCache;
+import org.apache.ignite.Ignition;
+import org.apache.ignite.binary.BinaryObjectBuilder;
+import org.apache.ignite.cluster.ClusterState;
+import org.apache.ignite.configuration.CacheConfiguration;
+import org.apache.ignite.configuration.DataStorageConfiguration;
+import org.apache.ignite.configuration.IgniteConfiguration;
+import org.junit.Rule;
+import org.junit.Test;
+import org.junit.rules.TemporaryFolder;
+
+import static org.junit.Assert.assertTrue;
+
+/**
+ * Integration tests for GridIntList migration on real Ignite persistence.
+ */
+public class GridIntListMigratorIntegrationTest {
+ /** Cache from the production failure. */
+ private static final String BUILD_LOG_CHECK_RESULT = "buildLogCheckResult";
+
+ /** Type name whose default Ignite type ID is -526400035. */
+ private static final String MISSING_BINARY_METADATA_TYPE = "ayzjfkj";
+
+ /** Type ID from the production failure. */
+ private static final int MISSING_BINARY_METADATA_TYPE_ID = -526400035;
+
+ /** Key from the production failure. */
+ private static final long FAILED_BUILD_LOG_CHECK_RESULT_KEY =
6062419808021002488L;
+
+ /** */
+ @Rule public TemporaryFolder tmp = new TemporaryFolder();
+
+ /**
+ * Reproduces the production failure on a persistent Ignite cache and
checks that default migration does not scan
+ * unrelated caches.
+ */
+ @Test public void
migrationSkipsPersistentEntryWithMissingBinaryMetadataInUnrelatedCache() throws
Exception {
+ java.io.File workDir = tmp.newFolder("persistent-ignite-work");
+
+ Ignite ignite = Ignition.start(persistentConfiguration(workDir,
"missing-binary-metadata-1"));
+
+ try {
+ ignite.cluster().state(ClusterState.ACTIVE);
+
+ IgniteCache<Long, Object> cache = ignite.getOrCreateCache(new
CacheConfiguration<Long, Object>(
+ BUILD_LOG_CHECK_RESULT));
+ BinaryObjectBuilder builder =
ignite.binary().builder(MISSING_BINARY_METADATA_TYPE);
+
+ builder.setField("field", "value");
+
+ cache.withKeepBinary().put(FAILED_BUILD_LOG_CHECK_RESULT_KEY,
builder.build());
+ }
+ finally {
+ ignite.close();
+ }
+
+ deleteBinaryMetadata(workDir.toPath(),
MISSING_BINARY_METADATA_TYPE_ID);
+
+ ignite = Ignition.start(persistentConfiguration(workDir,
"missing-binary-metadata-2"));
+
+ try {
+ ignite.cluster().state(ClusterState.ACTIVE);
+
+ GridIntListMigrator.migrateOnInstance(ignite, null, true, false,
1);
+
+ IgniteCache<Object, Object> cache =
ignite.cache(BUILD_LOG_CHECK_RESULT);
+
+ assertTrue("Broken buildLogCheckResult entry must not be touched
by default migration",
+ cache.containsKey(FAILED_BUILD_LOG_CHECK_RESULT_KEY));
+ }
+ finally {
+ ignite.close();
+ }
+ }
+
+ /**
+ * @param workDir Work dir.
+ * @param name Ignite instance name.
+ * @return Persistent single-node configuration.
+ */
+ private IgniteConfiguration persistentConfiguration(java.io.File workDir,
String name) {
+ IgniteConfiguration cfg = new IgniteConfiguration();
+ DataStorageConfiguration storage = new DataStorageConfiguration();
+
+
storage.getDefaultDataRegionConfiguration().setPersistenceEnabled(true);
+
+ cfg.setIgniteInstanceName(name);
+ cfg.setConsistentId("missing-binary-metadata-node");
+ cfg.setWorkDirectory(workDir.getAbsolutePath());
+ cfg.setDataStorageConfiguration(storage);
+
+ return cfg;
+ }
+
+ /**
+ * @param workDir Ignite work dir.
+ * @param typeId Binary type ID.
+ */
+ private void deleteBinaryMetadata(Path workDir, int typeId) throws
Exception {
+ String typeIdText = String.valueOf(typeId);
+ boolean deleted;
+
+ try (java.util.stream.Stream<Path> paths = Files.walk(workDir)) {
+ deleted = paths
+ .filter(Files::isRegularFile)
+ .filter(path ->
path.getFileName().toString().contains(typeIdText))
+ .map(path -> {
+ try {
+ Files.delete(path);
+
+ return true;
+ }
+ catch (java.io.IOException e) {
+ throw new RuntimeException(e);
+ }
+ })
+ .reduce(false, (left, right) -> left || right);
+ }
+
+ assertTrue("Binary metadata file for typeId=" + typeId + " must be
deleted", deleted);
+ }
+}
diff --git
a/migrator/src/test/java/org/apache/ignite/migrate/GridIntListMigratorTest.java
b/migrator/src/test/java/org/apache/ignite/migrate/GridIntListMigratorTest.java
new file mode 100644
index 00000000..db0526d5
--- /dev/null
+++
b/migrator/src/test/java/org/apache/ignite/migrate/GridIntListMigratorTest.java
@@ -0,0 +1,302 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.migrate;
+
+import java.nio.charset.StandardCharsets;
+import java.nio.file.Files;
+import java.nio.file.Path;
+import java.util.Arrays;
+import java.util.Collections;
+import javax.cache.Cache;
+import org.apache.ignite.Ignite;
+import org.apache.ignite.IgniteCache;
+import org.apache.ignite.IgniteSnapshot;
+import org.apache.ignite.binary.BinaryObject;
+import org.apache.ignite.binary.BinaryObjectBuilder;
+import org.apache.ignite.binary.BinaryObjectException;
+import org.apache.ignite.binary.BinaryType;
+import org.apache.ignite.cache.query.QueryCursor;
+import org.apache.ignite.cache.query.ScanQuery;
+import org.apache.ignite.configuration.IgniteConfiguration;
+import org.apache.ignite.lang.IgniteFuture;
+import org.junit.Rule;
+import org.junit.Test;
+import org.junit.rules.TemporaryFolder;
+
+import static org.junit.Assert.assertTrue;
+import static org.mockito.ArgumentMatchers.any;
+import static org.mockito.ArgumentMatchers.anyCollection;
+import static org.mockito.ArgumentMatchers.anyString;
+import static org.mockito.Mockito.mock;
+import static org.mockito.Mockito.never;
+import static org.mockito.Mockito.verify;
+import static org.mockito.Mockito.when;
+
+/**
+ * Tests for GridIntList migration diagnostics.
+ */
+public class GridIntListMigratorTest {
+ /** Production cache that may contain GridIntList. */
+ private static final String FAT_BUILD = "teamcityFatBuild";
+
+ /** Cache from the production failure. */
+ private static final String BUILD_LOG_CHECK_RESULT = "buildLogCheckResult";
+
+ /** Key from the production failure. */
+ private static final long FAILED_BUILD_LOG_CHECK_RESULT_KEY =
6062419808021002488L;
+
+ /** */
+ @Rule public TemporaryFolder tmp = new TemporaryFolder();
+
+ /**
+ * Checks that failed-entry summary contains actionable cache/key
diagnostics.
+ */
+ @Test public void failureSummaryContainsEntryDetails() {
+ String summary = GridIntListMigrator.failureSummary(2, Arrays.asList(
+ new GridIntListMigrator.MigrationFailure("cacheA",
"java.lang.String",
+ "org.apache.ignite.binary.BinaryObjectImpl", "key-1",
"<value>", "key-1", "boom"),
+ new GridIntListMigrator.MigrationFailure("cacheB",
"java.lang.Long",
+ "not-read", "42", "<value>", 42L, "read failed")
+ ));
+
+ assertTrue(summary.contains("stop this service now"));
+ assertTrue(summary.contains("dump and remove all failed entries
automatically"));
+ assertTrue(summary.contains("cache=cacheA"));
+ assertTrue(summary.contains("key=key-1"));
+
assertTrue(summary.contains("valueType=org.apache.ignite.binary.BinaryObjectImpl"));
+ assertTrue(summary.contains("reason=read failed"));
+ }
+
+ /**
+ * Checks that failures from later caches are not hidden by many earlier
failures.
+ */
+ @Test public void failureSummaryContainsExamplesFromEachFailedCache() {
+ java.util.List<GridIntListMigrator.MigrationFailure> failures = new
java.util.ArrayList<>();
+
+ for (int i = 0; i < 10; i++) {
+ failures.add(new GridIntListMigrator.MigrationFailure("cacheA",
"java.lang.Long",
+ "valueTypeA", "key-a-" + i, "<value>", i, "boom-a-" + i));
+ }
+
+ failures.add(new GridIntListMigrator.MigrationFailure("cacheB",
"java.lang.Long",
+ "valueTypeB", "key-b-1", "<value>", 100L, "boom-b"));
+
+ String summary = GridIntListMigrator.failureSummary(failures.size(),
failures);
+
+ assertTrue(summary.contains("cache=cacheA, failedEntries=10,
shown=5"));
+ assertTrue(summary.contains("cache=cacheB, failedEntries=1, shown=1"));
+ assertTrue(summary.contains("key=key-b-1"));
+ assertTrue(summary.contains("5 more failed entries in this cache"));
+ }
+
+ /**
+ * Checks the recovery path where a migrated cache value cannot resolve
binary type metadata.
+ */
+ @Test public void
migrationDumpsAndRemovesEntryWithMissingBinaryTypeDetails() throws Exception {
+
System.setProperty(GridIntListMigrator.AUTO_REPAIR_WAIT_MILLIS_PROPERTY, "1");
+
+ try {
+ Long failedKey = 6062419808021002488L;
+ Ignite ignite = mock(Ignite.class);
+ IgniteCache<Object, Object> rawCache = mock(IgniteCache.class);
+ IgniteCache<Object, Object> binCache = mock(IgniteCache.class);
+ QueryCursor<Cache.Entry<Object, Object>> cursor =
mock(QueryCursor.class);
+ IgniteSnapshot snapshot = mock(IgniteSnapshot.class);
+ IgniteFuture<Void> dumpFut = mock(IgniteFuture.class);
+ java.io.File workDir = tmp.newFolder("ignite-work");
+
+
when(ignite.cacheNames()).thenReturn(Collections.singleton(FAT_BUILD));
+ when(ignite.cache(FAT_BUILD)).thenReturn(rawCache);
+ when(ignite.configuration()).thenReturn(new IgniteConfiguration()
+ .setWorkDirectory(workDir.getAbsolutePath()));
+ when(ignite.snapshot()).thenReturn(snapshot);
+ when(snapshot.createDump(anyString(),
anyCollection())).thenAnswer(invocation -> {
+ String dumpName = invocation.getArgument(0);
+
+
Files.createDirectories(workDir.toPath().resolve("snapshots").resolve(dumpName));
+
+ return dumpFut;
+ });
+ when(rawCache.withKeepBinary()).thenReturn(binCache);
+ when(binCache.query(any(ScanQuery.class))).thenReturn(cursor);
+
when(cursor.iterator()).thenReturn(Collections.<Cache.Entry<Object,
Object>>singletonList(
+ new TestEntry(failedKey, new
BrokenBinaryObject())).iterator());
+ when(binCache.remove(failedKey)).thenReturn(true);
+
+ GridIntListMigrator.migrateOnInstance(ignite, null, true, false,
1);
+
+ verify(binCache).remove(failedKey);
+
+ Path dumpDir = new
java.io.File(ignite.configuration().getWorkDirectory()).toPath()
+ .resolve("diagnostic")
+ .resolve("grid-int-list-migration-recovery");
+ Path dump = Files.list(dumpDir)
+ .filter(path ->
path.getFileName().toString().endsWith("_manifest.jsonl"))
+ .findFirst().orElseThrow(() ->
+ new AssertionError("Recovery dump was not written"));
+ String dumpText = new String(Files.readAllBytes(dump),
StandardCharsets.UTF_8);
+
+ assertTrue(dumpText.contains(FAT_BUILD));
+ assertTrue(dumpText.contains(String.valueOf(failedKey)));
+ assertTrue(dumpText.contains("Failed to get binary type details
[typeId=-526400035]"));
+ }
+ finally {
+
System.clearProperty(GridIntListMigrator.AUTO_REPAIR_WAIT_MILLIS_PROPERTY);
+ }
+ }
+
+ /**
+ * Checks that a diagnostic dump failure does not stop automatic migration
recovery.
+ */
+ @Test public void migrationRemovesFailedEntryWhenDumpFails() throws
Exception {
+
System.setProperty(GridIntListMigrator.AUTO_REPAIR_WAIT_MILLIS_PROPERTY, "1");
+
+ try {
+ Long failedKey = 6062419808021002488L;
+ Ignite ignite = mock(Ignite.class);
+ IgniteCache<Object, Object> rawCache = mock(IgniteCache.class);
+ IgniteCache<Object, Object> binCache = mock(IgniteCache.class);
+ QueryCursor<Cache.Entry<Object, Object>> cursor =
mock(QueryCursor.class);
+ IgniteSnapshot snapshot = mock(IgniteSnapshot.class);
+ java.io.File workDir = tmp.newFolder("ignite-work-dump-fails");
+
+
when(ignite.cacheNames()).thenReturn(Collections.singleton(FAT_BUILD));
+ when(ignite.cache(FAT_BUILD)).thenReturn(rawCache);
+ when(ignite.configuration()).thenReturn(new IgniteConfiguration()
+ .setWorkDirectory(workDir.getAbsolutePath()));
+ when(ignite.snapshot()).thenReturn(snapshot);
+ when(snapshot.createDump(anyString(),
anyCollection())).thenThrow(new RuntimeException("dump failed"));
+ when(rawCache.withKeepBinary()).thenReturn(binCache);
+ when(binCache.query(any(ScanQuery.class))).thenReturn(cursor);
+
when(cursor.iterator()).thenReturn(Collections.<Cache.Entry<Object,
Object>>singletonList(
+ new TestEntry(failedKey, new
BrokenBinaryObject())).iterator());
+ when(binCache.remove(failedKey)).thenReturn(true);
+
+ GridIntListMigrator.migrateOnInstance(ignite, null, true, false,
1);
+
+ verify(binCache).remove(failedKey);
+ }
+ finally {
+
System.clearProperty(GridIntListMigrator.AUTO_REPAIR_WAIT_MILLIS_PROPERTY);
+ }
+ }
+
+ /**
+ * Checks that unrelated caches are skipped by default, including the
cache that exposed the production failure.
+ */
+ @Test public void migrationSkipsNonGridIntListCachesByDefault() {
+ Ignite ignite = mock(Ignite.class);
+
+
when(ignite.cacheNames()).thenReturn(Collections.singleton(BUILD_LOG_CHECK_RESULT));
+
+ long updated = GridIntListMigrator.migrateOnInstance(ignite, null,
true, false, 1);
+
+ assertTrue("No entries should be updated in skipped caches", updated
== 0);
+ verify(ignite, never()).cache(BUILD_LOG_CHECK_RESULT);
+ }
+
+ /**
+ * Test cache entry.
+ */
+ private static class TestEntry implements Cache.Entry<Object, Object> {
+ /** */
+ private final Object key;
+
+ /** */
+ private final Object val;
+
+ /**
+ * @param key Key.
+ * @param val Value.
+ */
+ private TestEntry(Object key, Object val) {
+ this.key = key;
+ this.val = val;
+ }
+
+ /** {@inheritDoc} */
+ @Override public Object getKey() {
+ return key;
+ }
+
+ /** {@inheritDoc} */
+ @Override public Object getValue() {
+ return val;
+ }
+
+ /** {@inheritDoc} */
+ @Override public <T> T unwrap(Class<T> cls) {
+ throw new UnsupportedOperationException();
+ }
+ }
+
+ /**
+ * Binary object that reproduces missing binary metadata failure from a
real persistent cache.
+ */
+ private static class BrokenBinaryObject implements BinaryObject {
+ /** {@inheritDoc} */
+ @Override public BinaryType type() {
+ throw new BinaryObjectException("Failed to get binary type details
[typeId=-526400035]");
+ }
+
+ /** {@inheritDoc} */
+ @Override public <F> F field(String fieldName) {
+ throw new UnsupportedOperationException();
+ }
+
+ /** {@inheritDoc} */
+ @Override public boolean hasField(String fieldName) {
+ return false;
+ }
+
+ /** {@inheritDoc} */
+ @Override public <T> T deserialize() {
+ throw new UnsupportedOperationException();
+ }
+
+ /** {@inheritDoc} */
+ @Override public <T> T deserialize(ClassLoader ldr) {
+ throw new UnsupportedOperationException();
+ }
+
+ /** {@inheritDoc} */
+ @Override public BinaryObject clone() {
+ return this;
+ }
+
+ /** {@inheritDoc} */
+ @Override public BinaryObjectBuilder toBuilder() {
+ throw new UnsupportedOperationException();
+ }
+
+ /** {@inheritDoc} */
+ @Override public int enumOrdinal() {
+ throw new UnsupportedOperationException();
+ }
+
+ /** {@inheritDoc} */
+ @Override public String enumName() {
+ throw new UnsupportedOperationException();
+ }
+
+ /** {@inheritDoc} */
+ @Override public int size() {
+ return 0;
+ }
+ }
+}
diff --git
a/migrator/src/test/java/org/apache/ignite/migrate/LegacyPersistentStorageCompatibilityTest.java
b/migrator/src/test/java/org/apache/ignite/migrate/LegacyPersistentStorageCompatibilityTest.java
index 7c5cfb58..d45f70af 100644
---
a/migrator/src/test/java/org/apache/ignite/migrate/LegacyPersistentStorageCompatibilityTest.java
+++
b/migrator/src/test/java/org/apache/ignite/migrate/LegacyPersistentStorageCompatibilityTest.java
@@ -55,7 +55,6 @@ import org.apache.ignite.failure.FailureContext;
import org.apache.ignite.failure.FailureHandler;
import org.apache.ignite.logger.slf4j.Slf4jLogger;
import org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi;
-import org.apache.ignite.tcbot.common.util.GridIntList;
import org.apache.ignite.tcbot.persistence.CacheConfigs;
import org.apache.ignite.tcbot.persistence.IStringCompactor;
import org.apache.ignite.tcignited.ITeamcityIgnited;
@@ -99,9 +98,6 @@ public class LegacyPersistentStorageCompatibilityTest {
/** Branch used by the local perf compatibility test. */
private static final String BRANCH = "refs/heads/perf-test-master";
- /** Cache that intentionally stores Ignite 2.14 internal GridIntList for
migrator coverage. */
- private static final String LEGACY_GRID_INT_LIST_CACHE =
"legacyGridIntListCache";
-
/** Cache used only to produce enough WAL with old Ignite. */
private static final String LEGACY_WAL_STRESS_CACHE =
"legacyWalStressCache";
@@ -175,10 +171,7 @@ public class LegacyPersistentStorageCompatibilityTest {
assertEquals(RUN_ALL_JAVA_8, ref.buildTypeId(compactor));
assertEquals(BRANCH, ref.branchName(compactor));
assertTrue(fatBuild.isFinished(compactor));
-
- Object migrated =
ignite.cache(LEGACY_GRID_INT_LIST_CACHE).get("legacy");
-
- assertEquals(GridIntList.asList(1, 2, 3), migrated);
+ assertEquals("enabled",
fatBuild.parameters().toParameters(compactor).getParameter("compat.parameter"));
assertAllUserCachesCanBeDeserialized(ignite);
@@ -551,6 +544,7 @@ public class LegacyPersistentStorageCompatibilityTest {
private String legacyGeneratorJava() {
return "package org.apache.ignite.ci.db;\n"
+ "\n"
+ + "import java.util.Arrays;\n"
+ "import java.io.File;\n"
+ "import java.util.Collections;\n"
+ "import java.util.Iterator;\n"
@@ -565,7 +559,6 @@ public class LegacyPersistentStorageCompatibilityTest {
+ "import org.apache.ignite.configuration.CacheConfiguration;\n"
+ "import
org.apache.ignite.configuration.DataRegionConfiguration;\n"
+ "import org.apache.ignite.configuration.IgniteConfiguration;\n"
- + "import org.apache.ignite.internal.util.GridIntList;\n"
+ "import org.apache.ignite.logger.slf4j.Slf4jLogger;\n"
+ "import org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi;\n"
+ "import org.apache.ignite.tcbot.persistence.CacheConfigs;\n"
@@ -574,6 +567,8 @@ public class LegacyPersistentStorageCompatibilityTest {
+ "import org.apache.ignite.tcignited.build.FatBuildDao;\n"
+ "import org.apache.ignite.tcignited.buildref.BuildRefDao;\n"
+ "import org.apache.ignite.tcservice.model.conf.BuildType;\n"
+ + "import org.apache.ignite.tcservice.model.conf.bt.Parameters;\n"
+ + "import org.apache.ignite.tcservice.model.conf.bt.Property;\n"
+ "import org.apache.ignite.tcservice.model.hist.BuildRef;\n"
+ "import org.apache.ignite.tcservice.model.result.Build;\n"
+ "\n"
@@ -581,7 +576,6 @@ public class LegacyPersistentStorageCompatibilityTest {
+ " private static final String SRV_ID = \"" + SRV_ID + "\";\n"
+ " private static final String RUN_ALL_JAVA_8 = \"" +
RUN_ALL_JAVA_8 + "\";\n"
+ " private static final String BRANCH = \"" + BRANCH + "\";\n"
- + " private static final String LEGACY_GRID_INT_LIST_CACHE =
\"" + LEGACY_GRID_INT_LIST_CACHE + "\";\n"
+ " private static final String LEGACY_WAL_STRESS_CACHE = \"" +
LEGACY_WAL_STRESS_CACHE + "\";\n"
+ " private static final long REGION_SIZE = " + REGION_SIZE +
"L;\n"
+ " private static final int WAL_STRESS_MB = " +
legacyWalStressMb() + ";\n"
@@ -612,9 +606,6 @@ public class LegacyPersistentStorageCompatibilityTest {
+ " fatBuilds.put(key, new
FatBuildCompacted(compactor, build));\n"
+ " }\n"
+ "\n"
- + " IgniteCache<String, Object> legacy =
ignite.getOrCreateCache(new CacheConfiguration<String,
Object>(LEGACY_GRID_INT_LIST_CACHE));\n"
- + " legacy.put(\"legacy\", new GridIntList(new int[]
{1, 2, 3}));\n"
- + "\n"
+ " writeWalStressData(ignite);\n"
+ "\n"
+ " String legacyMigrationRes = new
DbMigrations(ignite).dataMigration();\n"
@@ -675,6 +666,7 @@ public class LegacyPersistentStorageCompatibilityTest {
+ " build.defaultBranch = Boolean.FALSE;\n"
+ " build.composite = Boolean.TRUE;\n"
+ " build.webUrl =
\"http://localhost/perf-test/teamcity/build/\" + buildId;\n"
+ + " build.parameters(new Parameters(Arrays.asList(new
Property(\"compat.parameter\", \"enabled\"))));\n"
+ " build.setQueuedDateTs(1700000000000L + buildId);\n"
+ " build.setStartDateTs(1700000010000L + buildId);\n"
+ " build.setFinishDateTs(1700000070000L + buildId);\n"
diff --git
a/tcbot-common/src/main/java/org/apache/ignite/tcbot/common/conf/TcBotWorkDir.java
b/tcbot-common/src/main/java/org/apache/ignite/tcbot/common/conf/TcBotWorkDir.java
index 1434bd52..23346f1e 100644
---
a/tcbot-common/src/main/java/org/apache/ignite/tcbot/common/conf/TcBotWorkDir.java
+++
b/tcbot-common/src/main/java/org/apache/ignite/tcbot/common/conf/TcBotWorkDir.java
@@ -22,6 +22,9 @@ import static com.google.common.base.Preconditions.checkState;
import static com.google.common.base.Strings.isNullOrEmpty;
public class TcBotWorkDir {
+ /** Directory for JVM and migration diagnostics under TC Bot work dir. */
+ public static final String DIAGNOSTIC_DIR = "diagnostic";
+
public static File ensureDirExist(File workDir) {
if (!workDir.exists())
checkState(workDir.mkdirs(), "Unable to make directory [" +
workDir + "]");
@@ -43,4 +46,8 @@ public class TcBotWorkDir {
return ensureDirExist(workDir);
}
+
+ public static File resolveDiagnosticDir() {
+ return ensureDirExist(new File(resolveWorkDir(), DIAGNOSTIC_DIR));
+ }
}