bbotella commented on code in PR #3751:
URL: https://github.com/apache/cassandra/pull/3751#discussion_r1890367935
##########
src/java/org/apache/cassandra/config/CassandraRelevantProperties.java:
##########
@@ -502,6 +502,12 @@ public enum CassandraRelevantProperties
SNAPSHOT_CLEANUP_INITIAL_DELAY_SECONDS("cassandra.snapshot.ttl_cleanup_initial_delay_seconds",
"5"),
/** snapshots ttl cleanup period in seconds */
SNAPSHOT_CLEANUP_PERIOD_SECONDS("cassandra.snapshot.ttl_cleanup_period_seconds",
"60"),
+ /**
+ * When there is a snapshot with old / basic format (basically
pre-CASSANDRA-16789),
+ * it will enrich it with more metadata upon snapshot's loading at startup.
+ * Defaults to true, when set to false, no enriching will be done.
+ * */
+ SNAPSHOT_MANIFEST_ENRICH_ENABLED("cassandra.snapshot.enrich.enabled",
"true"),
Review Comment:
nit: `SNAPSHOT_MANIFEST_ENRICH_OR_CREATE_ENABLED`?
##########
.circleci/config.yml:
##########
Review Comment:
Are the changes in circle config expected?
##########
test/distributed/org/apache/cassandra/distributed/test/SnapshotsTest.java:
##########
@@ -130,6 +137,65 @@ public void testEverySnapshotDirHasManifestAndSchema()
}
}
+ @Test
+ public void testLocalOrReplicatedSystemTablesSnapshotsDoNotHaveSchema()
+ {
+ cluster.get(1)
+ .nodetoolResult("snapshot", "-t",
"snapshot_with_local_or_replicated")
+ .asserts()
+ .success();
+
+ String[] dataDirs = (String[])
cluster.get(1).config().get("data_file_directories");
+ String[] paths = getSnapshotPaths(true);
+
+ for (String dataDir : dataDirs)
+ {
+ for (String path : paths)
+ {
+ Path snapshotDir = Paths.get(dataDir)
+ .resolve(path)
+ .resolve("snapshots")
+
.resolve("snapshot_with_local_or_replicated");
+
+ if (snapshotDir.toFile().exists())
+ assertFalse(new File(snapshotDir, "schema.cql").exists());
+ }
+ }
+ }
+
+ @Test
+ public void testMissingManifestIsCreatedOnStartup()
+ {
+ cluster.schemaChange(withKeyspace("CREATE TABLE %s.tbl (key int, value
text, PRIMARY KEY (key))"));
+ populate(cluster);
+
+ cluster.get(1)
+ .nodetoolResult("snapshot", "-t",
"snapshot_with_local_or_replicated")
+ .asserts()
+ .success();
+
+ String[] dataDirs = (String[])
cluster.get(1).config().get("data_file_directories");
+ String[] paths = getSnapshotPaths(false);
+
+ assertManifestsPresence(dataDirs, paths, true);
Review Comment:
Can we make the same test but with the config flag
`SNAPSHOT_MANIFEST_ENRICH_ENABLED` set to false to assert that the manifests
are left untouched/uncreated?
##########
src/java/org/apache/cassandra/service/snapshot/TableSnapshot.java:
##########
@@ -490,8 +496,90 @@ private void loadMetadataFromManifest(File manifestFile)
TableSnapshot build()
{
+ maybeCreateOrEnrichManifest();
return new TableSnapshot(keyspaceName, tableName, tableId, tag,
createdAt, expiresAt, snapshotDirs, ephemeral);
}
+
+ private void maybeCreateOrEnrichManifest()
+ {
+ // this is caused by not reading any manifest of a new format or
that snapshot had none upon loading,
+ // that might be the case when upgrading e.g. from 4.0 where basic
manifest is created when taking a snapshot,
+ // so we just go ahead and enrich it in each snapshot dir
+ if (createdAt != null)
+ return;
+
+ if
(!CassandraRelevantProperties.SNAPSHOT_MANIFEST_ENRICH_ENABLED.getBoolean())
+ return;
+
+ long lastModified = -1;
+
+ List<String> allDataFiles = new ArrayList<>();
+ for (File snapshotDir : snapshotDirs)
+ {
+ // we will consider time of the creation the oldest last
modified
+ // timestamp on any snapshot directory
+ long currentLastModified = snapshotDir.lastModified();
+ if (currentLastModified < lastModified || lastModified == -1)
+ lastModified = currentLastModified;
+
+ List<File> dataFiles = new ArrayList<>();
+ try
+ {
+ List<File> indicesDirs = new ArrayList<>();
+ File[] snapshotFiles = snapshotDir.list(file -> {
+ if (file.isDirectory() && file.name().startsWith("."))
+ {
+ indicesDirs.add(file);
+ return false;
+ }
+ else
+ {
+ return file.name().endsWith('-' +
SSTableFormat.Components.DATA.type.repr);
+ }
+ });
+
+ Collections.addAll(dataFiles, snapshotFiles);
+
+ for (File indexDir : indicesDirs)
+ dataFiles.addAll(Arrays.asList(indexDir.list(file ->
file.name().endsWith('-' + SSTableFormat.Components.DATA.type.repr))));
+
+ }
+ catch (IOException ex)
+ {
+ logger.error("Unable to list a directory for data
components: {}", snapshotDir);
+ }
+
+ for (File dataFile : dataFiles)
+ {
+ Descriptor descriptor =
SSTable.tryDescriptorFromFile(dataFile);
+ if (descriptor != null)
+ {
+ String relativeDataFileName =
descriptor.relativeFilenameFor(SSTableFormat.Components.DATA);
+ allDataFiles.add(relativeDataFileName);
+ }
+ }
+ }
+
+ // in any case of not being able to resolve it, set it to current
time
+ if (lastModified <= 0)
+ createdAt =
Instant.ofEpochMilli(Clock.Global.currentTimeMillis());
+ else
+ createdAt = Instant.ofEpochMilli(lastModified);
+
+ for (File snapshotDir : snapshotDirs)
+ {
+ SnapshotManifest snapshotManifest = new
SnapshotManifest(allDataFiles, null, createdAt, ephemeral);
+ File manifestFile = new File(snapshotDir, "manifest.json");
+ try
+ {
+ snapshotManifest.serializeToJsonFile(manifestFile);
+ }
+ catch (IOException ex)
+ {
+ logger.error("Unable to create a manifest.json file in
{}", manifestFile.absolutePath());
+ }
Review Comment:
Nit: Maybe adding logger debug line with `manifest.json file was
updated/created for snapshot` is useful?
##########
src/java/org/apache/cassandra/service/snapshot/TableSnapshot.java:
##########
@@ -490,8 +496,90 @@ private void loadMetadataFromManifest(File manifestFile)
TableSnapshot build()
{
+ maybeCreateOrEnrichManifest();
return new TableSnapshot(keyspaceName, tableName, tableId, tag,
createdAt, expiresAt, snapshotDirs, ephemeral);
}
+
+ private void maybeCreateOrEnrichManifest()
+ {
+ // this is caused by not reading any manifest of a new format or
that snapshot had none upon loading,
+ // that might be the case when upgrading e.g. from 4.0 where basic
manifest is created when taking a snapshot,
+ // so we just go ahead and enrich it in each snapshot dir
+ if (createdAt != null)
Review Comment:
Nit: Is it possible to check what of the two cases it is? Maybe adding a
logger debug line stating it may be useful?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]