bbotella commented on code in PR #3751:
URL: https://github.com/apache/cassandra/pull/3751#discussion_r1890367935


##########
src/java/org/apache/cassandra/config/CassandraRelevantProperties.java:
##########
@@ -502,6 +502,12 @@ public enum CassandraRelevantProperties
     
SNAPSHOT_CLEANUP_INITIAL_DELAY_SECONDS("cassandra.snapshot.ttl_cleanup_initial_delay_seconds",
 "5"),
     /** snapshots ttl cleanup period in seconds */
     
SNAPSHOT_CLEANUP_PERIOD_SECONDS("cassandra.snapshot.ttl_cleanup_period_seconds",
 "60"),
+    /**
+     * When there is a snapshot with old / basic format (basically 
pre-CASSANDRA-16789),
+     * it will enrich it with more metadata upon snapshot's loading at startup.
+     * Defaults to true, when set to false, no enriching will be done.
+     * */
+    SNAPSHOT_MANIFEST_ENRICH_ENABLED("cassandra.snapshot.enrich.enabled", 
"true"),

Review Comment:
   nit: `SNAPSHOT_MANIFEST_ENRICH_OR_CREATE_ENABLED`?



##########
.circleci/config.yml:
##########


Review Comment:
   Are the changes in circle config expected?



##########
test/distributed/org/apache/cassandra/distributed/test/SnapshotsTest.java:
##########
@@ -130,6 +137,65 @@ public void testEverySnapshotDirHasManifestAndSchema()
         }
     }
 
+    @Test
+    public void testLocalOrReplicatedSystemTablesSnapshotsDoNotHaveSchema()
+    {
+        cluster.get(1)
+               .nodetoolResult("snapshot", "-t", 
"snapshot_with_local_or_replicated")
+               .asserts()
+               .success();
+
+        String[] dataDirs = (String[]) 
cluster.get(1).config().get("data_file_directories");
+        String[] paths = getSnapshotPaths(true);
+
+        for (String dataDir : dataDirs)
+        {
+            for (String path : paths)
+            {
+                Path snapshotDir = Paths.get(dataDir)
+                                        .resolve(path)
+                                        .resolve("snapshots")
+                                        
.resolve("snapshot_with_local_or_replicated");
+
+                if (snapshotDir.toFile().exists())
+                    assertFalse(new File(snapshotDir, "schema.cql").exists());
+            }
+        }
+    }
+
+    @Test
+    public void testMissingManifestIsCreatedOnStartup()
+    {
+        cluster.schemaChange(withKeyspace("CREATE TABLE %s.tbl (key int, value 
text, PRIMARY KEY (key))"));
+        populate(cluster);
+
+        cluster.get(1)
+               .nodetoolResult("snapshot", "-t", 
"snapshot_with_local_or_replicated")
+               .asserts()
+               .success();
+
+        String[] dataDirs = (String[]) 
cluster.get(1).config().get("data_file_directories");
+        String[] paths = getSnapshotPaths(false);
+
+        assertManifestsPresence(dataDirs, paths, true);

Review Comment:
   Can we make the same test but with the config flag 
`SNAPSHOT_MANIFEST_ENRICH_ENABLED` set to false to assert that the manifests 
are left untouched/uncreated?



##########
src/java/org/apache/cassandra/service/snapshot/TableSnapshot.java:
##########
@@ -490,8 +496,90 @@ private void loadMetadataFromManifest(File manifestFile)
 
         TableSnapshot build()
         {
+            maybeCreateOrEnrichManifest();
             return new TableSnapshot(keyspaceName, tableName, tableId, tag, 
createdAt, expiresAt, snapshotDirs, ephemeral);
         }
+
+        private void maybeCreateOrEnrichManifest()
+        {
+            // this is caused by not reading any manifest of a new format or 
that snapshot had none upon loading,
+            // that might be the case when upgrading e.g. from 4.0 where basic 
manifest is created when taking a snapshot,
+            // so we just go ahead and enrich it in each snapshot dir
+            if (createdAt != null)
+                return;
+
+            if 
(!CassandraRelevantProperties.SNAPSHOT_MANIFEST_ENRICH_ENABLED.getBoolean())
+                return;
+
+            long lastModified = -1;
+
+            List<String> allDataFiles = new ArrayList<>();
+            for (File snapshotDir : snapshotDirs)
+            {
+                // we will consider time of the creation the oldest last 
modified
+                // timestamp on any snapshot directory
+                long currentLastModified = snapshotDir.lastModified();
+                if (currentLastModified < lastModified || lastModified == -1)
+                    lastModified = currentLastModified;
+
+                List<File> dataFiles = new ArrayList<>();
+                try
+                {
+                    List<File> indicesDirs = new ArrayList<>();
+                    File[] snapshotFiles = snapshotDir.list(file -> {
+                        if (file.isDirectory() && file.name().startsWith("."))
+                        {
+                            indicesDirs.add(file);
+                            return false;
+                        }
+                        else
+                        {
+                            return file.name().endsWith('-' + 
SSTableFormat.Components.DATA.type.repr);
+                        }
+                    });
+
+                    Collections.addAll(dataFiles, snapshotFiles);
+
+                    for (File indexDir : indicesDirs)
+                        dataFiles.addAll(Arrays.asList(indexDir.list(file -> 
file.name().endsWith('-' + SSTableFormat.Components.DATA.type.repr))));
+
+                }
+                catch (IOException ex)
+                {
+                    logger.error("Unable to list a directory for data 
components: {}", snapshotDir);
+                }
+
+                for (File dataFile : dataFiles)
+                {
+                    Descriptor descriptor = 
SSTable.tryDescriptorFromFile(dataFile);
+                    if (descriptor != null)
+                    {
+                        String relativeDataFileName = 
descriptor.relativeFilenameFor(SSTableFormat.Components.DATA);
+                        allDataFiles.add(relativeDataFileName);
+                    }
+                }
+            }
+
+            // in any case of not being able to resolve it, set it to current 
time
+            if (lastModified <= 0)
+                createdAt = 
Instant.ofEpochMilli(Clock.Global.currentTimeMillis());
+            else
+                createdAt = Instant.ofEpochMilli(lastModified);
+
+            for (File snapshotDir : snapshotDirs)
+            {
+                SnapshotManifest snapshotManifest = new 
SnapshotManifest(allDataFiles, null, createdAt, ephemeral);
+                File manifestFile = new File(snapshotDir, "manifest.json");
+                try
+                {
+                    snapshotManifest.serializeToJsonFile(manifestFile);
+                }
+                catch (IOException ex)
+                {
+                    logger.error("Unable to create a manifest.json file in 
{}", manifestFile.absolutePath());
+                }

Review Comment:
   Nit: Maybe adding logger debug line with `manifest.json file was 
updated/created for snapshot` is useful?



##########
src/java/org/apache/cassandra/service/snapshot/TableSnapshot.java:
##########
@@ -490,8 +496,90 @@ private void loadMetadataFromManifest(File manifestFile)
 
         TableSnapshot build()
         {
+            maybeCreateOrEnrichManifest();
             return new TableSnapshot(keyspaceName, tableName, tableId, tag, 
createdAt, expiresAt, snapshotDirs, ephemeral);
         }
+
+        private void maybeCreateOrEnrichManifest()
+        {
+            // this is caused by not reading any manifest of a new format or 
that snapshot had none upon loading,
+            // that might be the case when upgrading e.g. from 4.0 where basic 
manifest is created when taking a snapshot,
+            // so we just go ahead and enrich it in each snapshot dir
+            if (createdAt != null)

Review Comment:
   Nit: Is it possible to check what of the two cases it is? Maybe adding a 
logger debug line stating it may be useful?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to