smiklosovic commented on code in PR #3751:
URL: https://github.com/apache/cassandra/pull/3751#discussion_r1915518980


##########
src/java/org/apache/cassandra/service/snapshot/TableSnapshot.java:
##########
@@ -490,8 +496,110 @@ private void loadMetadataFromManifest(File manifestFile)
 
         TableSnapshot build()
         {
+            maybeCreateOrEnrichManifest();
             return new TableSnapshot(keyspaceName, tableName, tableId, tag, 
createdAt, expiresAt, snapshotDirs, ephemeral);
         }
+
+        private void maybeCreateOrEnrichManifest()
+        {
+            boolean oldManifestExists = false;
+
+            if 
(!CassandraRelevantProperties.SNAPSHOT_MANIFEST_ENRICH_OR_CREATE_ENABLED.getBoolean())
+                return;
+
+            // this is caused by not reading any manifest or that snapshot had 
a basic manifest just with sstables
+            // enumerated (pre CASSANDRA-16789), so we just go ahead and 
enrich it in each snapshot dir
+
+            if (createdAt != null)
+                return;
+
+            for (File snapshotDir : snapshotDirs)
+            {
+                File maybeManifest = new 
File(snapshotDir.toPath().resolve("manifest.json"));
+                if (maybeManifest.exists())
+                {
+                    oldManifestExists = true;
+                    break;
+                }
+            }
+
+            if (oldManifestExists)
+                logger.debug("Manifest in the old format for snapshot {} was 
detected, going to enrich it.", this);
+            else
+                logger.debug("There is no manifest for {}, going to create 
it.", this);
+
+            long lastModified = -1;
+
+            List<String> allDataFiles = new ArrayList<>();
+            for (File snapshotDir : snapshotDirs)
+            {
+                // we will consider time of the creation the oldest last 
modified
+                // timestamp on any snapshot directory
+                long currentLastModified = snapshotDir.lastModified();
+                if (currentLastModified < lastModified || lastModified == -1)
+                    lastModified = currentLastModified;
+
+                List<File> dataFiles = new ArrayList<>();
+                try
+                {
+                    List<File> indicesDirs = new ArrayList<>();
+                    File[] snapshotFiles = snapshotDir.list(file -> {
+                        if (file.isDirectory() && file.name().startsWith("."))
+                        {
+                            indicesDirs.add(file);
+                            return false;
+                        }
+                        else
+                        {
+                            return file.name().endsWith('-' + 
SSTableFormat.Components.DATA.type.repr);
+                        }
+                    });
+
+                    Collections.addAll(dataFiles, snapshotFiles);
+
+                    for (File indexDir : indicesDirs)
+                        dataFiles.addAll(Arrays.asList(indexDir.list(file -> 
file.name().endsWith('-' + SSTableFormat.Components.DATA.type.repr))));

Review Comment:
   @jrwest unfortunatelly it is not so easy, I think I found a bug. Check what 
`Component.parse` is doing, it will eventually call 
`Component.fromRepresentation` where it tests if it matches, here - 
`Pattern.matches(type.repr, repr)`.
   
   So it needs two "reprs". For Data.db file, it is DATA type has "Data.db" 
repr. (regular expression) and for `repr` we are testing, I am putting there 
whole file name (e.g. `oa-2-big-Data.db`).
   
   It does not match on `Pattern.matches("Data.db", "oa-2-big-Data.db")`. It 
matches on `Pattern.matches(".*Data.db", "oa-2-big-Data.db")`. 
   
   If I wanted to parse the suffix from file name and then tested that on 
suffix (as `repr` against "Data.db", I guess that would work, but parsing that 
suffix ... nah. Too much work. I can just equally do it like I did that here 
and be done with it ... 
   
   
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to