Kelson has submitted this change and it was merged. ( 
https://gerrit.wikimedia.org/r/315238 )

Change subject: [zimwriterfs] Try to avoid too big cluster.
......................................................................


[zimwriterfs] Try to avoid too big cluster.

We check that cluster will not be too big *before* adding the content.
This way, cluster are always closed before the maximum size and not
just after.

The only way a cluster can be too big is if the content of a sole article
is bigger than the maximum size.

Change-Id: I77a581df46ae87e01a3fe2689570a7c7355d1877
---
M zimlib/src/zimcreator.cpp
1 file changed, 9 insertions(+), 5 deletions(-)

Approvals:
  Kelson: Verified; Looks good to me, approved



diff --git a/zimlib/src/zimcreator.cpp b/zimlib/src/zimcreator.cpp
index 0f0f6d0..1e4a21c 100644
--- a/zimlib/src/zimcreator.cpp
+++ b/zimlib/src/zimcreator.cpp
@@ -222,12 +222,12 @@
           myDirents = &uncompDirents;
           otherDirents = &compDirents;
         }
-        dirents.back().setCluster(clusterOffsets.size(), cluster->count());
-        cluster->addBlob(blob);
-        myDirents->push_back(dirents.size()-1);
 
-        // If cluster is now large enough, write it to disk.
-        if (cluster->size() >= minChunkSize * 1024)
+        // If cluster will be too large, write it to dis, and open a new
+        // one for the content.
+        if ( cluster->count()
+          && cluster->size()+blob.size() >= minChunkSize * 1024
+           )
         {
           log_info("cluster with " << cluster->count() << " articles, " <<
                    cluster->size() << " bytes; current title \"" <<
@@ -249,6 +249,10 @@
           currentSize += (end - start) +
             sizeof(offset_type) /* for cluster pointer entry */;
         }
+
+        dirents.back().setCluster(clusterOffsets.size(), cluster->count());
+        cluster->addBlob(blob);
+        myDirents->push_back(dirents.size()-1);
       }
 
       // When we've seen all articles, write any remaining clusters.

-- 
To view, visit https://gerrit.wikimedia.org/r/315238
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I77a581df46ae87e01a3fe2689570a7c7355d1877
Gerrit-PatchSet: 1
Gerrit-Project: openzim
Gerrit-Branch: master
Gerrit-Owner: Mgautierfr <mgaut...@kymeria.fr>
Gerrit-Reviewer: Kelson <kel...@kiwix.org>

_______________________________________________
MediaWiki-commits mailing list
MediaWiki-commits@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits

Reply via email to