Re: [PR] OAK-11232 - indexing-job - Simplify download from Mongo logic by traversing only by _modified instead of (_modified, _id) [jackrabbit-oak]

via GitHub Thu, 31 Oct 2024 11:36:53 -0700


fabriziofortino commented on code in PR #1827:
URL: https://github.com/apache/jackrabbit-oak/pull/1827#discussion_r1824730705



##########
oak-run-commons/src/main/java/org/apache/jackrabbit/oak/index/indexer/document/flatfile/pipelined/PipelinedTransformTask.java:
##########
@@ -163,20 +167,31 @@ public Result call() throws Exception {
                     return new Result(threadId, totalEntryCount);
                 } else {
                     for (RawBsonDocument rawBsonDocument : 
rawBsonDocumentBatch) {
-                        int sizeEstimate = 
rawBsonDocument.getByteBuffer().remaining() * 2;
-                        NodeDocument nodeDoc = 
rawBsonDocument.decode(nodeDocumentCodec);
                         statistics.incrementMongoDocumentsTraversed();
+                        mongoObjectsProcessedSinceLastLog++;
                         mongoObjectsProcessed++;
-                        if (mongoObjectsProcessed % 50_000 == 0) {
-                            LOG.info("Mongo objects: {}, total entries: {}, 
current batch: {}, Size: {}/{} MB",
+                        ByteBuf byteBuffer = rawBsonDocument.getByteBuffer();
+                        // Mongo documents contain mostly Strings, so we can 
estimate the size by doubling the byte
+                        // buffer size. This will usually will overestimate 
the size in memory of the document, but it

Review Comment:
   minor
   ```suggestion
                           // buffer size. This will usually overestimate the 
size in memory of the document, but it
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] OAK-11232 - indexing-job - Simplify download from Mongo logic by traversing only by _modified instead of (_modified, _id) [jackrabbit-oak]

Reply via email to