davsclaus opened a new pull request, #23908: URL: https://github.com/apache/camel/pull/23908
## Backport of #21689 Cherry-pick of #21689 onto `camel-4.18.x`. **Original PR:** #21689 - CAMEL-23120 - camel-docling - Implement batchSize sub-batch partitioning in batch processing **Original author:** @oscerd **Target branch:** `camel-4.18.x` ### Original description The batchSize configuration parameter (default 10) was declared and read from headers in processBatchConversion() and processBatchStructuredData(), but the value was never actually applied. Both convertDocumentsBatch() and convertStructuredDataBatch() submitted all documents to the executor at once regardless of batchSize, making the parameter a no-op. This change makes batchSize control how many documents are submitted per sub-batch. Documents are partitioned into chunks of batchSize and each sub-batch is processed to completion before starting the next one. Within each sub-batch, up to batchParallelism threads run concurrently. The overall batchTimeout is tracked across sub-batches so remaining time decreases as sub-batches complete, and failOnFirstError stops processing across sub-batch boundaries. This provides back-pressure and controls memory usage when processing large document sets, preventing the creation of unbounded numbers of CompletableFutures. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
