Github user markap14 commented on a diff in the pull request: https://github.com/apache/nifi/pull/1115#discussion_r86138383 --- Diff: nifi-commons/nifi-processor-utilities/src/main/java/org/apache/nifi/processor/util/bin/BinFiles.java --- @@ -273,25 +262,26 @@ private int binFlowFiles(final ProcessContext context, final ProcessSessionFacto } final ProcessSession session = sessionFactory.createSession(); - FlowFile flowFile = session.get(); - if (flowFile == null) { + final List<FlowFile> flowFiles = session.get(1000); + if (flowFiles.isEmpty()) { break; } - flowFile = this.preprocessFlowFile(context, session, flowFile); - - String groupId = this.getGroupId(context, flowFile); - - final boolean binned = binManager.offer(groupId, flowFile, session); - - // could not be added to a bin -- probably too large by itself, so create a separate bin for just this guy. - if (!binned) { - Bin bin = new Bin(0, Long.MAX_VALUE, 0, Integer.MAX_VALUE, null); - bin.offer(flowFile, session); - this.readyBins.add(bin); + final Map<String, List<FlowFile>> flowFileGroups = new HashMap<>(); + for (FlowFile flowFile : flowFiles) { + flowFile = this.preprocessFlowFile(context, session, flowFile); + final String groupingIdentifier = getGroupId(context, flowFile); + flowFileGroups.computeIfAbsent(groupingIdentifier, id -> new ArrayList<>()).add(flowFile); } - flowFilesBinned++; + for (final Map.Entry<String, List<FlowFile>> entry : flowFileGroups.entrySet()) { + final Set<FlowFile> unbinned = binManager.offer(entry.getKey(), entry.getValue(), session, sessionFactory); + for (final FlowFile flowFile : unbinned) { + Bin bin = new Bin(session, 0, Long.MAX_VALUE, 0, Integer.MAX_VALUE, null); + bin.offer(flowFile, session); + this.readyBins.add(bin); + } + } --- End diff -- Not exactly. The loop above says "if the bin manager didn't bin it for whatever reason, create our own one-element bin and process it (by adding to this.readyBins) - nothing else will go in this bin." In BinManager:201, it is saying "if the FlowFile didn't fit in any of the bins that are available, create a new bin and add this FlowFile to it. Subsequent FlowFiles may then go into this bin."
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---