Hi all, I've been doing some further testing with the SplitPCAP processor, and I've found that with file sizes larger than around 3GB it tends to error out with the response that some packet or other in the main PCAP file is invalid. I've been unable to determine precisely why an invalid packet error is returned rather than a framework-generated error, but I have found that if any resultant flowfiles are transferred as soon as they're split from the original then this issue no longer occurs.
To remedy this, I propose that flowfiles should be transferred in configurably-sized batches during the process of splitting the main file rather than being collated and sent after processing is complete. This will also have the effect that the amount of RAM that is dedicated to the task of splitting PCAPs can be determined by the user rather than by how big the original PCAP file is. There are some issues with this, however: 1. 'Split' processors mark every resultant flowfile with a 'number X of Y' attribute that means the resultant flowfiles can't be sent off until it's known how many there are in total. 2. As the 'split' flowfiles would be transferred as they're created, if there is an invalid packet later in the original PCAP then a situation could arise where flowfiles are transferred both to the 'split' relationship and the 'failure' relationship. Does anyone have any thoughts on how to address those problems? Thanks, Jack Hinton