Thanks Mark! On Wed, Apr 21, 2021 at 8:48 PM Mark Payne <marka...@hotmail.com> wrote:
> Ryan, > > It gets a bit more complex than this, because the flowfiles may not always > be accessed/read sequentially in exactly the same order that they live on > disk, there’s concurrent threads/disk accessed to consider, etc. But in the > best case scenarios, yes that is accurate. > > Keep in mind, though, that what you are comparing there is the performance > of the disk accesses/reads, and that is, of course, not the entire picture. > Lots more going on under the covers, so if you see a performance > improvement of 20x in reading the content, that won’t mean a 20x > improvement in overall throughout. > > But it sure won’t hurt! :) > > -Mark > > Sent from my iPhone > > > On Apr 21, 2021, at 8:34 PM, Ryan Hendrickson < > ryan.andrew.hendrick...@gmail.com> wrote: > > > > https://issues.apache.org/jira/browse/NIFI-7646 - Improve performance > of > > MergeContent / others that read content of many small FlowFiles > > > > Hi, > > In reference to the ticket above, released in 1.13, the descriptions > > says "if the FlowFile is small, say 200 bytes, the result is that we > > perform 2+ disk accesses to read those 200 bytes (even though 4K - 8K is > a > > typical block size and could be read in the same amount of time as those > > 200 bytes)." > > > > To clarify, if the FlowFiles are never more than 1K, and the block size > > is 4k, does that mean this improvement will read 4 FlowFiles with the > > resources of 1? > > > > This would be a 4:1 improvement. Or in the 200 byte scenario, it would > > be a 20:1 improvement? > > > > Thanks, > > Ryan >