Andrew Christianson created MINIFICPP-929:
---------------------------------------------

             Summary: Create memory map interface to flow files in 
ProcessSession/ContentRepository
                 Key: MINIFICPP-929
                 URL: https://issues.apache.org/jira/browse/MINIFICPP-929
             Project: Apache NiFi MiNiFi C++
          Issue Type: Improvement
            Reporter: Andrew Christianson
            Assignee: Andrew Christianson


Currently, MiNiFi - C++ only support stream-oriented i/o to FlowFile payloads. 
This can limit performance in cases where in-place access to the payload is 
desirable. In cases where data can be accessed randomly and in-place, a 
significant speedup can be realized by mapping the payload into system memory 
address space. This is natively supported at the kernel level in Linux, MacOS, 
and Windows via the mmap() interface on files. Other repositories, such as the 
VolatileRepository, already store the entire payload in memory, so it is 
natural to pass through this memory block as if it were a memory-mapped file. 
While the DatabaseContentRepostory does not appear to natively support a memory 
map interface, accesses via an emulated memory-map interface should be possible 
with no performance degradation with respect to a full read via the streaming 
interface.

Cases where in-place, random access is beneficial include, but are not limited 
to:
 * in-place parsing of JSON (e.g. RapidJSON supports parsing in-place, at least 
for strings).
 * access of payload via protocol buffers
 * random access of large files on disk, where it would otherwise require many 
seek() and read() syscalls

The interface should be accessible by processors via a mmap() call on 
ProcessSession (adjacent to read() and write()). A MemoryMapCallback should be 
provided, which is called back via a process() call where the argument is an 
instance of BaseMemoryMap. The BaseMemoryMap is extended for each type of 
repository that MiNiFi - C++ supports, including: FileSystemRepository, 
VolatileRepository, and DatabaseContentRepository.

As part of the change, in addition to extensive unit test coverage, benchmarks 
should be written such that the performance impact can be empirically measured 
and evaluated.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to