Paresh, Ok understood. Just keep in mind NiFi will not be loading the flowfile content into memory and can handle data as large as the content repo will allow. From there you can split data into the individual records and do so in a way that may allow for rather high performance.
If you want to read in content in chunks it is probably best to make a custom processor which will have sessions per chunk or set of chunks (up to you) and you'll want to keep state about how far along you are on that object (in case of a restart). You should definitely be able to use the session over and over again as described in the nifi-api description for a ProcessSession "A process session instance may be used continuously. That is, after each commit or rollback, the session can be used again." What are you seeing in logs when you try to use the session again? Thanks Joe On Sun, Dec 13, 2015 at 3:09 PM, Paresh Shah <[email protected]> wrote: > The file is quite large. So reading all of it into the flow file is not an > option. We wanting to be processing the individual records of the file, > that is why are creating ³n² flow files. > > Paresh > > On 12/13/15, 11:11 AM, "Joe Witt" <[email protected]> wrote: > >>Paresh >> >>Is it feasible to read the large object I to the flow and then split as >>needed? It sounds like you are reading from the orig file in place and >>making flowfiles from it. >> >>Perhaps you can share a screenshot of the flow? >> >>Thanks >>Joe >>On Dec 13, 2015 1:49 PM, "Paresh Shah" <[email protected]> wrote: >> >>> We have the following use case: >>> On a scheduled basis, reading of a large no of records from an external >>> system and moving the records through the NIFI pipeline. >>> >>> What we see is that the flowFiles are not moved to the relationship >>>until >>> the session is committed. And once the session is committed we are not >>>able >>> to transfer anything else on that session. >>> >>> We see that in GetFileTransfer where the entire file contents are moved >>> using the ³importFrom² api on the session. But since we need to handle >>>the >>> individual records in the pipeline it does not work for our use case. >>> >>> Is there a different mechanism to do what we want. Any insights will be >>> appreciated. >>> >>> Thanks >>> Paresh >>> ________________________________ >>> The information contained in this transmission may contain privileged >>>and >>> confidential information. It is intended only for the use of the >>>person(s) >>> named above. If you are not the intended recipient, you are hereby >>>notified >>> that any review, dissemination, distribution or duplication of this >>> communication is strictly prohibited. If you are not the intended >>> recipient, please contact the sender by reply email and destroy all >>>copies >>> of the original message. >>> ________________________________ >>> > > ________________________________ > The information contained in this transmission may contain privileged and > confidential information. It is intended only for the use of the person(s) > named above. If you are not the intended recipient, you are hereby notified > that any review, dissemination, distribution or duplication of this > communication is strictly prohibited. If you are not the intended recipient, > please contact the sender by reply email and destroy all copies of the > original message. > ________________________________
