Gabor,

I do agree that it’s reasonable to allow the source processor to return no 
output. In fact, that’s probably what it should return the vast majority of the 
time.

As for 1 FlowFile in, many out, I think we should hold off. This will be an 
important strategy to support, for sure. It was intentionally not implemented 
initially, though.
Given that we are still in the “milestone release phase” I didn’t want to 
support a huge number of options. The more we add the more complex the API 
becomes and
the more code we have to maintain while we’re still in a phase where things 
haven’t been solidified. It will be important to add in the future, though. I’m 
not sure what the
best approach will be there yet. We will need a clear, concise, easy-to-use 
API. I intentionally avoided introducing a notion of a ProcessSession into the 
Python API
because of the inherit complexities that it introduces, both in terms of API 
and the implementation. This could be an update to the existing 
TransformFlowFile processor,
or it could be a new ForkFlowFile API. We will need to figure out a very clear 
API here, and it will likely take some time to iron out. But I don’t think it 
makes sense to
do any of that yet, personally, for the reasons noted above.

As for the notion of ‘yielding’ it is probably fine. It adds minimal complexity 
to the API and doesn’t have to be used if it’s not needed.

Thanks
-Mark


> On Aug 6, 2024, at 12:43 PM, Gábor Gyimesi <[email protected]> wrote:
> 
> Hi Team,
> 
> Recently there was a request [1] to support splitting a flow file into
> multiple flow files using the python FlowFileTransform API, which
> would result in multiple outgoing flow files. A valid use case was
> presented for this: "Input is a single flowfile which contains an
> excel file, and output would be multiple flowfiles, where each
> flowfile will contain one sheet from the excel file.".
> 
> As Joe Witt commented on the ticket the current APIs only support the
> one flowfile in/one flowfile out model, whereas this is a request to
> add python API support of the model of single flow file in and several
> flow files out. I think this is a good idea and I think it could be
> generalized for other types of python processors as well.
> 
> There was a merged PR [2] to support source python processors, and I
> think we should also support multiple flow file outputs for source
> processors too. There could be use cases like the ListenTCP processor
> or any polling processor that could periodically be checking a queue
> and creating flow files from all the new entries since the last
> trigger. Even though a source processor could be written in a way to
> return multiple records in a single flow file and then splitting it
> with the SplitRecord processor, but it's more of a workaround than a
> solution.
> 
> With the previously mentioned polling type of processor there could be
> triggers when no new entries are available at all, so no flow file can
> be generated. Because of this I also suggested a change to the API to
> allow returning no new flow files in a trigger [3]. We may also
> consider adding the option to yield for some time in this case.
> 
> So there are a couple of questions to the community:
> 
> 1. Do you agree to add support for multiple flow file outputs on the
> python API for both transform and source flow files?
> 2. Do you agree to add the support for returning with no flow files
> from source processors?
> 3. Do you think we should add an option to yield in case no output
> files are returned or that complicates the API way too much for a
> user?
> 
> I also think these changes should be implemented before the NiFi 2.0 release.
> 
> As I talked with Peter Gyori he said he had already started working on
> the "no output" feature and said he would be happy to work on the
> multiple flow file output change as well. I would also be happy to
> help him and port these changes on the MiNiFi C++ side.
> 
> Feel free to comment with any request or requirement on the related API 
> change.
> 
> Regards,
> Gabor
> 
> [1] https://issues.apache.org/jira/browse/NIFI-13402
> [2] https://github.com/apache/nifi/pull/9000
> [3] https://issues.apache.org/jira/browse/NIFI-13604

Reply via email to