Heya NiFi devs,

I'm having a bit of trouble trying to wrap my head around a valid way of
tackling this problem with the available Processor templates. I'd like to
split an input flowfile into N different flowfiles, 1 going into 1 of N
relationships.

A simplistic way of viewing it would be: A very large CSV file, with N
columns, and I want to split each column into its own flowfile, and each of
these flowfiles to its own relationship (or with an attribute saying which
column it belongs to).

Basic premise is for an example with two columns, and only two lines:
* Read a line, write first column value to flowfile A, write second column
value to flowfile B
* Read next line, appending first column value to flowfile A, appending
second column value to flowfile B
Followed by one of:
* Send flowfile A to relationship A, and send flowfile B to relationship B
or
* Set attribute "A" to flowfile A, attribute "B" to flowfile B, then send
both A and B to a 'success' relationship.

Unfortunately, I can't seem to find a way to write to multiple flowfiles at
once, or at least, write to an outputstream for one flowfile, then write to
another outputstream for another flowfile, then continue writing to the
first flowfile.

If they weren't such large files, i'd be okay with reading the input file N
times, pulling out the different part each time, but i'd like to only have
to read each line (by extension, the file) only once.

I've written AbstractProcessors before for simple One-to-One
transformations, and even Merge processors which use are an extension of
AbstractSessionFactoryProcessors to do Many-to-One, and even Split
AbstractProcessors for One-to-Many in serial (splitting at different
places, even clone(flowfile, start, size); But I can't work out a way to do
this One-to-Many in parallel.

Any ideas? Am I missing something useful? Do I just have to do it reading
it multiple times? Just a really simple proof of concept explaining the
design would be enough to get me started.

Kind regards,
Salvatore

Reply via email to