Bryan,
So the best practice when segmenting is to

- build your segments as a list while processing the incoming stream
- then after send them all to the relationship

right?


On March 1, 2019 at 09:21:46, Bryan Bende (bbe...@gmail.com) wrote:

Hello,

Flow files are not transferred until the session they came form is
committed. So imagine we periodically commit and some of the splits
are transferred, then half way through a failure is encountered, the
entire original flow file will be reprocessed, producing some of the
same splits that were already send out. The way it is implemented now,
it is either completely successful, or not, but never partially
successful producing duplicates.

Based on the description of your flow with the three processors you
mentioned, I wouldn't bother using SplitRecord, just have ListenHttp
-> PublishKafkaRecord. PublishKafkaRecorcd can be configured with the
same reader and writer you were using in SplitRecord, and it will read
each record and send to Kafka, without having to produce unnecessary
flow files.

Thanks,

Bryan

On Fri, Mar 1, 2019 at 3:44 AM Kumara M S, Hemantha (Nokia -
IN/Bangalore) <hemantha.kumara_...@nokia.com> wrote:
>
> Hi All,
>
> We have a use case where receiving huge json(file size might vary from
1GB to 50GB) via http, convert in to XML(xml format is not fixed, any other
format is fine) and send out using Kafka. - here is the restriction is CPU
& RAM usage requirement(once it is fixed, it should handle all size files)
should not getting changed based on incoming file size.
>
> We used ListenHTTP -->SplitRecord -->PublishKafa , but we have observed
one behaviour where SplitRecord is sending out data to PublishKafa only
after whole FlowFile processing. Is there any reason why did we design this
way? Will it not be good if we send out splits to next processor after each
configured records instead of all sending all splits at one shot?
>
>
> Regards,
> Hemantha
>

Reply via email to