I was looking at EnforceOrder again and I'm not sure that will
actually help here since I don't think it works across a cluster, but
maybe others know more.

I think you can only ever have 1 concurrent task for your PublishKafka
processor. Even if you run everything on primary node, if you have 2
concurrent tasks it is going to take 2 flow files from the queue and
start publishing them to Kafka at the same time which will break the
ordering.

One thing you could try is to use ListenHttp and PostHttp instead of
site-to-site, this would let you customize the routing.

You would have a ListenHttp running on each node to receive the
listings, then you would have ListSFTP (primary node only) -> (some
processor that creates your key attrbiute) -> RouteOnAttribute (routes
on the key) -> 3 PostHttp processors (1 for each node of your
cluster).

This way you always route flow files with the same key to the same node.

You might have to consider what you would want to do if a node went
down and one of the PostHttp processors can't deliver the data, do you
let it queue up until the node comes back, and try and send to one of
the other nodes.


On Thu, May 31, 2018 at 4:18 PM, rey26 <reyaa...@gmail.com> wrote:
> Hello Bryan ,
>
> I have read about the enforce order processor and have timestamp part in my
> files which are unique and actually files are created in the order of events
> happening.
> I can read the flow file and extract this timestamp , convert this to unix
> timestamp and make it as an attribute for Nifi.
> But once all the flow reached till queue before publish kafka, actual
> problem/confusion comes.
>
> So If i have 12 files which are sorted with timestamp they are in queue just
> before the Publish Kafka .
>
> As I understood correctly each publish kafka processor is unique
> publisher.So if my cluster is having 3 nodes and with 2 threads,
> I believe Nifi will spawn 6 publishers .[Correct me if i am wrong.]
>
> Lets say if each will get 2 files , there is no way to enfore order ,
> publisher 1 may publish its records
> before publisher 2 .
>
> I believe running on primary node is the only was for these scenarios I
> guess so far untill some global order kind of concept is introduced in Nifi.
>
>
>
>
>
> --
> Sent from: http://apache-nifi-developer-list.39713.n7.nabble.com/

Reply via email to