We have a 6-node cluster using external ZooKeeper. It is heavily loaded,
and we are attempting to tune some of the properties to alleviate some
observed issues. By "heavily loaded" I mean the graph is large (approx.
3,000 processors) and there is a lot of data in process (approx. 2M
flowfiles/120GB queued)

One symptom we see is that changes to the graph are not replicated to other
nodes, and the Node(s) are subsequently disconnected from the cluster. In
one example, we see in the nifi-app.log that the node is disconnected due
to "failed to process request PUT
/nifi-api/connection/976a60b5d-3c4e-3bbb-8fbe-4790f3ecb147"

The following properties are set in nifi.properties:

nifi.cluster.node.protocol.threads=30
nifi.cluster.node.protocol.max.threads=50
nifi.cluster.node.event.history.size=25
nifi.cluster.node.connection.timeout=60 sec
nifi.cluster.node.read.timeout=60 sec
nifi.cluster.node.max.concurrent.requests=500
nifi.cluster.node.request.replication.claim.timeout=20 secs

nifi.zookeeper.connect.timeout=30 secs
nifi.zookeeper.session.timeout=30 secs

Some of the (timeout) values are set fairly high due to the heavily loaded
system; we allow a longer time to complete tasks. Are there interrelated
properties which a long timeout might actually become detrimental? Are
there other properties we should look at more closely?

Thanks,
Mark

Reply via email to