Re: NiFi data HA in cluster mode
That is a fair point Brett - i wasnt thinking of that when I answer but that is a good point. Then again we should create those connections lazily so if we don't i'd call that a bug :) Ben Yeah there is definitely intent to provide distributed data durability across nodes. This is especially important as it serves as a great way to support elastic clustering behavior. I'm not sure HDFS as the backing store is best and we all have to keep in mind we must ensure distributed durability of flowfile, content, and provenance. That might mean application level replication similar to what Apache Kafka does. That might mean distributed durable block storage and then deciding which node is responsible for processing a given set of data at a time. There are a lot of ways to slice this and they all offer different tradeoffs. On Mon, Jan 8, 2018 at 11:37 PM, Brett Ryan wrote: > I had someone from Hortonworks suggest to me that I should also set any > PutSQL processors to only execute on primary. The reasoning was due to > flooding of the JDBC pool. > >> On 9 Jan 2018, at 17:25, Joe Witt wrote: >> >> I'd avoid setting any processor to primary node only unless it is a >> source processor (something that brings data into the system). >> >> But, yes, I believe your description is accurate as of now. >> >> Thanks >> >>> On Mon, Jan 8, 2018 at 11:21 PM, 尹文才 wrote: >>> Thanks Joe, so you mean for example, if I set one processor to run only on >>> primary node in the cluster and there're 100 FlowFiles in the incoming >>> queue of the processor >>> waiting to be processed by this processor, and the processor suddenly goes >>> down and then another node is elected as the primary node, those 100 >>> FlowFiles will be kept locally >>> in the node that went down and will continue to be processed by the node >>> when it goes back online, these FlowFiles will not be available to the new >>> primary node and other nodes, >>> am I correct? >>> >>> Regards, >>> Ben >>> >>> >>> 2018-01-09 14:08 GMT+08:00 Joe Witt : >>> Ben, Data already mid-flow within a node will be kept on the node and processed when the node is back on-line. All other data coming into the cluster can fail-over to other nodes provided you're sourcing data with queuing semantics or automated load balancing or fail-over as-is present in the Apache NiFi Site to Site protocol. Thanks Joe > On Mon, Jan 8, 2018 at 11:05 PM, 尹文才 wrote: > Hi guys, I have a question about data HA when NiFi is run in clustered > mode, if one node goes down, will the flowfiles owned by this node taken > over and processed by another node? > Or will the flowfiles be kept locally to that node and will only be > processed when that node is back online? Thanks. > > Regards, > Ben
Re: NiFi data HA in cluster mode
I had someone from Hortonworks suggest to me that I should also set any PutSQL processors to only execute on primary. The reasoning was due to flooding of the JDBC pool. > On 9 Jan 2018, at 17:25, Joe Witt wrote: > > I'd avoid setting any processor to primary node only unless it is a > source processor (something that brings data into the system). > > But, yes, I believe your description is accurate as of now. > > Thanks > >> On Mon, Jan 8, 2018 at 11:21 PM, 尹文才 wrote: >> Thanks Joe, so you mean for example, if I set one processor to run only on >> primary node in the cluster and there're 100 FlowFiles in the incoming >> queue of the processor >> waiting to be processed by this processor, and the processor suddenly goes >> down and then another node is elected as the primary node, those 100 >> FlowFiles will be kept locally >> in the node that went down and will continue to be processed by the node >> when it goes back online, these FlowFiles will not be available to the new >> primary node and other nodes, >> am I correct? >> >> Regards, >> Ben >> >> >> 2018-01-09 14:08 GMT+08:00 Joe Witt : >> >>> Ben, >>> >>> Data already mid-flow within a node will be kept on the node and >>> processed when the node is back on-line. All other data coming into >>> the cluster can fail-over to other nodes provided you're sourcing data >>> with queuing semantics or automated load balancing or fail-over as-is >>> present in the Apache NiFi Site to Site protocol. >>> >>> Thanks >>> Joe >>> On Mon, Jan 8, 2018 at 11:05 PM, 尹文才 wrote: Hi guys, I have a question about data HA when NiFi is run in clustered mode, if one node goes down, will the flowfiles owned by this node taken over and processed by another node? Or will the flowfiles be kept locally to that node and will only be processed when that node is back online? Thanks. Regards, Ben >>>
Re: NiFi data HA in cluster mode
Thanks Joe, I will try to avoid to set processor to primary node. By the way, I've seen someone posted suggestion about Data HA in NiFi's wiki(HDFSContentRepository), is there a plan for that feature to be implemented and included in NiFi? Regards, Ben 2018-01-09 14:25 GMT+08:00 Joe Witt : > I'd avoid setting any processor to primary node only unless it is a > source processor (something that brings data into the system). > > But, yes, I believe your description is accurate as of now. > > Thanks > > On Mon, Jan 8, 2018 at 11:21 PM, 尹文才 wrote: > > Thanks Joe, so you mean for example, if I set one processor to run only > on > > primary node in the cluster and there're 100 FlowFiles in the incoming > > queue of the processor > > waiting to be processed by this processor, and the processor suddenly > goes > > down and then another node is elected as the primary node, those 100 > > FlowFiles will be kept locally > > in the node that went down and will continue to be processed by the node > > when it goes back online, these FlowFiles will not be available to the > new > > primary node and other nodes, > > am I correct? > > > > Regards, > > Ben > > > > > > 2018-01-09 14:08 GMT+08:00 Joe Witt : > > > >> Ben, > >> > >> Data already mid-flow within a node will be kept on the node and > >> processed when the node is back on-line. All other data coming into > >> the cluster can fail-over to other nodes provided you're sourcing data > >> with queuing semantics or automated load balancing or fail-over as-is > >> present in the Apache NiFi Site to Site protocol. > >> > >> Thanks > >> Joe > >> > >> On Mon, Jan 8, 2018 at 11:05 PM, 尹文才 wrote: > >> > Hi guys, I have a question about data HA when NiFi is run in clustered > >> > mode, if one node goes down, will the flowfiles owned by this node > taken > >> > over and processed by another node? > >> > Or will the flowfiles be kept locally to that node and will only be > >> > processed when that node is back online? Thanks. > >> > > >> > Regards, > >> > Ben > >> >
Re: NiFi data HA in cluster mode
I'd avoid setting any processor to primary node only unless it is a source processor (something that brings data into the system). But, yes, I believe your description is accurate as of now. Thanks On Mon, Jan 8, 2018 at 11:21 PM, 尹文才 wrote: > Thanks Joe, so you mean for example, if I set one processor to run only on > primary node in the cluster and there're 100 FlowFiles in the incoming > queue of the processor > waiting to be processed by this processor, and the processor suddenly goes > down and then another node is elected as the primary node, those 100 > FlowFiles will be kept locally > in the node that went down and will continue to be processed by the node > when it goes back online, these FlowFiles will not be available to the new > primary node and other nodes, > am I correct? > > Regards, > Ben > > > 2018-01-09 14:08 GMT+08:00 Joe Witt : > >> Ben, >> >> Data already mid-flow within a node will be kept on the node and >> processed when the node is back on-line. All other data coming into >> the cluster can fail-over to other nodes provided you're sourcing data >> with queuing semantics or automated load balancing or fail-over as-is >> present in the Apache NiFi Site to Site protocol. >> >> Thanks >> Joe >> >> On Mon, Jan 8, 2018 at 11:05 PM, 尹文才 wrote: >> > Hi guys, I have a question about data HA when NiFi is run in clustered >> > mode, if one node goes down, will the flowfiles owned by this node taken >> > over and processed by another node? >> > Or will the flowfiles be kept locally to that node and will only be >> > processed when that node is back online? Thanks. >> > >> > Regards, >> > Ben >>
Re: NiFi data HA in cluster mode
Thanks Joe, so you mean for example, if I set one processor to run only on primary node in the cluster and there're 100 FlowFiles in the incoming queue of the processor waiting to be processed by this processor, and the processor suddenly goes down and then another node is elected as the primary node, those 100 FlowFiles will be kept locally in the node that went down and will continue to be processed by the node when it goes back online, these FlowFiles will not be available to the new primary node and other nodes, am I correct? Regards, Ben 2018-01-09 14:08 GMT+08:00 Joe Witt : > Ben, > > Data already mid-flow within a node will be kept on the node and > processed when the node is back on-line. All other data coming into > the cluster can fail-over to other nodes provided you're sourcing data > with queuing semantics or automated load balancing or fail-over as-is > present in the Apache NiFi Site to Site protocol. > > Thanks > Joe > > On Mon, Jan 8, 2018 at 11:05 PM, 尹文才 wrote: > > Hi guys, I have a question about data HA when NiFi is run in clustered > > mode, if one node goes down, will the flowfiles owned by this node taken > > over and processed by another node? > > Or will the flowfiles be kept locally to that node and will only be > > processed when that node is back online? Thanks. > > > > Regards, > > Ben >
Re: NiFi data HA in cluster mode
Ben, Data already mid-flow within a node will be kept on the node and processed when the node is back on-line. All other data coming into the cluster can fail-over to other nodes provided you're sourcing data with queuing semantics or automated load balancing or fail-over as-is present in the Apache NiFi Site to Site protocol. Thanks Joe On Mon, Jan 8, 2018 at 11:05 PM, 尹文才 wrote: > Hi guys, I have a question about data HA when NiFi is run in clustered > mode, if one node goes down, will the flowfiles owned by this node taken > over and processed by another node? > Or will the flowfiles be kept locally to that node and will only be > processed when that node is back online? Thanks. > > Regards, > Ben