Re: Clustered Site-toSite

Matthew Clarke Mon, 30 Nov 2015 08:26:38 -0800

What are the debug logs telling you on the NCM and Nodes when you have it
configured as (4)?


I would be curious as to what the debug logs say on the sending NiFi when
you add a RPG using  http://56.72.192.81:8080/nifi?
If it accepts the URL and adds the RPG to the graph, then the communication
with the NCM on 8080 worked. Next the sending NiFi will try to talk to the
NCM using the nifi.remote.input.socket.host and
nifi.remote.input.socket.port configured on your NCM. The NCM will respond
with the URLs (nifi.remote.input.socket.host and
nifi.remote.input.socket.port configured on each Node).  The sending NiFi
has to be able to resolve and communicate with all these addresses.
Finally the sending NiFi will load-balance data smartly across all Nodes.

So the question is where in that process is it failing to establish a
connection?  The logs when in debug mode will tell us.  I would set it up
as (4) and use the http://56.72.192.81:8080/nifi url to add the RPG.  Make
sure you enabled debug and take look in your logs for all
org.apache.nifi.remote
lines on the sending NiFi and the NCM.

Matt

On Mon, Nov 30, 2015 at 10:34 AM, Edgardo Vega <edgardo.v...@gmail.com>
wrote:

> I am going to describe the machines and then what we tried.
>
> We have a cluster of 4 machines
>
> Hostname    internal ip          external ip
> master          176.0.1.128       56.72.192.81
> node-1          176.0.1.130       56.72.192.123
> node-2          176.0.1.131       56.72.192.45
> node-3          176.0.1.132       56.72.121.12
>
> Each machine has a /etc/hosts file that ties the hostname to the internal
> ip address. So in the nifi.properties all machines connect using the
> hostname. In the nifi gui when nodes connect it says node-1, node-2,
> node-3.
>
>
> Here is all the step we took:
>
>
>    1. First we tried used http://56.72.192.81:8080/nifi - Did not work
>    2. We change the config on master to set nifi.remote.input.socket.host
>    to 56.72.192.81 - Did not work
>    3. We change the config on all the nodes to set
> nifi.remote.input.socket.host
>    to 56.72.192.81 - Did not work
>    4. We change the config on all the nodes to set
> nifi.remote.input.socket.host
>    to each of their eternal ips - Did not work
>    5. We updated the /etc/host on the sending nifi and put all the
>    hostnames and external ips. Used http://master:8080/nifi - Did work.
>
> I don't think we should have to do 5 and 4 should work but it didn't.
>
> So were did we go wrong?
>
>
>
>
> On Thu, Nov 26, 2015 at 12:35 AM, Matthew Clarke <
> matt.clarke....@gmail.com>
> wrote:
>
> > The postHTTP processor has an option to send as a FlowFile to a
> listenHTTP
> > processor on another NiFi. This allows you to keep the FlowFile
> attributes
> > across multiple NiFis just like S2S.
> > On Nov 25, 2015 1:58 PM, "Matthew Gaulin" <mattgau...@gmail.com> wrote:
> >
> > > Ok, that all makes sense.  The main reason, we like doing it strictly
> as
> > > S2S is to maintain the flowfile attributes, so we would like to avoid
> > > HTTP.  Otherwise we would have to rebuild some of these attributes from
> > the
> > > content, which isn't the end of the world, but still no fun.  We may
> > > consider the idea of the single receive node for distribution to a
> > cluster,
> > > in order to further lock things down from a firewall standpoint.  I
> think
> > > the main thing we had to wrap our heads around was that every send node
> > > needs to be able to directly connect to every receiver node.  Thanks
> > again
> > > for the very detailed responses!
> > >
> > > On Wed, Nov 25, 2015 at 10:44 AM Matthew Clarke <
> > matt.clarke....@gmail.com
> > > >
> > > wrote:
> > >
> > > > I am not following why you set all your Nodes (source and
> destination)
> > to
> > > > use the same hostname(s).  Each hostname resolves to a single IP and
> by
> > > > doing so doesn't all data get sent to a single end-point?
> > > >
> > > > The idea behind spreading out the connections when using S2S is for
> > smart
> > > > load balancing purposes.  If all data going to another cluster passed
> > > > through the NCM first, you lose that data load balancing capability
> > > because
> > > > one instance of NiFi (NCM in this case) has to receive all that
> network
> > > > traffic. It sound like the approach you want is to send source data
> to
> > a
> > > > single NiFi point on another network and then have that single point
> > > > redistribute that data internally to that network across multiple
> > > > "processing" nodes in a cluster.
> > > >
> > > > This can be accomplished in several ways:
> > > >
> > > > 1. You could use S2S to send to a single instance of NiFi on the
> other
> > > > network and then have that instance S2S that data to a cluster on
> that
> > > same
> > > > network.
> > > > 2. You could use the postHTTP (source NiFi) and ListenHTTP
> > (desitination
> > > > NiFi) processors to facilitate sending data to a single Node in the
> > > > destination cluster, and then have that Node use S2S to redistribute
> > the
> > > > data across the entire cluster.
> > > >
> > > > A more ideal setup to limit connections needed between networks,
> might
> > > be:
> > > >
> > > > - Source cluster (consists of numerous low end servers or VMs) and a
> > > single
> > > > instance running on a beefy server/VM that will hand all data coming
> in
> > > and
> > > > out of this network.  Use S2S top communicate between internal
> cluster
> > > and
> > > > single instance on same network.
> > > > - The destination would be setup the same way cluster would look the
> > > same.
> > > > You can then use S2S or postHTTP to ListenHTTP to send data as NiFi
> > > > FlowFIles between your network. That network to network data transfer
> > > > shoudl occur between the two beefy single instances in each network.
> > > >
> > > > Matt
> > > >
> > > >
> > > >
> > > >
> > > > On Wed, Nov 25, 2015 at 9:10 AM, Matthew Gaulin <
> mattgau...@gmail.com>
> > > > wrote:
> > > >
> > > > > Thank you for the info.  I was working with Edgardo on this.  We
> > ended
> > > up
> > > > > having to set the SAME hostname on each of the source nodes, as the
> > > > > destination NCM uses for each of its nodes and of course open up
> the
> > > > > firewall rules so all source nodes can talk to each destination
> node.
> > > > This
> > > > > seems to jive with that you explained above.  It is a little
> annoying
> > > > that
> > > > > we have to have so much open to get this to work and can't have a
> > > single
> > > > > point of entry on the NCM to send all this data from one network to
> > > > > another.  Not a huge deal in the end though.  Thanks again.
> > > > >
> > > > > On Wed, Nov 25, 2015 at 8:36 AM Matthew Clarke <
> > > > matt.clarke....@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > let me explain first how S2S works when connecting from one
> cluster
> > > to
> > > > > > another cluster.
> > > > > >
> > > > > > I will start with the source cluster (this would be the cluster
> > where
> > > > you
> > > > > > are adding the Remote Process Group (RPG) to the graph).  The NCM
> > has
> > > > no
> > > > > > role in this cluster. Every Node in a cluster works independently
> > > form
> > > > > one
> > > > > > another, so by adding the RPG to the graph, you have added it to
> > > every
> > > > > > Node.  So Now the behavior of each Node is the same as as it
> would
> > be
> > > > if
> > > > > it
> > > > > > were a standalone instance with regards to S2S.  The URL you are
> > > > > providing
> > > > > > in that RPG would be the URL for the NCM of the target cluster
> > (This
> > > > URL
> > > > > is
> > > > > > not to the S2S port of the NCM, but to the same URL you would use
> > to
> > > > > access
> > > > > > the UI of that cluster).  Now each Node in your "source" cluster
> is
> > > > > > communicating with the NCM of the destination cluster unaware at
> > this
> > > > > time
> > > > > > that they are communicating with a NCM. These Nodes want to send
> > > their
> > > > > data
> > > > > > to the S2S port on that NCM. Now of course since the NCM does not
> > > > process
> > > > > > any data, it is not going to accept any data from those Nodes.
> The
> > > > > > "destination" NCM will respond to each of the "source" Nodes with
> > the
> > > > > > configured nifi.remote.input.socket.host=,
> > > > > nifi.remote.input.socket.port=,
> > > > > > and the status for each of those "destination" Nodes.  Using that
> > > > > provided
> > > > > > information, the source Nodes can logically distribute the data
> to
> > > our
> > > > > the
> > > > > > "destination' Nodes.
> > > > > >
> > > > > > When S2S fails beyond the initial URL connection, there are
> > typically
> > > > on
> > > > > a
> > > > > > few likely causes:
> > > > > > 1. There is a firewall preventing communication between the
> source
> > > > Nodes
> > > > > > and the destination Nodes on the S2S ports.
> > > > > > 2. No value was supplied for nifi.remote.input.socket.host= on
> each
> > > of
> > > > > the
> > > > > > target Nodes.  When no value is provided whatever the "hostname"
> > > > command
> > > > > > returns is what is sent.  In many cases this hostname may end up
> > > being
> > > > > > "localhost" or some other value that is not resolvable/reachable
> by
> > > the
> > > > > > "source" systems.
> > > > > >
> > > > > > You can change the logging for S2S to DEBUG to see more detail
> > about
> > > > the
> > > > > > message traffic between the "destination" NCM and the "source"
> > nodes
> > > by
> > > > > > adding the following lines to the logback.xml files.
> > > > > >
> > > > > > <logger name="org.apache.nifi.remote" level="DEBUG"/>
> > > > > >
> > > > > > Watch the logs on one of the source Nodes specifically to see
> what
> > > > > hostname
> > > > > > and port is being returned for each destination Node.
> > > > > >
> > > > > > Thanks,
> > > > > > Matt
> > > > > >
> > > > > > On Wed, Nov 25, 2015 at 7:59 AM, Matthew Clarke <
> > > > > matt.clarke....@gmail.com
> > > > > > >
> > > > > > wrote:
> > > > > >
> > > > > > >
> > > > > > >
> > > > > > > On Tue, Nov 24, 2015 at 1:38 PM, Edgardo Vega <
> > > > edgardo.v...@gmail.com>
> > > > > > > wrote:
> > > > > > >
> > > > > > >> Yeah the S2S port is set on all node.
> > > > > > >>
> > > > > > >> What should the host be set to on each machine? I first set it
> > to
> > > > the
> > > > > > NCM
> > > > > > >> ip on each machine in the cluster. Then I set the host to be
> the
> > > ip
> > > > of
> > > > > > >> each
> > > > > > >> individual machine without luck.
> > > > > > >>
> > > > > > >> The S2S port is open to the internet for the entire cluster
> for
> > > > those
> > > > > > >> ports.
> > > > > > >>
> > > > > > >> On Tue, Nov 24, 2015 at 1:35 PM, Matthew Clarke <
> > > > > > >> matt.clarke....@gmail.com>
> > > > > > >> wrote:
> > > > > > >>
> > > > > > >> > Did you configure the S2S port on all the Nodes in the
> cluster
> > > you
> > > > > are
> > > > > > >> > trying to S2S to?
> > > > > > >> >
> > > > > > >> > In addition to setting the port on those Nodes, you should
> > also
> > > > set
> > > > > > the
> > > > > > >> S2S
> > > > > > >> > hostname.  The hostname entered should be resolvable and
> > > reachable
> > > > > by
> > > > > > >> the
> > > > > > >> > systems trying to S2S to that cluster.
> > > > > > >> >
> > > > > > >> > Thanks,
> > > > > > >> > Matt
> > > > > > >> >
> > > > > > >> > On Tue, Nov 24, 2015 at 1:29 PM, Edgardo Vega <
> > > > > edgardo.v...@gmail.com
> > > > > > >
> > > > > > >> > wrote:
> > > > > > >> >
> > > > > > >> > > Trying to get site to site working from one cluster to
> > > another.
> > > > It
> > > > > > >> works
> > > > > > >> > is
> > > > > > >> > > the connection goes from cluster to single node but not
> > > clusted
> > > > to
> > > > > > >> > > clustered.
> > > > > > >> > >
> > > > > > >> > > I was looking at jira and saw this ticket
> > > > > > >> > > https://issues.apache.org/jira/browse/NIFI-872.
> > > > > > >> > >
> > > > > > >> > > Is this saying I am out of luck or is there some special
> > > config
> > > > > > that I
> > > > > > >> > must
> > > > > > >> > > do to make this work?
> > > > > > >> > >
> > > > > > >> > >
> > > > > > >> > >
> > > > > > >> > >
> > > > > > >> > > --
> > > > > > >> > > Cheers,
> > > > > > >> > >
> > > > > > >> > > Edgardo
> > > > > > >> > >
> > > > > > >> >
> > > > > > >>
> > > > > > >>
> > > > > > >>
> > > > > > >> --
> > > > > > >> Cheers,
> > > > > > >>
> > > > > > >> Edgardo
> > > > > > >>
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>
>
>
> --
> Cheers,
>
> Edgardo
>

Re: Clustered Site-toSite

Reply via email to