Re: Clustered Site-toSite

2015-11-25 Thread Matthew Clarke
I am not following why you set all your Nodes (source and destination) to
use the same hostname(s).  Each hostname resolves to a single IP and by
doing so doesn't all data get sent to a single end-point?

The idea behind spreading out the connections when using S2S is for smart
load balancing purposes.  If all data going to another cluster passed
through the NCM first, you lose that data load balancing capability because
one instance of NiFi (NCM in this case) has to receive all that network
traffic. It sound like the approach you want is to send source data to a
single NiFi point on another network and then have that single point
redistribute that data internally to that network across multiple
"processing" nodes in a cluster.

This can be accomplished in several ways:

1. You could use S2S to send to a single instance of NiFi on the other
network and then have that instance S2S that data to a cluster on that same
network.
2. You could use the postHTTP (source NiFi) and ListenHTTP (desitination
NiFi) processors to facilitate sending data to a single Node in the
destination cluster, and then have that Node use S2S to redistribute the
data across the entire cluster.

A more ideal setup to limit connections needed between networks, might be:

- Source cluster (consists of numerous low end servers or VMs) and a single
instance running on a beefy server/VM that will hand all data coming in and
out of this network.  Use S2S top communicate between internal cluster and
single instance on same network.
- The destination would be setup the same way cluster would look the same.
You can then use S2S or postHTTP to ListenHTTP to send data as NiFi
FlowFIles between your network. That network to network data transfer
shoudl occur between the two beefy single instances in each network.

Matt




On Wed, Nov 25, 2015 at 9:10 AM, Matthew Gaulin 
wrote:

> Thank you for the info.  I was working with Edgardo on this.  We ended up
> having to set the SAME hostname on each of the source nodes, as the
> destination NCM uses for each of its nodes and of course open up the
> firewall rules so all source nodes can talk to each destination node.  This
> seems to jive with that you explained above.  It is a little annoying that
> we have to have so much open to get this to work and can't have a single
> point of entry on the NCM to send all this data from one network to
> another.  Not a huge deal in the end though.  Thanks again.
>
> On Wed, Nov 25, 2015 at 8:36 AM Matthew Clarke 
> wrote:
>
> > let me explain first how S2S works when connecting from one cluster to
> > another cluster.
> >
> > I will start with the source cluster (this would be the cluster where you
> > are adding the Remote Process Group (RPG) to the graph).  The NCM has no
> > role in this cluster. Every Node in a cluster works independently form
> one
> > another, so by adding the RPG to the graph, you have added it to every
> > Node.  So Now the behavior of each Node is the same as as it would be if
> it
> > were a standalone instance with regards to S2S.  The URL you are
> providing
> > in that RPG would be the URL for the NCM of the target cluster (This URL
> is
> > not to the S2S port of the NCM, but to the same URL you would use to
> access
> > the UI of that cluster).  Now each Node in your "source" cluster is
> > communicating with the NCM of the destination cluster unaware at this
> time
> > that they are communicating with a NCM. These Nodes want to send their
> data
> > to the S2S port on that NCM. Now of course since the NCM does not process
> > any data, it is not going to accept any data from those Nodes.  The
> > "destination" NCM will respond to each of the "source" Nodes with the
> > configured nifi.remote.input.socket.host=,
> nifi.remote.input.socket.port=,
> > and the status for each of those "destination" Nodes.  Using that
> provided
> > information, the source Nodes can logically distribute the data to our
> the
> > "destination' Nodes.
> >
> > When S2S fails beyond the initial URL connection, there are typically on
> a
> > few likely causes:
> > 1. There is a firewall preventing communication between the source Nodes
> > and the destination Nodes on the S2S ports.
> > 2. No value was supplied for nifi.remote.input.socket.host= on each of
> the
> > target Nodes.  When no value is provided whatever the "hostname" command
> > returns is what is sent.  In many cases this hostname may end up being
> > "localhost" or some other value that is not resolvable/reachable by the
> > "source" systems.
> >
> > You can change the logging for S2S to DEBUG to see more detail about the
> > message traffic between the "destination" NCM and the "source" nodes by
> > adding the following lines to the logback.xml files.
> >
> > 
> >
> > Watch the logs on one of the source Nodes specifically to see what
> hostname
> > and port is being returned for each destination Node.
> >
> > Thanks,
> > Matt
> >
> > On Wed, 

[ANNOUNCE] CFP open for ApacheCon North America 2016

2015-11-25 Thread Rich Bowen
Community growth starts by talking with those interested in your
project. ApacheCon North America is coming, are you?

We are delighted to announce that the Call For Presentations (CFP) is
now open for ApacheCon North America. You can submit your proposed
sessions at
http://events.linuxfoundation.org/events/apache-big-data-north-america/program/cfp
for big data talks and
http://events.linuxfoundation.org/events/apachecon-north-america/program/cfp
for all other topics.

ApacheCon North America will be held in Vancouver, Canada, May 9-13th
2016. ApacheCon has been running every year since 2000, and is the place
to build your project communities.

While we will consider individual talks we prefer to see related
sessions that are likely to draw users and community members. When
submitting your talk work with your project community and with related
communities to come up with a full program that will walk attendees
through the basics and on into mastery of your project in example use
cases. Content that introduces what's new in your latest release is also
of particular interest, especially when it builds upon existing well
know application models. The goal should be to showcase your project in
ways that will attract participants and encourage engagement in your
community, Please remember to involve your whole project community (user
and dev lists) when building content. This is your chance to create a
project specific event within the broader ApacheCon conference.

Content at ApacheCon North America will be cross-promoted as
mini-conferences, such as ApacheCon Big Data, and ApacheCon Mobile, so
be sure to indicate which larger category your proposed sessions fit into.

Finally, please plan to attend ApacheCon, even if you're not proposing a
talk. The biggest value of the event is community building, and we count
on you to make it a place where your project community is likely to
congregate, not just for the technical content in sessions, but for
hackathons, project summits, and good old fashioned face-to-face networking.

-- 
rbo...@apache.org
http://apache.org/


[GitHub] nifi pull request: Nifi 631

2015-11-25 Thread jskora
Github user jskora commented on the pull request:

https://github.com/apache/nifi/pull/113#issuecomment-159769035
  
Closed by commit 226ac64ef95f3d755dfbb3d5288ba98052855473 and 
4c4d62c61f7c828dbcb124090992b91d631cb22e.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] nifi pull request: Nifi 631

2015-11-25 Thread jskora
Github user jskora closed the pull request at:

https://github.com/apache/nifi/pull/113


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] nifi pull request: Fixes NIFI-1220

2015-11-25 Thread gilday
GitHub user gilday opened a pull request:

https://github.com/apache/nifi/pull/133

Fixes NIFI-1220

`MockProcessSession` returns a new FlowFile from its `penalty` method 
instead of mutating then returning the given FlowFile

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/gilday/nifi NIFI-1220

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/nifi/pull/133.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #133


commit bbed296ad5ad828b93ca90765b5e2ac1629803d2
Author: Johnathan Gilday 
Date:   2015-11-25T18:34:06Z

Fixes NIFI-1220: `MockProcessSession` returns a new FlowFile from its 
`penalty` method instead of mutating then returning the given FlowFile




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: PRs

2015-11-25 Thread Tony Kurc
Things that make me feel better: The persistence mechanism is very similar
to that of ListHDFS.

https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-hadoop-bundle/nifi-hdfs-processors/src/main/java/org/apache/nifi/processors/hadoop/ListHDFS.java#L417

On Tue, Nov 24, 2015 at 10:56 PM, Joe Witt  wrote:

> Tags are a great place to mark experimental.  We used to plan for this
> concept outright and make it look at scary and such on the ui.  But
> folks just didn't care.  They used it anyway.  Happy to revisit it but
> for now perhaps just adding a tag of experimental is enough.
>
> If the existing code path is largely untouched then that is certainly
> great for moving the ball forward.  In fairness to Joe S or anyone
> that has to persist internal process state until we offer that as part
> of the framework it is much harder than we want it to be for people.
> Will take a look through the state methods but Payne is probably the
> best at playing the wack a mole edge case game for such things.
>
> On Tue, Nov 24, 2015 at 10:52 PM, Tony Kurc  wrote:
> > So, I beat on the the patch for NIFI-1107, and as I suspected, it is
> > awfully low risk for existing flows, but I think I'd need a second
> opinion
> > on how state is kept for resuming uploads. I believe it will work, and it
> > looks like a lot of the edge cases are covered if somehow state is lost
> or
> > corrupted, but I'm not sure if I am comfortable with how it fits
> > architecturally. If someone has cycles, and can peruse the *State methods
> > (getState, persistState, ...) and weigh in, it would accelerate my review
> > significantly.
> >
> > Also, it sure would be great to mark features as experimental!
> >
> >
> >
> > On Tue, Nov 24, 2015 at 10:36 PM, Matt Gilman 
> > wrote:
> >
> >> These tickets [1][2] address the incorrect validation errors we were
> >> seeing for processors that include the Input Required annotation. These
> >> were bugs that slipped through the NIFI-810 the review. Would be good to
> >> include if possible but I understand we need to draw the line somewhere.
> >>
> >> As for NIFI-655, I've been struggling getting an LDAP server stood up
> that
> >> uses 2 way SSL. Hopefully we can get that squared away soon and wrap
> this
> >> one up. :)
> >>
> >> Matt
> >>
> >> [1] https://issues.apache.org/jira/browse/NIFI-1198
> >> [2] https://issues.apache.org/jira/browse/NIFI-1203
> >>
> >> Sent from my iPhone
> >>
> >> > On Nov 24, 2015, at 10:23 PM, Joe Witt  wrote:
> >> >
> >> > Given the testing to NIFI-1192 and review of NIFI-631 done already
> >> > both are lower risk I think.
> >> >
> >> > NIFI-1107 seems very useful and helpful but we do need to be careful
> >> > given that we know this one is already in use and this is a
> >> > substantive change.
> >> >
> >> > If there are folks that can dig into review/testing of NIFI-1107 that
> >> > would be great.  Waiting for word on NIFI-655 readiness then I think
> >> > we should go cold and just focus on testing an RC.
> >> >
> >> > Thanks
> >> > Joe
> >> >
> >> >> On Tue, Nov 24, 2015 at 4:22 PM, Tony Kurc  wrote:
> >> >> Agreed. I know there has already been a good deal of discussion about
> >> >> design on all these.
> >> >>
> >> >>> On Tue, Nov 24, 2015 at 4:14 PM, Aldrin Piri 
> >> wrote:
> >> >>>
> >> >>> No qualms here.  If they look good to go while the work and testing
> >> >>> surrounding NIFI-655 wraps up, they might as well be included. Would
> >> not
> >> >>> want to delay the release should any of these become protracted in
> >> terms of
> >> >>> iterations.
> >> >>>
> >>  On Tue, Nov 24, 2015 at 4:05 PM, Tony Kurc 
> wrote:
> >> 
> >>  All,
> >>  I was reviewing github PRs and wondering whether anyone objected to
> >>  slipping a couple that look like they're very close into 0.4.0.
> >> 
> >>  NIFI-1192 (#131)
> >>  NIFI-631 (#113)
> >>  NIFI-1107 (#192)
> >> 
> >>  I should have some review cycles tonight. Lots of comments on them
> >> all,
> >> >>> and
> >>  have good "momentum".
> >> 
> >>  Tony
> >> >>>
> >>
>


[GitHub] nifi pull request: NIFI-1107 - Integrate Multipart uploads into th...

2015-11-25 Thread trkurc
Github user trkurc commented on a diff in the pull request:

https://github.com/apache/nifi/pull/132#discussion_r45942561
  
--- Diff: 
nifi-nar-bundles/nifi-aws-bundle/nifi-aws-processors/src/main/java/org/apache/nifi/processors/aws/s3/PutS3Object.java
 ---
@@ -102,6 +177,94 @@ protected PropertyDescriptor 
getSupportedDynamicPropertyDescriptor(final String
 .build();
 }
 
+protected File getPersistenceFile() {
+return new File(PERSISTENCE_ROOT + getIdentifier());
+}
+
+@Override
+public void onPropertyModified(final PropertyDescriptor descriptor, 
final String oldValue, final String newValue) {
+if (descriptor.equals(KEY)
+|| descriptor.equals(BUCKET)
+|| descriptor.equals(ENDPOINT_OVERRIDE)
+|| descriptor.equals(STORAGE_CLASS)
+|| descriptor.equals(REGION)) {
+destroyState();
+}
+}
+
+protected MultipartState getState(final String s3ObjectKey) throws 
IOException {
--- End diff --

I believe these methods could be problematic if multiple threads are 
monkeying with the persistence file at the same time. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: absolute.path vs path for FetchFile/ListFile

2015-11-25 Thread Joe Witt
It sounds like ListFile kept logic similar to GetFile which I can
understand that approach.

However, I do believe it makes more sense to follow the behavior of
ListHDFS where the path would be absolute.

Thanks
Joe

On Wed, Nov 25, 2015 at 1:56 PM, Tony Kurc  wrote:
> All,
> Joe and I commented on NIFI-631 that it didn't "just work" when wiring the
> processors together. ListFile was populating the attributes as
> described in CoreAttributes.java
> [1] (path being relative to the input directory, and absolute being the
> full path). FetchFile was using ${path}/${filename} as the default, which
> wouldn't grab the directory. I'm puzzled as to what the correct behavior
> should be. The description of path said it is relative ... relative to
> what? ListHDFS appears to state path is absolute [2] [3], and I expect we
> should have consistent behavior between ListHDFS and ListFile.
>
> So, I guess I'm not sure what guidance to give on a review of NIFI-631.
> Should the default of FetchFile be changed to ${absolute.path}/${filename}
> (which may be inconsistent with other List/Fetch processor combos), or
> should ListFile be changed to have path be absolute?
>
> [1]
> https://github.com/apache/nifi/blob/master/nifi-commons/nifi-utils/src/main/java/org/apache/nifi/flowfile/attributes/CoreAttributes.java
> [2]
> https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-hadoop-bundle/nifi-hdfs-processors/src/main/java/org/apache/nifi/processors/hadoop/ListHDFS.java#L79
> [3]
> https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-hadoop-bundle/nifi-hdfs-processors/src/main/java/org/apache/nifi/processors/hadoop/ListHDFS.java#L442


Re: Clustered Site-toSite

2015-11-25 Thread Matthew Gaulin
Ok, that all makes sense.  The main reason, we like doing it strictly as
S2S is to maintain the flowfile attributes, so we would like to avoid
HTTP.  Otherwise we would have to rebuild some of these attributes from the
content, which isn't the end of the world, but still no fun.  We may
consider the idea of the single receive node for distribution to a cluster,
in order to further lock things down from a firewall standpoint.  I think
the main thing we had to wrap our heads around was that every send node
needs to be able to directly connect to every receiver node.  Thanks again
for the very detailed responses!

On Wed, Nov 25, 2015 at 10:44 AM Matthew Clarke 
wrote:

> I am not following why you set all your Nodes (source and destination) to
> use the same hostname(s).  Each hostname resolves to a single IP and by
> doing so doesn't all data get sent to a single end-point?
>
> The idea behind spreading out the connections when using S2S is for smart
> load balancing purposes.  If all data going to another cluster passed
> through the NCM first, you lose that data load balancing capability because
> one instance of NiFi (NCM in this case) has to receive all that network
> traffic. It sound like the approach you want is to send source data to a
> single NiFi point on another network and then have that single point
> redistribute that data internally to that network across multiple
> "processing" nodes in a cluster.
>
> This can be accomplished in several ways:
>
> 1. You could use S2S to send to a single instance of NiFi on the other
> network and then have that instance S2S that data to a cluster on that same
> network.
> 2. You could use the postHTTP (source NiFi) and ListenHTTP (desitination
> NiFi) processors to facilitate sending data to a single Node in the
> destination cluster, and then have that Node use S2S to redistribute the
> data across the entire cluster.
>
> A more ideal setup to limit connections needed between networks, might be:
>
> - Source cluster (consists of numerous low end servers or VMs) and a single
> instance running on a beefy server/VM that will hand all data coming in and
> out of this network.  Use S2S top communicate between internal cluster and
> single instance on same network.
> - The destination would be setup the same way cluster would look the same.
> You can then use S2S or postHTTP to ListenHTTP to send data as NiFi
> FlowFIles between your network. That network to network data transfer
> shoudl occur between the two beefy single instances in each network.
>
> Matt
>
>
>
>
> On Wed, Nov 25, 2015 at 9:10 AM, Matthew Gaulin 
> wrote:
>
> > Thank you for the info.  I was working with Edgardo on this.  We ended up
> > having to set the SAME hostname on each of the source nodes, as the
> > destination NCM uses for each of its nodes and of course open up the
> > firewall rules so all source nodes can talk to each destination node.
> This
> > seems to jive with that you explained above.  It is a little annoying
> that
> > we have to have so much open to get this to work and can't have a single
> > point of entry on the NCM to send all this data from one network to
> > another.  Not a huge deal in the end though.  Thanks again.
> >
> > On Wed, Nov 25, 2015 at 8:36 AM Matthew Clarke <
> matt.clarke@gmail.com>
> > wrote:
> >
> > > let me explain first how S2S works when connecting from one cluster to
> > > another cluster.
> > >
> > > I will start with the source cluster (this would be the cluster where
> you
> > > are adding the Remote Process Group (RPG) to the graph).  The NCM has
> no
> > > role in this cluster. Every Node in a cluster works independently form
> > one
> > > another, so by adding the RPG to the graph, you have added it to every
> > > Node.  So Now the behavior of each Node is the same as as it would be
> if
> > it
> > > were a standalone instance with regards to S2S.  The URL you are
> > providing
> > > in that RPG would be the URL for the NCM of the target cluster (This
> URL
> > is
> > > not to the S2S port of the NCM, but to the same URL you would use to
> > access
> > > the UI of that cluster).  Now each Node in your "source" cluster is
> > > communicating with the NCM of the destination cluster unaware at this
> > time
> > > that they are communicating with a NCM. These Nodes want to send their
> > data
> > > to the S2S port on that NCM. Now of course since the NCM does not
> process
> > > any data, it is not going to accept any data from those Nodes.  The
> > > "destination" NCM will respond to each of the "source" Nodes with the
> > > configured nifi.remote.input.socket.host=,
> > nifi.remote.input.socket.port=,
> > > and the status for each of those "destination" Nodes.  Using that
> > provided
> > > information, the source Nodes can logically distribute the data to our
> > the
> > > "destination' Nodes.
> > >
> > > When S2S fails beyond the initial URL connection, there are typically
> on
> > a
> 

Re: absolute.path vs path for FetchFile/ListFile

2015-11-25 Thread Tony Kurc
I am reading the ListHDFS code. I can't tell if the description is wrong,
the code is wrong, or I'm missing something.

Description: The path is set to the absolute path of the file's directory
on HDFS. For example, if the Directory property is set to /tmp then files
picked up from /tmp will have the path attribute set to \"./\". If the
Recurse Subdirectories property is set to true and a file is picked up from
/tmp/abc/1/2/3, then the path attribute will be set to \"/tmp/abc/1/2/3\".

Code:
attributes.put(CoreAttributes.PATH.key(), getAbsolutePath(status.getPath().
getParent()));

private String getAbsolutePath(final Path path) {
final Path parent = path.getParent();
final String prefix = (parent == null ||
parent.getName().equals("")) ? "" : getAbsolutePath(parent);
return prefix + "/" + path.getName();
}

I don't understand how it will return ./, it looks a lot like path is
determined independently of the Directory


On Wed, Nov 25, 2015 at 2:01 PM, Mark Payne  wrote:

> I certainly cannot argue with that, either.
>
> > On Nov 25, 2015, at 1:59 PM, Joe Witt  wrote:
> >
> > It sounds like ListFile kept logic similar to GetFile which I can
> > understand that approach.
> >
> > However, I do believe it makes more sense to follow the behavior of
> > ListHDFS where the path would be absolute.
> >
> > Thanks
> > Joe
> >
> > On Wed, Nov 25, 2015 at 1:56 PM, Tony Kurc  wrote:
> >> All,
> >> Joe and I commented on NIFI-631 that it didn't "just work" when wiring
> the
> >> processors together. ListFile was populating the attributes as
> >> described in CoreAttributes.java
> >> [1] (path being relative to the input directory, and absolute being the
> >> full path). FetchFile was using ${path}/${filename} as the default,
> which
> >> wouldn't grab the directory. I'm puzzled as to what the correct behavior
> >> should be. The description of path said it is relative ... relative to
> >> what? ListHDFS appears to state path is absolute [2] [3], and I expect
> we
> >> should have consistent behavior between ListHDFS and ListFile.
> >>
> >> So, I guess I'm not sure what guidance to give on a review of NIFI-631.
> >> Should the default of FetchFile be changed to
> ${absolute.path}/${filename}
> >> (which may be inconsistent with other List/Fetch processor combos), or
> >> should ListFile be changed to have path be absolute?
> >>
> >> [1]
> >>
> https://github.com/apache/nifi/blob/master/nifi-commons/nifi-utils/src/main/java/org/apache/nifi/flowfile/attributes/CoreAttributes.java
> >> [2]
> >>
> https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-hadoop-bundle/nifi-hdfs-processors/src/main/java/org/apache/nifi/processors/hadoop/ListHDFS.java#L79
> >> [3]
> >>
> https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-hadoop-bundle/nifi-hdfs-processors/src/main/java/org/apache/nifi/processors/hadoop/ListHDFS.java#L442
>
>


[GitHub] nifi pull request: NIFI-1192 added support for dynamic properties ...

2015-11-25 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/nifi/pull/131


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: [ANNOUNCE] New Apache NiFi PMC Member (and Committer) Sean Busbey

2015-11-25 Thread Ricky Saltzer
Congrats, Busbey!!
On Nov 25, 2015 1:14 PM, "Tony Kurc"  wrote:

> On behalf of the Apache NiFI PMC, I am very pleased to announce that Sean
> Busbey has accepted the PMC's invitation to become a PMC Member and
> Committer on the Apache NiFi project. We greatly appreciate all of Sean's
> hard work and generous contributions to the project.
>
> In addition to his contributions to NiFi, Sean is what I would describe as
> "prolific" in the Apache community, a PMC member on other projects, notably
> Apache Yetus. We look forward to his continued technical work and the
> interesting perspective someone with a breadth of experience brings to the
> NiFi communitty.
>
> Welcome and congratulations!
> Tony
>


Re: absolute.path vs path for FetchFile/ListFile

2015-11-25 Thread Mark Payne
Tony,

I would recommend that ListFile add both 'path' and 'absolute.path'. The 'path' 
would be relative to the base directory being listed.
For example, if ListFile is configured to list files from /data/nifi/in and 
recurse subdirectories, and it finds a file named: /data/nifi/in/123/myfile.txt
then i would expect the following attributes:

absolute.path = /data/nifi/in/123
path = ./123
filename = myfile.txt

Thanks
-Mark


> On Nov 25, 2015, at 1:56 PM, Tony Kurc  wrote:
> 
> All,
> Joe and I commented on NIFI-631 that it didn't "just work" when wiring the
> processors together. ListFile was populating the attributes as
> described in CoreAttributes.java
> [1] (path being relative to the input directory, and absolute being the
> full path). FetchFile was using ${path}/${filename} as the default, which
> wouldn't grab the directory. I'm puzzled as to what the correct behavior
> should be. The description of path said it is relative ... relative to
> what? ListHDFS appears to state path is absolute [2] [3], and I expect we
> should have consistent behavior between ListHDFS and ListFile.
> 
> So, I guess I'm not sure what guidance to give on a review of NIFI-631.
> Should the default of FetchFile be changed to ${absolute.path}/${filename}
> (which may be inconsistent with other List/Fetch processor combos), or
> should ListFile be changed to have path be absolute?
> 
> [1]
> https://github.com/apache/nifi/blob/master/nifi-commons/nifi-utils/src/main/java/org/apache/nifi/flowfile/attributes/CoreAttributes.java
> [2]
> https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-hadoop-bundle/nifi-hdfs-processors/src/main/java/org/apache/nifi/processors/hadoop/ListHDFS.java#L79
> [3]
> https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-hadoop-bundle/nifi-hdfs-processors/src/main/java/org/apache/nifi/processors/hadoop/ListHDFS.java#L442



Re: absolute.path vs path for FetchFile/ListFile

2015-11-25 Thread Mark Payne
I certainly cannot argue with that, either.

> On Nov 25, 2015, at 1:59 PM, Joe Witt  wrote:
> 
> It sounds like ListFile kept logic similar to GetFile which I can
> understand that approach.
> 
> However, I do believe it makes more sense to follow the behavior of
> ListHDFS where the path would be absolute.
> 
> Thanks
> Joe
> 
> On Wed, Nov 25, 2015 at 1:56 PM, Tony Kurc  wrote:
>> All,
>> Joe and I commented on NIFI-631 that it didn't "just work" when wiring the
>> processors together. ListFile was populating the attributes as
>> described in CoreAttributes.java
>> [1] (path being relative to the input directory, and absolute being the
>> full path). FetchFile was using ${path}/${filename} as the default, which
>> wouldn't grab the directory. I'm puzzled as to what the correct behavior
>> should be. The description of path said it is relative ... relative to
>> what? ListHDFS appears to state path is absolute [2] [3], and I expect we
>> should have consistent behavior between ListHDFS and ListFile.
>> 
>> So, I guess I'm not sure what guidance to give on a review of NIFI-631.
>> Should the default of FetchFile be changed to ${absolute.path}/${filename}
>> (which may be inconsistent with other List/Fetch processor combos), or
>> should ListFile be changed to have path be absolute?
>> 
>> [1]
>> https://github.com/apache/nifi/blob/master/nifi-commons/nifi-utils/src/main/java/org/apache/nifi/flowfile/attributes/CoreAttributes.java
>> [2]
>> https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-hadoop-bundle/nifi-hdfs-processors/src/main/java/org/apache/nifi/processors/hadoop/ListHDFS.java#L79
>> [3]
>> https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-hadoop-bundle/nifi-hdfs-processors/src/main/java/org/apache/nifi/processors/hadoop/ListHDFS.java#L442



Re: absolute.path vs path for FetchFile/ListFile

2015-11-25 Thread Tony Kurc
Okay, since we don't have consensus, here is what I propose:
ListFile
1. absolute.path will be absolute, path will be relative to input directory

FetchFile:
change default property to ${absolute.path}/${filename}. Don't have a
windows machine at the ready - will / work as a path separator?

Revisit consistency of List/Fetch when we can do a breaking change (1.0)

If someone can confirm that they get the same read on ListHDFS path
description as I do and we fix it before 0.4.0, I'd like that.




On Wed, Nov 25, 2015 at 2:12 PM, Tony Kurc  wrote:

> I am 100% in favor of keeping the relative path (I brought up out of band
> the value if the Lister and Fetcher were different machines with different
> mount points). I think is just a matter of what attribute to fill with what
> value.
>
>
> On Wed, Nov 25, 2015 at 2:09 PM, Joe Skora  wrote:
>
>> Mark,
>>
>> What you described is the behavior of ListFile (in spite of confusing doc
>> info).
>>
>> JoeW,
>>
>> Consistency with ListHDFS makes sense,  and if that is the desired
>> behavior
>> it's easy to change ListFile.  But CoreAttributres state "The flowfile's
>> path indicates the relative directory" and if that's not true, does
>> CoreAttributes need revision too?
>>
>> Thanks,
>> Joe
>>
>> On Wed, Nov 25, 2015 at 2:00 PM, Mark Payne  wrote:
>>
>> > Tony,
>> >
>> > I would recommend that ListFile add both 'path' and 'absolute.path'. The
>> > 'path' would be relative to the base directory being listed.
>> > For example, if ListFile is configured to list files from /data/nifi/in
>> > and recurse subdirectories, and it finds a file named:
>> > /data/nifi/in/123/myfile.txt
>> > then i would expect the following attributes:
>> >
>> > absolute.path = /data/nifi/in/123
>> > path = ./123
>> > filename = myfile.txt
>> >
>> > Thanks
>> > -Mark
>> >
>> >
>> > > On Nov 25, 2015, at 1:56 PM, Tony Kurc  wrote:
>> > >
>> > > All,
>> > > Joe and I commented on NIFI-631 that it didn't "just work" when wiring
>> > the
>> > > processors together. ListFile was populating the attributes as
>> > > described in CoreAttributes.java
>> > > [1] (path being relative to the input directory, and absolute being
>> the
>> > > full path). FetchFile was using ${path}/${filename} as the default,
>> which
>> > > wouldn't grab the directory. I'm puzzled as to what the correct
>> behavior
>> > > should be. The description of path said it is relative ... relative to
>> > > what? ListHDFS appears to state path is absolute [2] [3], and I
>> expect we
>> > > should have consistent behavior between ListHDFS and ListFile.
>> > >
>> > > So, I guess I'm not sure what guidance to give on a review of
>> NIFI-631.
>> > > Should the default of FetchFile be changed to
>> > ${absolute.path}/${filename}
>> > > (which may be inconsistent with other List/Fetch processor combos), or
>> > > should ListFile be changed to have path be absolute?
>> > >
>> > > [1]
>> > >
>> >
>> https://github.com/apache/nifi/blob/master/nifi-commons/nifi-utils/src/main/java/org/apache/nifi/flowfile/attributes/CoreAttributes.java
>> > > [2]
>> > >
>> >
>> https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-hadoop-bundle/nifi-hdfs-processors/src/main/java/org/apache/nifi/processors/hadoop/ListHDFS.java#L79
>> > > [3]
>> > >
>> >
>> https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-hadoop-bundle/nifi-hdfs-processors/src/main/java/org/apache/nifi/processors/hadoop/ListHDFS.java#L442
>> >
>> >
>>
>
>


Re: absolute.path vs path for FetchFile/ListFile

2015-11-25 Thread Joe Witt
i'm fine with your proposal which merges mark's concept but aligns
consistency of Fetch(File/HDFS)

We should fix the docs for the CoreAttribute.PATH  The concept of it
being relative is simply too vague.  We should just be honest that it
is unspecified - subject to the meaning of whichever processors
sets/updates that value.

Mark will have to address the question of ListHDFS path description I think.

Thanks
Joe

On Wed, Nov 25, 2015 at 2:36 PM, Tony Kurc  wrote:
> Okay, since we don't have consensus, here is what I propose:
> ListFile
> 1. absolute.path will be absolute, path will be relative to input directory
>
> FetchFile:
> change default property to ${absolute.path}/${filename}. Don't have a
> windows machine at the ready - will / work as a path separator?
>
> Revisit consistency of List/Fetch when we can do a breaking change (1.0)
>
> If someone can confirm that they get the same read on ListHDFS path
> description as I do and we fix it before 0.4.0, I'd like that.
>
>
>
>
> On Wed, Nov 25, 2015 at 2:12 PM, Tony Kurc  wrote:
>
>> I am 100% in favor of keeping the relative path (I brought up out of band
>> the value if the Lister and Fetcher were different machines with different
>> mount points). I think is just a matter of what attribute to fill with what
>> value.
>>
>>
>> On Wed, Nov 25, 2015 at 2:09 PM, Joe Skora  wrote:
>>
>>> Mark,
>>>
>>> What you described is the behavior of ListFile (in spite of confusing doc
>>> info).
>>>
>>> JoeW,
>>>
>>> Consistency with ListHDFS makes sense,  and if that is the desired
>>> behavior
>>> it's easy to change ListFile.  But CoreAttributres state "The flowfile's
>>> path indicates the relative directory" and if that's not true, does
>>> CoreAttributes need revision too?
>>>
>>> Thanks,
>>> Joe
>>>
>>> On Wed, Nov 25, 2015 at 2:00 PM, Mark Payne  wrote:
>>>
>>> > Tony,
>>> >
>>> > I would recommend that ListFile add both 'path' and 'absolute.path'. The
>>> > 'path' would be relative to the base directory being listed.
>>> > For example, if ListFile is configured to list files from /data/nifi/in
>>> > and recurse subdirectories, and it finds a file named:
>>> > /data/nifi/in/123/myfile.txt
>>> > then i would expect the following attributes:
>>> >
>>> > absolute.path = /data/nifi/in/123
>>> > path = ./123
>>> > filename = myfile.txt
>>> >
>>> > Thanks
>>> > -Mark
>>> >
>>> >
>>> > > On Nov 25, 2015, at 1:56 PM, Tony Kurc  wrote:
>>> > >
>>> > > All,
>>> > > Joe and I commented on NIFI-631 that it didn't "just work" when wiring
>>> > the
>>> > > processors together. ListFile was populating the attributes as
>>> > > described in CoreAttributes.java
>>> > > [1] (path being relative to the input directory, and absolute being
>>> the
>>> > > full path). FetchFile was using ${path}/${filename} as the default,
>>> which
>>> > > wouldn't grab the directory. I'm puzzled as to what the correct
>>> behavior
>>> > > should be. The description of path said it is relative ... relative to
>>> > > what? ListHDFS appears to state path is absolute [2] [3], and I
>>> expect we
>>> > > should have consistent behavior between ListHDFS and ListFile.
>>> > >
>>> > > So, I guess I'm not sure what guidance to give on a review of
>>> NIFI-631.
>>> > > Should the default of FetchFile be changed to
>>> > ${absolute.path}/${filename}
>>> > > (which may be inconsistent with other List/Fetch processor combos), or
>>> > > should ListFile be changed to have path be absolute?
>>> > >
>>> > > [1]
>>> > >
>>> >
>>> https://github.com/apache/nifi/blob/master/nifi-commons/nifi-utils/src/main/java/org/apache/nifi/flowfile/attributes/CoreAttributes.java
>>> > > [2]
>>> > >
>>> >
>>> https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-hadoop-bundle/nifi-hdfs-processors/src/main/java/org/apache/nifi/processors/hadoop/ListHDFS.java#L79
>>> > > [3]
>>> > >
>>> >
>>> https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-hadoop-bundle/nifi-hdfs-processors/src/main/java/org/apache/nifi/processors/hadoop/ListHDFS.java#L442
>>> >
>>> >
>>>
>>
>>


[ANNOUNCE] New Apache NiFi PMC Member (and Committer) Sean Busbey

2015-11-25 Thread Tony Kurc
On behalf of the Apache NiFI PMC, I am very pleased to announce that Sean
Busbey has accepted the PMC's invitation to become a PMC Member and
Committer on the Apache NiFi project. We greatly appreciate all of Sean's
hard work and generous contributions to the project.

In addition to his contributions to NiFi, Sean is what I would describe as
"prolific" in the Apache community, a PMC member on other projects, notably
Apache Yetus. We look forward to his continued technical work and the
interesting perspective someone with a breadth of experience brings to the
NiFi communitty.

Welcome and congratulations!
Tony


absolute.path vs path for FetchFile/ListFile

2015-11-25 Thread Tony Kurc
All,
Joe and I commented on NIFI-631 that it didn't "just work" when wiring the
processors together. ListFile was populating the attributes as
described in CoreAttributes.java
[1] (path being relative to the input directory, and absolute being the
full path). FetchFile was using ${path}/${filename} as the default, which
wouldn't grab the directory. I'm puzzled as to what the correct behavior
should be. The description of path said it is relative ... relative to
what? ListHDFS appears to state path is absolute [2] [3], and I expect we
should have consistent behavior between ListHDFS and ListFile.

So, I guess I'm not sure what guidance to give on a review of NIFI-631.
Should the default of FetchFile be changed to ${absolute.path}/${filename}
(which may be inconsistent with other List/Fetch processor combos), or
should ListFile be changed to have path be absolute?

[1]
https://github.com/apache/nifi/blob/master/nifi-commons/nifi-utils/src/main/java/org/apache/nifi/flowfile/attributes/CoreAttributes.java
[2]
https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-hadoop-bundle/nifi-hdfs-processors/src/main/java/org/apache/nifi/processors/hadoop/ListHDFS.java#L79
[3]
https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-hadoop-bundle/nifi-hdfs-processors/src/main/java/org/apache/nifi/processors/hadoop/ListHDFS.java#L442


Re: remote command execution via SSH?

2015-11-25 Thread Sumanth Chinthagunta
 I have first-cut  implementation of ExecuteRemoteProcess processor   at: 

https://github.com/xmlking/nifi-scripting/releases 


I tried to provide all capabilities offed by groovy-ssh 
(https://gradle-ssh-plugin.github.io/docs/ 
) to ExecuteRemoteProcess user.
it takes three attributes: 
1. SSH Config DSL (run once on OnScheduled)
remotes {
web01 {
role 'masterNode'
host = '192.168.1.5'
user = 'sumo'
password = ‘fake'
knownHosts = allowAnyHosts
}
web02 {
host = '192.168.1.5'
user = 'sumo'
knownHosts = allowAnyHosts
}
}
2. Run DSL ( run on each onTrigger)
ssh.run {
session(ssh.remotes.web01) {
  result = execute 'uname -a' 
}
}
3. User supplied Arguments which will be available in Run DSL 

anything that is assigned to ‘result’ in RunDSL  is passed as flowfile to 
success relationship.

Any suggestions for improvements are welcome.

-Sumo

> On Nov 24, 2015, at 8:19 PM, Adam Taft  wrote:
> 
> Sumo,
> 
> On Tue, Nov 24, 2015 at 10:27 PM, Sumanth Chinthagunta 
> wrote:
> 
>> I think you guys may have configured password less login for  SSH (keys?)
>> 
> 
> ​Correct.  I'm using SSH key exchange for authentication.  It's usually
> done password-less, true, but it doesn't necessarily have to be (if using
> ssh-agent).
> 
> ​
> 
> 
>> In my case the  edge node is managed by different team and they don’t
>> allow me to add my SSH key.
>> 
> 
> ​Yikes.  Someone should teach them the benefits of ssh keys!  :)​
> 
> 
> 
>> I am thinking we need ExecuteRemoteCommand processor (based on
>> https://github.com/int128/groovy-ssh) that will take care of key or
>> password base SSH login.
>> 
> 
> ​+1  - this would be a pretty nice contribution.  Recommend building the
> processor and then posting here for review. I'm sure this would be a useful
> processor for many people.
> 
> 
> ExecuteRemoteCommand should have configurable attributes and return command
>> output as flowfile
>> 
>> host : Hostname or IP address.
>> port : Port. Defaults to 22.
>> user : User name.
>> password: A password for password authentication.
>> identity : A private key file for public-key authentication.
>> execute - Execute a command.
>> executeBackground - Execute a command in background.
>> executeSudo - Execute a command with sudo support.
>> shell - Execute a shell.
>> 
>> 
> ​As we do for SSL contexts, it might make sense to bury some of these
> properties in an SSH key controller service.  I'm thinking username,
> password, identity might make sense to have configured externally as a
> service so they could be reused by multiple processors.  Unsure though,
> there might not be enough re-usability to really get the benefit.
> 
> Also, I'm thinking that the "background", "sudo" and "shell" options should
> possibly be a multi-valued option of the processor, not separate
> properties, and definitely not separate "commands."  i.e. I'd probably
> recommend property configuration similar to ExecuteCommand, with options
> for specifying the background, sudo, shell preference.
> 
> Good idea, I hope this works out.
> 
> Adam



[GitHub] nifi pull request: NIFI-1107 - Integrate Multipart uploads into th...

2015-11-25 Thread trkurc
Github user trkurc commented on the pull request:

https://github.com/apache/nifi/pull/132#issuecomment-159806672
  
Another, major secondary concern is what to do with failed partial 
multipart puts and maybe having to do bucket cleanup.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: PRs

2015-11-25 Thread Joe Witt
Understood tony - thanks for digging into the review so thoroughly and
Joe thank you.  This is a very non-trivial contrib.

On Thu, Nov 26, 2015 at 12:12 AM, Tony Kurc  wrote:
> I recommend we push NIFI-1107 to next release. We discovered some unfun
> issues the S3 Multipart "API" creates, notably, leaving dangling pieces
> around [1]:
>
> "Once you initiate a multipart upload there is no expiry; you must
> explicitly complete or abort the multipart upload"
>
> And charging while they're there [2]:
>
> "After you initiate multipart upload and upload one or more parts, you must
> either complete or abort multipart upload in order to stop getting charged
> for storage of the uploaded parts. Only after you either complete or abort
> multipart upload, Amazon S3 frees up the parts storage and stops charging
> you for the parts storage."
>
> This, plus grappling with persisting state for the multipart (and lack of
> framework support for persistent state) means I think we have some more
> work to do despite the feature being in a "works" state.
>
> [1] http://docs.aws.amazon.com/AmazonS3/latest/dev/uploadobjusingmpu.html
> [2] http://docs.aws.amazon.com/AmazonS3/latest/dev/mpuoverview.html
>
> On Wed, Nov 25, 2015 at 7:16 PM, Tony Kurc  wrote:
>
>> Things that make me feel better: The persistence mechanism is very similar
>> to that of ListHDFS.
>>
>>
>> https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-hadoop-bundle/nifi-hdfs-processors/src/main/java/org/apache/nifi/processors/hadoop/ListHDFS.java#L417
>>
>> On Tue, Nov 24, 2015 at 10:56 PM, Joe Witt  wrote:
>>
>>> Tags are a great place to mark experimental.  We used to plan for this
>>> concept outright and make it look at scary and such on the ui.  But
>>> folks just didn't care.  They used it anyway.  Happy to revisit it but
>>> for now perhaps just adding a tag of experimental is enough.
>>>
>>> If the existing code path is largely untouched then that is certainly
>>> great for moving the ball forward.  In fairness to Joe S or anyone
>>> that has to persist internal process state until we offer that as part
>>> of the framework it is much harder than we want it to be for people.
>>> Will take a look through the state methods but Payne is probably the
>>> best at playing the wack a mole edge case game for such things.
>>>
>>> On Tue, Nov 24, 2015 at 10:52 PM, Tony Kurc  wrote:
>>> > So, I beat on the the patch for NIFI-1107, and as I suspected, it is
>>> > awfully low risk for existing flows, but I think I'd need a second
>>> opinion
>>> > on how state is kept for resuming uploads. I believe it will work, and
>>> it
>>> > looks like a lot of the edge cases are covered if somehow state is lost
>>> or
>>> > corrupted, but I'm not sure if I am comfortable with how it fits
>>> > architecturally. If someone has cycles, and can peruse the *State
>>> methods
>>> > (getState, persistState, ...) and weigh in, it would accelerate my
>>> review
>>> > significantly.
>>> >
>>> > Also, it sure would be great to mark features as experimental!
>>> >
>>> >
>>> >
>>> > On Tue, Nov 24, 2015 at 10:36 PM, Matt Gilman 
>>> > wrote:
>>> >
>>> >> These tickets [1][2] address the incorrect validation errors we were
>>> >> seeing for processors that include the Input Required annotation. These
>>> >> were bugs that slipped through the NIFI-810 the review. Would be good
>>> to
>>> >> include if possible but I understand we need to draw the line
>>> somewhere.
>>> >>
>>> >> As for NIFI-655, I've been struggling getting an LDAP server stood up
>>> that
>>> >> uses 2 way SSL. Hopefully we can get that squared away soon and wrap
>>> this
>>> >> one up. :)
>>> >>
>>> >> Matt
>>> >>
>>> >> [1] https://issues.apache.org/jira/browse/NIFI-1198
>>> >> [2] https://issues.apache.org/jira/browse/NIFI-1203
>>> >>
>>> >> Sent from my iPhone
>>> >>
>>> >> > On Nov 24, 2015, at 10:23 PM, Joe Witt  wrote:
>>> >> >
>>> >> > Given the testing to NIFI-1192 and review of NIFI-631 done already
>>> >> > both are lower risk I think.
>>> >> >
>>> >> > NIFI-1107 seems very useful and helpful but we do need to be careful
>>> >> > given that we know this one is already in use and this is a
>>> >> > substantive change.
>>> >> >
>>> >> > If there are folks that can dig into review/testing of NIFI-1107 that
>>> >> > would be great.  Waiting for word on NIFI-655 readiness then I think
>>> >> > we should go cold and just focus on testing an RC.
>>> >> >
>>> >> > Thanks
>>> >> > Joe
>>> >> >
>>> >> >> On Tue, Nov 24, 2015 at 4:22 PM, Tony Kurc 
>>> wrote:
>>> >> >> Agreed. I know there has already been a good deal of discussion
>>> about
>>> >> >> design on all these.
>>> >> >>
>>> >> >>> On Tue, Nov 24, 2015 at 4:14 PM, Aldrin Piri >> >
>>> >> wrote:
>>> >> >>>
>>> >> >>> No qualms here.  If they look good to go while the work and testing
>>> 

Re: Clustered Site-toSite

2015-11-25 Thread Matthew Clarke
The postHTTP processor has an option to send as a FlowFile to a listenHTTP
processor on another NiFi. This allows you to keep the FlowFile attributes
across multiple NiFis just like S2S.
On Nov 25, 2015 1:58 PM, "Matthew Gaulin"  wrote:

> Ok, that all makes sense.  The main reason, we like doing it strictly as
> S2S is to maintain the flowfile attributes, so we would like to avoid
> HTTP.  Otherwise we would have to rebuild some of these attributes from the
> content, which isn't the end of the world, but still no fun.  We may
> consider the idea of the single receive node for distribution to a cluster,
> in order to further lock things down from a firewall standpoint.  I think
> the main thing we had to wrap our heads around was that every send node
> needs to be able to directly connect to every receiver node.  Thanks again
> for the very detailed responses!
>
> On Wed, Nov 25, 2015 at 10:44 AM Matthew Clarke  >
> wrote:
>
> > I am not following why you set all your Nodes (source and destination) to
> > use the same hostname(s).  Each hostname resolves to a single IP and by
> > doing so doesn't all data get sent to a single end-point?
> >
> > The idea behind spreading out the connections when using S2S is for smart
> > load balancing purposes.  If all data going to another cluster passed
> > through the NCM first, you lose that data load balancing capability
> because
> > one instance of NiFi (NCM in this case) has to receive all that network
> > traffic. It sound like the approach you want is to send source data to a
> > single NiFi point on another network and then have that single point
> > redistribute that data internally to that network across multiple
> > "processing" nodes in a cluster.
> >
> > This can be accomplished in several ways:
> >
> > 1. You could use S2S to send to a single instance of NiFi on the other
> > network and then have that instance S2S that data to a cluster on that
> same
> > network.
> > 2. You could use the postHTTP (source NiFi) and ListenHTTP (desitination
> > NiFi) processors to facilitate sending data to a single Node in the
> > destination cluster, and then have that Node use S2S to redistribute the
> > data across the entire cluster.
> >
> > A more ideal setup to limit connections needed between networks, might
> be:
> >
> > - Source cluster (consists of numerous low end servers or VMs) and a
> single
> > instance running on a beefy server/VM that will hand all data coming in
> and
> > out of this network.  Use S2S top communicate between internal cluster
> and
> > single instance on same network.
> > - The destination would be setup the same way cluster would look the
> same.
> > You can then use S2S or postHTTP to ListenHTTP to send data as NiFi
> > FlowFIles between your network. That network to network data transfer
> > shoudl occur between the two beefy single instances in each network.
> >
> > Matt
> >
> >
> >
> >
> > On Wed, Nov 25, 2015 at 9:10 AM, Matthew Gaulin 
> > wrote:
> >
> > > Thank you for the info.  I was working with Edgardo on this.  We ended
> up
> > > having to set the SAME hostname on each of the source nodes, as the
> > > destination NCM uses for each of its nodes and of course open up the
> > > firewall rules so all source nodes can talk to each destination node.
> > This
> > > seems to jive with that you explained above.  It is a little annoying
> > that
> > > we have to have so much open to get this to work and can't have a
> single
> > > point of entry on the NCM to send all this data from one network to
> > > another.  Not a huge deal in the end though.  Thanks again.
> > >
> > > On Wed, Nov 25, 2015 at 8:36 AM Matthew Clarke <
> > matt.clarke@gmail.com>
> > > wrote:
> > >
> > > > let me explain first how S2S works when connecting from one cluster
> to
> > > > another cluster.
> > > >
> > > > I will start with the source cluster (this would be the cluster where
> > you
> > > > are adding the Remote Process Group (RPG) to the graph).  The NCM has
> > no
> > > > role in this cluster. Every Node in a cluster works independently
> form
> > > one
> > > > another, so by adding the RPG to the graph, you have added it to
> every
> > > > Node.  So Now the behavior of each Node is the same as as it would be
> > if
> > > it
> > > > were a standalone instance with regards to S2S.  The URL you are
> > > providing
> > > > in that RPG would be the URL for the NCM of the target cluster (This
> > URL
> > > is
> > > > not to the S2S port of the NCM, but to the same URL you would use to
> > > access
> > > > the UI of that cluster).  Now each Node in your "source" cluster is
> > > > communicating with the NCM of the destination cluster unaware at this
> > > time
> > > > that they are communicating with a NCM. These Nodes want to send
> their
> > > data
> > > > to the S2S port on that NCM. Now of course since the NCM does not
> > process
> > > > any data, it is not going to accept 

Re: Clustered Site-toSite

2015-11-25 Thread Matthew Clarke
On Tue, Nov 24, 2015 at 1:38 PM, Edgardo Vega 
wrote:

> Yeah the S2S port is set on all node.
>
> What should the host be set to on each machine? I first set it to the NCM
> ip on each machine in the cluster. Then I set the host to be the ip of each
> individual machine without luck.
>
> The S2S port is open to the internet for the entire cluster for those
> ports.
>
> On Tue, Nov 24, 2015 at 1:35 PM, Matthew Clarke  >
> wrote:
>
> > Did you configure the S2S port on all the Nodes in the cluster you are
> > trying to S2S to?
> >
> > In addition to setting the port on those Nodes, you should also set the
> S2S
> > hostname.  The hostname entered should be resolvable and reachable by the
> > systems trying to S2S to that cluster.
> >
> > Thanks,
> > Matt
> >
> > On Tue, Nov 24, 2015 at 1:29 PM, Edgardo Vega 
> > wrote:
> >
> > > Trying to get site to site working from one cluster to another. It
> works
> > is
> > > the connection goes from cluster to single node but not clusted to
> > > clustered.
> > >
> > > I was looking at jira and saw this ticket
> > > https://issues.apache.org/jira/browse/NIFI-872.
> > >
> > > Is this saying I am out of luck or is there some special config that I
> > must
> > > do to make this work?
> > >
> > >
> > >
> > >
> > > --
> > > Cheers,
> > >
> > > Edgardo
> > >
> >
>
>
>
> --
> Cheers,
>
> Edgardo
>


Re: Clustered Site-toSite

2015-11-25 Thread Matthew Gaulin
Thank you for the info.  I was working with Edgardo on this.  We ended up
having to set the SAME hostname on each of the source nodes, as the
destination NCM uses for each of its nodes and of course open up the
firewall rules so all source nodes can talk to each destination node.  This
seems to jive with that you explained above.  It is a little annoying that
we have to have so much open to get this to work and can't have a single
point of entry on the NCM to send all this data from one network to
another.  Not a huge deal in the end though.  Thanks again.

On Wed, Nov 25, 2015 at 8:36 AM Matthew Clarke 
wrote:

> let me explain first how S2S works when connecting from one cluster to
> another cluster.
>
> I will start with the source cluster (this would be the cluster where you
> are adding the Remote Process Group (RPG) to the graph).  The NCM has no
> role in this cluster. Every Node in a cluster works independently form one
> another, so by adding the RPG to the graph, you have added it to every
> Node.  So Now the behavior of each Node is the same as as it would be if it
> were a standalone instance with regards to S2S.  The URL you are providing
> in that RPG would be the URL for the NCM of the target cluster (This URL is
> not to the S2S port of the NCM, but to the same URL you would use to access
> the UI of that cluster).  Now each Node in your "source" cluster is
> communicating with the NCM of the destination cluster unaware at this time
> that they are communicating with a NCM. These Nodes want to send their data
> to the S2S port on that NCM. Now of course since the NCM does not process
> any data, it is not going to accept any data from those Nodes.  The
> "destination" NCM will respond to each of the "source" Nodes with the
> configured nifi.remote.input.socket.host=, nifi.remote.input.socket.port=,
> and the status for each of those "destination" Nodes.  Using that provided
> information, the source Nodes can logically distribute the data to our the
> "destination' Nodes.
>
> When S2S fails beyond the initial URL connection, there are typically on a
> few likely causes:
> 1. There is a firewall preventing communication between the source Nodes
> and the destination Nodes on the S2S ports.
> 2. No value was supplied for nifi.remote.input.socket.host= on each of the
> target Nodes.  When no value is provided whatever the "hostname" command
> returns is what is sent.  In many cases this hostname may end up being
> "localhost" or some other value that is not resolvable/reachable by the
> "source" systems.
>
> You can change the logging for S2S to DEBUG to see more detail about the
> message traffic between the "destination" NCM and the "source" nodes by
> adding the following lines to the logback.xml files.
>
> 
>
> Watch the logs on one of the source Nodes specifically to see what hostname
> and port is being returned for each destination Node.
>
> Thanks,
> Matt
>
> On Wed, Nov 25, 2015 at 7:59 AM, Matthew Clarke  >
> wrote:
>
> >
> >
> > On Tue, Nov 24, 2015 at 1:38 PM, Edgardo Vega 
> > wrote:
> >
> >> Yeah the S2S port is set on all node.
> >>
> >> What should the host be set to on each machine? I first set it to the
> NCM
> >> ip on each machine in the cluster. Then I set the host to be the ip of
> >> each
> >> individual machine without luck.
> >>
> >> The S2S port is open to the internet for the entire cluster for those
> >> ports.
> >>
> >> On Tue, Nov 24, 2015 at 1:35 PM, Matthew Clarke <
> >> matt.clarke@gmail.com>
> >> wrote:
> >>
> >> > Did you configure the S2S port on all the Nodes in the cluster you are
> >> > trying to S2S to?
> >> >
> >> > In addition to setting the port on those Nodes, you should also set
> the
> >> S2S
> >> > hostname.  The hostname entered should be resolvable and reachable by
> >> the
> >> > systems trying to S2S to that cluster.
> >> >
> >> > Thanks,
> >> > Matt
> >> >
> >> > On Tue, Nov 24, 2015 at 1:29 PM, Edgardo Vega  >
> >> > wrote:
> >> >
> >> > > Trying to get site to site working from one cluster to another. It
> >> works
> >> > is
> >> > > the connection goes from cluster to single node but not clusted to
> >> > > clustered.
> >> > >
> >> > > I was looking at jira and saw this ticket
> >> > > https://issues.apache.org/jira/browse/NIFI-872.
> >> > >
> >> > > Is this saying I am out of luck or is there some special config
> that I
> >> > must
> >> > > do to make this work?
> >> > >
> >> > >
> >> > >
> >> > >
> >> > > --
> >> > > Cheers,
> >> > >
> >> > > Edgardo
> >> > >
> >> >
> >>
> >>
> >>
> >> --
> >> Cheers,
> >>
> >> Edgardo
> >>
> >
> >
>