Re: NIFI connecting to Activemq

2016-01-09 Thread Oleg Zhurakousky
Chris

Are you sure you are providing the correct logs? I can’t see a single mention 
of JMS nor any stack traces which would definitely be there given what you see 
in UI.
Also, the fact that you see NPE is definitely a bug that we have to fix (user’s 
should never see NPE), so that can be filed. What I am trying to figure out is 
the condition that triggers it. May be if you shut down NIFI, delete all the 
logs and restart so you can get a fresh data. . .

Cheers
Oleg

On Jan 9, 2016, at 1:42 AM, Christopher Hamm 
mailto:em...@christopherhamm.com>> wrote:

Here are the logs

On Wed, Jan 6, 2016 at 11:32 PM, Joe Witt 
mailto:joe.w...@gmail.com>> wrote:
Chris,

Ok.  Can you take a look in the /logs/nifi-app.log and see if there is a 
stack trace for the NullPointerException included?

If not please add the following to your logback.xml and after 30 seconds or so 
it should start giving you the stack traces.  A stacktrace for the NPE would be 
really useful in pinpointing the likely issue.  Also please share the config 
details of GetJMSQueue processor that you can.



Thanks
Joe

On Wed, Jan 6, 2016 at 11:16 PM, Christopher Hamm 
mailto:em...@christopherhamm.com>> wrote:
Couldnt copy the text but here it is.


​

On Wed, Jan 6, 2016 at 5:28 PM, Joe Witt 
mailto:joe.w...@gmail.com>> wrote:
Christopher,

Is there any error/feedback showing up in the UI or in the logs?

Thanks
Joe

On Wed, Jan 6, 2016 at 5:19 PM, Christopher Hamm
mailto:em...@christopherhamm.com>> wrote:
> What am I doing wrong with hooking up my activemq jms get template? I put
> stuff into the activeMQ and NIFI wont get it. Using 0.4.1.
>
> --
> Sincerely,
> Chris Hamm
> (E) ceham...@gmail.com
> (Twitter) http://twitter.com/webhamm
> (Linkedin) http://www.linkedin.com/in/chrishamm



--
Sincerely,
Chris Hamm
(E) ceham...@gmail.com
(Twitter) http://twitter.com/webhamm
(Linkedin) http://www.linkedin.com/in/chrishamm




--
Sincerely,
Chris Hamm
(E) ceham...@gmail.com
(Twitter) http://twitter.com/webhamm
(Linkedin) http://www.linkedin.com/in/chrishamm




Re: Nifi cluster features - Questions

2016-01-09 Thread Chakrader Dewaragatla
Bryan – Thanks, how do the nodes distribute the load for a input port. As port 
is open and listening on two nodes,  does it copy same files on both the nodes?
I need to try this setup to see the results, appreciate your help.

Thanks,
-Chakri

From: Bryan Bende mailto:bbe...@gmail.com>>
Reply-To: "users@nifi.apache.org" 
mailto:users@nifi.apache.org>>
Date: Friday, January 8, 2016 at 3:44 PM
To: "users@nifi.apache.org" 
mailto:users@nifi.apache.org>>
Subject: Re: Nifi cluster features - Questions

Hi Chakri,

I believe the DistributeLoad processor is more for load balancing when sending 
to downstream systems. For example, if you had two HTTP endpoints,
you could have the first relationship from DistributeLoad going to a PostHTTP 
that posts to endpoint #1, and the second relationship going to a second 
PostHTTP that goes to endpoint #2.

If you want to distribute the data with in the cluster, then you need to use 
site-to-site. The way you do this is the following...

- Add an Input Port connected to your PutFile.
- Add GenerateFlowFile scheduled on primary node only, connected to a Remote 
Process Group. The Remote Process Group should be connected to the Input Port 
from the previous step.

So both nodes have an input port listening for data, but only the primary node 
produces a FlowFile and sends it to the RPG which then re-distributes it back 
to one of the Input Ports.

In order for this to work you need to set nifi.remote.input.socket.port in 
nifi.properties to some available port, and you probably want 
nifi.remote.input.secure=false for testing.

-Bryan


On Fri, Jan 8, 2016 at 6:27 PM, Chakrader Dewaragatla 
mailto:chakrader.dewaraga...@lifelock.com>> 
wrote:
Mark – I have setup a two node cluster and tried the following .
 GenrateFlowfile processor (Run only on primary node) —> DistributionLoad 
processor (RoundRobin)   —> PutFile

>> The GetFile/PutFile will run on all nodes (unless you schedule it to run on 
>> primary node only).
>From your above comment, It should put file on two nodes. It put files on 
>primary node only. Any thoughts ?

Thanks,
-Chakri

From: Mark Payne mailto:marka...@hotmail.com>>
Reply-To: "users@nifi.apache.org" 
mailto:users@nifi.apache.org>>
Date: Wednesday, October 7, 2015 at 11:28 AM

To: "users@nifi.apache.org" 
mailto:users@nifi.apache.org>>
Subject: Re: Nifi cluster features - Questions

Chakri,

Correct - when NiFi instances are clustered, they do not transfer data between 
the nodes. This is very different
than you might expect from something like Storm or Spark, as the key goals and 
design are quite different.
We have discussed providing the ability to allow the user to indicate that they 
want to have the framework
do load balancing for specific connections in the background, but it's still in 
more of a discussion phase.

Site-to-Site is simply the capability that we have developed to transfer data 
between one instance of
NiFi and another instance of NiFi. So currently, if we want to do load 
balancing across the cluster, we would
create a site-to-site connection (by dragging a Remote Process Group onto the 
graph) and give that
site-to-site connection the URL of our cluster. That way, you can push data to 
your own cluster, effectively
providing a load balancing capability.

If you were to just run ListenHTTP without setting it to Primary Node, then 
every node in the cluster will be listening
for incoming HTTP connections. So you could then use a simple load balancer in 
front of NiFi to distribute the load
across your cluster.

Does this help? If you have any more questions we're happy to help!

Thanks
-Mark


On Oct 7, 2015, at 2:32 PM, Chakrader Dewaragatla 
mailto:chakrader.dewaraga...@lifelock.com>> 
wrote:

Mark - Thanks for the notes.

>> The other option would be to have a ListenHTTP processor run on Primary Node 
>> only and then use Site-to-Site to distribute the data to other nodes.
Lets say I have 5 node cluster and ListenHTTP processor on Primary node, 
collected data on primary node is not transfered to other nodes by default for 
processing despite all nodes are part of one cluster?
If ListenHTTP processor is running  as a dafult (with out explicit setting to 
run on primary node), how does the data transferred to rest of the nodes? Does 
site-to-site come in play when I make one processor to run on primary node ?

Thanks,
-Chakri

From: Mark Payne mailto:marka...@hotmail.com>>
Reply-To: "users@nifi.apache.org" 
mailto:users@nifi.apache.org>>
Date: Wednesday, October 7, 2015 at 7:00 AM
To: "users@nifi.apache.org" 
mailto:users@nifi.apache.org>>
Subject: Re: Nifi cluster features - Questions

Hello Chakro,

When you create a cluster of NiFi instances, each node in the cluster is acting 
independently and in exactly
the same way. I.e., if you have 5 nodes, all 

Re: Nifi cluster features - Questions

2016-01-09 Thread Bryan Bende
The sending node (where the remote process group is) will distribute the
data evenly across the two nodes, so an individual file will only be sent
to one of the nodes. You could think of it as if a separate NiFi instance
was sending directly to a two node cluster, it would be evenly distributing
the data across the two nodes. In this case it just so happens to all be
with in the same cluster.

The most common use case for this scenario is the List and Fetch processors
like HDFS. You can perform the listing on primary node, and then distribute
the results so the fetching takes place on all nodes.

On Saturday, January 9, 2016, Chakrader Dewaragatla <
chakrader.dewaraga...@lifelock.com> wrote:

> Bryan – Thanks, how do the nodes distribute the load for a input port. As
> port is open and listening on two nodes,  does it copy same files on both
> the nodes?
> I need to try this setup to see the results, appreciate your help.
>
> Thanks,
> -Chakri
>
> From: Bryan Bende  >
> Reply-To: "users@nifi.apache.org
> " <
> users@nifi.apache.org
> >
> Date: Friday, January 8, 2016 at 3:44 PM
> To: "users@nifi.apache.org
> " <
> users@nifi.apache.org
> >
> Subject: Re: Nifi cluster features - Questions
>
> Hi Chakri,
>
> I believe the DistributeLoad processor is more for load balancing when
> sending to downstream systems. For example, if you had two HTTP endpoints,
> you could have the first relationship from DistributeLoad going to a
> PostHTTP that posts to endpoint #1, and the second relationship going to a
> second PostHTTP that goes to endpoint #2.
>
> If you want to distribute the data with in the cluster, then you need to
> use site-to-site. The way you do this is the following...
>
> - Add an Input Port connected to your PutFile.
> - Add GenerateFlowFile scheduled on primary node only, connected to a
> Remote Process Group. The Remote Process Group should be connected to the
> Input Port from the previous step.
>
> So both nodes have an input port listening for data, but only the primary
> node produces a FlowFile and sends it to the RPG which then re-distributes
> it back to one of the Input Ports.
>
> In order for this to work you need to set nifi.remote.input.socket.port in
> nifi.properties to some available port, and you probably want
> nifi.remote.input.secure=false for testing.
>
> -Bryan
>
>
> On Fri, Jan 8, 2016 at 6:27 PM, Chakrader Dewaragatla <
> chakrader.dewaraga...@lifelock.com
> >
> wrote:
>
>> Mark – I have setup a two node cluster and tried the following .
>>  GenrateFlowfile processor (Run only on primary node) —> DistributionLoad
>> processor (RoundRobin)   —> PutFile
>>
>> >> The GetFile/PutFile will run on all nodes (unless you schedule it to
>> run on primary node only).
>> From your above comment, It should put file on two nodes. It put files on
>> primary node only. Any thoughts ?
>>
>> Thanks,
>> -Chakri
>>
>> From: Mark Payne > >
>> Reply-To: "users@nifi.apache.org
>> " <
>> users@nifi.apache.org
>> >
>> Date: Wednesday, October 7, 2015 at 11:28 AM
>>
>> To: "users@nifi.apache.org
>> " <
>> users@nifi.apache.org
>> >
>> Subject: Re: Nifi cluster features - Questions
>>
>> Chakri,
>>
>> Correct - when NiFi instances are clustered, they do not transfer data
>> between the nodes. This is very different
>> than you might expect from something like Storm or Spark, as the key
>> goals and design are quite different.
>> We have discussed providing the ability to allow the user to indicate
>> that they want to have the framework
>> do load balancing for specific connections in the background, but it's
>> still in more of a discussion phase.
>>
>> Site-to-Site is simply the capability that we have developed to transfer
>> data between one instance of
>> NiFi and another instance of NiFi. So currently, if we want to do load
>> balancing across the cluster, we would
>> create a site-to-site connection (by dragging a Remote Process Group onto
>> the graph) and give that
>> site-to-site connection the URL of our cluster. That way, you can push
>> data to your own cluster, effectively
>> providing a load balancing capability.
>>
>> If you were to just run ListenHTTP without setting it to Primary Node,
>> then every node in the cluster will be listening
>> for incoming HTTP connections. So you could then use a simple load
>> balancer in front of NiFi to distribute the load
>> across your cluster.
>>
>> Does this help? If you have any more questions we're happy to help!
>>
>> Thanks
>> -Mark
>>
>>
>> On Oct 7, 2015, at 2:32 PM, Chakrader Dewaragatla <
>> chakrader.dewaraga...@lifelock.com
>> >
>> wrote:
>>
>> Mark - Thanks for the notes.
>>
>> >> The other option would be to have a ListenHTTP processor run on
>> Primary Node only and then use Site-to-Site to distribute the data to other
>> nodes.
>> Lets say I have 5 node cluster and ListenHTTP processor on Primary node,
>> collected data on primary node is not transfered to other nodes by default
>> for processing despite all nodes are part of one cl

Re: Testing a nifi flow via junit

2016-01-09 Thread Oleg Zhurakousky
This is definitely possible and been done. What makes it difficult at times is 
to have all required NiFi dependencies in the process space of a given test. 
I've actually proposed a separate module for these types of 'headless' flow 
tests. It actually helped me to discover some of the bugs as well as learn some 
of the NiFi internals. 

Anyway, not near the computer at the moment, but will follow up with more next 
week 

Oleg 

Sent from my iPhone

> On Jan 4, 2016, at 12:38, Vincent Russell  wrote:
> 
> All,
> 
> I see that there is a way to test a single processor with the TestRunner 
> (StandardProcessorTestRunner) class, but is there a way to set up an 
> integration test to test a complete flow or a subset of a flow?
> 
> Thank you,
> Vincent


Re: NIFI connecting to Activemq

2016-01-09 Thread Joe Witt
Chris,

Thanks for sending the screenshot.  The NullPointerException is
trivially reproduced using your settings.  I didn't even need a JMS
server to cause it.

Have created a JIRA for this: https://issues.apache.org/jira/browse/NIFI-1378

Thanks!
Joe



On Sat, Jan 9, 2016 at 3:21 PM, Oleg Zhurakousky
 wrote:
> Well, you URL is malformed, since it should be of form protocol://host:port.
> Just look at any AMQ sample out there on the web. The part that worries me
> is the NPE and nothing in the logs. Did you try Joe's suggestion and enable
> Debug level logging?
>
> Sent from my iPhone
>
> On Jan 9, 2016, at 14:06, Christopher Hamm 
> wrote:
>
> Those logs came from a completely new restart with cleared logs.
>
> If have an ActiveMQ topic setup. What should each field have in it if it is
> running locally and using port 61613? I think maybe the NPE is related to
> bad network connection with my activeMQ. Can't find examples out there.
> 
>
> On Sat, Jan 9, 2016 at 9:13 AM, Oleg Zhurakousky
>  wrote:
>>
>> Chris
>>
>> Are you sure you are providing the correct logs? I can’t see a single
>> mention of JMS nor any stack traces which would definitely be there given
>> what you see in UI.
>> Also, the fact that you see NPE is definitely a bug that we have to fix
>> (user’s should never see NPE), so that can be filed. What I am trying to
>> figure out is the condition that triggers it. May be if you shut down NIFI,
>> delete all the logs and restart so you can get a fresh data. . .
>>
>> Cheers
>> Oleg
>>
>> On Jan 9, 2016, at 1:42 AM, Christopher Hamm 
>> wrote:
>>
>> Here are the logs
>>
>> On Wed, Jan 6, 2016 at 11:32 PM, Joe Witt  wrote:
>>>
>>> Chris,
>>>
>>> Ok.  Can you take a look in the /logs/nifi-app.log and see if there
>>> is a stack trace for the NullPointerException included?
>>>
>>> If not please add the following to your logback.xml and after 30 seconds
>>> or so it should start giving you the stack traces.  A stacktrace for the NPE
>>> would be really useful in pinpointing the likely issue.  Also please share
>>> the config details of GetJMSQueue processor that you can.
>>>
>>> >> level="DEBUG"/>
>>>
>>> Thanks
>>> Joe
>>>
>>> On Wed, Jan 6, 2016 at 11:16 PM, Christopher Hamm
>>>  wrote:

 Couldnt copy the text but here it is.
 



 On Wed, Jan 6, 2016 at 5:28 PM, Joe Witt  wrote:
>
> Christopher,
>
> Is there any error/feedback showing up in the UI or in the logs?
>
> Thanks
> Joe
>
> On Wed, Jan 6, 2016 at 5:19 PM, Christopher Hamm
>  wrote:
> > What am I doing wrong with hooking up my activemq jms get template? I
> > put
> > stuff into the activeMQ and NIFI wont get it. Using 0.4.1.
> >
> > --
> > Sincerely,
> > Chris Hamm
> > (E) ceham...@gmail.com
> > (Twitter) http://twitter.com/webhamm
> > (Linkedin) http://www.linkedin.com/in/chrishamm




 --
 Sincerely,
 Chris Hamm
 (E) ceham...@gmail.com
 (Twitter) http://twitter.com/webhamm
 (Linkedin) http://www.linkedin.com/in/chrishamm
>>>
>>>
>>
>>
>>
>> --
>> Sincerely,
>> Chris Hamm
>> (E) ceham...@gmail.com
>> (Twitter) http://twitter.com/webhamm
>> (Linkedin) http://www.linkedin.com/in/chrishamm
>> 
>>
>>
>
>
>
> --
> Sincerely,
> Chris Hamm
> (E) ceham...@gmail.com
> (Twitter) http://twitter.com/webhamm
> (Linkedin) http://www.linkedin.com/in/chrishamm


Fwd: How to validate records in Hadoop using NiFi?

2016-01-09 Thread sudeep mishra
Hi,

I am pushing some database records into HDFS using Sqoop.

I want to perform some validations on each record in the HDFS data. Which
NiFi processor can I use to split each record (separated by a new line
character) and perform validations?

For validations I want to verify a particular column value for each record
using a SQL query. I can see an ExecuteQuery processor. How can I
dynamically pass query parameters to it. Also is there a way to execute the
queries in bulk rather for each record.

Kindly suggest.

Apprecuate your help.


Thanks & Regards,

Sudeep Shekhar Mishra





-- 
Thanks & Regards,

Sudeep Shekhar Mishra

+91-9167519029
sudeepshekh...@gmail.com


Re: How to validate records in Hadoop using NiFi?

2016-01-09 Thread Joe Witt
Hello Sudeep,

"Which NiFi processor can I use to split each record (separated by a
new line character)"

  For this the SplitText processor is rather helpful if you want to
split each line.  I recommend you do two SplitText processors in a
chain where one splits on every 1000 lines for example and then the
next one splits each line.  As long as you have back-pressure setup
this means you could split arbitrarily larger (in terms of number of
lines) source files and have good behavior.

..."and perform validations?"

  Consider if you want to validate each line in a text file and route
valid lines one way and invalid lines another way.  If this is the
case then you may be able to avoid using SplitText and simply use
RouteText instead as it can operate on the original file in a line by
line manner and perform expression based validation.  This would
operate in bulk and be quite efficient.

"For validations I want to verify a particular column value for each
record using a SQL query"

  Our ExecuteSQL processor is designed for executing SQL against a
JDBC accessible database.  It is not helpful at this point for
executing queries on line oriented data even if that data were valid
DML or something.  Interesting idea but not something we support at
this time.

I'm interested to understand your case more if you don't mind though.
You mention you're getting data from Sqoop into HDFS.  How is NiFi
involved in that flow - is it after data lands in HDFS you're pulling
it into NiFi?

Thanks
Joe

On Sat, Jan 9, 2016 at 10:32 PM, sudeep mishra  wrote:
> Hi,
>
> I am pushing some database records into HDFS using Sqoop.
>
> I want to perform some validations on each record in the HDFS data. Which
> NiFi processor can I use to split each record (separated by a new line
> character) and perform validations?
>
> For validations I want to verify a particular column value for each record
> using a SQL query. I can see an ExecuteQuery processor. How can I
> dynamically pass query parameters to it. Also is there a way to execute the
> queries in bulk rather for each record.
>
> Kindly suggest.
>
> Apprecuate your help.
>
>
> Thanks & Regards,
>
> Sudeep Shekhar Mishra
>
>
>
>
>
> --
> Thanks & Regards,
>
> Sudeep Shekhar Mishra
>
> +91-9167519029
> sudeepshekh...@gmail.com