Nifi cluster features - Questions

2015-10-07 Thread Chakrader Dewaragatla
Nifi Team – I would like to understand the advantages of Nifi clustering setup.

Questions :

 - How does workflow work on multiple nodes ? Does it share the resources intra 
nodes ?
Lets say I need to pull data 10 1Gig files from S3, how does work load 
distribute  ? Setting concurrent tasks as 5. Does it spew 5 tasks per node ?

 - How to “isolate” the processor to the master node (or one node)?

- Getfile/Putfile processors on cluster setup, does it get/put on primary node 
? How do I force processor to look in one of the slave node?

- How can we have a workflow where the input side we want to receive requests 
(http) and then the rest of the pipeline need to run in parallel on all the 
nodes ?

Thanks,
-Chakro


The information contained in this transmission may contain privileged and 
confidential information. It is intended only for the use of the person(s) 
named above. If you are not the intended recipient, you are hereby notified 
that any review, dissemination, distribution or duplication of this 
communication is strictly prohibited. If you are not the intended recipient, 
please contact the sender by reply email and destroy all copies of the original 
message.



Re: Nifi 0.3.0 on java-1.8.0 (S3 processor error)

2015-10-07 Thread Mark Payne
Chakri,

Can you check that your system's date/time is accurate? I have not tried this 
with S3, but I know
that Twitter sends back a pretty similar response when we use GetTwitter on a 
node that has
the wrong date/time. It has to do with the authentication protocol that is used 
requiring an accurate
timestamp.

Please advise if your system date/time is correct.

Thanks
-Mark


> On Oct 7, 2015, at 8:29 PM, Chakrader Dewaragatla 
>  wrote:
> 
> Any one notice Nifi S3 processor broken with latest java version ? 
> (java-1.8.0-openjdk-1.8.0.60-14.b27)
> S3 put error: 
> 
> 2015-10-08 00:28:30,745 ERROR [Timer-Driven Process Thread-6] 
> o.a.nifi.processors.aws.s3.PutS3Object 
> PutS3Object[id=1c0f0c5c-cbb9-4ed9-891c-0bd48ae69366] Failed to put 
> StandardFlowFileRecord[uuid=4a2455b1-d47f-4d6e-9847-a93a5e66d7a5,claim=StandardContentClaim
>  [resourceClaim=StandardResourceClaim[id=1444264110525-192, 
> container=default, section=192], offset=0, 
> length=1000],offset=0,name=c2.txt,size=1000] to Amazon S3 due to 
> com.amazonaws.services.s3.model.AmazonS3Exception: AWS authentication 
> requires a valid Date or x-amz-date header (Service: Amazon S3; Status Code: 
> 403; Error Code: AccessDenied; Request ID: 254510F6EE5E85B5), S3 Extended 
> Request ID: 
> +Q5lfeKce1HMGUjw9BCp8lg8wmIy/eIHKH62xsWteJzKmqbsWRSeGLo4p8h6R1BLd5KIFXjKOO4=: 
> com.amazonaws.services.s3.model.AmazonS3Exception: AWS authentication 
> requires a valid Date or x-amz-date header (Service: Amazon S3; Status Code: 
> 403; Error Code: AccessDenied; Request ID: 254510F6EE5E85B5), S3 Extended 
> Request ID: 
> +Q5lfeKce1HMGUjw9BCp8lg8wmIy/eIHKH62xsWteJzKmqbsWRSeGLo4p8h6R1BLd5KIFXjKOO4=
> 
> 
> It works on other node with java version (java-1.8.0-openjdk-1.8.0.45-40.b14).
> 
> Thanks,
> -Chakri
>  
> 
> 
> The information contained in this transmission may contain privileged and 
> confidential information. It is intended only for the use of the person(s) 
> named above. If you are not the intended recipient, you are hereby notified 
> that any review, dissemination, distribution or duplication of this 
> communication is strictly prohibited. If you are not the intended recipient, 
> please contact the sender by reply email and destroy all copies of the 
> original message.



Nifi 0.3.0 on java-1.8.0 (S3 processor error)

2015-10-07 Thread Chakrader Dewaragatla
Any one notice Nifi S3 processor broken with latest java version ? 
(java-1.8.0-openjdk-1.8.0.60-14.b27)

S3 put error:


2015-10-08 00:28:30,745 ERROR [Timer-Driven Process Thread-6] 
o.a.nifi.processors.aws.s3.PutS3Object 
PutS3Object[id=1c0f0c5c-cbb9-4ed9-891c-0bd48ae69366] Failed to put 
StandardFlowFileRecord[uuid=4a2455b1-d47f-4d6e-9847-a93a5e66d7a5,claim=StandardContentClaim
 [resourceClaim=StandardResourceClaim[id=1444264110525-192, container=default, 
section=192], offset=0, length=1000],offset=0,name=c2.txt,size=1000] to 
Amazon S3 due to com.amazonaws.services.s3.model.AmazonS3Exception: AWS 
authentication requires a valid Date or x-amz-date header (Service: Amazon S3; 
Status Code: 403; Error Code: AccessDenied; Request ID: 254510F6EE5E85B5), S3 
Extended Request ID: 
+Q5lfeKce1HMGUjw9BCp8lg8wmIy/eIHKH62xsWteJzKmqbsWRSeGLo4p8h6R1BLd5KIFXjKOO4=: 
com.amazonaws.services.s3.model.AmazonS3Exception: AWS authentication requires 
a valid Date or x-amz-date header (Service: Amazon S3; Status Code: 403; Error 
Code: AccessDenied; Request ID: 254510F6EE5E85B5), S3 Extended Request ID: 
+Q5lfeKce1HMGUjw9BCp8lg8wmIy/eIHKH62xsWteJzKmqbsWRSeGLo4p8h6R1BLd5KIFXjKOO4=



It works on other node with java version (java-1.8.0-openjdk-1.8.0.45-40.b14).


Thanks,

-Chakri






The information contained in this transmission may contain privileged and 
confidential information. It is intended only for the use of the person(s) 
named above. If you are not the intended recipient, you are hereby notified 
that any review, dissemination, distribution or duplication of this 
communication is strictly prohibited. If you are not the intended recipient, 
please contact the sender by reply email and destroy all copies of the original 
message.



Re: Nifi 0.3.0 on java-1.8.0 (S3 processor error)

2015-10-07 Thread Chakrader Dewaragatla
Downgraded to java 7u80 and s3 processor works.

Thanks,
-Chakri

From: Chakrader Dewaragatla 
>
Date: Wednesday, October 7, 2015 at 5:46 PM
To: "users@nifi.apache.org" 
>
Subject: Re: Nifi 0.3.0 on java-1.8.0 (S3 processor error)

Mark – Yes, I have the right date/time with ntp client configured.

Thanks,
-Chakri

From: Mark Payne >
Reply-To: "users@nifi.apache.org" 
>
Date: Wednesday, October 7, 2015 at 5:34 PM
To: "users@nifi.apache.org" 
>
Subject: Re: Nifi 0.3.0 on java-1.8.0 (S3 processor error)

Chakri,

Can you check that your system's date/time is accurate? I have not tried this 
with S3, but I know
that Twitter sends back a pretty similar response when we use GetTwitter on a 
node that has
the wrong date/time. It has to do with the authentication protocol that is used 
requiring an accurate
timestamp.

Please advise if your system date/time is correct.

Thanks
-Mark


On Oct 7, 2015, at 8:29 PM, Chakrader Dewaragatla 
> 
wrote:

Any one notice Nifi S3 processor broken with latest java version ? 
(java-1.8.0-openjdk-1.8.0.60-14.b27)
S3 put error:

2015-10-08 00:28:30,745 ERROR [Timer-Driven Process Thread-6] 
o.a.nifi.processors.aws.s3.PutS3Object 
PutS3Object[id=1c0f0c5c-cbb9-4ed9-891c-0bd48ae69366] Failed to put 
StandardFlowFileRecord[uuid=4a2455b1-d47f-4d6e-9847-a93a5e66d7a5,claim=StandardContentClaim
 [resourceClaim=StandardResourceClaim[id=1444264110525-192, container=default, 
section=192], offset=0, length=1000],offset=0,name=c2.txt,size=1000] to 
Amazon S3 due to com.amazonaws.services.s3.model.AmazonS3Exception: AWS 
authentication requires a valid Date or x-amz-date header (Service: Amazon S3; 
Status Code: 403; Error Code: AccessDenied; Request ID: 254510F6EE5E85B5), S3 
Extended Request ID: 
+Q5lfeKce1HMGUjw9BCp8lg8wmIy/eIHKH62xsWteJzKmqbsWRSeGLo4p8h6R1BLd5KIFXjKOO4=: 
com.amazonaws.services.s3.model.AmazonS3Exception: AWS authentication requires 
a valid Date or x-amz-date header (Service: Amazon S3; Status Code: 403; Error 
Code: AccessDenied; Request ID: 254510F6EE5E85B5), S3 Extended Request ID: 
+Q5lfeKce1HMGUjw9BCp8lg8wmIy/eIHKH62xsWteJzKmqbsWRSeGLo4p8h6R1BLd5KIFXjKOO4=


It works on other node with java version (java-1.8.0-openjdk-1.8.0.45-40.b14).

Thanks,
-Chakri





The information contained in this transmission may contain privileged and 
confidential information. It is intended only for the use of the person(s) 
named above. If you are not the intended recipient, you are hereby notified 
that any review, dissemination, distribution or duplication of this 
communication is strictly prohibited. If you are not the intended recipient, 
please contact the sender by reply email and destroy all copies of the original 
message.



The information contained in this transmission may contain privileged and 
confidential information. It is intended only for the use of the person(s) 
named above. If you are not the intended recipient, you are hereby notified 
that any review, dissemination, distribution or duplication of this 
communication is strictly prohibited. If you are not the intended recipient, 
please contact the sender by reply email and destroy all copies of the original 
message.



Re: Nifi cluster features - Questions

2015-10-07 Thread Mark Payne
Chakri,

Correct - when NiFi instances are clustered, they do not transfer data between 
the nodes. This is very different
than you might expect from something like Storm or Spark, as the key goals and 
design are quite different.
We have discussed providing the ability to allow the user to indicate that they 
want to have the framework
do load balancing for specific connections in the background, but it's still in 
more of a discussion phase.

Site-to-Site is simply the capability that we have developed to transfer data 
between one instance of
NiFi and another instance of NiFi. So currently, if we want to do load 
balancing across the cluster, we would
create a site-to-site connection (by dragging a Remote Process Group onto the 
graph) and give that
site-to-site connection the URL of our cluster. That way, you can push data to 
your own cluster, effectively
providing a load balancing capability.

If you were to just run ListenHTTP without setting it to Primary Node, then 
every node in the cluster will be listening
for incoming HTTP connections. So you could then use a simple load balancer in 
front of NiFi to distribute the load
across your cluster.

Does this help? If you have any more questions we're happy to help!

Thanks
-Mark


> On Oct 7, 2015, at 2:32 PM, Chakrader Dewaragatla 
>  wrote:
> 
> Mark - Thanks for the notes. 
> 
> >> The other option would be to have a ListenHTTP processor run on Primary 
> >> Node only and then use Site-to-Site to distribute the data to other nodes.
> Lets say I have 5 node cluster and ListenHTTP processor on Primary node, 
> collected data on primary node is not transfered to other nodes by default 
> for processing despite all nodes are part of one cluster? 
> If ListenHTTP processor is running  as a dafult (with out explicit setting to 
> run on primary node), how does the data transferred to rest of the nodes? 
> Does site-to-site come in play when I make one processor to run on primary 
> node ?
> 
> Thanks,
> -Chakri
> 
> From: Mark Payne >
> Reply-To: "users@nifi.apache.org " 
> >
> Date: Wednesday, October 7, 2015 at 7:00 AM
> To: "users@nifi.apache.org " 
> >
> Subject: Re: Nifi cluster features - Questions
> 
> Hello Chakro,
> 
> When you create a cluster of NiFi instances, each node in the cluster is 
> acting independently and in exactly
> the same way. I.e., if you have 5 nodes, all 5 nodes will run exactly the 
> same flow. However, they will be
> pulling in different data and therefore operating on different data.
> 
> So if you pull in 10 1-gig files from S3, each of those files will be 
> processed on the node that pulled the data
> in. NiFi does not currently shuffle data around between nodes in the cluster 
> (you can use site-to-site to do
> this if you want to, but it won't happen automatically). If you set the 
> number of Concurrent Tasks to 5, then
> you will have up to 5 threads running for that processor on each node.
> 
> The only exception to this is the Primary Node. You can schedule a Processor 
> to run only on the Primary Node
> by right-clicking on the Processor, and going to the Configure menu. In the 
> Scheduling tab, you can change
> the Scheduling Strategy to Primary Node Only. In this case, that Processor 
> will only be triggered to run on
> whichever node is elected the Primary Node (this can be changed in the 
> Cluster management screen by clicking
> the appropriate icon in the top-right corner of the UI).
> 
> The GetFile/PutFile will run on all nodes (unless you schedule it to run on 
> primary node only).
> 
> If you are attempting to have a single input running HTTP and then push that 
> out across the entire cluster to 
> process the data, you would have a few options. First, you could just use an 
> HTTP Load Balancer in front of NiFi.
> The other option would be to have a ListenHTTP processor run on Primary Node 
> only and then use Site-to-Site
> to distribute the data to other nodes.
> 
> For more info on site-to-site, you can see the Site-to-Site section of the 
> User Guide at
> http://nifi.apache.org/docs/nifi-docs/html/user-guide.html#site-to-site 
> 
> 
> If you have any more questions, let us know!
> 
> Thanks
> -Mark
> 
>> On Oct 7, 2015, at 2:33 AM, Chakrader Dewaragatla 
>> > > wrote:
>> 
>> Nifi Team – I would like to understand the advantages of Nifi clustering 
>> setup. 
>> 
>> Questions : 
>> 
>>  - How does workflow work on multiple nodes ? Does it share the resources 
>> intra nodes ? 
>> Lets say I need to pull data 10 1Gig files from S3, how does work load 
>> distribute  ? Setting concurrent 

Re: Nifi cluster features - Questions

2015-10-07 Thread Mark Payne
Hello Chakro,

When you create a cluster of NiFi instances, each node in the cluster is acting 
independently and in exactly
the same way. I.e., if you have 5 nodes, all 5 nodes will run exactly the same 
flow. However, they will be
pulling in different data and therefore operating on different data.

So if you pull in 10 1-gig files from S3, each of those files will be processed 
on the node that pulled the data
in. NiFi does not currently shuffle data around between nodes in the cluster 
(you can use site-to-site to do
this if you want to, but it won't happen automatically). If you set the number 
of Concurrent Tasks to 5, then
you will have up to 5 threads running for that processor on each node.

The only exception to this is the Primary Node. You can schedule a Processor to 
run only on the Primary Node
by right-clicking on the Processor, and going to the Configure menu. In the 
Scheduling tab, you can change
the Scheduling Strategy to Primary Node Only. In this case, that Processor will 
only be triggered to run on
whichever node is elected the Primary Node (this can be changed in the Cluster 
management screen by clicking
the appropriate icon in the top-right corner of the UI).

The GetFile/PutFile will run on all nodes (unless you schedule it to run on 
primary node only).

If you are attempting to have a single input running HTTP and then push that 
out across the entire cluster to 
process the data, you would have a few options. First, you could just use an 
HTTP Load Balancer in front of NiFi.
The other option would be to have a ListenHTTP processor run on Primary Node 
only and then use Site-to-Site
to distribute the data to other nodes.

For more info on site-to-site, you can see the Site-to-Site section of the User 
Guide at
http://nifi.apache.org/docs/nifi-docs/html/user-guide.html#site-to-site 


If you have any more questions, let us know!

Thanks
-Mark

> On Oct 7, 2015, at 2:33 AM, Chakrader Dewaragatla 
>  wrote:
> 
> Nifi Team – I would like to understand the advantages of Nifi clustering 
> setup. 
> 
> Questions : 
> 
>  - How does workflow work on multiple nodes ? Does it share the resources 
> intra nodes ? 
> Lets say I need to pull data 10 1Gig files from S3, how does work load 
> distribute  ? Setting concurrent tasks as 5. Does it spew 5 tasks per node ? 
>  
>  - How to “isolate” the processor to the master node (or one node)?
> 
> - Getfile/Putfile processors on cluster setup, does it get/put on primary 
> node ? How do I force processor to look in one of the slave node? 
> 
> - How can we have a workflow where the input side we want to receive requests 
> (http) and then the rest of the pipeline need to run in parallel on all the 
> nodes ? 
> 
> Thanks,
> -Chakro
> 
> The information contained in this transmission may contain privileged and 
> confidential information. It is intended only for the use of the person(s) 
> named above. If you are not the intended recipient, you are hereby notified 
> that any review, dissemination, distribution or duplication of this 
> communication is strictly prohibited. If you are not the intended recipient, 
> please contact the sender by reply email and destroy all copies of the 
> original message.



Re: Need help in nifi- flume processor

2015-10-07 Thread Bryan Bende
Hello,

The NiFi Flume processors are for running Flume sources and sinks with in
NiFi. They don't communicate with an external Flume process.

In your example you would need an ExecuteFlumeSource configured to run the
netcat source, connected to a ExecuteFlumeSink configured with the logger.

-Bryan

On Wednesday, October 7, 2015, Parul Agrawal 
wrote:

> Hi,
>
> I was trying to run Nifi Flume processor with the below mentioned
> details but not could bring it up.
>
> I already started flume with the sample configuration file
> =
> # example.conf: A single-node Flume configuration
>
> # Name the components on this agent
> a1.sources = r1
> a1.sinks = k1
> a1.channels = c1
>
> # Describe/configure the source
> a1.sources.r1.type = netcat
> a1.sources.r1.bind = localhost
> a1.sources.r1.port = 4
>
> # Describe the sink
> a1.sinks.k1.type = logger
>
> # Use a channel which buffers events in memory
> a1.channels.c1.type = memory
> a1.channels.c1.capacity = 1000
> a1.channels.c1.transactionCapacity = 100
>
> # Bind the source and sink to the channel
> a1.sources.r1.channels = c1
> a1.sinks.k1.channel = c1
> =
>
> Command used to start flume : $ bin/flume-ng agent --conf conf
> --conf-file example.conf --name a1 -Dflume.root.logger=INFO,console
>
> In the Nifi browser of ExecuteFlumeSink following configuration was done:
> Property   Value
> Sink Type logger
> Agent Name  a1
> Sink Name k1.
>
> Event is sent to the flume using:
> $ telnet localhost 4
> Trying 127.0.0.1...
> Connected to localhost.localdomain (127.0.0.1).
> Escape character is '^]'.
> Hello world! 
> OK
>
> But I could not get any data in the nifi flume processor. Request your
> help in this.
> Do i need to change the example.conf file of flume so that Nifi Flume
> Sink should get the data.
>
> Thanks and Regards,
> Parul
>


-- 
Sent from Gmail Mobile


Re: Nifi cluster features - Questions

2015-10-07 Thread Chakrader Dewaragatla
Mark - Thanks for the notes.

>> The other option would be to have a ListenHTTP processor run on Primary Node 
>> only and then use Site-to-Site to distribute the data to other nodes.
Lets say I have 5 node cluster and ListenHTTP processor on Primary node, 
collected data on primary node is not transfered to other nodes by default for 
processing despite all nodes are part of one cluster?
If ListenHTTP processor is running  as a dafult (with out explicit setting to 
run on primary node), how does the data transferred to rest of the nodes? Does 
site-to-site come in play when I make one processor to run on primary node ?

Thanks,
-Chakri

From: Mark Payne >
Reply-To: "users@nifi.apache.org" 
>
Date: Wednesday, October 7, 2015 at 7:00 AM
To: "users@nifi.apache.org" 
>
Subject: Re: Nifi cluster features - Questions

Hello Chakro,

When you create a cluster of NiFi instances, each node in the cluster is acting 
independently and in exactly
the same way. I.e., if you have 5 nodes, all 5 nodes will run exactly the same 
flow. However, they will be
pulling in different data and therefore operating on different data.

So if you pull in 10 1-gig files from S3, each of those files will be processed 
on the node that pulled the data
in. NiFi does not currently shuffle data around between nodes in the cluster 
(you can use site-to-site to do
this if you want to, but it won't happen automatically). If you set the number 
of Concurrent Tasks to 5, then
you will have up to 5 threads running for that processor on each node.

The only exception to this is the Primary Node. You can schedule a Processor to 
run only on the Primary Node
by right-clicking on the Processor, and going to the Configure menu. In the 
Scheduling tab, you can change
the Scheduling Strategy to Primary Node Only. In this case, that Processor will 
only be triggered to run on
whichever node is elected the Primary Node (this can be changed in the Cluster 
management screen by clicking
the appropriate icon in the top-right corner of the UI).

The GetFile/PutFile will run on all nodes (unless you schedule it to run on 
primary node only).

If you are attempting to have a single input running HTTP and then push that 
out across the entire cluster to
process the data, you would have a few options. First, you could just use an 
HTTP Load Balancer in front of NiFi.
The other option would be to have a ListenHTTP processor run on Primary Node 
only and then use Site-to-Site
to distribute the data to other nodes.

For more info on site-to-site, you can see the Site-to-Site section of the User 
Guide at
http://nifi.apache.org/docs/nifi-docs/html/user-guide.html#site-to-site

If you have any more questions, let us know!

Thanks
-Mark

On Oct 7, 2015, at 2:33 AM, Chakrader Dewaragatla 
> 
wrote:

Nifi Team – I would like to understand the advantages of Nifi clustering setup.

Questions :

 - How does workflow work on multiple nodes ? Does it share the resources intra 
nodes ?
Lets say I need to pull data 10 1Gig files from S3, how does work load 
distribute  ? Setting concurrent tasks as 5. Does it spew 5 tasks per node ?

 - How to “isolate” the processor to the master node (or one node)?

- Getfile/Putfile processors on cluster setup, does it get/put on primary node 
? How do I force processor to look in one of the slave node?

- How can we have a workflow where the input side we want to receive requests 
(http) and then the rest of the pipeline need to run in parallel on all the 
nodes ?

Thanks,
-Chakro


The information contained in this transmission may contain privileged and 
confidential information. It is intended only for the use of the person(s) 
named above. If you are not the intended recipient, you are hereby notified 
that any review, dissemination, distribution or duplication of this 
communication is strictly prohibited. If you are not the intended recipient, 
please contact the sender by reply email and destroy all copies of the original 
message.



The information contained in this transmission may contain privileged and 
confidential information. It is intended only for the use of the person(s) 
named above. If you are not the intended recipient, you are hereby notified 
that any review, dissemination, distribution or duplication of this 
communication is strictly prohibited. If you are not the intended recipient, 
please contact the sender by reply email and destroy all copies of the original 
message.