Trigger a processor if all files in a folder are processed

2015-12-03 Thread Manish Gupta 8
Hi,

I have a scenario where I want to trigger / execute one processor once GetFile 
has pulled all the files from a folder and the last processor has finished its 
execution. How can I implement this in Nifi?

Basically what I am trying to do is:
({Execute Process to call some phantomJS script to download few files in a 
directory}) : runs every 1 hour
({Get File (xml)} --> {Validate with XSD} --> {Put HDFS}): checks for files 
continuously

Now after this flow is complete i.e. all files are available in HDFS, I want to 
submit my XML to Avro conversion MR job using Oozie REST. How can I make sure 
that my Invoke HTTP processor executes only once and that too after all files 
have successfully landed in HDFS?

Thanks,
Manish



RE: Trigger a processor if all files in a folder are processed

2015-12-04 Thread Manish Gupta 8
Can someone please provide a workaround for this scenario.

Thanks,
Manish


From: Manish Gupta 8 [mailto:mgupt...@sapient.com]
Sent: Thursday, December 03, 2015 2:18 PM
To: users@nifi.apache.org
Subject: Trigger a processor if all files in a folder are processed

Hi,

I have a scenario where I want to trigger / execute one processor once GetFile 
has pulled all the files from a folder and the last processor has finished its 
execution. How can I implement this in Nifi?

Basically what I am trying to do is:
({Execute Process to call some phantomJS script to download few files in a 
directory}) : runs every 1 hour
({Get File (xml)} --> {Validate with XSD} --> {Put HDFS}): checks for files 
continuously

Now after this flow is complete i.e. all files are available in HDFS, I want to 
submit my XML to Avro conversion MR job using Oozie REST. How can I make sure 
that my Invoke HTTP processor executes only once and that too after all files 
have successfully landed in HDFS?

Thanks,
Manish



Default termination for relationships

2016-04-24 Thread Manish Gupta 8
Hi,

Does it make sense to keep all the out-flowing relationships auto-terminated by 
default when a new processor is dragged in? When user connects the processor 
and specify a relationship, only the selected one becomes non-terminating.

I think this will be good from usability point of view.

Thanks,
Manish




NiFi - Configuration option

2016-04-28 Thread Manish Gupta 8
Hi,

What is the best option for storing root/processor group level configurations? 
From "Expression Language Guide", I know one can use environment variable or 
JVM system property. Or specify the value for one flow using an update 
attribute processor near the top of the flow.

But is there a way I can have a single property / xml file to have my all the 
configurations available and each property be available in nifi as a variable?

Thanks,
Manish


RE: NiFi on Windows

2016-05-03 Thread Manish Gupta 8
I am also running on windows and for status and stop script, I just created a 
copy of run file and changed the BOOTSTRAP_ACTION in it. It works fine.

Regards,
Manish


From: Tom Jerry [mailto:toejam20...@gmail.com]
Sent: Tuesday, May 03, 2016 7:25 PM
To: users@nifi.apache.org
Subject: Re: NiFi on Windows

Thanks for the reply.

There also used to be a run-nifi.bat and stop-nifi.bat scripts, but they are no 
longer in /bin

The two files are still listed in the Admin Guide.


RE: Windows Nifi ExecuteSql querying Azure Sql Server using JDBC failed to load database driver.

2016-05-04 Thread Manish Gupta 8
I had a similar issue and what worked for me is:

Database Connection URL: 
jdbc:sqlserver://**.*.com;instanceName=dev;databaseName=*
Database Driver Class Name   : com.microsoft.sqlserver.jdbc.SQLServerDriver
Database Driver Jar Url  : 
file:///d:/nifi/driver/sqljdbc_4.0/sqljdbc4.jar
Database User  : 
Password: *
Max Wait Time : 1000 millis
Max Total Connections : 1


Regards,
Manish


From: Keith Lim [mailto:keith@ds-iq.com]
Sent: Thursday, May 05, 2016 5:22 AM
To: users@nifi.apache.org
Subject: Windows Nifi ExecuteSql querying Azure Sql Server using JDBC failed to 
load database driver.

I have a simple ExecuteSql processor that uses jdbc to connect to Microsoft 
Azure Sql Server.  My DBCPConnectionPool is set as below:

Database Connection URL: 
jdbc:sqlserver://dsiq-dev.database.windows.net:1433;database=CDBStaging  (I 
have tried with the full connection as extracted from Azure Portal:  
jdbc:sqlserver://dsiq-dev.database.windows.net:1433;database=CDBStaging;encrypt=true;trustServerCertificate=false;hostNameInCertificate=*.database.windows.net;loginTimeout=30;)
Database Driver Class Name: com.microsoft.sqlserver.jdbc.SQLServerDriver
Database Driver Jar Url: 
file:///C:/drivers/sqljdbc/sqljdbc_6.0/enu/sqljdbc41.jar
   (I tried sqljdbc4.jar and sqljdbc42.jar as well).

I tried all permutation of the above I can think of, and always get something 
like below: Can’t load Database Driver.
My Nifi server is running on windows 10 and the driver was downloaded from 
here: https://www.microsoft.com/en-us/download/details.aspx?id=11774).

Thanks,
Keith

2016-05-04 16:07:57,545 ERROR [StandardProcessScheduler Thread-3] 
o.a.n.c.s.StandardControllerServiceNode 
DBCPConnectionPool[id=63a0ccaf-fed7-4223-8c28-6d3b9ba6607e] Failed to invoke 
@OnEnabled method due to org.apache.nifi.reporting.InitializationException: 
Can't load Database Driver
2016-05-04 16:07:57,547 ERROR [StandardProcessScheduler Thread-3] 
o.a.n.c.s.StandardControllerServiceNode
org.apache.nifi.reporting.InitializationException: Can't load Database Driver
at 
org.apache.nifi.dbcp.DBCPConnectionPool.getDriverClassLoader(DBCPConnectionPool.java:199)
 ~[na:na]
at 
org.apache.nifi.dbcp.DBCPConnectionPool.onConfigured(DBCPConnectionPool.java:162)
 ~[na:na]
at sun.reflect.GeneratedMethodAccessor228.invoke(Unknown 
Source) ~[na:na]
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 ~[na:1.8.0_77]
at java.lang.reflect.Method.invoke(Method.java:498) 
~[na:1.8.0_77]
at 
org.apache.nifi.util.ReflectionUtils.invokeMethodsWithAnnotations(ReflectionUtils.java:137)
 ~[na:na]
at 
org.apache.nifi.util.ReflectionUtils.invokeMethodsWithAnnotations(ReflectionUtils.java:125)
 ~[na:na]
at 
org.apache.nifi.util.ReflectionUtils.invokeMethodsWithAnnotations(ReflectionUtils.java:70)
 ~[na:na]
at 
org.apache.nifi.util.ReflectionUtils.invokeMethodsWithAnnotation(ReflectionUtils.java:47)
 ~[na:na]
at 
org.apache.nifi.controller.service.StandardControllerServiceNode$1.run(StandardControllerServiceNode.java:285)
 ~[na:na]
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
[na:1.8.0_77]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
[na:1.8.0_77]
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
 [na:1.8.0_77]
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
 [na:1.8.0_77]
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
[na:1.8.0_77]
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
[na:1.8.0_77]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_77]
Caused by: java.lang.ClassNotFoundException: 
com.microsoft.sqlserver.jdbc.SQLServerDriver
at java.net.URLClassLoader.findClass(URLClassLoader.java:381) 
~[na:1.8.0_77]
at java.lang.ClassLoader.loadClass(ClassLoader.java:424) 
~[na:1.8.0_77]
at java.lang.ClassLoader.loadClass(ClassLoader.java:357) 
~[na:1.8.0_77]
at java.lang.Class.forName0(Native Method) ~[na:1.8.0_77]
at java.lang.Class.forName(Class.java:348) ~[na:1.8.0_77]
at 
org.apache.nifi.dbcp.DBCPConnectionPool.getDriverClassLoader(DBCPConnectionPool.java:188)
 ~[na:na]
... 16 common frames omitted
2016-05-04 16:07:57,547 ERROR [StandardProcessScheduler Thread-3] 
o.a.n.c.s.StandardControllerServiceNode Failed to invoke @OnEnabled meth

Re: MySQL/Oracle to CSV

2016-07-19 Thread Manish Gupta 8
The way we have implemented in our project (ms sql server to hadoop) is -
1. Sql Agent job runs every 5 minutes and generate a new tab separated file 
with incremental changes, in a network share.
2. Nifi pick up the files, merge content (suitable for hdfs), covert to avro, 
load in hive external tables.

This is running for around 15 tables, 3 databases.

Thanks,
Manish



On Tue, Jul 19, 2016 at 10:39 PM +0530, "Ravi Papisetti (rpapiset)" 
mailto:rpapi...@cisco.com>> wrote:

Hi,

We are trying to find right workflow for ingesting data from a relational 
database to csv files on hadoop.

There are few options we could think of:

  *   Write a sqoop script and wrap under ExecuteProcess
  *   Stitch a workflow using "QueryDataBaseTable" -> ConvertAVROToJson -> 
ReplaceTextWithMapping (looks complext to me for a simple flow)

Appreciate if you can share any experiences and best practices of using apache 
NiFi for this use case.



Thanks,

Ravi Papisetti

Technical Leader

Services Technology Incubation 
Center

rpapi...@cisco.com

Phone: +1 512 340 3377


[stic-logo-email-blue]


Processors in cluster mode

2016-08-08 Thread Manish Gupta 8
Hi,

I am running a multi-node NiFi (0.7.0) cluster and trying to implement a 
streaming ingestion pipeline (@ 200 MB/s at peak and around 30 MB/s at non-peak 
hours) and routing to different destinations (Kafka, Azure Storage, HDFS). The 
dataflow will be exposing a TCP port for incoming data and will also be 
ingesting files from folder, database records etc.

It would be great if someone can provide a link/doc that explains how 
processors can be expected to behave in a multi-node environment.
My doubts are about how some of the processors work in a clustered mode, and 
the meaning of concurrent tasks.

For example:


* ListenTCP:

o   When this processor is scheduled to run on a cluster (and not on the 
primary node), then does it mean I need to send data to all the individual 
nodes manually i.e. specify each node's host names separately? If I don't 
specify hosts individually and only provide let's say primary node's host name 
from producer, will all the other nodes remain idle? Or NiFi tries to 
distribute the data to other nodes using some routing strategy? I am trying to 
increase the throughput and thinking something like this as data producer 
strategy:



[cid:image004.jpg@01D1F199.EB0A8300]



And consumer will be simply as following:

[cid:image003.png@01D1F199.E94A3560]





o   When I increase the number of concurrent tasks, does it make multiple 
copies of buffer/channel reader etc.? Or is it only the processing which gets 
multiplied?

* Get / Fetch File:

o   Can we assume that when this processor is running on multiple nodes and 
threads, the same file will never get pulled multiple times as a flow-file?

* Distribute Load Processor:

o   When this processor is running on multiple nodes, will all the incoming 
flow files go to each instance of running node? And this question is for any 
processor that run on a cluster and has to consume an incoming flow-file? 
What's the general routing strategy in NiFi when a processor is running on 
multiple node?

* ExecuteSQL

o   Will all the running instances on all the nodes be hitting the RDBMS to 
fetch the data for the same query leading to duplicates, and heavy load on 
database?

Thanks,
Manish



RE: Processors in cluster mode

2016-08-08 Thread Manish Gupta 8
Thanks Bryan. This is very helpful.

Regards,
Manish


From: Bryan Bende [mailto:bbe...@gmail.com]
Sent: Tuesday, August 09, 2016 12:50 AM
To: users@nifi.apache.org
Subject: Re: Processors in cluster mode

Hi Manish,

This post [1] has an overview of how to distribute data across your NiFi 
cluster.

In general though, NiFi runs the same flow on each node and the data needs to 
be divided across the nodes appropriately depending on the situation.
The only exception to running the same flow on every node is when a processor 
is scheduled to run Primary Node only.

Concurrent Tasks is the number of threads that will concurrently call a given 
instance of a processor. So if you have processor "Foo" and a three node 
cluster, and set concurrent tasks to 2, there will be three instances of Foo 
and each will have two threads calling the onTrigger method.

For some of your specific cases...

ListenTCP - You would have an instance of this process on all three nodes and 
need the producer to send to all of them, or have a load balancer that supports 
TCP sitting in front of the nodes and have the producer send to the load 
balancer.
Get/Fetch File - These pick up files from the local filesystem so it would be 
up to the data producer to send/write files on each node of the cluster for 
each instance of this processor to pick up.
Distribute Load Processor - There will be a Distribute Load processor running 
on each node and operating on only the flow files on that node.
ExecuteSQL - Typically you would run this on primary node only, or in an 
upcoming release there is going to be some more options with a 
ListDatabaseTable processor that can produce instructions than can be 
distributed across a cluster to your ExecuteSQL processor.

Hope that helps.

Thanks,

Bryan

[1] 
https://community.hortonworks.com/articles/16120/how-do-i-distribute-data-across-a-nifi-cluster.html

On Mon, Aug 8, 2016 at 7:55 AM, Manish Gupta 8 
mailto:mgupt...@sapient.com>> wrote:
Hi,

I am running a multi-node NiFi (0.7.0) cluster and trying to implement a 
streaming ingestion pipeline (@ 200 MB/s at peak and around 30 MB/s at non-peak 
hours) and routing to different destinations (Kafka, Azure Storage, HDFS). The 
dataflow will be exposing a TCP port for incoming data and will also be 
ingesting files from folder, database records etc.

It would be great if someone can provide a link/doc that explains how 
processors can be expected to behave in a multi-node environment.
My doubts are about how some of the processors work in a clustered mode, and 
the meaning of concurrent tasks.

For example:


• ListenTCP:

o   When this processor is scheduled to run on a cluster (and not on the 
primary node), then does it mean I need to send data to all the individual 
nodes manually i.e. specify each node’s host names separately? If I don’t 
specify hosts individually and only provide let’s say primary node’s host name 
from producer, will all the other nodes remain idle? Or NiFi tries to 
distribute the data to other nodes using some routing strategy? I am trying to 
increase the throughput and thinking something like this as data producer 
strategy:



[cid:image001.jpg@01D1F22A.7030]



And consumer will be simply as following:

[cid:image002.png@01D1F22A.7030]





o   When I increase the number of concurrent tasks, does it make multiple 
copies of buffer/channel reader etc.? Or is it only the processing which gets 
multiplied?

• Get / Fetch File:

o   Can we assume that when this processor is running on multiple nodes and 
threads, the same file will never get pulled multiple times as a flow-file?

• Distribute Load Processor:

o   When this processor is running on multiple nodes, will all the incoming 
flow files go to each instance of running node? And this question is for any 
processor that run on a cluster and has to consume an incoming flow-file? 
What’s the general routing strategy in NiFi when a processor is running on 
multiple node?

• ExecuteSQL

o   Will all the running instances on all the nodes be hitting the RDBMS to 
fetch the data for the same query leading to duplicates, and heavy load on 
database?

Thanks,
Manish




Processor to enrich attribute from external service

2016-09-02 Thread Manish Gupta 8
Hello Everyone,

Is there a processor that we can use for updating/adding an attribute of an 
incoming flow file from some external service (say MongoDB or Couchbase or any 
RDBMS)? The processor will use the attribute of incoming flow file, query the 
external service, and simply modify/add an additional attribute of flow-file 
(without touching the flow file content).

If we have to achieve this kind of "lookup" operation (but only to update 
attribute and not the content), what are the options in NiFi?
Should we create a custom processor (may be by taking GetMongo processor and 
modifying its code to update an attribute with query result)?

Thanks,
Manish



RE: Processor to enrich attribute from external service

2016-09-02 Thread Manish Gupta 8
Thanks for the reply Joe. Just a thought – do you think it would be a good idea 
for every Get processor (GetMongo, GetHBase etc.) to have 2 additional 
properties like:

1.   Result in Content or Result in Attribute

2.   Result Attribute Name (only applicable when “Result in Attribute” is 
selected).
But then all such processors should be able to accept incoming flowfile (which 
they don’t as of now – being a “Get”).

May be ExecuteSQL and FetchDistributeMapCache can be enhanced that way i.e. 
have an option to specify the destination – content or attribute?

Regards,
Manish

From: Joe Witt [mailto:joe.w...@gmail.com]
Sent: Friday, September 02, 2016 5:58 PM
To: users@nifi.apache.org
Subject: Re: Processor to enrich attribute from external service


You would need to make a custom process for now.  I think we should have a nice 
controller service to generalize jdbc lookups which supports caching.  And then 
a processor which leverages it.

This comes up fairly often and is pretty straightforward from a design POV.  
Anyone want to take a stab at this?

On Sep 2, 2016 4:47 PM, "Manish Gupta 8" 
mailto:mgupt...@sapient.com>> wrote:
Hello Everyone,

Is there a processor that we can use for updating/adding an attribute of an 
incoming flow file from some external service (say MongoDB or Couchbase or any 
RDBMS)? The processor will use the attribute of incoming flow file, query the 
external service, and simply modify/add an additional attribute of flow-file 
(without touching the flow file content).

If we have to achieve this kind of “lookup” operation (but only to update 
attribute and not the content), what are the options in NiFi?
Should we create a custom processor (may be by taking GetMongo processor and 
modifying its code to update an attribute with query result)?

Thanks,
Manish



RE: Processor to enrich attribute from external service

2016-09-02 Thread Manish Gupta 8
I think the lookup processor should return data in a format that can be 
efficiently parsed/processed by NiFi expression language. For example – JSON. 
This would avoid using additional “Extract” type processor. All the downstream 
processor can simply work with “jsonPath” for additional lookup inside the 
attribute.

Regards,
Manish

From: Matt Burgess [mailto:mattyb...@gmail.com]
Sent: Friday, September 02, 2016 6:37 PM
To: users@nifi.apache.org
Subject: Re: Processor to enrich attribute from external service

Manish,

Some of the queries in those processors could bring back lots of data, and 
putting them into an attribute could cause memory issues. Another concern is 
when the result is binary data, such as ExecuteSQL returning an Avro file. And 
since the return of these is a collection of records, these processors are 
often followed by a Split processor to perform operations on individual records.

Having said that, if the return value is text and you'd like to transfer it to 
an attribute, you can use ExtractText to put the content into an attribute. For 
small content (which is the appropriate use case), this should be pretty fast, 
and keeps the logic in a single processor instead of duplicated (either 
logically or physically) across processors.

By the way I'm very interested in an RDBMS lookup processor, but not sure I'd 
have time in the short run to write it up. If someone takes a crack at it, I 
recommend properties to pre-cache the table with a refresh interval. This way 
if the lookup table doesn't change much and is not too big, it could be read 
into the processor's memory for super-fast lookups. Alternatively, a property 
could be a cache size, which would build a subset of the table in memory as 
values are looked up. This is probably more robust as it is bounded and if the 
size is set high enough for a small table, it would be read in its entirety. 
Still would want the cache refresh property though.

Cheers,
Matt

On Sep 2, 2016, at 6:19 PM, Manish Gupta 8 
mailto:mgupt...@sapient.com>> wrote:
Thanks for the reply Joe. Just a thought – do you think it would be a good idea 
for every Get processor (GetMongo, GetHBase etc.) to have 2 additional 
properties like:

1.  Result in Content or Result in Attribute

2.  Result Attribute Name (only applicable when “Result in Attribute” is 
selected).
But then all such processors should be able to accept incoming flowfile (which 
they don’t as of now – being a “Get”).

May be ExecuteSQL and FetchDistributeMapCache can be enhanced that way i.e. 
have an option to specify the destination – content or attribute?

Regards,
Manish

From: Joe Witt [mailto:joe.w...@gmail.com]
Sent: Friday, September 02, 2016 5:58 PM
To: users@nifi.apache.org<mailto:users@nifi.apache.org>
Subject: Re: Processor to enrich attribute from external service


You would need to make a custom process for now.  I think we should have a nice 
controller service to generalize jdbc lookups which supports caching.  And then 
a processor which leverages it.

This comes up fairly often and is pretty straightforward from a design POV.  
Anyone want to take a stab at this?

On Sep 2, 2016 4:47 PM, "Manish Gupta 8" 
mailto:mgupt...@sapient.com>> wrote:
Hello Everyone,

Is there a processor that we can use for updating/adding an attribute of an 
incoming flow file from some external service (say MongoDB or Couchbase or any 
RDBMS)? The processor will use the attribute of incoming flow file, query the 
external service, and simply modify/add an additional attribute of flow-file 
(without touching the flow file content).

If we have to achieve this kind of “lookup” operation (but only to update 
attribute and not the content), what are the options in NiFi?
Should we create a custom processor (may be by taking GetMongo processor and 
modifying its code to update an attribute with query result)?

Thanks,
Manish



RE: Nifi compatibility with Hadoop Version

2016-09-10 Thread Manish Gupta 8
I generally prefer to check Github for such information. Look for project 
properties in pom.xml for Hadoop version.

NiFi 1.0: https://github.com/apache/nifi/blob/rel/nifi-1.0.0/pom.xml
NiFi 0.7: https://github.com/apache/nifi/blob/rel/nifi-0.7.0/pom.xml
NiFi 0.6: https://github.com/apache/nifi/blob/rel/nifi-0.6.1/pom.xml

Regards,
Manish

From: Shashi Vishwakarma [mailto:shashi.vish...@gmail.com]
Sent: Saturday, September 10, 2016 3:20 PM
To: users@nifi.apache.org
Subject: Re: Nifi compatibility with Hadoop Version


Thanks a lot. Does hadoop Client version changes with NiFi version? Where can 
get more information about it that which version of Nifi is packed with which 
version of hadoop?

Thanks Shashi

On 11 Sep 2016 12:20 am, "Bryan Bende" 
mailto:bbe...@gmail.com>> wrote:
Shashi,

Apache NiFi is currently built with the Apache Hadoop 2.6.2 client, so 
generally it will work with any versions of Hadoop that this client is 
compatible with.

NiFi is not using any libraries or anything from the target cluster, except for 
the config files for locations of services, and the client itself is bundled 
with the NiFi build.

There have been some efforts recently to provide build profiles for NiFi for 
those who want to build a version of NiFi that uses vendor specific libraries 
(i.e. MapR, CDH, HDP, etc.), but I can't fully speak to the current state of 
that effort.

Thanks,

Bryan

On Sat, Sep 10, 2016 at 9:53 AM, Shashi Vishwakarma 
mailto:shashi.vish...@gmail.com>> wrote:
Hi All

I just got a very basic question about nifi. I see that nifi has got default 
putHDFS and getHdfs processor.

Does nifi depends on hadoop version present on cluster?

Lets nifi 0.6 is compatible with hadoop 2.7 etc. something like this.

Do we have such metrics or it purely depends on hadoop configuration that we 
provide.

Thanks
Shashi



Best Practice for backing up NiFi Flows

2016-09-13 Thread Manish Gupta 8
Hello Everyone,

Is there a best practice for keeping a backup of all the data flows we are 
developing in NiFi?

Currently we take a copy of flow.xml.gz every hour and keep it in backup folder 
(also in our source control). Also, we keep a copy of all Config files in 
source control.


* We are assuming that using flow.xml.gz and Config files, we will be 
able to restore the NiFi in case of any failure or if someone makes some 
mistake. Is this assumption correct? Is there a better way to deal with this?

* When we move to production (or some other environment), will it be as 
simple as dropping flow.xml.gz in a new NiFi installation on NCM along with 
making some environment related changes? Or, should we use templates on Dev, 
and import on Prod?

Thanks,
Manish



OnTrigger - FlowFile is Null

2016-09-14 Thread Manish Gupta 8
Hi,

In one of our custom processor, I forgot to mention the following check, and 
the error started showing up randomly.

if (flowFile == null) {
return;
}


I am just curious to know, in what situation would onTrigger will get called if 
there is no FlowFile?

Regards,
Manish



RE: OnTrigger - FlowFile is Null

2016-09-14 Thread Manish Gupta 8
Thanks Matt. This is very helpful. In my case also, it's the 3rd scenario. I 
remember, this error started showing up only when I had increased the 
concurrent tasks to greater than 1 sometime earlier today.

Regards,
Manish
Cell: +1 646-744-7606
Email: mgupt...@sapient.com<mailto:mgupt...@sapient.com>

From: Mark Payne [mailto:marka...@hotmail.com]
Sent: Wednesday, September 14, 2016 2:50 PM
To: users@nifi.apache.org
Subject: Re: OnTrigger - FlowFile is Null

Manish,

This happens for a few reasons:

* Processor has no incoming connections (is a source processor)
* Processor has @ScheduleWhenEmpty annotation
* Processor has more than 1 concurrent task

The main reason is the third one above. If you have multiple concurrent tasks, 
Thread 1 can determine that there is 1
FlowFile queued. Thread 2 then determines that there is 1 FlowFile queued. Both 
threads call
onTrigger(). Thread 1 gets the FlowFIle, and Thread 2 gets null.

Does this help?

Thanks
-Mark


On Sep 14, 2016, at 2:45 PM, Manish Gupta 8 
mailto:mgupt...@sapient.com>> wrote:

Hi,

In one of our custom processor, I forgot to mention the following check, and 
the error started showing up randomly.

if (flowFile == null) {
return;
}


I am just curious to know, in what situation would onTrigger will get called if 
there is no FlowFile?

Regards,
Manish



RE: OnTrigger - FlowFile is Null

2016-09-14 Thread Manish Gupta 8
Apologies for the typo. I meant *Mark.


Regards,
Manish
Cell: +1 646-744-7606
Email: mgupt...@sapient.com<mailto:mgupt...@sapient.com>

From: Manish Gupta 8
Sent: Wednesday, September 14, 2016 2:55 PM
To: users@nifi.apache.org
Subject: RE: OnTrigger - FlowFile is Null

Thanks Matt. This is very helpful. In my case also, it's the 3rd scenario. I 
remember, this error started showing up only when I had increased the 
concurrent tasks to greater than 1 sometime earlier today.

Regards,
Manish
Cell: +1 646-744-7606
Email: mgupt...@sapient.com<mailto:mgupt...@sapient.com>

From: Mark Payne [mailto:marka...@hotmail.com]
Sent: Wednesday, September 14, 2016 2:50 PM
To: users@nifi.apache.org<mailto:users@nifi.apache.org>
Subject: Re: OnTrigger - FlowFile is Null

Manish,

This happens for a few reasons:

* Processor has no incoming connections (is a source processor)
* Processor has @ScheduleWhenEmpty annotation
* Processor has more than 1 concurrent task

The main reason is the third one above. If you have multiple concurrent tasks, 
Thread 1 can determine that there is 1
FlowFile queued. Thread 2 then determines that there is 1 FlowFile queued. Both 
threads call
onTrigger(). Thread 1 gets the FlowFIle, and Thread 2 gets null.

Does this help?

Thanks
-Mark


On Sep 14, 2016, at 2:45 PM, Manish Gupta 8 
mailto:mgupt...@sapient.com>> wrote:

Hi,

In one of our custom processor, I forgot to mention the following check, and 
the error started showing up randomly.

if (flowFile == null) {
return;
}


I am just curious to know, in what situation would onTrigger will get called if 
there is no FlowFile?

Regards,
Manish



RE: Best Practice for backing up NiFi Flows

2016-09-14 Thread Manish Gupta 8
Thanks James. Using Templates makes more sense. So what we are going to do is:

· Take backup of Conf Folder in source control (scheduled).

· Create Template for the top level Processor Group(s) periodically 
(will try to automate it using NiFi REST API for Processor Group).

Regards,
Manish
Cell: +1 646-744-7606
Email: mgupt...@sapient.com<mailto:mgupt...@sapient.com>

From: James Wing [mailto:jvw...@gmail.com]
Sent: Wednesday, September 14, 2016 3:02 PM
To: users@nifi.apache.org
Subject: Re: Best Practice for backing up NiFi Flows

Manish, you are absolutely right to back up your flow.xml.gz and conf files.  
But I would carefully distinguish between using these backups to recreate an 
equivalent new NiFi, versus attempting to reset the state of your existing 
NiFi.  The difference is the live data in your flow, in the provenance 
repository, in state variables, etc.  Restoring a flow definition that no 
longer matches your content and provenance data may have unexpected results for 
you, and for systems connecting with NiFi.  NiFi does try hard to handle these 
changes smoothly, but it isn't a magic time machine.

Deploying flow.xml.gz can work, especially when deployed with conf files that 
reference IDs in the flow (like authorizations.xml), or the 
nifi.sensitive.props.key setting, etc.  But if you overwrite a running flow, 
you still have the data migration problem.

Templates are the current recommended best practice for deployment.  As I 
understand it, templates provide:

1.) Concise packaging for deployment
2.) Separation between site-specific configuration like authorizations from the 
flow logic
3.) Workflow that allows, encourages, forces the administrator to address 
migration from the existing flow to incorporate the new template

Personally, I think it centers on acceptance or rejection of the 
command-and-control model, which is controversial and different from most other 
systems.  Templates fit within command-and-control, overwriting flow.xml.gz 
suggests a different model.

I know there are many other opinions on this.

Thanks,

James

On Tue, Sep 13, 2016 at 1:30 PM, Manish Gupta 8 
mailto:mgupt...@sapient.com>> wrote:
Hello Everyone,

Is there a best practice for keeping a backup of all the data flows we are 
developing in NiFi?

Currently we take a copy of flow.xml.gz every hour and keep it in backup folder 
(also in our source control). Also, we keep a copy of all Config files in 
source control.


• We are assuming that using flow.xml.gz and Config files, we will be 
able to restore the NiFi in case of any failure or if someone makes some 
mistake. Is this assumption correct? Is there a better way to deal with this?

• When we move to production (or some other environment), will it be as 
simple as dropping flow.xml.gz in a new NiFi installation on NCM along with 
making some environment related changes? Or, should we use templates on Dev, 
and import on Prod?

Thanks,
Manish




RE: logging all transformed flowfiles

2016-09-27 Thread Manish Gupta 8
Hi Phil,

We are also doing a similar thing but not keeping all the content after each 
transformation externally. What we do is, only send the flow file attributes to 
an external storage (like file / Event Hub / Database/NoSQL) using 
AttributesToJSON processor and then send it for logging after every logical 
step where we want to log (after adding couple of additional details like - 
step name, #of rows in file, hascode etc.).

For your scenario, I think you can simply clone the output relationship from 
each of your processors and send it to a single/multiple logging/sink 
processors. For keeping the lineage, you have couple of options:
1. Use different sink/folder/table for each step (with corresponding name)
2. Keep file name consistent to track the lineage
3. Modify the Flow file content to make sure you can track the lineage from the 
metadata content.


Regards,
Manish

-Original Message-
From: philippe.gib...@orange.com [mailto:philippe.gib...@orange.com] 
Sent: Tuesday, September 27, 2016 7:33 PM
To: users@nifi.apache.org
Subject: logging all transformed flowfiles

Hello,
My SW context : standalone  NiFi  1.0.0

My Problem  : I would like to log all the different transformations applied to 
an initial file ( input) up to exiting the  DF ( output) :
If imagine this simple DF :
File1 (in) --> Processor1 --> flow1 --> Processor2 --> flow2 --> File2 (out)
I would like to store outside of Nifi  ( in my own  external DB) ->   File1, 
flow1, flow2, File2
Are  there some simple  REST API to help to accomplish this ( I looked at  Data 
provenance and SiteToSiteProvenanceReportingTask but not clearly found the 
right way to implement this)
Any idea ?

Phil
Best regards 


-Original Message-
From: Joe Witt [mailto:joe.w...@gmail.com] 
Sent: mercredi 30 mars 2016 04:58
To: users@nifi.apache.org
Subject: Re: Developing dataflows in the canvas editor

Dmitry these are great questions and Chris that was in my opinion a pretty 
excellent response - 'noob' or not.

The only thing I'd add Dmitry is that some of what you're saying regarding 
templates themselves is very true.  We can do better and so much more than we 
are.  We have a feature proposal/discussion framing here [1,2] and please by 
all means help us shape how this evolves.

[1] 
https://cwiki.apache.org/confluence/display/NIFI/Configuration+Management+of+Flows
[2] https://cwiki.apache.org/confluence/display/NIFI/Extension+Registry

Thanks
Joe

On Tue, Mar 29, 2016 at 1:59 PM, McDermott, Chris Kevin (MSDU -
STaTS/StorefrontRemote)  wrote:
> Dimitri,
>
> From one noob to another, welcome.
>
> All modifications to the canvas are automatically saved.  If you want to 
> organize multiple flow instances look to process groups.  Drag a process 
> group onto the canvas. Double click the process group to open it. Then drag a 
> template onto the canvas.  Use the breadcrumbs to navigate back to the root 
> process group (root of the canvas).  Create a second process group.  Wash and 
> repeat.  Process groups can be nested to your hearts content.  Process groups 
> themselves can be saved as templates.  You can also copy then paste in 
> process groups.  And you can drag processors and process groups into other 
> process groups, although I am not sure that you can do this with 
> multi-select.  They are great for creating a high-level abstraction for a 
> complex flow.
>
> I find its best to use the zoom controls.  For what its worth Google Maps 
> uses the same paradigm for zooming.   I’m not sure these web-apps can really 
> understand “gestures”, its just that the browser translates the gesture into 
> scroll events which NiFi uses for zooming.
>
> Good luck,
>
> Chris
>
> Date: Tuesday, March 29, 2016 at 1:27 PM
> To: "users@nifi.apache.org" 
> mailto:users@nifi.apache.org>>
> Subject: Developing dataflows in the canvas editor
> From: Dmitry Goldenberg 
> mailto:dgoldenb...@hexastax.com>>
> Reply-To: "users@nifi.apache.org" 
> mailto:users@nifi.apache.org>>
>
> Hi,
>
> These may be just 'noob' impressions from someone who hasn't learned enough 
> NiFi yet (I may be missing something obvious).
> z
> My first confusion is about dataflows vs. templates.  I've developed a couple 
> of templates.  Now I want to drop a template into the canvas and treat that 
> as a dataflow or an instance of a template.  But I don't see a way to save 
> this instance into any persistent store, or any way to manage its lifecycle 
> (edit, delete etc).  Is there something I'm missing or are there features in 
> progress related to this?
>
> I mean, where does the dataflow go if I kill the browser? It seems to 
> persist... but what happens when I want to create a slightly different 
> rendition of the same flow?  Is there a namespaced persistence for dataflows 
> with CRUD operations supported?  I keep looking for a File -> New, File -> 
> Open, File -> Save type of metaphor.
>
> My second item is the m

NiFi Cluster mode

2016-09-27 Thread Manish Gupta 8
Hello Everyone,

In one of our project, we only have one large box available for running NiFi in 
production (to start with). This is due to some data center related issues 
(availability of space etc.).  Later, if required, we will move to a multi-node 
cluster.

We are thinking about - Should we set up NiFi to run in Clustered mode from the 
beginning i.e. NCM + Slave node on single machine or should we set it up as 
single-node deployment. Some of the considerations are:

* How difficult is it to move from single node deployment (without 
running NCM) to a multi-node cluster? I mean, can we simply make some 
configuration settings and run NCM to existing installation and add multiple 
nodes to it? Should we worry about any potential data/configuration loss while 
doing such an upgrade?

* Is NCM resource intensive?

* Any other factor we should consider?

Thanks,
Manish



RE: NiFi Cluster mode

2016-09-27 Thread Manish Gupta 8
Thanks Andy. Sorry, I forgot to mention – we are on 0.7 now, and will be moving 
to 1.0 after some time. Also, we have 128 GB machine.
So, we will go with multi-node setup only.

Regards,
Manish

From: Andy LoPresto [mailto:alopre...@apache.org]
Sent: Wednesday, September 28, 2016 12:06 AM
To: users@nifi.apache.org
Subject: Re: NiFi Cluster mode

Hi Manish,

With NiFi 1.0.0, the NCM model has been replaced by “zero master clustering” or 
“ZMC” [1]. This means an arbitrary number of nodes can run in the cluster, and 
there is no longer a SPOF. With ZMC, all connected nodes provide the UI and 
replicate changes to the flow to all other nodes. If a node is lost, a new 
coordinating node is elected via ZooKeeper.

In your case, if the machine available really is “large”, I would definitely 
recommend setting up a cluster on the single node and migrating or expanding it 
when possible. Migrating from a single node to a multi-node cluster is just a 
matter of configuration changes, and you will not lose any data. However, if 
you configure the application as a cluster from the beginning, you can 
seamlessly introduce new nodes to the cluster as they become provisioned with 
minimal changes to the running instances. Just ensure that you have the RAM 
available to run decent Java heaps for both instances, and follow the 
configuration best practices in the NiFi Admin Guide [2] for open file handles, 
etc.

[1] 
https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#clustering
[2] 
https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#configuration-best-practices


Andy LoPresto
alopre...@apache.org<mailto:alopre...@apache.org>
alopresto.apa...@gmail.com<mailto:alopresto.apa...@gmail.com>
PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69

On Sep 27, 2016, at 11:27 AM, Manish Gupta 8 
mailto:mgupt...@sapient.com>> wrote:

Hello Everyone,

In one of our project, we only have one large box available for running NiFi in 
production (to start with). This is due to some data center related issues 
(availability of space etc.).  Later, if required, we will move to a multi-node 
cluster.

We are thinking about - Should we set up NiFi to run in Clustered mode from the 
beginning i.e. NCM + Slave node on single machine or should we set it up as 
single-node deployment. Some of the considerations are:
• How difficult is it to move from single node deployment (without 
running NCM) to a multi-node cluster? I mean, can we simply make some 
configuration settings and run NCM to existing installation and add multiple 
nodes to it? Should we worry about any potential data/configuration loss while 
doing such an upgrade?
• Is NCM resource intensive?
• Any other factor we should consider?

Thanks,
Manish



RE: UI: feedback on the processor 'color' in NiFi 1.0

2016-09-27 Thread Manish Gupta 8
I think one of the things that will really help in complex data flow from UI 
perspective is “colored icons” on each processor. Not sure if this already part 
of 1.0, but from my experience, icons definitely help a lot in quickly 
understanding complex flows. Those icons can be fixed (embedded within the nar) 
or may be dynamic (user defined icon file for different processors) – just a 
suggestion.

Regards,
Manish

From: Andrew Grande [mailto:apere...@gmail.com]
Sent: Tuesday, September 20, 2016 10:40 PM
To: users@nifi.apache.org
Subject: Re: UI: feedback on the processor 'color' in NiFi 1.0


No need to go wild, changing processor colors should be enough, IMO. PG and RPG 
are possible candidates, but they are different enough already, I guess.

What I heard quite often was to differentiate between regular processors, 
incoming sources of data and out only (data producers?). Maybe even with a 
shape?

Andrew

On Tue, Sep 20, 2016, 12:35 PM Rob Moran 
mailto:rmo...@gmail.com>> wrote:
Good points. I was thinking a label would be tied to the group of components to 
which it was applied, but that could also introduce problems as things move and 
are added to a flow.

So would you all expect to be able to change the color of every component type, 
or just processors?

Andrew - your comment about coloring terminators red is interesting as well. 
What are some other parts of a flow you might use color to identify? Along with 
backpressure, we could explore other ways to call these things out so users do 
not come up with their own methods. Perhaps there are layer options, like on a 
map (e.g., "show terrain" or "show traffic").

Rob

On Tue, Sep 20, 2016 at 11:23 AM, Andrew Grande 
mailto:apere...@gmail.com>> wrote:

I agree. Labels are great for grouping, beyond PGs. Processor colors 
individually add value. E.g. flow terminator colored in red was a very common 
pattern I used. Besides, labels are not grouped with components, so moving 
things and re-arranging is a pain.

Andrew

On Tue, Sep 20, 2016, 11:21 AM Joe Skora 
mailto:jsk...@gmail.com>> wrote:
Rob,
The labelling functionality you described sounds very useful in general.  But, 
I miss the processor color too.

I think labels are really useful for identifying groups of components and areas 
in the flow, but I worry that needing to use them in volume for processor 
coloring will increase the API and browser canvas load for elements that don't 
actually affect the flow.

On Tue, Sep 20, 2016 at 10:40 AM, Rob Moran 
mailto:rmo...@gmail.com>> wrote:
What if we promote the use of Labels as a way to highlight things. We could add 
functionality to expand their usefulness as a way to highlight things on the 
canvas. I believe that is their intended use.

Today you can create a label and change its color to highlight single or 
multiple components. Even better you can do it for any component (not just 
processors).

To expand on functionality, I'm imagining a context menu and palette action to 
"Label" a selected component or components. This would prompt a user to pick a 
background and add text which would place a label around everything once it's 
applied.

Rob

On Mon, Sep 19, 2016 at 6:42 PM, Jeff 
mailto:jtsw...@gmail.com>> wrote:
I was thinking, in addition to changing the color of the icon on the processor, 
that the color of the drop shadow could be changed as well.  That would provide 
more contrast, but preserve readability, in my opinion.

On Mon, Sep 19, 2016 at 6:39 PM Andrew Grande 
mailto:apere...@gmail.com>> wrote:
Hi All,

Rolling with UI feedback threads. This time I'd like to discuss how NiFi 'lost' 
its ability to change processor boxes color. I.e. as you can see from a 
screenshot attached, it does change color for the processor in the flow 
overview panel, but the processor itself only changes the icon in the top-left 
of the box. I came across a few users who definitely miss the old way. I 
personally think changing the icon color for the processor doesn't go far 
enough, especially when one is dealing with a flow of several dozen processors, 
zooms in and out often. The overview helps, but it's not the same.

Proposal - can we restore how color selection for the processor changed the 
actual background of the processor box on the canvas? Let the user go wild with 
colors and deal with readability, but at least it's easy to spot 'important' 
things this way. And with multi-tenant authorization it becomes a poor-man's 
doc between teams, to an extent.

Thanks for any feedback,
Andrew





PutHiveQL and Hive Connection Pool with HDInsight

2016-09-29 Thread Manish Gupta 8
Hi,

I am not able to use PutHiveQL when accessing Hive on HDInsight. I am using 
NiFi 0.7.


* Tried specifying the URL in couple of different ways. If I follow 
Azure Documentation 
(https://azure.microsoft.com/en-in/documentation/articles/hdinsight-connect-hive-jdbc-driver/)
 and specify the URL as jdbc:hive2:// 
somehdiclustername.azurehdinsight.net:443/somedbname;ssl=true?hive.server2.transport.mode=http;hive.server2.thrift.http.path=/hive2,
 then I get a "failed to process session due to java.lang.NoSuchFieldError: 
INSTANCE: java.lang.NoSuchFieldError: INSTANCE".

* I tried using hive-jdbc jars from my cluster (dropping them into 
lib), but then NiFi didn't start (some javax.xml.parsers conflicts).

* When I use 
"jdbc:hive2://somehdiclustername.azurehdinsight.net:443/somedbname", then I get 
following error.

Is this issue because of https://issues.apache.org/jira/browse/NIFI-2575 or my 
connection settings are incorrect? Any workaround? /Any reference 
settings/example for HDI?
All I need to do is call an Alter Table Add Partition command in Hive from NiFi 
(once a day). Should I use HWI/Custom processor?

2016-09-29 08:18:48,194 INFO [StandardProcessScheduler Thread-1] 
o.a.n.c.s.TimerDrivenSchedulingAgent Scheduled 
PutHiveQL[id=05505d0c-eee1-48bc-8a99-b53302118933] to run with 1 threads
2016-09-29 08:18:48,194 INFO [Timer-Driven Process Thread-6] 
o.a.nifi.dbcp.hive.HiveConnectionPool 
HiveConnectionPool[id=4d7f766a-1177-4f1d-a376-6ba5b84bf856] Simple 
Authentication
2016-09-29 08:18:48,262 INFO [Timer-Driven Process Thread-6] 
org.apache.hive.jdbc.Utils Supplied authorities: 
somehdiclustername.azurehdinsight.net:443
2016-09-29 08:18:48,263 INFO [Timer-Driven Process Thread-6] 
org.apache.hive.jdbc.Utils Resolved authority: 
somehdiclustername.azurehdinsight.net:443
2016-09-29 08:18:48,468 INFO [Timer-Driven Process Thread-6] 
org.apache.hive.jdbc.HiveConnection Transport Used for JDBC connection: null
2016-09-29 08:18:48,468 ERROR [Timer-Driven Process Thread-6] 
o.a.nifi.dbcp.hive.HiveConnectionPool 
HiveConnectionPool[id=4d7f766a-1177-4f1d-a376-6ba5b84bf856] Error getting Hive 
connection
2016-09-29 08:18:48,484 ERROR [Timer-Driven Process Thread-6] 
o.a.nifi.dbcp.hive.HiveConnectionPool
org.apache.commons.dbcp.SQLNestedException: Cannot create 
PoolableConnectionFactory (Could not open client transport with JDBC Uri: 
jdbc:hive2://somehdiclustername.azurehdinsight.net:443/somedbname;ssl=true: 
Invalid status 72)
at 
org.apache.commons.dbcp.BasicDataSource.createPoolableConnectionFactory(BasicDataSource.java:1549)
 ~[commons-dbcp-1.4.jar:1.4]
at 
org.apache.commons.dbcp.BasicDataSource.createDataSource(BasicDataSource.java:1388)
 ~[commons-dbcp-1.4.jar:1.4]
at 
org.apache.commons.dbcp.BasicDataSource.getConnection(BasicDataSource.java:1044)
 ~[commons-dbcp-1.4.jar:1.4]
at 
org.apache.nifi.dbcp.hive.HiveConnectionPool.getConnection(HiveConnectionPool.java:289)
 ~[nifi-hive-processors-0.7.0.jar:0.7.0]
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
~[na:1.8.0_102]
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
~[na:1.8.0_102]
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 ~[na:1.8.0_102]
at java.lang.reflect.Method.invoke(Method.java:498) 
~[na:1.8.0_102]
at 
org.apache.nifi.controller.service.StandardControllerServiceProvider$1.invoke(StandardControllerServiceProvider.java:166)
 [nifi-framework-core-0.7.0.jar:0.7.0]
at com.sun.proxy.$Proxy89.getConnection(Unknown Source) [na:na]
at 
org.apache.nifi.processors.hive.PutHiveQL.onTrigger(PutHiveQL.java:152) 
[nifi-hive-processors-0.7.0.jar:0.7.0]
at 
org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27)
 [nifi-api-0.7.0.jar:0.7.0]
at 
org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1054)
 [nifi-framework-core-0.7.0.jar:0.7.0]
at 
org.apache.nifi.controller.tasks.ContinuallyRunProcessorTask.call(ContinuallyRunProcessorTask.java:136)
 [nifi-framework-core-0.7.0.jar:0.7.0]
at 
org.apache.nifi.controller.tasks.ContinuallyRunProcessorTask.call(ContinuallyRunProcessorTask.java:47)
 [nifi-framework-core-0.7.0.jar:0.7.0]
at 
org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:127)
 [nifi-framework-core-0.7.0.jar:0.7.0]
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
[na:1.8.0_102]
at 
java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) [na:1.8.0_102]
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
 [na:1.8.0_1

RE: PutHiveQL and Hive Connection Pool with HDInsight

2016-09-29 Thread Manish Gupta 8
Thank you Matt. I did tried with hive.server2.transport.mode=http like this 
jdbc:hive2:// 
somehdiclustername.azurehdinsight.net:443/somedbname;ssl=true?hive.server2.transport.mode=http;hive.server2.thrift.http.path=/hive2<http://somehdiclustername.azurehdinsight.net:443/somedbname;ssl=true?hive.server2.transport.mode=http;hive.server2.thrift.http.path=/hive2>.
But, I was getting java.lang.NoSuchFieldError: INSTANCE: 
java.lang.NoSuchFieldError: INSTANCE.

I will try again with transportMode=http and/or httpPath=cliservice.

But, as per hive’s documentation, right syntax should be 
hive.server2.transport.mode.(https://cwiki.apache.org/confluence/display/Hive/Setting+Up+HiveServer2)

Regards,
Manish

From: Matt Burgess [mailto:mattyb...@apache.org]
Sent: Thursday, September 29, 2016 7:28 PM
To: users@nifi.apache.org
Subject: Re: PutHiveQL and Hive Connection Pool with HDInsight

Manish,

According to [1], status 72 means a bad URL, perhaps you need a transportMode 
and/or httpPath parameter in the URL (as described in the post)?

Regards,
Matt

[1] 
https://community.hortonworks.com/questions/23864/hive-http-transport-mode-problem.html


On Thu, Sep 29, 2016 at 9:06 AM, Manish Gupta 8 
mailto:mgupt...@sapient.com>> wrote:
Hi,

I am not able to use PutHiveQL when accessing Hive on HDInsight. I am using 
NiFi 0.7.


• Tried specifying the URL in couple of different ways. If I follow 
Azure Documentation 
(https://azure.microsoft.com/en-in/documentation/articles/hdinsight-connect-hive-jdbc-driver/)
 and specify the URL as jdbc:hive2:// 
somehdiclustername.azurehdinsight.net:443/somedbname;ssl=true?hive.server2.transport.mode=http;hive.server2.thrift.http.path=/hive2<http://somehdiclustername.azurehdinsight.net:443/somedbname;ssl=true?hive.server2.transport.mode=http;hive.server2.thrift.http.path=/hive2>,
 then I get a “failed to process session due to java.lang.NoSuchFieldError: 
INSTANCE: java.lang.NoSuchFieldError: INSTANCE”.

• I tried using hive-jdbc jars from my cluster (dropping them into 
lib), but then NiFi didn’t start (some javax.xml.parsers conflicts).

• When I use 
“jdbc:hive2://somehdiclustername.azurehdinsight.net:443/somedbname<http://somehdiclustername.azurehdinsight.net:443/somedbname>”,
 then I get following error.
 Is this issue because of https://issues.apache.org/jira/browse/NIFI-2575 or my 
connection settings are incorrect? Any workaround? /Any reference 
settings/example for HDI?
All I need to do is call an Alter Table Add Partition command in Hive from NiFi 
(once a day). Should I use HWI/Custom processor?

2016-09-29 08:18:48,194 INFO [StandardProcessScheduler Thread-1] 
o.a.n.c.s.TimerDrivenSchedulingAgent Scheduled 
PutHiveQL[id=05505d0c-eee1-48bc-8a99-b53302118933] to run with 1 threads
2016-09-29 08:18:48,194 INFO [Timer-Driven Process Thread-6] 
o.a.nifi.dbcp.hive.HiveConnectionPool 
HiveConnectionPool[id=4d7f766a-1177-4f1d-a376-6ba5b84bf856] Simple 
Authentication
2016-09-29 08:18:48,262 INFO [Timer-Driven Process Thread-6] 
org.apache.hive.jdbc.Utils Supplied authorities: 
somehdiclustername.azurehdinsight.net:443<http://somehdiclustername.azurehdinsight.net:443>
2016-09-29 08:18:48,263 INFO [Timer-Driven Process Thread-6] 
org.apache.hive.jdbc.Utils Resolved authority: 
somehdiclustername.azurehdinsight.net:443<http://somehdiclustername.azurehdinsight.net:443>
2016-09-29 08:18:48,468 INFO [Timer-Driven Process Thread-6] 
org.apache.hive.jdbc.HiveConnection Transport Used for JDBC connection: null
2016-09-29 08:18:48,468 ERROR [Timer-Driven Process Thread-6] 
o.a.nifi.dbcp.hive.HiveConnectionPool 
HiveConnectionPool[id=4d7f766a-1177-4f1d-a376-6ba5b84bf856] Error getting Hive 
connection
2016-09-29 08:18:48,484 ERROR [Timer-Driven Process Thread-6] 
o.a.nifi.dbcp.hive.HiveConnectionPool
org.apache.commons.dbcp.SQLNestedException: Cannot create 
PoolableConnectionFactory (Could not open client transport with JDBC Uri: 
jdbc:hive2://somehdiclustername.azurehdinsight.net:443/somedbname;ssl=true<http://somehdiclustername.azurehdinsight.net:443/somedbname;ssl=true>:
 Invalid status 72)
at 
org.apache.commons.dbcp.BasicDataSource.createPoolableConnectionFactory(BasicDataSource.java:1549)
 ~[commons-dbcp-1.4.jar:1.4]
at 
org.apache.commons.dbcp.BasicDataSource.createDataSource(BasicDataSource.java:1388)
 ~[commons-dbcp-1.4.jar:1.4]
at 
org.apache.commons.dbcp.BasicDataSource.getConnection(BasicDataSource.java:1044)
 ~[commons-dbcp-1.4.jar:1.4]
at 
org.apache.nifi.dbcp.hive.HiveConnectionPool.getConnection(HiveConnectionPool.java:289)
 ~[nifi-hive-processors-0.7.0.jar:0.7.0]
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
~[na:1.8.0_102]
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
~[na:1.8.0_102]
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(D

RE: PutHiveQL and Hive Connection Pool with HDInsight

2016-09-29 Thread Manish Gupta 8
Tried with different combinations, but couldn’t succeed. It’s a HDI 3.4 Cluster 
with default hive settings.

Some of the errors I received:


1.   PutHiveQL[id=05505d0c-eee1-48bc-8a99-b53302118933] 
PutHiveQL[id=05505d0c-eee1-48bc-8a99-b53302118933] failed to process due to 
org.apache.nifi.processor.exception.ProcessException: 
org.apache.commons.dbcp.SQLNestedException: Cannot create 
PoolableConnectionFactory (Could not open client transport with JDBC Uri: 
jdbc:hive2://somehdiclustername.azurehdinsight.net:443/default;hive.server2.transport.mode=http;hive.server2.thrift.http.path=/:
 java.net.SocketException: Connection reset); rolling back session: 
org.apache.nifi.processor.exception.ProcessException: 
org.apache.commons.dbcp.SQLNestedException: Cannot create 
PoolableConnectionFactory (Could not open client transport with JDBC Uri: 
jdbc:hive2://somehdiclustername.azurehdinsight.net:443/default;hive.server2.transport.mode=http;hive.server2.thrift.http.path=/:
 java.net.SocketException: Connection reset).



2.   failed to process session due to java.lang.NoSuchFieldError: INSTANCE: 
java.lang.NoSuchFieldError: INSTANCE



3.   PutHiveQL[id=05505d0c-eee1-48bc-8a99-b53302118933] 
PutHiveQL[id=05505d0c-eee1-48bc-8a99-b53302118933] failed to process due to 
org.apache.nifi.processor.exception.ProcessException: 
org.apache.commons.dbcp.SQLNestedException: Cannot create 
PoolableConnectionFactory (Could not open client transport with JDBC Uri: 
jdbc:hive2://somehdiclustername.azurehdinsight.net:443/somedbname;ssl=true?transportMode=http;hive.server2.thrift.http.path=/:
 Invalid status 72); rolling back session: 
org.apache.nifi.processor.exception.ProcessException: 
org.apache.commons.dbcp.SQLNestedException: Cannot create 
PoolableConnectionFactory (Could not open client transport with JDBC Uri: 
jdbc:hive2://somehdiclustername.azurehdinsight.net:443/somedbname;ssl=true?transportMode=http;hive.server2.thrift.http.path=/:
 Invalid status 72)



4.   PutHiveQL[id=05505d0c-eee1-48bc-8a99-b53302118933] 
PutHiveQL[id=05505d0c-eee1-48bc-8a99-b53302118933] failed to process due to 
org.apache.nifi.processor.exception.ProcessException: 
org.apache.commons.dbcp.SQLNestedException: Cannot create 
PoolableConnectionFactory (Could not open client transport with JDBC Uri: 
jdbc:hive2://somehdiclustername.azurehdinsight.net:443/somedbname;ssl=true?transportMode=http;httpPath=/:
 Invalid status 72); rolling back session: 
org.apache.nifi.processor.exception.ProcessException: 
org.apache.commons.dbcp.SQLNestedException: Cannot create 
PoolableConnectionFactory (Could not open client transport with JDBC Uri: 
jdbc:hive2://somehdiclustername.azurehdinsight.net:443/somedbname;ssl=true?transportMode=http;httpPath=/:
 Invalid status 72)



5.   PutHiveQL[id=05505d0c-eee1-48bc-8a99-b53302118933] 
PutHiveQL[id=05505d0c-eee1-48bc-8a99-b53302118933] failed to process due to 
org.apache.nifi.processor.exception.ProcessException: 
org.apache.commons.dbcp.SQLNestedException: Cannot create 
PoolableConnectionFactory (Could not open client transport with JDBC Uri: 
jdbc:hive2://somehdiclustername.azurehdinsight.net:443/somedbname;ssl=true?transportMode=http:
 Invalid status 72); rolling back session: 
org.apache.nifi.processor.exception.ProcessException: 
org.apache.commons.dbcp.SQLNestedException: Cannot create 
PoolableConnectionFactory (Could not open client transport with JDBC Uri: 
jdbc:hive2://somehdiclustername.azurehdinsight.net:443/somedbname;ssl=true?transportMode=http:
 Invalid status 72)



Regards,
Manish

From: Manish Gupta 8 [mailto:mgupt...@sapient.com]
Sent: Friday, September 30, 2016 12:44 AM
To: users@nifi.apache.org
Subject: RE: PutHiveQL and Hive Connection Pool with HDInsight

Thank you Matt. I did tried with hive.server2.transport.mode=http like this 
jdbc:hive2:// 
somehdiclustername.azurehdinsight.net:443/somedbname;ssl=true?hive.server2.transport.mode=http;hive.server2.thrift.http.path=/hive2<http://somehdiclustername.azurehdinsight.net:443/somedbname;ssl=true?hive.server2.transport.mode=http;hive.server2.thrift.http.path=/hive2>.
But, I was getting java.lang.NoSuchFieldError: INSTANCE: 
java.lang.NoSuchFieldError: INSTANCE.

I will try again with transportMode=http and/or httpPath=cliservice.

But, as per hive’s documentation, right syntax should be 
hive.server2.transport.mode.(https://cwiki.apache.org/confluence/display/Hive/Setting+Up+HiveServer2)

Regards,
Manish

From: Matt Burgess [mailto:mattyb...@apache.org]
Sent: Thursday, September 29, 2016 7:28 PM
To: users@nifi.apache.org<mailto:users@nifi.apache.org>
Subject: Re: PutHiveQL and Hive Connection Pool with HDInsight

Manish,

According to [1], status 72 means a bad URL, perhaps you need a transportMode 
and/or httpPath parameter in the URL (as described in the post)?

Regards,
Matt

[1] 
https://community.hortonworks.com/questions/23864/hive-http-transport-mode-problem.html


On Thu, Sep 29, 2016 at 9:06 AM, Ma

RE: PutHiveQL and Hive Connection Pool with HDInsight

2016-09-30 Thread Manish Gupta 8
Tried couple of more connection options, but always got an error while setting 
up Hive Connection Pool. What’s strange is DBCP Connection Pool works fine with 
the same Hive JDBC connection string.

Now I am writing a custom “PutSQL” like processor that uses standard DBCP 
Controller service and allows to run DDL commands on Hive (since standard 
PutSQL does not allow DDL statements - only insert and update works). 
Basically, I’ll be writing a custom PutHiveQL that can work on standard DBCP.

Regards,
Manish

From: Manish Gupta 8 [mailto:mgupt...@sapient.com]
Sent: Friday, September 30, 2016 3:09 AM
To: users@nifi.apache.org
Subject: RE: PutHiveQL and Hive Connection Pool with HDInsight

Tried with different combinations, but couldn’t succeed. It’s a HDI 3.4 Cluster 
with default hive settings.

Some of the errors I received:


1.   PutHiveQL[id=05505d0c-eee1-48bc-8a99-b53302118933] 
PutHiveQL[id=05505d0c-eee1-48bc-8a99-b53302118933] failed to process due to 
org.apache.nifi.processor.exception.ProcessException: 
org.apache.commons.dbcp.SQLNestedException: Cannot create 
PoolableConnectionFactory (Could not open client transport with JDBC Uri: 
jdbc:hive2://somehdiclustername.azurehdinsight.net:443/default;hive.server2.transport.mode=http;hive.server2.thrift.http.path=/:
 java.net.SocketException: Connection reset); rolling back session: 
org.apache.nifi.processor.exception.ProcessException: 
org.apache.commons.dbcp.SQLNestedException: Cannot create 
PoolableConnectionFactory (Could not open client transport with JDBC Uri: 
jdbc:hive2://somehdiclustername.azurehdinsight.net:443/default;hive.server2.transport.mode=http;hive.server2.thrift.http.path=/:
 java.net.SocketException: Connection reset).



2.   failed to process session due to java.lang.NoSuchFieldError: INSTANCE: 
java.lang.NoSuchFieldError: INSTANCE



3.   PutHiveQL[id=05505d0c-eee1-48bc-8a99-b53302118933] 
PutHiveQL[id=05505d0c-eee1-48bc-8a99-b53302118933] failed to process due to 
org.apache.nifi.processor.exception.ProcessException: 
org.apache.commons.dbcp.SQLNestedException: Cannot create 
PoolableConnectionFactory (Could not open client transport with JDBC Uri: 
jdbc:hive2://somehdiclustername.azurehdinsight.net:443/somedbname;ssl=true?transportMode=http;hive.server2.thrift.http.path=/:
 Invalid status 72); rolling back session: 
org.apache.nifi.processor.exception.ProcessException: 
org.apache.commons.dbcp.SQLNestedException: Cannot create 
PoolableConnectionFactory (Could not open client transport with JDBC Uri: 
jdbc:hive2://somehdiclustername.azurehdinsight.net:443/somedbname;ssl=true?transportMode=http;hive.server2.thrift.http.path=/:
 Invalid status 72)



4.   PutHiveQL[id=05505d0c-eee1-48bc-8a99-b53302118933] 
PutHiveQL[id=05505d0c-eee1-48bc-8a99-b53302118933] failed to process due to 
org.apache.nifi.processor.exception.ProcessException: 
org.apache.commons.dbcp.SQLNestedException: Cannot create 
PoolableConnectionFactory (Could not open client transport with JDBC Uri: 
jdbc:hive2://somehdiclustername.azurehdinsight.net:443/somedbname;ssl=true?transportMode=http;httpPath=/:
 Invalid status 72); rolling back session: 
org.apache.nifi.processor.exception.ProcessException: 
org.apache.commons.dbcp.SQLNestedException: Cannot create 
PoolableConnectionFactory (Could not open client transport with JDBC Uri: 
jdbc:hive2://somehdiclustername.azurehdinsight.net:443/somedbname;ssl=true?transportMode=http;httpPath=/:
 Invalid status 72)



5.   PutHiveQL[id=05505d0c-eee1-48bc-8a99-b53302118933] 
PutHiveQL[id=05505d0c-eee1-48bc-8a99-b53302118933] failed to process due to 
org.apache.nifi.processor.exception.ProcessException: 
org.apache.commons.dbcp.SQLNestedException: Cannot create 
PoolableConnectionFactory (Could not open client transport with JDBC Uri: 
jdbc:hive2://somehdiclustername.azurehdinsight.net:443/somedbname;ssl=true?transportMode=http:
 Invalid status 72); rolling back session: 
org.apache.nifi.processor.exception.ProcessException: 
org.apache.commons.dbcp.SQLNestedException: Cannot create 
PoolableConnectionFactory (Could not open client transport with JDBC Uri: 
jdbc:hive2://somehdiclustername.azurehdinsight.net:443/somedbname;ssl=true?transportMode=http:
 Invalid status 72)



Regards,
Manish

From: Manish Gupta 8 [mailto:mgupt...@sapient.com]
Sent: Friday, September 30, 2016 12:44 AM
To: users@nifi.apache.org<mailto:users@nifi.apache.org>
Subject: RE: PutHiveQL and Hive Connection Pool with HDInsight

Thank you Matt. I did tried with hive.server2.transport.mode=http like this 
jdbc:hive2:// 
somehdiclustername.azurehdinsight.net:443/somedbname;ssl=true?hive.server2.transport.mode=http;hive.server2.thrift.http.path=/hive2<http://somehdiclustername.azurehdinsight.net:443/somedbname;ssl=true?hive.server2.transport.mode=http;hive.server2.thrift.http.path=/hive2>.
But, I was getting java.lang.NoSuchFieldError: INSTANCE: 
java.lang.NoSuchFieldError: INSTANCE.

I will try again with transportM

RE: PutHiveQL and Hive Connection Pool with HDInsight

2016-09-30 Thread Manish Gupta 8
Hi Matt,

Thanks for the detailed response. Really appreciate it.

Few comments –

· “if loading a DBCPConnectionPool for a HiveQL processor happens to 
work” – it does not allow to pick a DBCPConnectionPool. Only Hive pool is 
possible.

·  “The non-HiveQL processors make calls to the JDBC API that are not 
supported by the Hive JDBC driver” – Yeah. Even though the connection get 
created, but in PutSQL – only insert and update commands work. As soon as 
stmt.addBatch(); is called for Hive DDL, SQL Exception is thrown.

· Support for Kerberos is definitely a big factor. Don’t know how to 
deal with this if I use DBCP.

· “Hive version mismatch between NiFi and HDI 3.4”. I am using NiFi 0.7 
(which uses hive-jdbc-2.0.0) and the hive standalone jar from HDI cluster is 
hive-jdbc-1.2.1000.2.4.2.4-6-standalone. Not sure, if this should be an issue.

Regarding a possible solution – I am thinking about doing exactly the same what 
you mentioned (as a quick fix) – to take putHiveSQL and change it to use 
DBCPConnectionPool. But before starting on that, I will definitely try to dig 
deeper on why Hive DBCP is not working for HDP 3.4.

As of now, I don’t intend to use Query/Load data in Hive from NiFi at all. Only 
thing I am trying to do is – call certain DDL commands from NiFi. For example – 
I have modified the PutHDFS processor to return a flag if a new directory is 
created by an incoming flow file. If yes, we just want to call Alter Table add 
partition to refresh Hive metadata with newly created partition.


Regards,
Manish

From: Matt Burgess [mailto:mattyb...@apache.org]
Sent: Friday, September 30, 2016 7:13 PM
To: users@nifi.apache.org
Subject: Re: PutHiveQL and Hive Connection Pool with HDInsight

Manish,

Sorry to hear you're having issues connecting.  I should mention though that 
the fact that DBCPConnectionPool works with the Hive JDBC connection string 
doesn't imply that the processors that use DBCPConnectionPool will work for 
Hive. Notably there are three differences:

1) The Hive JDBC driver has many JARs, so to use DBCPConnectionPool 
successfully (in NiFi 1.0.0 due to NIFI-2604 [1]) with a processor not in the 
Hive bundle, you'd need to add all the JARs for the Hive driver. This is not 
possible in NiFi 0.x, instead you'd need the fat/standalone JAR for the Hive 
driver. The Hive bundle includes the driver, so if loading a DBCPConnectionPool 
for a HiveQL processor happens to work, I suspect it's because all the Hive 
JARs are in the classpath of the classloader (from the Hive processor(s)) used 
to instantiate the connection.

2) The non-HiveQL processors make calls to the JDBC API that are not supported 
by the Hive JDBC driver. The HiveQL processors specifically avoid those methods 
that are not supported, but ExecuteSQL and PutSQL (for example) do not.

3) DBCPConnectionPool does not support Kerberos, and if there are settings for 
the Hive driver that are not supported on the URL, then they must be in a 
config file (hive-site.xml, e.g.) and DBCPConnectionPool doesn't support that 
either. Perhaps for your use case this is not an issue.

I'd like to find the root of your problem if possible (maybe a Hive version 
mismatch between NiFi and HDI 3.4?), rather than have the need for a custom 
processor.  Even so, you shouldn't need to code a full custom processor for 
this, instead you could copy PutHiveQL and replace the references to 
HiveDBCPService with DBCPService, and change the call to getConnectionURL() to 
whatever URL you want to be recorded for provenance (not sure the connection 
string is available via java.sql.Connection, which is why the additional 
interface method was added).

If you do go down the custom processor path and get something working, I 
encourage you to share your findings with the community, in that case I'd 
imagine there are improvement(s) that can be made to the existing processor(s) 
so as to avoid the need for a custom one.

Regards,
Matt

[1] https://issues.apache.org/jira/browse/NIFI-2604

On Fri, Sep 30, 2016 at 7:59 AM, Manish Gupta 8 
mailto:mgupt...@sapient.com>> wrote:
Tried couple of more connection options, but always got an error while setting 
up Hive Connection Pool. What’s strange is DBCP Connection Pool works fine with 
the same Hive JDBC connection string.

Now I am writing a custom “PutSQL” like processor that uses standard DBCP 
Controller service and allows to run DDL commands on Hive (since standard 
PutSQL does not allow DDL statements - only insert and update works). 
Basically, I’ll be writing a custom PutHiveQL that can work on standard DBCP.

Regards,
Manish

From: Manish Gupta 8 [mailto:mgupt...@sapient.com<mailto:mgupt...@sapient.com>]
Sent: Friday, September 30, 2016 3:09 AM

To: users@nifi.apache.org<mailto:users@nifi.apache.org>
Subject: RE: PutHiveQL and Hive Connection Pool with HDInsight

Tried with different combinations, 

Configure Logging - Rolling

2016-10-04 Thread Manish Gupta 8
Hi,

Is there any documentation/example for setting rolling app and bootstrap log 
file. I want NiFi to create a new file every one hour. By default, everything 
goes into a single file.
I couldn't find much about logback.xml in admin/dev guide.

Thanks,
Manish



RE: Configure Logging - Rolling

2016-10-04 Thread Manish Gupta 8
Hi Bryan,

My app-log settings look as following, but the file is definitely not getting 
created every hour. Everything is going into a single file only. And, this is 
the default configuration.
I am on 0.7.


${org.apache.nifi.bootstrap.config.log.dir}/nifi-app.log



${org.apache.nifi.bootstrap.config.log.dir}/nifi-app_%d{-MM-dd_HH}.%i.log

100MB


30


%date %level [%thread] %logger{40} %msg%n
true







Regards,
Manish

From: Bryan Bende [mailto:bbe...@gmail.com]
Sent: Tuesday, October 04, 2016 7:35 PM
To: users@nifi.apache.org
Subject: Re: Configure Logging - Rolling

Hello,

I believe nifi-app.log is already doing hourly rollover with a max of 100mb per 
file.

The logback.xml contains some comments that explain how to change it for 
different rollovers:
https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-framework-bundle/nifi-framework/nifi-resources/src/main/resources/conf/logback.xml#L24-L30

The nifi-bootstrap.log looks like it is set to daily because it only has %d
https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-framework-bundle/nifi-framework/nifi-resources/src/main/resources/conf/logback.xml#L71

Thanks,

Bryan


On Tue, Oct 4, 2016 at 9:57 AM, Manish Gupta 8 
mailto:mgupt...@sapient.com>> wrote:
Hi,

Is there any documentation/example for setting rolling app and bootstrap log 
file. I want NiFi to create a new file every one hour. By default, everything 
goes into a single file.
I couldn’t find much about logback.xml in admin/dev guide.

Thanks,
Manish




RE: Configure Logging - Rolling

2016-10-04 Thread Manish Gupta 8
Hi Bryan,

I mean none of these files are getting rolled over. For APP_FILE, there are no 
nifi-app_2016-10-04_01.log getting created (on hourly basis), and everything 
goes into a single nifi-app.log file only. I tried changing it to minute level, 
still it didn’t work. Also, I tried setting the log directory to a hardcoded 
“logs” like it used to be in 0.3. Still, no success.

I am wondering if this could be because of running NiFi on windows.

Thanks,
Manish

From: Bryan Bende [mailto:bbe...@gmail.com]
Sent: Wednesday, October 05, 2016 1:37 AM
To: users@nifi.apache.org
Subject: Re: Configure Logging - Rolling

Manish,

When you say everything is going into a single file, are you saying that 
nifi-app.log, nifi-user.log, and nifi-bootstrap.log all never roll?

I just tested the nifi-app.log by running NiFi for over an hour and it rolled 
exactly at 4pm and ended up with:

nifi-app.log
nifi-app_2016-10-04_15.0.log

The current hour will always be in nifi-app.log.

The boostrap and user logs are set to roll per day, so you would have edit 
their fileNamePattern to include %d{-MM-dd_HH}  rather than just %d  if you 
wanted them to roll per hour.

-Bryan


On Tue, Oct 4, 2016 at 10:12 AM, Manish Gupta 8 
mailto:mgupt...@sapient.com>> wrote:
Hi Bryan,

My app-log settings look as following, but the file is definitely not getting 
created every hour. Everything is going into a single file only. And, this is 
the default configuration.
I am on 0.7.


${org.apache.nifi.bootstrap.config.log.dir}/nifi-app.log



${org.apache.nifi.bootstrap.config.log.dir}/nifi-app_%d{-MM-dd_HH}.%i.log

100MB


30


%date %level [%thread] %logger{40} %msg%n
true







Regards,
Manish

From: Bryan Bende [mailto:bbe...@gmail.com<mailto:bbe...@gmail.com>]
Sent: Tuesday, October 04, 2016 7:35 PM
To: users@nifi.apache.org<mailto:users@nifi.apache.org>
Subject: Re: Configure Logging - Rolling

Hello,

I believe nifi-app.log is already doing hourly rollover with a max of 100mb per 
file.

The logback.xml contains some comments that explain how to change it for 
different rollovers:
https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-framework-bundle/nifi-framework/nifi-resources/src/main/resources/conf/logback.xml#L24-L30

The nifi-bootstrap.log looks like it is set to daily because it only has %d
https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-framework-bundle/nifi-framework/nifi-resources/src/main/resources/conf/logback.xml#L71

Thanks,

Bryan


On Tue, Oct 4, 2016 at 9:57 AM, Manish Gupta 8 
mailto:mgupt...@sapient.com>> wrote:
Hi,

Is there any documentation/example for setting rolling app and bootstrap log 
file. I want NiFi to create a new file every one hour. By default, everything 
goes into a single file.
I couldn’t find much about logback.xml in admin/dev guide.

Thanks,
Manish





Download Multiple Files from Queue

2016-10-10 Thread Manish Gupta 8
Hello Everyone,

Is there a way I can download multiple flow files from a queue at once (from 
UI) or may be programmatically? If it's not possible now, I think this feature 
should definitely be added :)

Regards,
Manish



RE: Download Multiple Files from Queue

2016-10-10 Thread Manish Gupta 8
Thank You Matt. I will try doing that.

Regards,
Manish

-Original Message-
From: Matt Burgess [mailto:mattyb...@apache.org] 
Sent: Monday, October 10, 2016 6:40 PM
To: users@nifi.apache.org
Subject: Re: Download Multiple Files from Queue

Manish,

This is possible via the REST API [1]:

1) Identify the UUID of the connection you're interested in. This can
be done manually using the UI (right-click on the connection, choose
Configure, then on the Settings tab there is an "Id" field. You can
also use the flow API
(nifi-api/flow/process-groups/) to get all the
components for the Process Group and find the connections in the JSON
response.

2) Request the queue details with a POST to
nifi-api/flowfile-queues//listing-requests. This is an
asynchronous operation, you'll get a response with (among other
things) an ID for the request, a URI and a "finished" property. Once
finished is true, you can do a GET on the URI from the response, which
looks like:
 nifi-api/flowfile-queues//listing-requests/

3) The response from the listing request (with ID) contains a
flowFileSummaries array. Each element is an object that contains
(among other things) the filename, the file size, and the URI to use
for access to the file. To get the contents, do a GET on the URI plus
/content on the end:
nifi-api/flowfile-queues//flowfiles//content

A "Download All" button on the Queue Listing dialog (with
corresponding REST API endpoint) could be pretty helpful, perhaps by
providing a ZIP with the flow files (with some naming convention like
filename_uuid or something). Please feel free to write a Jira [2] to
suggest this feature.

Regards,
Matt

[1] https://nifi.apache.org/docs/nifi-docs/rest-api/index.html
[2] https://issues.apache.org/jira/browse/NIFI/

On Mon, Oct 10, 2016 at 7:42 AM, Manish Gupta 8  wrote:
> Hello Everyone,
>
>
>
> Is there a way I can download multiple flow files from a queue at once (from
> UI) or may be programmatically? If it’s not possible now, I think this
> feature should definitely be added J
>
>
>
> Regards,
>
> Manish
>
>


Penalize Flow File on Failure

2016-10-14 Thread Manish Gupta 8
Hello Everyone,

In some of the processors I have seen that flow files on failure are not being 
penalized. For example - Kite processors like ConvertJsonToAvro. Is there some 
specific reason why some processors have different behavior?

I think every processor should penalize every non-success relationship. This 
way, we can hold those failed files (using a self-referencing loop and some 
decent penalty duration) in the queue itself and debug and resolve later.

Regards,
Manish



RE: Penalize Flow File on Failure

2016-10-14 Thread Manish Gupta 8
Thanks Matt. As a workaround, is there a processor that does not modify the 
flow file (content or attribute) at all, and I can use it to delay the 
self-referencing flow files to hit the main processor again immediately?

Regards,
Manish

-Original Message-
From: Matt Burgess [mailto:mattyb...@apache.org] 
Sent: Friday, October 14, 2016 10:25 PM
To: users@nifi.apache.org
Subject: Re: Penalize Flow File on Failure

Manish,

The use of penalize(), yield(), etc. is not enforced by the framework,
so processors can have different behavior, sometimes on purpose, and
sometimes inadvertently.  The Developer's Guide has guidance on when
to use such methods [1], and reviewers often check the submissions to
see if they exhibit such behavior, but it is possible for processors
to handle these cases differently.

Please feel free to log bugs/improvements for such processors, I think
consistent behavior in this vein (when prudent) is a good idea.

Regards,
Matt

[1] 
https://nifi.apache.org/docs/nifi-docs/html/developer-guide.html#penalization-vs-yielding

On Fri, Oct 14, 2016 at 12:38 PM, Manish Gupta 8  wrote:
> Hello Everyone,
>
>
>
> In some of the processors I have seen that flow files on failure are not
> being penalized. For example – Kite processors like ConvertJsonToAvro. Is
> there some specific reason why some processors have different behavior?
>
>
>
> I think every processor should penalize every non-success relationship. This
> way, we can hold those failed files (using a self-referencing loop and some
> decent penalty duration) in the queue itself and debug and resolve later.
>
>
>
> Regards,
>
> Manish
>
>


NiFi and MSMQ

2017-10-05 Thread Manish Gupta 8
Hi,

Has anyone tried integrating MSMQ with Apache NiFi (sender/receiver)? Is it 
even possible?

As per MS documentation – “Message Queuing applications can be developed using 
C++ APIs or COM objects. Applications can be built in any of the popular 
development environments: for example, Microsoft® Visual Basic®, Visual Basic® 
Scripting Edition, Visual C++®, Visual Studio® .NET, Borland Delphi, and 
Powersoft Powerbuilder.”. What are the options?

Thanks,
Manish





Wait for N Files before starting to process

2017-11-17 Thread Manish Gupta 8
Hi,

I am working on building a flow in NiFi that involve cloning a file to 2 or 
more processing flows, and then wait (on some processor) for all the parallel 
flows to finish (in parallel) and execute a another flow.

Is there any processor in Nifi which can do this out of the box i.e. wait for N 
file in input queue and then start its processing and timeout if expected N 
files didn't arrive in some time T? If not, what's a good way of achieving this.

I have such a flow implemented in Akka, and want to migrate it to NiFi. I am 
using NiFi 1.3



Thanks,

Manish Gupta
Senior Specialist Platform | AI & Data Engineering Practice


"Oxygen", Tower C, Ground - 3rd floor,
Plot No. 7, Sector 144 Expressway, Noida, UP, India

Mobile: +91 981 059 1361
Office: +91 (120) 479 5000  Ext : 75398
Email: mgupt...@sapient.com
sapientconsulting.com





RE: Wait for N Files before starting to process

2017-11-18 Thread Manish Gupta 8
Thanks Andy. I will search for that thread.

From: Andy Loughran [mailto:andylock...@gmail.com]
Sent: Saturday, November 18, 2017 2:53 PM
To: users@nifi.apache.org
Subject: Re: Wait for N Files before starting to process

Yo can chain together the wait and notify - there’s a message on this list for 
how to do that from about 4 months ago. - I asked the same question :)

Andy


Sent from my iPhone

On 18 Nov 2017, at 06:32, Manish Gupta 8 
mailto:mgupt...@sapient.com>> wrote:
Hi,

I am working on building a flow in NiFi that involve cloning a file to 2 or 
more processing flows, and then wait (on some processor) for all the parallel 
flows to finish (in parallel) and execute a another flow.

Is there any processor in Nifi which can do this out of the box i.e. wait for N 
file in input queue and then start its processing and timeout if expected N 
files didn’t arrive in some time T? If not, what’s a good way of achieving this.

I have such a flow implemented in Akka, and want to migrate it to NiFi. I am 
using NiFi 1.3



Thanks,

Manish Gupta
Senior Specialist Platform | AI & Data Engineering Practice


“Oxygen”, Tower C, Ground - 3rd floor,
Plot No. 7, Sector 144 Expressway, Noida, UP, India

Mobile: +91 981 059 1361
Office: +91 (120) 479 5000  Ext : 75398
Email: mgupt...@sapient.com<mailto:mgupt...@sapient.com>
sapientconsulting.com<https://www.sapientconsulting.com/>





RE: Wait for N Files before starting to process

2017-11-20 Thread Manish Gupta 8
Thank Jeff. This is exactly what I was searching for. Thank You.


From: Jeff [mailto:jtsw...@gmail.com]
Sent: Monday, November 20, 2017 7:41 PM
To: users@nifi.apache.org
Subject: Re: Wait for N Files before starting to process

Hello Manish,

I was answering a dev-list question and provided a URL to Koji's blog about 
Wait/Notify processors [1].  Please take a look and feel free to ask questions 
about how you you can integrate the Wait/Notify processors into your flow.

[1] 
http://ijokarumawak.github.io/nifi/2017/02/02/nifi-notify-batch/#alternative-solution-waitnotify

On Sat, Nov 18, 2017 at 8:28 AM Manish Gupta 8 
mailto:mgupt...@sapient.com>> wrote:
Thanks Andy. I will search for that thread.

From: Andy Loughran [mailto:andylock...@gmail.com<mailto:andylock...@gmail.com>]
Sent: Saturday, November 18, 2017 2:53 PM
To: users@nifi.apache.org<mailto:users@nifi.apache.org>
Subject: Re: Wait for N Files before starting to process

Yo can chain together the wait and notify - there’s a message on this list for 
how to do that from about 4 months ago. - I asked the same question :)

Andy


Sent from my iPhone

On 18 Nov 2017, at 06:32, Manish Gupta 8 
mailto:mgupt...@sapient.com>> wrote:
Hi,

I am working on building a flow in NiFi that involve cloning a file to 2 or 
more processing flows, and then wait (on some processor) for all the parallel 
flows to finish (in parallel) and execute a another flow.

Is there any processor in Nifi which can do this out of the box i.e. wait for N 
file in input queue and then start its processing and timeout if expected N 
files didn’t arrive in some time T? If not, what’s a good way of achieving this.

I have such a flow implemented in Akka, and want to migrate it to NiFi. I am 
using NiFi 1.3



Thanks,

Manish Gupta
Senior Specialist Platform | AI & Data Engineering Practice


“Oxygen”, Tower C, Ground - 3rd floor,
Plot No. 7, Sector 144 Expressway, Noida, UP, India

Mobile: +91 981 059 1361
Office: +91 (120) 479 5000  Ext : 75398
Email: mgupt...@sapient.com<mailto:mgupt...@sapient.com>
sapientconsulting.com<https://www.sapientconsulting.com/>