Re: Kafka proxy with NiFi

2017-06-08 Thread Andre
Hi,

I assume by endpoint you suggest to expose the broker side of the kafka
API? As far as I know it is not possible.

However you could use the site2site API or other supported reliable
protocols (http, beats, relp, etc) to feed data to NiFi and from there feed
kafka.

How this fits into your environment will largely depend on data volumes and
kafka acknowledgement strategy.

On 9 Jun 2017 07:15, "Laurens Vets"  wrote:

Hi List,

Is it possible to build a Kafka proxy with NiFi such that NiFi will expose
a Kafka endpoint and proxy all Kafka messages to another Kafka endpoint?


Re: NiFi 1.2.0 REST API problem

2017-06-08 Thread Raymond Rogers
I have not been able to find anything in the logs that is useful, but I'm
not sure I enabled all of the logging that I need to.

But I did find some more information, if I restart the NiFi service the
InvokeHTTP call starts working and we have two other flow patterns that
have the same issue.  In each case it is always the second InvokeHTTP
processor that fails.

This is how I have the logging set currently:









All of the other logging is set to its default value.

On Thu, Jun 8, 2017 at 12:20 PM, Matt Gilman 
wrote:

> Raymond,
>
> If you enable debug level logging, I believe that InvokeHTTP will log the
> request and response. It may be helpful in diagnosing this issue. I think
> you could just set the bulletin level to DEBUG to see these as messages as
> bulletins. Additionally, you can update your conf/logback.xml to enable
> DEBUG messages for org.apache.nifi.processors.standard.InvokeHTTP to see
> these messages in your logs/nifi-app.log.
>
> Thanks
>
> Matt
>
> On Thu, Jun 8, 2017 at 1:01 PM, Raymond Rogers 
> wrote:
>
>> No bulletins on any of the processors.  All of the output flow-files have
>> 0 bytes and the error 401 in the attributes.
>> All of the properties look correct and I can copy the values from the
>> non-working to the manually created processor and it works fine.
>> When you export the SSL context service and re-import it you have to
>> reset the password on the trust store and that is the only change I am
>> making.
>>
>> I will need to dig into the nifi logs to check for any errors there.
>>
>> On Thu, Jun 8, 2017 at 11:24 AM, Matt Gilman 
>> wrote:
>>
>>> Raymond,
>>>
>>> When it's in a state that is not working are there any bulletins on the
>>> second processor? When it's in that state and you view the configuration
>>> details for that processor, do the properties look correct and the same as
>>> when you manually re-add the processor through the UI? Specifically, I'm
>>> wondering about the SSL Context Service since you mentioned fixing that
>>> after an export/import process resolves the issue.
>>>
>>> Any other issues in the logs/nifi-app.log or the logs/nifi-user.log?
>>>
>>> Thanks
>>>
>>> Matt
>>>
>>> On Thu, Jun 8, 2017 at 11:59 AM, Raymond Rogers <
>>> raymond.rog...@gmail.com> wrote:
>>>
 We have a node.js service that automatically creates & manages NiFi
 groups using the REST API which works great in NiFi 1.1.1.  We are
 upgrading our NiFi instances to 1.2.0 and I have found that some of the
 processors are exhibiting odd behavior.

 We have a flow the connects to the Outlook 365 OWA service generates an
 access token and then uses that token in two different InvokeHTTP
 processors.  The first processor always works and the second always returns
 an HTTP error 401.

 If I delete and manually re-add the InvokeHTTP processor with the same
 configuration it always works.

 If I export this flow from the NiFi web interface and then re-import
 it, only fixing the SSL context service, it works every time.

 Using our node.js service to create the exact same flow in NiFi 1.1.1
 it always works.

 Thanks,
 Raymond

>>>
>>>
>>
>


Re: Bulk inserting into HBase with NiFi

2017-06-08 Thread Bryan Bende
Mike,

Just out of curiosity, what would the original data for your example
look like that produced that JSON?

Is it a CSV with two lines, like:

ABC, XYZ
DEF, LMN

and then ExecuteScript is turning that into the JSON array?


As far as reading the JSON, I created a simple flow of GeneratFlowFile
-> ConvertRecord -> LogAttribute  where ConvertRecord uses the
JsonPathReader with $.value

https://gist.github.com/bbende/3789a6907a9af09aa7c32413040e7e2b

LogAttribute ends up logging:

[ {
  "value" : "XYZ"
}, {
  "value" : "LMN"
} ]

Which seems correct given that its reading in the JSON with a schema
that only has the field "value" in it.

Let me know if that is not what you are looking for.



On Thu, Jun 8, 2017 at 4:13 PM, Mike Thomsen  wrote:
> Bryan,
>
> I have the processor somewhat operational now, but I'm running into a
> problem with the record readers. What I've done is basically this:
>
> Ex. JSON:
>
> [
>{
>"key": "ABC", "value": "XYZ"
>},
>{
>"key": "DEF", "value": "LMN"
>}
> ]
>
> Avro schema:
>
> {
> "type": "record",
> "name": "GenomeRecord",
> "fields": [{
> "name": "value",
> "type": "string"
> },
> ]
> }
>
> 1. ExecuteScript iterates over a line and builds a JSON array as mentioned
> above.
> 2. PutHBaseRecord is wired to use a JsonPathReader that uses an
> AvroSchemaRegistry.
>   - I put a lot of logging in and can verify it is identifying the schema
> based on the attribute on the flowfile and looking at the appropriate field
> while looping over the Record to turn it into a serializable form for a Put.
>   - All I get are nulls.
> 3. My JsonPath has been variously $.value and $[*].value. It just does not
> seem to want to parse that JSON.
>
> The strategy I was going for is to use the "key" attribute in each JSON
> object to set the row key for the Put.
>
> Any ideas would be great.
>
> Thanks,
>
> Mike
>
> On Wed, Jun 7, 2017 at 4:40 PM, Bryan Bende  wrote:
>>
>> Mike,
>>
>> Glad to hear that the record API looks promising for what you are trying
>> to do!
>>
>> Here are a couple of thoughts, and please correct me if I am not
>> understanding your flow correctly...
>>
>> We should be able to make a generic PutHBaseRecord processor that uses
>> any record reader to read the incoming flow file and then converts
>> each record directly into a PutFlowFile (more on this in a minute).
>>
>> Once we have PutHBaseRecord, then there may be no need for you to
>> convert your data from CSV to JSON (unless there is another reason I
>> am missing) because you can send your CSV data directly into
>> PutHBaseRecord configured with a CSVRecordReader.
>>
>> If you are doing other processing/enrichment while going from CSV to
>> JSON, then you may be able to achieve some of the same things with
>> processors like UpdateRecord, PartitionRecord, and LookupRecord.
>> Essentially keeping the initial CSV intact and treating it like
>> records through the entire flow.
>>
>> Now back to PutHBaseRecord and the question of how to go from a Record
>> to a PutFlowFile...
>>
>> We basically need to know the rowId, column family, and then a list of
>> column-qualifier/value pairs. I haven't fully though this through yet,
>> but...
>>
>> For the row id, we could have a similar strategy as PutHBaseJson,
>> where the value comes from a "Row Id" property in the processor or
>> from a "Row Id Record Path" which would evaluate the record path
>> against the record and use that value for the row id.
>>
>> For column family, we could probably do the same as above, where it
>> could be from a property or a record path.
>>
>> For the list of column-qualifier/value pairs, we can loop over all
>> fields in the record (skipping the row id and family if using record
>> fields) and then convert each one into a PutColumn. The bulk of the
>> work here is going to be taking the value of a field and turning it
>> into an appropriate byte[], so you'll likely want to use the type of
>> the field to cast into an appropriate Java type and then figure out
>> how to represent that as bytes.
>>
>> I know this was a lot of information, but I hope this helps, and let
>> me know if anything is not making sense.
>>
>> Thanks,
>>
>> Bryan
>>
>>
>> On Wed, Jun 7, 2017 at 3:56 PM, Mike Thomsen 
>> wrote:
>> > Yeah, it's really getting hammered by the small files. I took a look at
>> > the
>> > new record APIs and that looked really promising. In fact, I'm taking a
>> > shot
>> > at creating a variant of PutHBaseJSON that uses the record API. Look
>> > fairly
>> > straight forward so far. My strategy is roughly like this:
>> >
>> > GetFile -> SplitText -> ExecuteScript -> RouteOnAttribute ->
>> > PutHBaseJSONRecord
>> >
>> > ExecuteScript generates a larger flowfile that contains a structure like
>> > this now:
>> >
>> > [
>> >   { "key": "XYZ", "value": "ABC" }
>> > ]
>> >
>> >
>> > My intention is to have a JsonPathReader take that bigger flowfile 

Re: Bulk inserting into HBase with NiFi

2017-06-08 Thread Mike Thomsen
Bryan,

I have the processor somewhat operational now, but I'm running into a
problem with the record readers. What I've done is basically this:

Ex. JSON:

[
   {
   "key": "ABC", "value": "XYZ"
   },
   {
   "key": "DEF", "value": "LMN"
   }
]

Avro schema:

{
"type": "record",
"name": "GenomeRecord",
"fields": [{
"name": "value",
"type": "string"
},
]
}

1. ExecuteScript iterates over a line and builds a JSON array as mentioned
above.
2. PutHBaseRecord is wired to use a JsonPathReader that uses an
AvroSchemaRegistry.
  - I put a lot of logging in and can verify it is identifying the schema
based on the attribute on the flowfile and looking at the appropriate field
while looping over the Record to turn it into a serializable form for a Put.
  - All I get are nulls.
3. My JsonPath has been variously $.value and $[*].value. It just does not
seem to want to parse that JSON.

The strategy I was going for is to use the "key" attribute in each JSON
object to set the row key for the Put.

Any ideas would be great.

Thanks,

Mike

On Wed, Jun 7, 2017 at 4:40 PM, Bryan Bende  wrote:

> Mike,
>
> Glad to hear that the record API looks promising for what you are trying
> to do!
>
> Here are a couple of thoughts, and please correct me if I am not
> understanding your flow correctly...
>
> We should be able to make a generic PutHBaseRecord processor that uses
> any record reader to read the incoming flow file and then converts
> each record directly into a PutFlowFile (more on this in a minute).
>
> Once we have PutHBaseRecord, then there may be no need for you to
> convert your data from CSV to JSON (unless there is another reason I
> am missing) because you can send your CSV data directly into
> PutHBaseRecord configured with a CSVRecordReader.
>
> If you are doing other processing/enrichment while going from CSV to
> JSON, then you may be able to achieve some of the same things with
> processors like UpdateRecord, PartitionRecord, and LookupRecord.
> Essentially keeping the initial CSV intact and treating it like
> records through the entire flow.
>
> Now back to PutHBaseRecord and the question of how to go from a Record
> to a PutFlowFile...
>
> We basically need to know the rowId, column family, and then a list of
> column-qualifier/value pairs. I haven't fully though this through yet,
> but...
>
> For the row id, we could have a similar strategy as PutHBaseJson,
> where the value comes from a "Row Id" property in the processor or
> from a "Row Id Record Path" which would evaluate the record path
> against the record and use that value for the row id.
>
> For column family, we could probably do the same as above, where it
> could be from a property or a record path.
>
> For the list of column-qualifier/value pairs, we can loop over all
> fields in the record (skipping the row id and family if using record
> fields) and then convert each one into a PutColumn. The bulk of the
> work here is going to be taking the value of a field and turning it
> into an appropriate byte[], so you'll likely want to use the type of
> the field to cast into an appropriate Java type and then figure out
> how to represent that as bytes.
>
> I know this was a lot of information, but I hope this helps, and let
> me know if anything is not making sense.
>
> Thanks,
>
> Bryan
>
>
> On Wed, Jun 7, 2017 at 3:56 PM, Mike Thomsen 
> wrote:
> > Yeah, it's really getting hammered by the small files. I took a look at
> the
> > new record APIs and that looked really promising. In fact, I'm taking a
> shot
> > at creating a variant of PutHBaseJSON that uses the record API. Look
> fairly
> > straight forward so far. My strategy is roughly like this:
> >
> > GetFile -> SplitText -> ExecuteScript -> RouteOnAttribute ->
> > PutHBaseJSONRecord
> >
> > ExecuteScript generates a larger flowfile that contains a structure like
> > this now:
> >
> > [
> >   { "key": "XYZ", "value": "ABC" }
> > ]
> >
> >
> > My intention is to have a JsonPathReader take that bigger flowfile which
> is
> > a JSON array and iterate over it as a bunch of records to turn into Puts
> > with the new HBase processor. I'm borrowing some code for wiring in the
> > reader from the QueryRecord processor.
> >
> > So my only question now is, what is the best way to serialize the Record
> > objects to JSON? The PutHBaseJson processor already has a Jackson setup
> > internally. Any suggestions on doing this in a way that doesn't tie me at
> > the hip to a particular reader implementation?
> >
> > Thanks,
> >
> > Mike
> >
> >
> > On Wed, Jun 7, 2017 at 6:12 PM, Bryan Bende  wrote:
> >>
> >> Mike,
> >>
> >> Just following up on this...
> >>
> >> I created this JIRA to track the idea of record-based HBase processors:
> >> https://issues.apache.org/jira/browse/NIFI-4034
> >>
> >> Also wanted to mention that with the existing processors, the main way
> >> to scale up would be to increase the concurrent tasks on PutHBaseJson
> 

Re: Saving controller services w/ templates?

2017-06-08 Thread Mike Thomsen
Yeah, I just screwed up and didn't reference one.

On Thu, Jun 8, 2017 at 1:26 PM, Mike Thomsen  wrote:

> I'll have to look again, but I scanned through the XML and didn't see
> either my avro schema registry or the jsonpath reader.
>
> Thanks,
>
> Mike
>
> On Thu, Jun 8, 2017 at 1:10 PM, Matt Gilman 
> wrote:
>
>> Mike,
>>
>> Currently, the services are saved if they are referenced by processors in
>> your data flow. There is an existing JIRA [1] to always include them.
>>
>> Thanks
>>
>> Matt
>>
>> [1] https://issues.apache.org/jira/browse/NIFI-2895
>>
>> On Thu, Jun 8, 2017 at 12:59 PM, Mike Thomsen 
>> wrote:
>>
>>> Is it possible to save the controller services w/ a template?
>>>
>>> Thanks,
>>>
>>> Mike
>>>
>>
>>
>


Keytab Configuration for Nifi processor

2017-06-08 Thread Shashi Vishwakarma
Hi

I have Nifi 3 node cluster (Installed Via Hortonworks Data Flow - HDF ) in
Kerborized environment. As part of installation Ambari has created nifi
service keytab .

Can I use this nifi.service.keytab for configuring processors like PutHDFS
who talks to Hadoop services ?

The nifi.service.keytab is machine specific and always expect principal
names with machine information. ex nifi/HOSTNAME@REALM

If I configure my Processor with nfii/NODE1_Hostname@REALM information then
I see kerberos authentication exception in other two nodes.

How do I dynamically resolve hostname to use nifi service  keytab  ?

Thanks
Shashi


Re: Saving controller services w/ templates?

2017-06-08 Thread Mike Thomsen
I'll have to look again, but I scanned through the XML and didn't see
either my avro schema registry or the jsonpath reader.

Thanks,

Mike

On Thu, Jun 8, 2017 at 1:10 PM, Matt Gilman  wrote:

> Mike,
>
> Currently, the services are saved if they are referenced by processors in
> your data flow. There is an existing JIRA [1] to always include them.
>
> Thanks
>
> Matt
>
> [1] https://issues.apache.org/jira/browse/NIFI-2895
>
> On Thu, Jun 8, 2017 at 12:59 PM, Mike Thomsen 
> wrote:
>
>> Is it possible to save the controller services w/ a template?
>>
>> Thanks,
>>
>> Mike
>>
>
>


Re: NiFi 1.2.0 REST API problem

2017-06-08 Thread Matt Gilman
Raymond,

If you enable debug level logging, I believe that InvokeHTTP will log the
request and response. It may be helpful in diagnosing this issue. I think
you could just set the bulletin level to DEBUG to see these as messages as
bulletins. Additionally, you can update your conf/logback.xml to enable
DEBUG messages for org.apache.nifi.processors.standard.InvokeHTTP to see
these messages in your logs/nifi-app.log.

Thanks

Matt

On Thu, Jun 8, 2017 at 1:01 PM, Raymond Rogers 
wrote:

> No bulletins on any of the processors.  All of the output flow-files have
> 0 bytes and the error 401 in the attributes.
> All of the properties look correct and I can copy the values from the
> non-working to the manually created processor and it works fine.
> When you export the SSL context service and re-import it you have to reset
> the password on the trust store and that is the only change I am making.
>
> I will need to dig into the nifi logs to check for any errors there.
>
> On Thu, Jun 8, 2017 at 11:24 AM, Matt Gilman 
> wrote:
>
>> Raymond,
>>
>> When it's in a state that is not working are there any bulletins on the
>> second processor? When it's in that state and you view the configuration
>> details for that processor, do the properties look correct and the same as
>> when you manually re-add the processor through the UI? Specifically, I'm
>> wondering about the SSL Context Service since you mentioned fixing that
>> after an export/import process resolves the issue.
>>
>> Any other issues in the logs/nifi-app.log or the logs/nifi-user.log?
>>
>> Thanks
>>
>> Matt
>>
>> On Thu, Jun 8, 2017 at 11:59 AM, Raymond Rogers > > wrote:
>>
>>> We have a node.js service that automatically creates & manages NiFi
>>> groups using the REST API which works great in NiFi 1.1.1.  We are
>>> upgrading our NiFi instances to 1.2.0 and I have found that some of the
>>> processors are exhibiting odd behavior.
>>>
>>> We have a flow the connects to the Outlook 365 OWA service generates an
>>> access token and then uses that token in two different InvokeHTTP
>>> processors.  The first processor always works and the second always returns
>>> an HTTP error 401.
>>>
>>> If I delete and manually re-add the InvokeHTTP processor with the same
>>> configuration it always works.
>>>
>>> If I export this flow from the NiFi web interface and then re-import it,
>>> only fixing the SSL context service, it works every time.
>>>
>>> Using our node.js service to create the exact same flow in NiFi 1.1.1 it
>>> always works.
>>>
>>> Thanks,
>>> Raymond
>>>
>>
>>
>


Re: Saving controller services w/ templates?

2017-06-08 Thread Matt Gilman
Mike,

Currently, the services are saved if they are referenced by processors in
your data flow. There is an existing JIRA [1] to always include them.

Thanks

Matt

[1] https://issues.apache.org/jira/browse/NIFI-2895

On Thu, Jun 8, 2017 at 12:59 PM, Mike Thomsen 
wrote:

> Is it possible to save the controller services w/ a template?
>
> Thanks,
>
> Mike
>


Re: Saving controller services w/ templates?

2017-06-08 Thread James Wing
Mike,

I believe templates include controller services by default, as long as one
or more of the processors in the template references the controller
service.  Did that not happen for you?

Thanks,

James


On Thu, Jun 8, 2017 at 9:59 AM, Mike Thomsen  wrote:

> Is it possible to save the controller services w/ a template?
>
> Thanks,
>
> Mike
>


Re: NiFi 1.2.0 REST API problem

2017-06-08 Thread Raymond Rogers
No bulletins on any of the processors.  All of the output flow-files have 0
bytes and the error 401 in the attributes.
All of the properties look correct and I can copy the values from the
non-working to the manually created processor and it works fine.
When you export the SSL context service and re-import it you have to reset
the password on the trust store and that is the only change I am making.

I will need to dig into the nifi logs to check for any errors there.

On Thu, Jun 8, 2017 at 11:24 AM, Matt Gilman 
wrote:

> Raymond,
>
> When it's in a state that is not working are there any bulletins on the
> second processor? When it's in that state and you view the configuration
> details for that processor, do the properties look correct and the same as
> when you manually re-add the processor through the UI? Specifically, I'm
> wondering about the SSL Context Service since you mentioned fixing that
> after an export/import process resolves the issue.
>
> Any other issues in the logs/nifi-app.log or the logs/nifi-user.log?
>
> Thanks
>
> Matt
>
> On Thu, Jun 8, 2017 at 11:59 AM, Raymond Rogers 
> wrote:
>
>> We have a node.js service that automatically creates & manages NiFi
>> groups using the REST API which works great in NiFi 1.1.1.  We are
>> upgrading our NiFi instances to 1.2.0 and I have found that some of the
>> processors are exhibiting odd behavior.
>>
>> We have a flow the connects to the Outlook 365 OWA service generates an
>> access token and then uses that token in two different InvokeHTTP
>> processors.  The first processor always works and the second always returns
>> an HTTP error 401.
>>
>> If I delete and manually re-add the InvokeHTTP processor with the same
>> configuration it always works.
>>
>> If I export this flow from the NiFi web interface and then re-import it,
>> only fixing the SSL context service, it works every time.
>>
>> Using our node.js service to create the exact same flow in NiFi 1.1.1 it
>> always works.
>>
>> Thanks,
>> Raymond
>>
>
>


Saving controller services w/ templates?

2017-06-08 Thread Mike Thomsen
Is it possible to save the controller services w/ a template?

Thanks,

Mike


Re: How to perform bulk insert into SQLServer from one machine to another?

2017-06-08 Thread Matt Burgess
You won't need/want NiFi for that part; instead you would need to
login to the machine running SQL Server, install an FTP daemon (such
as ftpd), then in the PutFTP processor in NiFi you can point to the
FTP server using the Hostname, Port, Username, Password, etc.

On Thu, Jun 8, 2017 at 12:18 PM, prabhu Mahendran
 wrote:
> Matt,
>
> Thanks for your wonderful response
>
> I think create FTP server is best way for me to move input file into sql and
> runs a query.
>
> Can you please suggest way
> to create FTP server in Sql installed machine using NIFI?
>
> Many thanks,
> Prabhu
>
> On 08-Jun-2017 6:27 PM, "Matt Burgess"  wrote:
>
> Prabhu,
>
> From [1], the data file "must specify a valid path from the server on
> which SQL Server is running. If data_file is a remote file, specify
> the Universal Naming Convention (UNC) name. A UNC name has the form
> \\Systemname\ShareName\Path\FileName. For example,
> \\SystemX\DiskZ\Sales\update.txt".  Can you expose the CSV file via a
> network drive/location?  If not, can you place the file on the SQL
> Server using NiFi?  For example, if there were an FTP server running
> on the SQL Server instance, you could use the PutFTP processor, then
> PutSQL after that to issue your BULK INSERT statement.
>
> Regards,
> Matt
>
> [1]
> https://docs.microsoft.com/en-us/sql/t-sql/statements/bulk-insert-transact-sql
>
> On Thu, Jun 8, 2017 at 8:11 AM, prabhu Mahendran
>  wrote:
>> i have running nifi instance in one machine and have SQL Server in another
>> machine.
>>
>> Here i can try to perform bulk insert operation with bulk insert Query in
>> SQLserver. but i cannot able insert data from one machine and move it into
>> SQL Server in another machine.
>>
>> If i run nifi and SQL Server in same machine then i can able to perform
>> bulk
>> insert operation easily.
>>
>> i have configured GetFile->ReplaceText(BulkInsertQuery)-->PutSQL
>> processors.
>>
>> I have tried both nifi and sql server in single machine then bulk insert
>> works but not works when both instances in different machines.
>>
>> I need to get all data's from one machine and write a query to move that
>> data into SQL runs in another machine.
>>
>> Below query works when nifi and sql server in same machine
>>
>> BULK INSERT BI FROM 'C:\Directory\input.csv' WITH (FIRSTROW = 1,
>> ROWTERMINATOR = '\n', FIELDTERMINATOR = ',', ROWS_PER_BATCH = 1)
>> if i run that query in another machine then it says..,"FileNotFoundError"
>> due to "input.csv" in Host1 machine but runs query in sql server machine
>> (host2)
>>
>> Can anyone give me suggestion to do this?


Re: Detect whether a flowfile has a particular attribute

2017-06-08 Thread Juan Sequeiros
Jim,

This might be related and coincidentally today we were talking with a
coworker about the "advanced" button of UpdateAttribute and its ability to
set attributes based on conditions.
It's pretty powerful. [1]
It might come in useful for your efforts.

[1]
https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-update-attribute-nar/1.2.0/org.apache.nifi.processors.attributes.UpdateAttribute/index.html

On Thu, Jun 8, 2017 at 9:54 AM James McMahon  wrote:

> I do understand now. Thank you very much Mark. -Jim
>
> On Thu, Jun 8, 2017 at 9:34 AM, Mark Payne  wrote:
>
>> Jim,
>>
>> The first expression will return false. None of the expressions below
>> will ever throw an Exception.
>>
>> You could even chain them together like
>> ${myAttribute:toLower():length():gt(4)} and if myAttribute does not
>> exist, it will return false, rather than throwing an Exception.
>>
>> Thanks
>> -Mark
>>
>>
>> On Jun 8, 2017, at 9:32 AM, James McMahon  wrote:
>>
>> So then if myAttribute does not even exist in a particular flowFile, the
>> first expression will return a null value rather than throw an error. Thank
>> you very much Mark. -Jim
>>
>> On Thu, Jun 8, 2017 at 8:44 AM, Mark Payne  wrote:
>>
>>> Jim,
>>>
>>> You can use the expression:
>>>
>>> ${myAttribute:isNull()}
>>>
>>> Or, alternatively, depending on how you want to setup the route:
>>>
>>> ${myAttribute:notNull()}
>>>
>>> If you want to check if the attribute contains 'True' somewhere within
>>> its value,
>>> then you can use:
>>>
>>> ${myAttribute:contains('True')}
>>>
>>> Thanks
>>> -Mark
>>>
>>>
>>> > On Jun 8, 2017, at 8:19 AM, James McMahon 
>>> wrote:
>>> >
>>> > Good morning. I receive HTTP POSTs of various types of files. Some
>>> have a particular attribute myAttribute, some do not. I want to route the
>>> flowfiles to different workflow paths depending on the presence of this
>>> attribute. Can I use RouteAttribute and the expression language to do that,
>>> something like this:
>>> >
>>> > hasTheAttributeOfInterest
>>>  ${anyAttribute("myAttribute":contains('True')}
>>> >
>>> > I ask because the expression guide did not say whether a False is
>>> returned or the processor throws an error if the attribute does not exist
>>> in the flowfile. I may have missed that. I wanted to see if anyone in the
>>> group has experience in this regard?
>>> >
>>> > Thanks in advance for your insights. -Jim
>>>
>>>
>>
>>
>


Re: NiFi 1.2.0 REST API problem

2017-06-08 Thread Matt Gilman
Raymond,

When it's in a state that is not working are there any bulletins on the
second processor? When it's in that state and you view the configuration
details for that processor, do the properties look correct and the same as
when you manually re-add the processor through the UI? Specifically, I'm
wondering about the SSL Context Service since you mentioned fixing that
after an export/import process resolves the issue.

Any other issues in the logs/nifi-app.log or the logs/nifi-user.log?

Thanks

Matt

On Thu, Jun 8, 2017 at 11:59 AM, Raymond Rogers 
wrote:

> We have a node.js service that automatically creates & manages NiFi groups
> using the REST API which works great in NiFi 1.1.1.  We are upgrading our
> NiFi instances to 1.2.0 and I have found that some of the processors are
> exhibiting odd behavior.
>
> We have a flow the connects to the Outlook 365 OWA service generates an
> access token and then uses that token in two different InvokeHTTP
> processors.  The first processor always works and the second always returns
> an HTTP error 401.
>
> If I delete and manually re-add the InvokeHTTP processor with the same
> configuration it always works.
>
> If I export this flow from the NiFi web interface and then re-import it,
> only fixing the SSL context service, it works every time.
>
> Using our node.js service to create the exact same flow in NiFi 1.1.1 it
> always works.
>
> Thanks,
> Raymond
>


Re: How to perform bulk insert into SQLServer from one machine to another?

2017-06-08 Thread prabhu Mahendran
Matt,

Thanks for your wonderful response

I think create FTP server is best way for me to move input file into sql
and runs a query.

Can you please suggest way
to create FTP server in Sql installed machine using NIFI?

Many thanks,
Prabhu
On 08-Jun-2017 6:27 PM, "Matt Burgess"  wrote:

Prabhu,

>From [1], the data file "must specify a valid path from the server on
which SQL Server is running. If data_file is a remote file, specify
the Universal Naming Convention (UNC) name. A UNC name has the form
\\Systemname\ShareName\Path\FileName. For example,
\\SystemX\DiskZ\Sales\update.txt".  Can you expose the CSV file via a
network drive/location?  If not, can you place the file on the SQL
Server using NiFi?  For example, if there were an FTP server running
on the SQL Server instance, you could use the PutFTP processor, then
PutSQL after that to issue your BULK INSERT statement.

Regards,
Matt

[1] https://docs.microsoft.com/en-us/sql/t-sql/statements/bulk-
insert-transact-sql

On Thu, Jun 8, 2017 at 8:11 AM, prabhu Mahendran
 wrote:
> i have running nifi instance in one machine and have SQL Server in another
> machine.
>
> Here i can try to perform bulk insert operation with bulk insert Query in
> SQLserver. but i cannot able insert data from one machine and move it into
> SQL Server in another machine.
>
> If i run nifi and SQL Server in same machine then i can able to perform
bulk
> insert operation easily.
>
> i have configured GetFile->ReplaceText(BulkInsertQuery)-->PutSQL
processors.
>
> I have tried both nifi and sql server in single machine then bulk insert
> works but not works when both instances in different machines.
>
> I need to get all data's from one machine and write a query to move that
> data into SQL runs in another machine.
>
> Below query works when nifi and sql server in same machine
>
> BULK INSERT BI FROM 'C:\Directory\input.csv' WITH (FIRSTROW = 1,
> ROWTERMINATOR = '\n', FIELDTERMINATOR = ',', ROWS_PER_BATCH = 1)
> if i run that query in another machine then it says..,"FileNotFoundError"
> due to "input.csv" in Host1 machine but runs query in sql server machine
> (host2)
>
> Can anyone give me suggestion to do this?


Re: integration with Zeppelin

2017-06-08 Thread James Wing
I do not believe NiFi has any specific features for Zeppelin yet, but it is
possible to write custom Zeppelin code paragraphs that communicate with the
NiFi API to pull data or inspect flow status.  For an example, I recommend
Pierre Villard's US Presidential Election: tweet analysis using HDF/NiFi,
Spark, Hive and Zeppelin (
https://community.hortonworks.com/articles/30213/us-presidential-election-tweet-analysis-using-hdfn.html
).

The most typical usage is to have NiFi write to a durable store like HDFS,
HBase, SQL, etc., and then have Zeppelin read from the data store.  Would
that not work for you?

Thanks,

James

On Thu, Jun 8, 2017 at 7:09 AM, Wojciech Indyk 
wrote:

> Hi All!
> I am new here. I find NiFi as a great project for data routing. However,
> extendability of NiFi give us a great potential to expand application of
> NiFi routing. Recently I looked at Kylo- a software for management of data
> lake (based on NiFi and its templates). In this composition of NiFi and
> Kylo I feel lack of a component able to do custom data-analytics. I asked
> the Kylo community on possible integration of Kylo with Zeppelin here:
> https://groups.google.com/forum/#!topic/kylo-community/e6JdzneAnV0
>
> The Kylo developers advised me to asked about possible integration here. I
> haven't found any NiFi processor for Zeppelin, that enables to work on the
> input from NiFi and handle the output from Zeppelin. The idea of such
> integration comes from functionality I've seen in Dataiku solution. There
> is a processor, that is able to run arbitrary code in the middle of
> processing by specific input and output variables. Having such opportunity
> the data governance in area of collecting data done by NiFi can be extended
> on data governance regarding data processing, like training of Machine
> Learning models, custom visualizations, etc.
>
> What do you think of this idea? Is any NiFi processor, that supports such
> integration? Is it in line with a NiFi roadmap?
>
> --
> Kind regards/ Pozdrawiam,
> Wojciech Indyk
>


NiFi 1.2.0 REST API problem

2017-06-08 Thread Raymond Rogers
We have a node.js service that automatically creates & manages NiFi groups
using the REST API which works great in NiFi 1.1.1.  We are upgrading our
NiFi instances to 1.2.0 and I have found that some of the processors are
exhibiting odd behavior.

We have a flow the connects to the Outlook 365 OWA service generates an
access token and then uses that token in two different InvokeHTTP
processors.  The first processor always works and the second always returns
an HTTP error 401.

If I delete and manually re-add the InvokeHTTP processor with the same
configuration it always works.

If I export this flow from the NiFi web interface and then re-import it,
only fixing the SSL context service, it works every time.

Using our node.js service to create the exact same flow in NiFi 1.1.1 it
always works.

Thanks,
Raymond


integration with Zeppelin

2017-06-08 Thread Wojciech Indyk
Hi All!
I am new here. I find NiFi as a great project for data routing. However,
extendability of NiFi give us a great potential to expand application of
NiFi routing. Recently I looked at Kylo- a software for management of data
lake (based on NiFi and its templates). In this composition of NiFi and
Kylo I feel lack of a component able to do custom data-analytics. I asked
the Kylo community on possible integration of Kylo with Zeppelin here:
https://groups.google.com/forum/#!topic/kylo-community/e6JdzneAnV0

The Kylo developers advised me to asked about possible integration here. I
haven't found any NiFi processor for Zeppelin, that enables to work on the
input from NiFi and handle the output from Zeppelin. The idea of such
integration comes from functionality I've seen in Dataiku solution. There
is a processor, that is able to run arbitrary code in the middle of
processing by specific input and output variables. Having such opportunity
the data governance in area of collecting data done by NiFi can be extended
on data governance regarding data processing, like training of Machine
Learning models, custom visualizations, etc.

What do you think of this idea? Is any NiFi processor, that supports such
integration? Is it in line with a NiFi roadmap?

--
Kind regards/ Pozdrawiam,
Wojciech Indyk


Re: Detect whether a flowfile has a particular attribute

2017-06-08 Thread James McMahon
I do understand now. Thank you very much Mark. -Jim

On Thu, Jun 8, 2017 at 9:34 AM, Mark Payne  wrote:

> Jim,
>
> The first expression will return false. None of the expressions below will
> ever throw an Exception.
>
> You could even chain them together like 
> ${myAttribute:toLower():length():gt(4)}
> and if myAttribute does not
> exist, it will return false, rather than throwing an Exception.
>
> Thanks
> -Mark
>
>
> On Jun 8, 2017, at 9:32 AM, James McMahon  wrote:
>
> So then if myAttribute does not even exist in a particular flowFile, the
> first expression will return a null value rather than throw an error. Thank
> you very much Mark. -Jim
>
> On Thu, Jun 8, 2017 at 8:44 AM, Mark Payne  wrote:
>
>> Jim,
>>
>> You can use the expression:
>>
>> ${myAttribute:isNull()}
>>
>> Or, alternatively, depending on how you want to setup the route:
>>
>> ${myAttribute:notNull()}
>>
>> If you want to check if the attribute contains 'True' somewhere within
>> its value,
>> then you can use:
>>
>> ${myAttribute:contains('True')}
>>
>> Thanks
>> -Mark
>>
>>
>> > On Jun 8, 2017, at 8:19 AM, James McMahon  wrote:
>> >
>> > Good morning. I receive HTTP POSTs of various types of files. Some have
>> a particular attribute myAttribute, some do not. I want to route the
>> flowfiles to different workflow paths depending on the presence of this
>> attribute. Can I use RouteAttribute and the expression language to do that,
>> something like this:
>> >
>> > hasTheAttributeOfInterest   ${anyAttribute("myAttribute":
>> contains('True')}
>> >
>> > I ask because the expression guide did not say whether a False is
>> returned or the processor throws an error if the attribute does not exist
>> in the flowfile. I may have missed that. I wanted to see if anyone in the
>> group has experience in this regard?
>> >
>> > Thanks in advance for your insights. -Jim
>>
>>
>
>


Re: Detect whether a flowfile has a particular attribute

2017-06-08 Thread Mark Payne
Jim,

The first expression will return false. None of the expressions below will ever 
throw an Exception.

You could even chain them together like ${myAttribute:toLower():length():gt(4)} 
and if myAttribute does not
exist, it will return false, rather than throwing an Exception.

Thanks
-Mark


On Jun 8, 2017, at 9:32 AM, James McMahon 
> wrote:

So then if myAttribute does not even exist in a particular flowFile, the first 
expression will return a null value rather than throw an error. Thank you very 
much Mark. -Jim

On Thu, Jun 8, 2017 at 8:44 AM, Mark Payne 
> wrote:
Jim,

You can use the expression:

${myAttribute:isNull()}

Or, alternatively, depending on how you want to setup the route:

${myAttribute:notNull()}

If you want to check if the attribute contains 'True' somewhere within its 
value,
then you can use:

${myAttribute:contains('True')}

Thanks
-Mark


> On Jun 8, 2017, at 8:19 AM, James McMahon 
> > wrote:
>
> Good morning. I receive HTTP POSTs of various types of files. Some have a 
> particular attribute myAttribute, some do not. I want to route the flowfiles 
> to different workflow paths depending on the presence of this attribute. Can 
> I use RouteAttribute and the expression language to do that, something like 
> this:
>
> hasTheAttributeOfInterest   
> ${anyAttribute("myAttribute":contains('True')}
>
> I ask because the expression guide did not say whether a False is returned or 
> the processor throws an error if the attribute does not exist in the 
> flowfile. I may have missed that. I wanted to see if anyone in the group has 
> experience in this regard?
>
> Thanks in advance for your insights. -Jim





Re: Detect whether a flowfile has a particular attribute

2017-06-08 Thread Mark Payne
Jim,

You can use the expression:

${myAttribute:isNull()}

Or, alternatively, depending on how you want to setup the route:

${myAttribute:notNull()}

If you want to check if the attribute contains 'True' somewhere within its 
value,
then you can use:

${myAttribute:contains('True')}

Thanks
-Mark


> On Jun 8, 2017, at 8:19 AM, James McMahon  wrote:
> 
> Good morning. I receive HTTP POSTs of various types of files. Some have a 
> particular attribute myAttribute, some do not. I want to route the flowfiles 
> to different workflow paths depending on the presence of this attribute. Can 
> I use RouteAttribute and the expression language to do that, something like 
> this:
> 
> hasTheAttributeOfInterest   
> ${anyAttribute("myAttribute":contains('True')}
> 
> I ask because the expression guide did not say whether a False is returned or 
> the processor throws an error if the attribute does not exist in the 
> flowfile. I may have missed that. I wanted to see if anyone in the group has 
> experience in this regard? 
> 
> Thanks in advance for your insights. -Jim



Re: Set priority to files based on date time value stored on attribute

2017-06-08 Thread Manojkumar Ravichandran
Hi Pierre,


After converting those date time format in to integer (using expression
language),I can able to process the file as per the requirement
by setting those integer values to the priority attribute and process those
files based on that priority.

Thanks for your guidance


Regards,
Manoj kumar R


On Thu, Jun 8, 2017 at 8:50 AM, Pierre Villard 
wrote:

> Hi Manoj,
>
> You may want ot have a look at EnforceOrder processor [1] or simply the
> prioritizers [2] of the connections (it depends of how your workflow is
> working). The idea would be to extract the date as an attribute of your
> flow file, convert into an integer (using expression language) and use it
> to ensure order.
>
> [1] https://nifi.apache.org/docs/nifi-docs/components/org.
> apache.nifi/nifi-standard-nar/1.2.0/org.apache.nifi.processors.standard.
> EnforceOrder/index.html
> [2] https://nifi.apache.org/docs/nifi-docs/html/user-guide.
> html#prioritization
>
> Hope this helps.
>
>
> 2017-06-08 8:43 GMT+02:00 Manojkumar Ravichandran 
> :
>
>> Hi All,
>>
>> I need to process the files based on the date time value stored on the
>> attribute
>>
>> *For example:*
>>
>> If the incoming files contains the following date time attribute values
>>
>> *2017/06/07  16:57:02*
>> *2017/06/06  12:49:49*
>> *2017/06/06  11:09:28*
>> *2017/06/06  06:37:45*
>>
>> I need to process the files based on the order of time that is oldest one
>> from the current time
>>
>> First I want to access the file that contains below date time attribute
>> which is the oldest one among them from the current time
>> *i.e 2017/06/06  06:37:45*
>> and then below one,
>> *2017/06/06  11:09:28*
>> and then this
>> *2017/06/06  12:49:49*
>> so on*  *
>>
>> How can I achieve the above mentioned scenario ?
>>
>> Regards,
>> Manoj kumar R
>>
>
>


Detect whether a flowfile has a particular attribute

2017-06-08 Thread James McMahon
Good morning. I receive HTTP POSTs of various types of files. Some have a
particular attribute myAttribute, some do not. I want to route the
flowfiles to different workflow paths depending on the presence of this
attribute. Can I use RouteAttribute and the expression language to do that,
something like this:

hasTheAttributeOfInterest
${anyAttribute("myAttribute":contains('True')}

I ask because the expression guide did not say whether a False is returned
or the processor throws an error if the attribute does not exist in the
flowfile. I may have missed that. I wanted to see if anyone in the group
has experience in this regard?

Thanks in advance for your insights. -Jim


How to perform bulk insert into SQLServer from one machine to another?

2017-06-08 Thread prabhu Mahendran
i have running nifi instance in one machine and have SQL Server in another
machine.

Here i can try to perform bulk insert operation with bulk insert Query in
SQLserver. but i cannot able insert data from one machine and move it into
SQL Server in another machine.

If i run nifi and SQL Server in same machine then i can able to perform
bulk insert operation easily.

i have configured GetFile->ReplaceText(BulkInsertQuery)-->PutSQL processors.

I have tried both nifi and sql server in single machine then bulk insert
works but not works when both instances in different machines.

I need to get all data's from one machine and write a query to move that
data into SQL runs in another machine.

Below query works when nifi and sql server in same machine

BULK INSERT BI FROM 'C:\Directory\input.csv' WITH (FIRSTROW = 1,
ROWTERMINATOR = '\n', FIELDTERMINATOR = ',', ROWS_PER_BATCH = 1)
if i run that query in another machine then it says..,"FileNotFoundError"
due to "input.csv" in Host1 machine but runs query in sql server machine
(host2)

Can anyone give me suggestion to do this?


Re: Set priority to files based on date time value stored on attribute

2017-06-08 Thread Andre
Koji,

One could convert date to epoch format which is incremental in nature.
Would that help?

On 8 Jun 2017 19:33, "Koji Kawamura"  wrote:

> Hi Manoj,
>
> I think EnforceOrder would not be useful in your case, as it expects
> the order to increases one by one (without skip).
> As Pierre suggested, I'd suggest using PriorityAttributePrioritizer.
>
> Thanks,
> Koji
>
> On Thu, Jun 8, 2017 at 3:50 PM, Pierre Villard
>  wrote:
> > Hi Manoj,
> >
> > You may want ot have a look at EnforceOrder processor [1] or simply the
> > prioritizers [2] of the connections (it depends of how your workflow is
> > working). The idea would be to extract the date as an attribute of your
> flow
> > file, convert into an integer (using expression language) and use it to
> > ensure order.
> >
> > [1]
> > https://nifi.apache.org/docs/nifi-docs/components/org.
> apache.nifi/nifi-standard-nar/1.2.0/org.apache.nifi.processors.standard.
> EnforceOrder/index.html
> > [2]
> > https://nifi.apache.org/docs/nifi-docs/html/user-guide.
> html#prioritization
> >
> > Hope this helps.
> >
> >
> > 2017-06-08 8:43 GMT+02:00 Manojkumar Ravichandran <
> sendmailt...@gmail.com>:
> >>
> >> Hi All,
> >>
> >> I need to process the files based on the date time value stored on the
> >> attribute
> >>
> >> For example:
> >>
> >> If the incoming files contains the following date time attribute values
> >>
> >> 2017/06/07  16:57:02
> >> 2017/06/06  12:49:49
> >> 2017/06/06  11:09:28
> >> 2017/06/06  06:37:45
> >>
> >> I need to process the files based on the order of time that is oldest
> one
> >> from the current time
> >>
> >> First I want to access the file that contains below date time attribute
> >> which is the oldest one among them from the current time
> >> i.e 2017/06/06  06:37:45
> >> and then below one,
> >> 2017/06/06  11:09:28
> >> and then this
> >> 2017/06/06  12:49:49
> >> so on 
> >>
> >> How can I achieve the above mentioned scenario ?
> >>
> >> Regards,
> >> Manoj kumar R
> >
> >
>


Re: Set priority to files based on date time value stored on attribute

2017-06-08 Thread Koji Kawamura
Hi Manoj,

I think EnforceOrder would not be useful in your case, as it expects
the order to increases one by one (without skip).
As Pierre suggested, I'd suggest using PriorityAttributePrioritizer.

Thanks,
Koji

On Thu, Jun 8, 2017 at 3:50 PM, Pierre Villard
 wrote:
> Hi Manoj,
>
> You may want ot have a look at EnforceOrder processor [1] or simply the
> prioritizers [2] of the connections (it depends of how your workflow is
> working). The idea would be to extract the date as an attribute of your flow
> file, convert into an integer (using expression language) and use it to
> ensure order.
>
> [1]
> https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-standard-nar/1.2.0/org.apache.nifi.processors.standard.EnforceOrder/index.html
> [2]
> https://nifi.apache.org/docs/nifi-docs/html/user-guide.html#prioritization
>
> Hope this helps.
>
>
> 2017-06-08 8:43 GMT+02:00 Manojkumar Ravichandran :
>>
>> Hi All,
>>
>> I need to process the files based on the date time value stored on the
>> attribute
>>
>> For example:
>>
>> If the incoming files contains the following date time attribute values
>>
>> 2017/06/07  16:57:02
>> 2017/06/06  12:49:49
>> 2017/06/06  11:09:28
>> 2017/06/06  06:37:45
>>
>> I need to process the files based on the order of time that is oldest one
>> from the current time
>>
>> First I want to access the file that contains below date time attribute
>> which is the oldest one among them from the current time
>> i.e 2017/06/06  06:37:45
>> and then below one,
>> 2017/06/06  11:09:28
>> and then this
>> 2017/06/06  12:49:49
>> so on 
>>
>> How can I achieve the above mentioned scenario ?
>>
>> Regards,
>> Manoj kumar R
>
>


Re: Set priority to files based on date time value stored on attribute

2017-06-08 Thread Pierre Villard
Hi Manoj,

You may want ot have a look at EnforceOrder processor [1] or simply the
prioritizers [2] of the connections (it depends of how your workflow is
working). The idea would be to extract the date as an attribute of your
flow file, convert into an integer (using expression language) and use it
to ensure order.

[1]
https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-standard-nar/1.2.0/org.apache.nifi.processors.standard.EnforceOrder/index.html
[2]
https://nifi.apache.org/docs/nifi-docs/html/user-guide.html#prioritization

Hope this helps.


2017-06-08 8:43 GMT+02:00 Manojkumar Ravichandran :

> Hi All,
>
> I need to process the files based on the date time value stored on the
> attribute
>
> *For example:*
>
> If the incoming files contains the following date time attribute values
>
> *2017/06/07  16:57:02*
> *2017/06/06  12:49:49*
> *2017/06/06  11:09:28*
> *2017/06/06  06:37:45*
>
> I need to process the files based on the order of time that is oldest one
> from the current time
>
> First I want to access the file that contains below date time attribute
> which is the oldest one among them from the current time
> *i.e 2017/06/06  06:37:45*
> and then below one,
> *2017/06/06  11:09:28*
> and then this
> *2017/06/06  12:49:49*
> so on*  *
>
> How can I achieve the above mentioned scenario ?
>
> Regards,
> Manoj kumar R
>


Set priority to files based on date time value stored on attribute

2017-06-08 Thread Manojkumar Ravichandran
Hi All,

I need to process the files based on the date time value stored on the
attribute

*For example:*

If the incoming files contains the following date time attribute values

*2017/06/07  16:57:02*
*2017/06/06  12:49:49*
*2017/06/06  11:09:28*
*2017/06/06  06:37:45*

I need to process the files based on the order of time that is oldest one
from the current time

First I want to access the file that contains below date time attribute
which is the oldest one among them from the current time
*i.e 2017/06/06  06:37:45*
and then below one,
*2017/06/06  11:09:28*
and then this
*2017/06/06  12:49:49*
so on*  *

How can I achieve the above mentioned scenario ?

Regards,
Manoj kumar R