date:20180817

[jira] [Commented] (NIFI-5522) HandleHttpRequest enters in fault state and does not recover

2018-08-17 Thread Diego Queiroz (JIRA)



[ 
https://issues.apache.org/jira/browse/NIFI-5522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16584494#comment-16584494
 ] 

Diego Queiroz commented on NIFI-5522:
-

These are bad news for sure. I am still trying to provide a good scenario that 
reproduces the behavior I am describing. I am trying to deploy a new server, to 
isolate this problem. So please keep this issue open for a while.

I tried to generate the dumps, as requested, but they were not useful because 
there were several other flows running at the same time and I can't stop them.

> HandleHttpRequest enters in fault state and does not recover
> 
>
> Key: NIFI-5522
> URL: https://issues.apache.org/jira/browse/NIFI-5522
> Project: Apache NiFi
>  Issue Type: Bug
>Affects Versions: 1.7.0, 1.7.1
>Reporter: Diego Queiroz
>Priority: Critical
>  Labels: security
> Attachments: HandleHttpRequest_Error_Template.xml, 
> image-2018-08-15-21-10-27-926.png, image-2018-08-15-21-10-33-515.png, 
> image-2018-08-15-21-11-57-818.png, image-2018-08-15-21-15-35-364.png, 
> image-2018-08-15-21-19-34-431.png, image-2018-08-15-21-20-31-819.png, 
> test_http_req_resp.xml
>
>
> HandleHttpRequest randomly enters in a fault state and does not recover until 
> I restart the node. I feel the problem is triggered when some exception 
> occurs (ex.: broken request, connection issues, etc), but I am usually able 
> to reproduce this behavior stressing the node with tons of simultaneous 
> requests:
> {{# example script to stress server}}
>  {{for i in `seq 1 1`; do}}
>  {{   wget ‐T10 ‐t10 ‐qO‐ 'http://127.0.0.1:64080/'>/dev/null &}}
>  {{done}}
> When this happens, HandleHttpRequest start to return "HTTP ERROR 503 - 
> Service Unavailable" and does not recover from this state:
> !image-2018-08-15-21-10-33-515.png!
> If I try to stop the HandleHttpRequest processor, the running threads does 
> not terminate:
> !image-2018-08-15-21-11-57-818.png!
> If I force them to terminate, the listen port continue being bound by NiFi:
> !image-2018-08-15-21-15-35-364.png!
> If I try to connect again, I got a HTTP ERROR 500:
> !image-2018-08-15-21-19-34-431.png!
>  
> If I try to start the HandleHttpRequest processor again, it doesn't start 
> with the message:
>  * {{ERROR [Timer-Driven Process Thread-11] 
> o.a.n.p.standard.HandleHttpRequest 
> HandleHttpRequest[id=9bae326b-5ac3-3e9f-2dac-c0399d8f2ddb] 
> {color:#FF}*Failed to process session due to 
> org.apache.nifi.processor.exception.ProcessException: Failed to initialize 
> the server: org.apache.nifi.processor.exception.ProcessException: Failed to 
> initialize the server*{color}}}{\{ 
> org.apache.nifi.processor.exception.ProcessException: Failed to initialize 
> the server}}\{{ {{ at 
> org.apache.nifi.processors.standard.HandleHttpRequest.onTrigger(HandleHttpRequest.java:501)\{{
>  {{ at 
> org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27)\{{
>  {{ at 
> org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1165)\{{
>  {{ at 
> org.apache.nifi.controller.tasks.ConnectableTask.invoke(ConnectableTask.java:203)\{{
>  {{ at 
> org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:117)\{{
>  {{ at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)\{{
>  {{ at 
> java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)\{{ {{ at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)\{{
>  {{ at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)\{{
>  {{ at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)\{{
>  {{ at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)\{{
>  {{ at java.lang.Thread.run(Thread.java:748){{ {color:#FF}*Caused by: 
> java.net.BindException: Address already in use*{color}}}\{{ {{ at 
> sun.nio.ch.Net.bind0(Native Method)\{{ {{ at 
> sun.nio.ch.Net.bind(Net.java:433)\{{ {{ at 
> sun.nio.ch.Net.bind(Net.java:425)\{{ {{ at 
> sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)\{{
>  {{ at 
> sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)\{{ {{ at 
> org.eclipse.jetty.server.ServerConnector.open(ServerConnector.java:298)\{{
>  {{ at 
> org.eclipse.jetty.server.AbstractNetworkConnector.doStart(AbstractNetworkConnector.java:80)\{{
>  {{ at 
> org.eclipse.jetty.server.ServerConnector.doStart(ServerConnector.java:236)\{{
>  {{ at 
> org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:6

[jira] [Commented] (NIFI-5522) HandleHttpRequest enters in fault state and does not recover

2018-08-17 Thread Otto Fowler (JIRA)



[ 
https://issues.apache.org/jira/browse/NIFI-5522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16584415#comment-16584415
 ] 

Otto Fowler commented on NIFI-5522:
---

I tried to verify with 1.7.1 and still could not reproduce.  Sorry.

> HandleHttpRequest enters in fault state and does not recover
> 
>
> Key: NIFI-5522
> URL: https://issues.apache.org/jira/browse/NIFI-5522
> Project: Apache NiFi
>  Issue Type: Bug
>Affects Versions: 1.7.0, 1.7.1
>Reporter: Diego Queiroz
>Priority: Critical
>  Labels: security
> Attachments: HandleHttpRequest_Error_Template.xml, 
> image-2018-08-15-21-10-27-926.png, image-2018-08-15-21-10-33-515.png, 
> image-2018-08-15-21-11-57-818.png, image-2018-08-15-21-15-35-364.png, 
> image-2018-08-15-21-19-34-431.png, image-2018-08-15-21-20-31-819.png, 
> test_http_req_resp.xml
>
>
> HandleHttpRequest randomly enters in a fault state and does not recover until 
> I restart the node. I feel the problem is triggered when some exception 
> occurs (ex.: broken request, connection issues, etc), but I am usually able 
> to reproduce this behavior stressing the node with tons of simultaneous 
> requests:
> {{# example script to stress server}}
>  {{for i in `seq 1 1`; do}}
>  {{   wget ‐T10 ‐t10 ‐qO‐ 'http://127.0.0.1:64080/'>/dev/null &}}
>  {{done}}
> When this happens, HandleHttpRequest start to return "HTTP ERROR 503 - 
> Service Unavailable" and does not recover from this state:
> !image-2018-08-15-21-10-33-515.png!
> If I try to stop the HandleHttpRequest processor, the running threads does 
> not terminate:
> !image-2018-08-15-21-11-57-818.png!
> If I force them to terminate, the listen port continue being bound by NiFi:
> !image-2018-08-15-21-15-35-364.png!
> If I try to connect again, I got a HTTP ERROR 500:
> !image-2018-08-15-21-19-34-431.png!
>  
> If I try to start the HandleHttpRequest processor again, it doesn't start 
> with the message:
>  * {{ERROR [Timer-Driven Process Thread-11] 
> o.a.n.p.standard.HandleHttpRequest 
> HandleHttpRequest[id=9bae326b-5ac3-3e9f-2dac-c0399d8f2ddb] 
> {color:#FF}*Failed to process session due to 
> org.apache.nifi.processor.exception.ProcessException: Failed to initialize 
> the server: org.apache.nifi.processor.exception.ProcessException: Failed to 
> initialize the server*{color}}}{\{ 
> org.apache.nifi.processor.exception.ProcessException: Failed to initialize 
> the server}}\{{ {{ at 
> org.apache.nifi.processors.standard.HandleHttpRequest.onTrigger(HandleHttpRequest.java:501)\{{
>  {{ at 
> org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27)\{{
>  {{ at 
> org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1165)\{{
>  {{ at 
> org.apache.nifi.controller.tasks.ConnectableTask.invoke(ConnectableTask.java:203)\{{
>  {{ at 
> org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:117)\{{
>  {{ at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)\{{
>  {{ at 
> java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)\{{ {{ at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)\{{
>  {{ at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)\{{
>  {{ at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)\{{
>  {{ at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)\{{
>  {{ at java.lang.Thread.run(Thread.java:748){{ {color:#FF}*Caused by: 
> java.net.BindException: Address already in use*{color}}}\{{ {{ at 
> sun.nio.ch.Net.bind0(Native Method)\{{ {{ at 
> sun.nio.ch.Net.bind(Net.java:433)\{{ {{ at 
> sun.nio.ch.Net.bind(Net.java:425)\{{ {{ at 
> sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)\{{
>  {{ at 
> sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)\{{ {{ at 
> org.eclipse.jetty.server.ServerConnector.open(ServerConnector.java:298)\{{
>  {{ at 
> org.eclipse.jetty.server.AbstractNetworkConnector.doStart(AbstractNetworkConnector.java:80)\{{
>  {{ at 
> org.eclipse.jetty.server.ServerConnector.doStart(ServerConnector.java:236)\{{
>  {{ at 
> org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68)\{{
>  {{ at org.eclipse.jetty.server.Server.doStart(Server.java:431)\{{ {{ at 
> org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68)\{{
>  {{ at 
> org.apache.nifi.processors.standard.HandleHttpRequest.initializeServer(HandleHttpRequest.java:430)\{{
>  {{ at 
> org.apach

[jira] [Updated] (NIFI-5534) Create a Nifi Processor using Boilerpipe Article Extractor

2018-08-17 Thread Paul Vidal (JIRA)



 [ 
https://issues.apache.org/jira/browse/NIFI-5534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Vidal updated NIFI-5534:
-
Description: 
Using the boilerpipe library ([https://code.google.com/archive/p/boilerpipe/] 
), I created a simple processor that reads the content of a URL and extract its 
text into a flowfile.

I think it is a good complement to the HMTL nar bundle.

 

Link to my implementation: 
https://github.com/paulvid/nifi/tree/NIFI-5534/nifi-nar-bundles/nifi-html-bundle/nifi-html-processors/

  was:
Using the boilerpipe library ([https://code.google.com/archive/p/boilerpipe/] 
), I created a simple processor that reads the content of a URL and extract its 
text into a flowfile.

I think it is a good complement to the HMTL nar bundle


> Create a Nifi Processor using Boilerpipe Article Extractor
> --
>
> Key: NIFI-5534
> URL: https://issues.apache.org/jira/browse/NIFI-5534
> Project: Apache NiFi
>  Issue Type: New Feature
>Reporter: Paul Vidal
>Priority: Minor
>  Labels: github-import
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Using the boilerpipe library ([https://code.google.com/archive/p/boilerpipe/] 
> ), I created a simple processor that reads the content of a URL and extract 
> its text into a flowfile.
> I think it is a good complement to the HMTL nar bundle.
>  
> Link to my implementation: 
> https://github.com/paulvid/nifi/tree/NIFI-5534/nifi-nar-bundles/nifi-html-bundle/nifi-html-processors/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (NIFI-5534) Create a Nifi Processor using Boilerpipe Article Extractor

2018-08-17 Thread Paul Vidal (JIRA)

Paul Vidal created NIFI-5534:


 Summary: Create a Nifi Processor using Boilerpipe Article Extractor
 Key: NIFI-5534
 URL: https://issues.apache.org/jira/browse/NIFI-5534
 Project: Apache NiFi
  Issue Type: New Feature
Reporter: Paul Vidal


Using the boilerpipe library ([https://code.google.com/archive/p/boilerpipe/] 
), I created a simple processor that reads the content of a URL and extract its 
text into a flowfile.

I think it is a good complement to the HMTL nar bundle



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (NIFI-4731) BigQuery processors

2018-08-17 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/NIFI-4731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16584232#comment-16584232
 ] 

ASF GitHub Bot commented on NIFI-4731:
--

Github user danieljimenez commented on the issue:

https://github.com/apache/nifi/pull/2682
  
Is there anything I can do to ensure this makes it to 1.8? Thanks!


> BigQuery processors
> ---
>
> Key: NIFI-4731
> URL: https://issues.apache.org/jira/browse/NIFI-4731
> Project: Apache NiFi
>  Issue Type: New Feature
>  Components: Extensions
>Reporter: Mikhail Sosonkin
>Priority: Major
>
> NIFI should have processors for putting data into BigQuery (Streaming and 
> Batch).
> Initial working processors can be found this repository: 
> https://github.com/nologic/nifi/tree/NIFI-4731/nifi-nar-bundles/nifi-gcp-bundle/nifi-gcp-processors/src/main/java/org/apache/nifi/processors/gcp/bigquery
> I'd like to get them into Nifi proper.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[GitHub] nifi issue #2682: NIFI-4731: BQ Processors and GCP library update.

2018-08-17 Thread danieljimenez

Github user danieljimenez commented on the issue:

https://github.com/apache/nifi/pull/2682
  
Is there anything I can do to ensure this makes it to 1.8? Thanks!


---

[jira] [Commented] (NIFI-5533) Improve efficiency of FlowFiles' heap usage

2018-08-17 Thread Mark Payne (JIRA)



[ 
https://issues.apache.org/jira/browse/NIFI-5533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16584226#comment-16584226
 ] 

Mark Payne commented on NIFI-5533:
--

StandardRepositoryRecord also keeps a {{Map 
updatedAttributes}}. For a Repository Record where the type = CREATE, we know 
that all attributes are 'updated attributes' so we can avoid storing the 
duplicate map and instead just return {{getCurrent().getAttributes()}} when 
{{getUpdatedAttributes()}} is called.

> Improve efficiency of FlowFiles' heap usage
> ---
>
> Key: NIFI-5533
> URL: https://issues.apache.org/jira/browse/NIFI-5533
> Project: Apache NiFi
>  Issue Type: Improvement
>Reporter: Mark Payne
>Assignee: Mark Payne
>Priority: Major
>
> Looking at the code, I see several places that we can improve the heap that 
> NiFi uses for FlowFiles:
>  * When StandardPreparedQuery is used (any time Expression Language is 
> evaluated), it creates a StringBuilder and iterates over all Expressions, 
> evaluating them and concatenating the results together. If there is only a 
> single Expression, though, we can avoid this and just return the value 
> obtained from the Expression. While this will improve the amount of garbage 
> collected, it plays a more important role: it avoids creating a new String 
> object for the FlowFile's attribute map. Currently, if 1 million FlowFiles go 
> through UpdateAttribute to copy the 'abc' attribute to 'xyz', we have 1 
> million copies of that String on the heap. If we just returned the result of 
> evaluating the Expression, we would instead have 1 copy of that String.
>  * Similar to above, it may make sense in UpdateAttribute to cache N number 
> of entries, so that when an expression like ${filename}.txt is evaluated, 
> even though a new String is generated by StandardPreparedQuery, we can 
> resolve that to the same String object when storing as a FlowFile attribute. 
> This would work similar to {{String.intern()}} but not use 
> {{String.intern()}} because we don't want to store an unbounded number of 
> these values in the {{String.intern()}} cache - we want to cap the number of 
> entries, in case the values aren't always reused.
>  * Every FlowFile that is created by StandardProcessSession has a 'filename' 
> attribute added. The value is obtained by calling 
> {{String.valueOf(System.nanoTime());}} This comes with a few downsides. 
> Firstly, the system call is a bit expensive (though not bad). Secondly, the 
> filename is not very unique - it's common with many dataflows and concurrent 
> tasks running to have several FlowFiles with 'naming collisions'. Most of 
> all, though, it means that we are keeping that String on the heap. A simple 
> test shows that instead using the UUID as the default filename resulted in 
> allowing 20% more FlowFiles to be generated on the same heap before running 
> out of heap.
>  * {{AbstractComponentNode.getProperties()}} creates a copy of its HashMap 
> for every call. If we instead created a copy of it once when the 
> StandardProcessContext was created, we could instead just return that one Map 
> every time, since it can't change over the lifetime of the ProcessContext. 
> This is more about garbage collection and general processor performance than 
> about heap utilization but still in the same realm.
> I am sure that there are far more of these nuances but these are certainly 
> worth tackling.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (NIFIREG-193) Upgrade superagent

2018-08-17 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/NIFIREG-193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16584224#comment-16584224
 ] 

ASF GitHub Bot commented on NIFIREG-193:


Github user scottyaslan commented on the issue:

https://github.com/apache/nifi-registry/pull/135
  
FYI this PR is still open even though it has been merged


> Upgrade superagent
> --
>
> Key: NIFIREG-193
> URL: https://issues.apache.org/jira/browse/NIFIREG-193
> Project: NiFi Registry
>  Issue Type: Bug
>Reporter: Scott Aslan
>Assignee: Scott Aslan
>Priority: Major
> Fix For: 0.3.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[GitHub] nifi-registry issue #135: [NIFIREG-193] upgrade superagent

2018-08-17 Thread scottyaslan

Github user scottyaslan commented on the issue:

https://github.com/apache/nifi-registry/pull/135
  
FYI this PR is still open even though it has been merged


---

[jira] [Commented] (NIFIREG-196) Upgrade lodash, parsejson, https-proxy-agent

2018-08-17 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/NIFIREG-196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16584218#comment-16584218
 ] 

ASF GitHub Bot commented on NIFIREG-196:


GitHub user scottyaslan opened a pull request:

https://github.com/apache/nifi-registry/pull/138

[NIFIREG-196] update client deps



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/scottyaslan/nifi-registry NIFIREG-196

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/nifi-registry/pull/138.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #138


commit 225f5482d865ab46c311aa5f7e785d0e46b95ce5
Author: Scott Aslan 
Date:   2018-08-17T17:52:37Z

[NIFIREG-196] update client deps




> Upgrade lodash, parsejson, https-proxy-agent
> 
>
> Key: NIFIREG-196
> URL: https://issues.apache.org/jira/browse/NIFIREG-196
> Project: NiFi Registry
>  Issue Type: Bug
>Reporter: Scott Aslan
>Assignee: Scott Aslan
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[GitHub] nifi-registry pull request #138: [NIFIREG-196] update client deps

2018-08-17 Thread scottyaslan

GitHub user scottyaslan opened a pull request:

https://github.com/apache/nifi-registry/pull/138

[NIFIREG-196] update client deps



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/scottyaslan/nifi-registry NIFIREG-196

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/nifi-registry/pull/138.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #138


commit 225f5482d865ab46c311aa5f7e785d0e46b95ce5
Author: Scott Aslan 
Date:   2018-08-17T17:52:37Z

[NIFIREG-196] update client deps




---

[jira] [Created] (NIFIREG-196) Upgrade lodash, parsejson, https-proxy-agent

2018-08-17 Thread Scott Aslan (JIRA)

Scott Aslan created NIFIREG-196:
---

 Summary: Upgrade lodash, parsejson, https-proxy-agent
 Key: NIFIREG-196
 URL: https://issues.apache.org/jira/browse/NIFIREG-196
 Project: NiFi Registry
  Issue Type: Bug
Reporter: Scott Aslan






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (NIFIREG-196) Upgrade lodash, parsejson, https-proxy-agent

2018-08-17 Thread Scott Aslan (JIRA)



 [ 
https://issues.apache.org/jira/browse/NIFIREG-196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Scott Aslan reassigned NIFIREG-196:
---

Assignee: Scott Aslan

> Upgrade lodash, parsejson, https-proxy-agent
> 
>
> Key: NIFIREG-196
> URL: https://issues.apache.org/jira/browse/NIFIREG-196
> Project: NiFi Registry
>  Issue Type: Bug
>Reporter: Scott Aslan
>Assignee: Scott Aslan
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (NIFI-5533) Improve efficiency of FlowFiles' heap usage

2018-08-17 Thread Mark Payne (JIRA)



[ 
https://issues.apache.org/jira/browse/NIFI-5533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16584139#comment-16584139
 ] 

Mark Payne commented on NIFI-5533:
--

Additionally, when we commit a session, we serialize the update to the 
FlowFileRepository into a ByteArrayOutputStream, then write it to disk in a 
single call to {{FileOutputStream.write(byte[])}}. This works well for small 
updates, but for a very large update, such as when we have a long Run Duration 
or a Split/Merge case, this can result in a lot of heap. It would make sense, 
instead, for a session commit that has say 10,000 updates (or 1,000 updates) to 
write to a separate file in the FlowFile Repo's directory, then write to the 
flowfile repo some sort of "external reference" with the name of the file. In 
order to ensure the integrity of the FlowFile Repo, we will need to ensure that 
we perform an fsync() on that file before updating the main 'journal'. 
Additionally, we would need to ensure that the file is deleted when we perform 
a checkpoint.

> Improve efficiency of FlowFiles' heap usage
> ---
>
> Key: NIFI-5533
> URL: https://issues.apache.org/jira/browse/NIFI-5533
> Project: Apache NiFi
>  Issue Type: Improvement
>Reporter: Mark Payne
>Assignee: Mark Payne
>Priority: Major
>
> Looking at the code, I see several places that we can improve the heap that 
> NiFi uses for FlowFiles:
>  * When StandardPreparedQuery is used (any time Expression Language is 
> evaluated), it creates a StringBuilder and iterates over all Expressions, 
> evaluating them and concatenating the results together. If there is only a 
> single Expression, though, we can avoid this and just return the value 
> obtained from the Expression. While this will improve the amount of garbage 
> collected, it plays a more important role: it avoids creating a new String 
> object for the FlowFile's attribute map. Currently, if 1 million FlowFiles go 
> through UpdateAttribute to copy the 'abc' attribute to 'xyz', we have 1 
> million copies of that String on the heap. If we just returned the result of 
> evaluating the Expression, we would instead have 1 copy of that String.
>  * Similar to above, it may make sense in UpdateAttribute to cache N number 
> of entries, so that when an expression like ${filename}.txt is evaluated, 
> even though a new String is generated by StandardPreparedQuery, we can 
> resolve that to the same String object when storing as a FlowFile attribute. 
> This would work similar to {{String.intern()}} but not use 
> {{String.intern()}} because we don't want to store an unbounded number of 
> these values in the {{String.intern()}} cache - we want to cap the number of 
> entries, in case the values aren't always reused.
>  * Every FlowFile that is created by StandardProcessSession has a 'filename' 
> attribute added. The value is obtained by calling 
> {{String.valueOf(System.nanoTime());}} This comes with a few downsides. 
> Firstly, the system call is a bit expensive (though not bad). Secondly, the 
> filename is not very unique - it's common with many dataflows and concurrent 
> tasks running to have several FlowFiles with 'naming collisions'. Most of 
> all, though, it means that we are keeping that String on the heap. A simple 
> test shows that instead using the UUID as the default filename resulted in 
> allowing 20% more FlowFiles to be generated on the same heap before running 
> out of heap.
>  * {{AbstractComponentNode.getProperties()}} creates a copy of its HashMap 
> for every call. If we instead created a copy of it once when the 
> StandardProcessContext was created, we could instead just return that one Map 
> every time, since it can't change over the lifetime of the ProcessContext. 
> This is more about garbage collection and general processor performance than 
> about heap utilization but still in the same realm.
> I am sure that there are far more of these nuances but these are certainly 
> worth tackling.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (NIFI-5533) Improve efficiency of FlowFiles' heap usage

2018-08-17 Thread Mark Payne (JIRA)

Mark Payne created NIFI-5533:


 Summary: Improve efficiency of FlowFiles' heap usage
 Key: NIFI-5533
 URL: https://issues.apache.org/jira/browse/NIFI-5533
 Project: Apache NiFi
  Issue Type: Improvement
Reporter: Mark Payne
Assignee: Mark Payne


Looking at the code, I see several places that we can improve the heap that 
NiFi uses for FlowFiles:
 * When StandardPreparedQuery is used (any time Expression Language is 
evaluated), it creates a StringBuilder and iterates over all Expressions, 
evaluating them and concatenating the results together. If there is only a 
single Expression, though, we can avoid this and just return the value obtained 
from the Expression. While this will improve the amount of garbage collected, 
it plays a more important role: it avoids creating a new String object for the 
FlowFile's attribute map. Currently, if 1 million FlowFiles go through 
UpdateAttribute to copy the 'abc' attribute to 'xyz', we have 1 million copies 
of that String on the heap. If we just returned the result of evaluating the 
Expression, we would instead have 1 copy of that String.
 * Similar to above, it may make sense in UpdateAttribute to cache N number of 
entries, so that when an expression like ${filename}.txt is evaluated, even 
though a new String is generated by StandardPreparedQuery, we can resolve that 
to the same String object when storing as a FlowFile attribute. This would work 
similar to {{String.intern()}} but not use {{String.intern()}} because we don't 
want to store an unbounded number of these values in the {{String.intern()}} 
cache - we want to cap the number of entries, in case the values aren't always 
reused.
 * Every FlowFile that is created by StandardProcessSession has a 'filename' 
attribute added. The value is obtained by calling 
{{String.valueOf(System.nanoTime());}} This comes with a few downsides. 
Firstly, the system call is a bit expensive (though not bad). Secondly, the 
filename is not very unique - it's common with many dataflows and concurrent 
tasks running to have several FlowFiles with 'naming collisions'. Most of all, 
though, it means that we are keeping that String on the heap. A simple test 
shows that instead using the UUID as the default filename resulted in allowing 
20% more FlowFiles to be generated on the same heap before running out of heap.
 * {{AbstractComponentNode.getProperties()}} creates a copy of its HashMap for 
every call. If we instead created a copy of it once when the 
StandardProcessContext was created, we could instead just return that one Map 
every time, since it can't change over the lifetime of the ProcessContext. This 
is more about garbage collection and general processor performance than about 
heap utilization but still in the same realm.

I am sure that there are far more of these nuances but these are certainly 
worth tackling.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (NIFI-5532) CRON Scheduling with timezone

2018-08-17 Thread Julian Gimbel (JIRA)

Julian Gimbel created NIFI-5532:
---

 Summary: CRON Scheduling with timezone
 Key: NIFI-5532
 URL: https://issues.apache.org/jira/browse/NIFI-5532
 Project: Apache NiFi
  Issue Type: Improvement
Reporter: Julian Gimbel


Currently it is possible to schedule processors based on a CRON trigger. This 
is fine as long as we only work in one timezone or in timezones that do not 
shift between summer and winter time. If processes should be scheduled at the 
same time the processors need to be rescheduled when timezones are switching 
from summer to winter time.

As pointed out in 
[https://stackoverflow.com/questions/40212696/apache-nifi-how-to-pass-the-timezone-into-the-crontab-string]
 Nifi uses quartz scheduler under the hood which already supports Timezones 
that could be used.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (NIFI-515) DeleteSQS and PutSQS should offer batch processing

2018-08-17 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/NIFI-515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16583870#comment-16583870
 ] 

ASF GitHub Bot commented on NIFI-515:
-

Github user jzonthemtn commented on the issue:

https://github.com/apache/nifi/pull/1977
  
@pvillard31 Wondering if there's still community interest in this one? It 
would be useful for me at least. I can help with the merge conflicts if there 
is.


> DeleteSQS and PutSQS should offer batch processing
> --
>
> Key: NIFI-515
> URL: https://issues.apache.org/jira/browse/NIFI-515
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Extensions
>Reporter: Mark Payne
>Assignee: Pierre Villard
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[GitHub] nifi issue #1977: NIFI-515 - DeleteSQS and PutSQS should offer batch process...

2018-08-17 Thread jzonthemtn

Github user jzonthemtn commented on the issue:

https://github.com/apache/nifi/pull/1977
  
@pvillard31 Wondering if there's still community interest in this one? It 
would be useful for me at least. I can help with the merge conflicts if there 
is.


---

[GitHub] nifi pull request #2942: NIFI-5500: Array support in QueryElasticseachHttp

2018-08-17 Thread jzonthemtn

Github user jzonthemtn commented on a diff in the pull request:

https://github.com/apache/nifi/pull/2942#discussion_r210897472
  
--- Diff: 
nifi-nar-bundles/nifi-elasticsearch-bundle/nifi-elasticsearch-processors/src/main/java/org/apache/nifi/processors/elasticsearch/QueryElasticsearchHttp.java
 ---
@@ -432,7 +432,17 @@ private int getPage(final Response getResponse, final 
URL url, final ProcessCont
 Map attributes = new HashMap<>();
 for(Iterator> it = 
source.fields(); it.hasNext(); ) {
 Entry entry = it.next();
-attributes.put(ATTRIBUTE_PREFIX + entry.getKey(), 
entry.getValue().asText());
+String text_value = "";
+String separator = "";
--- End diff --

I think it might be better if the value of `separator` didn't change. What 
do you think about adding each item's text to an array and then doing something 
like a `StringUtils.join()` on it?


---

[jira] [Commented] (NIFI-5500) Add array support to QueryElasticsearchHttp

2018-08-17 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/NIFI-5500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16583861#comment-16583861
 ] 

ASF GitHub Bot commented on NIFI-5500:
--

Github user jzonthemtn commented on a diff in the pull request:

https://github.com/apache/nifi/pull/2942#discussion_r210897472
  
--- Diff: 
nifi-nar-bundles/nifi-elasticsearch-bundle/nifi-elasticsearch-processors/src/main/java/org/apache/nifi/processors/elasticsearch/QueryElasticsearchHttp.java
 ---
@@ -432,7 +432,17 @@ private int getPage(final Response getResponse, final 
URL url, final ProcessCont
 Map attributes = new HashMap<>();
 for(Iterator> it = 
source.fields(); it.hasNext(); ) {
 Entry entry = it.next();
-attributes.put(ATTRIBUTE_PREFIX + entry.getKey(), 
entry.getValue().asText());
+String text_value = "";
+String separator = "";
--- End diff --

I think it might be better if the value of `separator` didn't change. What 
do you think about adding each item's text to an array and then doing something 
like a `StringUtils.join()` on it?


> Add array support to QueryElasticsearchHttp
> ---
>
> Key: NIFI-5500
> URL: https://issues.apache.org/jira/browse/NIFI-5500
> Project: Apache NiFi
>  Issue Type: Improvement
>Affects Versions: 1.7.1
>Reporter: Wietze B
>Priority: Minor
>
> When using the QueryElasticsearchHttp processor (with output=Attributes) to 
> query a document that contains a field with an Array rather than a String or 
> Integer, the resulting attribute for that field will be an empty string. 
> This is due to the fact that the QueryElasticsearchHttp component doesn't 
> handle array fields correctly. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (NIFI-5529) ReplaceText Now Requires Escapes in some cases

2018-08-17 Thread Otto Fowler (JIRA)



[ 
https://issues.apache.org/jira/browse/NIFI-5529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16583827#comment-16583827
 ] 

Otto Fowler commented on NIFI-5529:
---

I tested against that pull and it works as your original statement now.

[^test_escapes.xml]

> ReplaceText Now Requires Escapes in some cases
> --
>
> Key: NIFI-5529
> URL: https://issues.apache.org/jira/browse/NIFI-5529
> Project: Apache NiFi
>  Issue Type: Bug
>  Components: Extensions
>Affects Versions: 1.7.1
>Reporter: Mike Boonstra
>Priority: Minor
> Attachments: test_escapes.xml
>
>
> In some cases in v1.7.1 you need to escape backslashes in property 
> ReplacementValue; like when "Replacement Strategy"="Regex Replace". Whereas 
> in v1.4.0 you didn't. However, when "Replacement Strategy"="Always Replace" 
> it works as it did in v1.4.0. 
> So for example if before you had this in v1.4.0:
>  * 
> {code:java}
> "ReplacementValue"=${csvRow:replaceAll(',','\n')}{code}
> In v1.7.1 you would need (note the extra backslash):
>  * 
> {code:java}
> "ReplacementValue"=$\{csvRow:replaceAll(',','\\n')}{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (NIFI-5529) ReplaceText Now Requires Escapes in some cases

2018-08-17 Thread Otto Fowler (JIRA)



 [ 
https://issues.apache.org/jira/browse/NIFI-5529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Otto Fowler updated NIFI-5529:
--
Attachment: test_escapes.xml

> ReplaceText Now Requires Escapes in some cases
> --
>
> Key: NIFI-5529
> URL: https://issues.apache.org/jira/browse/NIFI-5529
> Project: Apache NiFi
>  Issue Type: Bug
>  Components: Extensions
>Affects Versions: 1.7.1
>Reporter: Mike Boonstra
>Priority: Minor
> Attachments: test_escapes.xml
>
>
> In some cases in v1.7.1 you need to escape backslashes in property 
> ReplacementValue; like when "Replacement Strategy"="Regex Replace". Whereas 
> in v1.4.0 you didn't. However, when "Replacement Strategy"="Always Replace" 
> it works as it did in v1.4.0. 
> So for example if before you had this in v1.4.0:
>  * 
> {code:java}
> "ReplacementValue"=${csvRow:replaceAll(',','\n')}{code}
> In v1.7.1 you would need (note the extra backslash):
>  * 
> {code:java}
> "ReplacementValue"=$\{csvRow:replaceAll(',','\\n')}{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (NIFI-5531) TCPListen issue

2018-08-17 Thread Charles Lassiter (JIRA)

Charles Lassiter created NIFI-5531:
--

 Summary: TCPListen issue
 Key: NIFI-5531
 URL: https://issues.apache.org/jira/browse/NIFI-5531
 Project: Apache NiFi
  Issue Type: Bug
Affects Versions: 0.7.4
 Environment: RHEL6 server to RHEL6 server over internal network
Reporter: Charles Lassiter


When utilizing the postTCP and ListenTCP, I'm receiving chunked files when 
sending one text file. So as an example, If I send one file via TCP, the 
ListenTCP receives the file and chunks it into 8+ files. Delimiter is set the 
same on both ends so I don't understand why the ListenTCP is chunking the 
files. Any suggestions would help.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Comment Edited] (NIFI-5529) ReplaceText Now Requires Escapes in some cases

2018-08-17 Thread Otto Fowler (JIRA)



[ 
https://issues.apache.org/jira/browse/NIFI-5529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16583782#comment-16583782
 ] 

Otto Fowler edited comment on NIFI-5529 at 8/17/18 11:28 AM:
-

[~Mike Boonstra] can you try this with 
[https://github.com/apache/nifi/pull/2951?|https://github.com/apache/nifi/pull/2951]
  It addresses another regression.  


was (Author: ottobackwards):
[~Mike Boonstra] can you try this with 
[https://github.com/apache/nifi/pull/2951?|https://github.com/apache/nifi/pull/2951]
  It addresses another regression.  It reverts the change/approach that most 
likely introduced this issue.

> ReplaceText Now Requires Escapes in some cases
> --
>
> Key: NIFI-5529
> URL: https://issues.apache.org/jira/browse/NIFI-5529
> Project: Apache NiFi
>  Issue Type: Bug
>  Components: Extensions
>Affects Versions: 1.7.1
>Reporter: Mike Boonstra
>Priority: Minor
>
> In some cases in v1.7.1 you need to escape backslashes in property 
> ReplacementValue; like when "Replacement Strategy"="Regex Replace". Whereas 
> in v1.4.0 you didn't. However, when "Replacement Strategy"="Always Replace" 
> it works as it did in v1.4.0. 
> So for example if before you had this in v1.4.0:
>  * 
> {code:java}
> "ReplacementValue"=${csvRow:replaceAll(',','\n')}{code}
> In v1.7.1 you would need (note the extra backslash):
>  * 
> {code:java}
> "ReplacementValue"=$\{csvRow:replaceAll(',','\\n')}{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (NIFI-5529) ReplaceText Now Requires Escapes in some cases

2018-08-17 Thread Otto Fowler (JIRA)



[ 
https://issues.apache.org/jira/browse/NIFI-5529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16583782#comment-16583782
 ] 

Otto Fowler commented on NIFI-5529:
---

[~Mike Boonstra] can you try this with 
[https://github.com/apache/nifi/pull/2951?|https://github.com/apache/nifi/pull/2951]
  It addresses another regression.  It reverts the change/approach that most 
likely introduced this issue.

> ReplaceText Now Requires Escapes in some cases
> --
>
> Key: NIFI-5529
> URL: https://issues.apache.org/jira/browse/NIFI-5529
> Project: Apache NiFi
>  Issue Type: Bug
>  Components: Extensions
>Affects Versions: 1.7.1
>Reporter: Mike Boonstra
>Priority: Minor
>
> In some cases in v1.7.1 you need to escape backslashes in property 
> ReplacementValue; like when "Replacement Strategy"="Regex Replace". Whereas 
> in v1.4.0 you didn't. However, when "Replacement Strategy"="Always Replace" 
> it works as it did in v1.4.0. 
> So for example if before you had this in v1.4.0:
>  * 
> {code:java}
> "ReplacementValue"=${csvRow:replaceAll(',','\n')}{code}
> In v1.7.1 you would need (note the extra backslash):
>  * 
> {code:java}
> "ReplacementValue"=$\{csvRow:replaceAll(',','\\n')}{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (NIFI-5327) NetFlow Processors

2018-08-17 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/NIFI-5327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16583745#comment-16583745
 ] 

ASF GitHub Bot commented on NIFI-5327:
--

Github user PrashanthVenkatesan commented on a diff in the pull request:

https://github.com/apache/nifi/pull/2820#discussion_r210868364
  
--- Diff: 
nifi-nar-bundles/nifi-network-bundle/nifi-network-processors/src/main/java/org/apache/nifi/processors/network/ParseNetflowv5.java
 ---
@@ -0,0 +1,258 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.nifi.processors.network;
+
+import static 
org.apache.nifi.processors.network.parser.Netflowv5Parser.getHeaderFields;
+import static 
org.apache.nifi.processors.network.parser.Netflowv5Parser.getRecordFields;
+
+import java.io.BufferedOutputStream;
+import java.io.IOException;
+import java.io.InputStream;
+import java.io.OutputStream;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.OptionalInt;
+import java.util.Set;
+
+import org.apache.nifi.annotation.behavior.EventDriven;
+import org.apache.nifi.annotation.behavior.InputRequirement;
+import org.apache.nifi.annotation.behavior.InputRequirement.Requirement;
+import org.apache.nifi.annotation.behavior.ReadsAttribute;
+import org.apache.nifi.annotation.behavior.ReadsAttributes;
+import org.apache.nifi.annotation.behavior.SideEffectFree;
+import org.apache.nifi.annotation.behavior.SupportsBatching;
+import org.apache.nifi.annotation.behavior.WritesAttribute;
+import org.apache.nifi.annotation.behavior.WritesAttributes;
+import org.apache.nifi.annotation.documentation.CapabilityDescription;
+import org.apache.nifi.annotation.documentation.Tags;
+import org.apache.nifi.annotation.lifecycle.OnScheduled;
+import org.apache.nifi.components.PropertyDescriptor;
+import org.apache.nifi.flowfile.FlowFile;
+import org.apache.nifi.flowfile.attributes.CoreAttributes;
+import org.apache.nifi.processor.AbstractProcessor;
+import org.apache.nifi.processor.ProcessContext;
+import org.apache.nifi.processor.ProcessSession;
+import org.apache.nifi.processor.Relationship;
+import org.apache.nifi.processor.exception.ProcessException;
+import org.apache.nifi.processor.io.InputStreamCallback;
+import org.apache.nifi.processor.io.OutputStreamCallback;
+import org.apache.nifi.processors.network.parser.Netflowv5Parser;
+import org.apache.nifi.stream.io.StreamUtils;
+
+import com.fasterxml.jackson.core.JsonProcessingException;
+import com.fasterxml.jackson.databind.ObjectMapper;
+import com.fasterxml.jackson.databind.node.ObjectNode;
+
+@EventDriven
+@SideEffectFree
+@SupportsBatching
+@InputRequirement(Requirement.INPUT_REQUIRED)
+@Tags({ "network", "netflow", "attributes", "datagram", "v5", "packet", 
"byte" })
+@CapabilityDescription("Parses netflowv5 byte ingest and add to NiFi 
flowfile as attributes or JSON content.")
+@ReadsAttributes({ @ReadsAttribute(attribute = "udp.port", description = 
"Optionally read if packets are received from UDP datagrams.") })
+@WritesAttributes({ @WritesAttribute(attribute = "netflowv5.header.*", 
description = "The key and value generated by the parsing of the header 
fields."),
+@WritesAttribute(attribute = "netflowv5.record.*", description = 
"The key and value generated by the parsing of the record fields.") })
+
+public class ParseNetflowv5 extends AbstractProcessor {
+private String destination;
+// Add mapper
+private static final ObjectMapper mapper = new ObjectMapper();
+
+public static final String DESTINATION_CONTENT = "flowfile-content";
+public static final String DESTINATION_ATTRIBUTES = 
"flowfile-attribute";
+public static final PropertyDescriptor FIELDS_DESTINATION

[GitHub] nifi pull request #2820: NIFI-5327 Adding Netflowv5 protocol parser

2018-08-17 Thread PrashanthVenkatesan

Github user PrashanthVenkatesan commented on a diff in the pull request:

https://github.com/apache/nifi/pull/2820#discussion_r210868364
  
--- Diff: 
nifi-nar-bundles/nifi-network-bundle/nifi-network-processors/src/main/java/org/apache/nifi/processors/network/ParseNetflowv5.java
 ---
@@ -0,0 +1,258 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.nifi.processors.network;
+
+import static 
org.apache.nifi.processors.network.parser.Netflowv5Parser.getHeaderFields;
+import static 
org.apache.nifi.processors.network.parser.Netflowv5Parser.getRecordFields;
+
+import java.io.BufferedOutputStream;
+import java.io.IOException;
+import java.io.InputStream;
+import java.io.OutputStream;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.OptionalInt;
+import java.util.Set;
+
+import org.apache.nifi.annotation.behavior.EventDriven;
+import org.apache.nifi.annotation.behavior.InputRequirement;
+import org.apache.nifi.annotation.behavior.InputRequirement.Requirement;
+import org.apache.nifi.annotation.behavior.ReadsAttribute;
+import org.apache.nifi.annotation.behavior.ReadsAttributes;
+import org.apache.nifi.annotation.behavior.SideEffectFree;
+import org.apache.nifi.annotation.behavior.SupportsBatching;
+import org.apache.nifi.annotation.behavior.WritesAttribute;
+import org.apache.nifi.annotation.behavior.WritesAttributes;
+import org.apache.nifi.annotation.documentation.CapabilityDescription;
+import org.apache.nifi.annotation.documentation.Tags;
+import org.apache.nifi.annotation.lifecycle.OnScheduled;
+import org.apache.nifi.components.PropertyDescriptor;
+import org.apache.nifi.flowfile.FlowFile;
+import org.apache.nifi.flowfile.attributes.CoreAttributes;
+import org.apache.nifi.processor.AbstractProcessor;
+import org.apache.nifi.processor.ProcessContext;
+import org.apache.nifi.processor.ProcessSession;
+import org.apache.nifi.processor.Relationship;
+import org.apache.nifi.processor.exception.ProcessException;
+import org.apache.nifi.processor.io.InputStreamCallback;
+import org.apache.nifi.processor.io.OutputStreamCallback;
+import org.apache.nifi.processors.network.parser.Netflowv5Parser;
+import org.apache.nifi.stream.io.StreamUtils;
+
+import com.fasterxml.jackson.core.JsonProcessingException;
+import com.fasterxml.jackson.databind.ObjectMapper;
+import com.fasterxml.jackson.databind.node.ObjectNode;
+
+@EventDriven
+@SideEffectFree
+@SupportsBatching
+@InputRequirement(Requirement.INPUT_REQUIRED)
+@Tags({ "network", "netflow", "attributes", "datagram", "v5", "packet", 
"byte" })
+@CapabilityDescription("Parses netflowv5 byte ingest and add to NiFi 
flowfile as attributes or JSON content.")
+@ReadsAttributes({ @ReadsAttribute(attribute = "udp.port", description = 
"Optionally read if packets are received from UDP datagrams.") })
+@WritesAttributes({ @WritesAttribute(attribute = "netflowv5.header.*", 
description = "The key and value generated by the parsing of the header 
fields."),
+@WritesAttribute(attribute = "netflowv5.record.*", description = 
"The key and value generated by the parsing of the record fields.") })
+
+public class ParseNetflowv5 extends AbstractProcessor {
+private String destination;
+// Add mapper
+private static final ObjectMapper mapper = new ObjectMapper();
+
+public static final String DESTINATION_CONTENT = "flowfile-content";
+public static final String DESTINATION_ATTRIBUTES = 
"flowfile-attribute";
+public static final PropertyDescriptor FIELDS_DESTINATION = new 
PropertyDescriptor.Builder().name("FIELDS_DESTINATION").displayName("Parsed 
fields destination")
+.description("Indicates whether the results of the parser are 
written " + "to the FlowFile content or a FlowFile attribute; if usin

[jira] [Commented] (NIFI-5522) HandleHttpRequest enters in fault state and does not recover

[jira] [Commented] (NIFI-5522) HandleHttpRequest enters in fault state and does not recover

[jira] [Updated] (NIFI-5534) Create a Nifi Processor using Boilerpipe Article Extractor

[jira] [Created] (NIFI-5534) Create a Nifi Processor using Boilerpipe Article Extractor

[jira] [Commented] (NIFI-4731) BigQuery processors

[GitHub] nifi issue #2682: NIFI-4731: BQ Processors and GCP library update.

[jira] [Commented] (NIFI-5533) Improve efficiency of FlowFiles' heap usage

[jira] [Commented] (NIFIREG-193) Upgrade superagent

[GitHub] nifi-registry issue #135: [NIFIREG-193] upgrade superagent

[jira] [Commented] (NIFIREG-196) Upgrade lodash, parsejson, https-proxy-agent

[GitHub] nifi-registry pull request #138: [NIFIREG-196] update client deps

[jira] [Created] (NIFIREG-196) Upgrade lodash, parsejson, https-proxy-agent

[jira] [Assigned] (NIFIREG-196) Upgrade lodash, parsejson, https-proxy-agent

[jira] [Commented] (NIFI-5533) Improve efficiency of FlowFiles' heap usage

[jira] [Created] (NIFI-5533) Improve efficiency of FlowFiles' heap usage

[jira] [Created] (NIFI-5532) CRON Scheduling with timezone

[jira] [Commented] (NIFI-515) DeleteSQS and PutSQS should offer batch processing

[GitHub] nifi issue #1977: NIFI-515 - DeleteSQS and PutSQS should offer batch process...

[GitHub] nifi pull request #2942: NIFI-5500: Array support in QueryElasticseachHttp

[jira] [Commented] (NIFI-5500) Add array support to QueryElasticsearchHttp

[jira] [Commented] (NIFI-5529) ReplaceText Now Requires Escapes in some cases

[jira] [Updated] (NIFI-5529) ReplaceText Now Requires Escapes in some cases

[jira] [Created] (NIFI-5531) TCPListen issue

[jira] [Comment Edited] (NIFI-5529) ReplaceText Now Requires Escapes in some cases

[jira] [Commented] (NIFI-5529) ReplaceText Now Requires Escapes in some cases

[jira] [Commented] (NIFI-5327) NetFlow Processors

[GitHub] nifi pull request #2820: NIFI-5327 Adding Netflowv5 protocol parser

27 matches

Site Navigation

Mail list logo

Footer information