[jira] [Commented] (NIFI-5522) HandleHttpRequest enters in fault state and does not recover
[ https://issues.apache.org/jira/browse/NIFI-5522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16584494#comment-16584494 ] Diego Queiroz commented on NIFI-5522: - These are bad news for sure. I am still trying to provide a good scenario that reproduces the behavior I am describing. I am trying to deploy a new server, to isolate this problem. So please keep this issue open for a while. I tried to generate the dumps, as requested, but they were not useful because there were several other flows running at the same time and I can't stop them. > HandleHttpRequest enters in fault state and does not recover > > > Key: NIFI-5522 > URL: https://issues.apache.org/jira/browse/NIFI-5522 > Project: Apache NiFi > Issue Type: Bug >Affects Versions: 1.7.0, 1.7.1 >Reporter: Diego Queiroz >Priority: Critical > Labels: security > Attachments: HandleHttpRequest_Error_Template.xml, > image-2018-08-15-21-10-27-926.png, image-2018-08-15-21-10-33-515.png, > image-2018-08-15-21-11-57-818.png, image-2018-08-15-21-15-35-364.png, > image-2018-08-15-21-19-34-431.png, image-2018-08-15-21-20-31-819.png, > test_http_req_resp.xml > > > HandleHttpRequest randomly enters in a fault state and does not recover until > I restart the node. I feel the problem is triggered when some exception > occurs (ex.: broken request, connection issues, etc), but I am usually able > to reproduce this behavior stressing the node with tons of simultaneous > requests: > {{# example script to stress server}} > {{for i in `seq 1 1`; do}} > {{ wget ‐T10 ‐t10 ‐qO‐ 'http://127.0.0.1:64080/'>/dev/null &}} > {{done}} > When this happens, HandleHttpRequest start to return "HTTP ERROR 503 - > Service Unavailable" and does not recover from this state: > !image-2018-08-15-21-10-33-515.png! > If I try to stop the HandleHttpRequest processor, the running threads does > not terminate: > !image-2018-08-15-21-11-57-818.png! > If I force them to terminate, the listen port continue being bound by NiFi: > !image-2018-08-15-21-15-35-364.png! > If I try to connect again, I got a HTTP ERROR 500: > !image-2018-08-15-21-19-34-431.png! > > If I try to start the HandleHttpRequest processor again, it doesn't start > with the message: > * {{ERROR [Timer-Driven Process Thread-11] > o.a.n.p.standard.HandleHttpRequest > HandleHttpRequest[id=9bae326b-5ac3-3e9f-2dac-c0399d8f2ddb] > {color:#FF}*Failed to process session due to > org.apache.nifi.processor.exception.ProcessException: Failed to initialize > the server: org.apache.nifi.processor.exception.ProcessException: Failed to > initialize the server*{color}}}{\{ > org.apache.nifi.processor.exception.ProcessException: Failed to initialize > the server}}\{{ {{ at > org.apache.nifi.processors.standard.HandleHttpRequest.onTrigger(HandleHttpRequest.java:501)\{{ > {{ at > org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27)\{{ > {{ at > org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1165)\{{ > {{ at > org.apache.nifi.controller.tasks.ConnectableTask.invoke(ConnectableTask.java:203)\{{ > {{ at > org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:117)\{{ > {{ at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)\{{ > {{ at > java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)\{{ {{ at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)\{{ > {{ at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)\{{ > {{ at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)\{{ > {{ at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)\{{ > {{ at java.lang.Thread.run(Thread.java:748){{ {color:#FF}*Caused by: > java.net.BindException: Address already in use*{color}}}\{{ {{ at > sun.nio.ch.Net.bind0(Native Method)\{{ {{ at > sun.nio.ch.Net.bind(Net.java:433)\{{ {{ at > sun.nio.ch.Net.bind(Net.java:425)\{{ {{ at > sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)\{{ > {{ at > sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)\{{ {{ at > org.eclipse.jetty.server.ServerConnector.open(ServerConnector.java:298)\{{ > {{ at > org.eclipse.jetty.server.AbstractNetworkConnector.doStart(AbstractNetworkConnector.java:80)\{{ > {{ at > org.eclipse.jetty.server.ServerConnector.doStart(ServerConnector.java:236)\{{ > {{ at > org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:6
[jira] [Commented] (NIFI-5522) HandleHttpRequest enters in fault state and does not recover
[ https://issues.apache.org/jira/browse/NIFI-5522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16584415#comment-16584415 ] Otto Fowler commented on NIFI-5522: --- I tried to verify with 1.7.1 and still could not reproduce. Sorry. > HandleHttpRequest enters in fault state and does not recover > > > Key: NIFI-5522 > URL: https://issues.apache.org/jira/browse/NIFI-5522 > Project: Apache NiFi > Issue Type: Bug >Affects Versions: 1.7.0, 1.7.1 >Reporter: Diego Queiroz >Priority: Critical > Labels: security > Attachments: HandleHttpRequest_Error_Template.xml, > image-2018-08-15-21-10-27-926.png, image-2018-08-15-21-10-33-515.png, > image-2018-08-15-21-11-57-818.png, image-2018-08-15-21-15-35-364.png, > image-2018-08-15-21-19-34-431.png, image-2018-08-15-21-20-31-819.png, > test_http_req_resp.xml > > > HandleHttpRequest randomly enters in a fault state and does not recover until > I restart the node. I feel the problem is triggered when some exception > occurs (ex.: broken request, connection issues, etc), but I am usually able > to reproduce this behavior stressing the node with tons of simultaneous > requests: > {{# example script to stress server}} > {{for i in `seq 1 1`; do}} > {{ wget ‐T10 ‐t10 ‐qO‐ 'http://127.0.0.1:64080/'>/dev/null &}} > {{done}} > When this happens, HandleHttpRequest start to return "HTTP ERROR 503 - > Service Unavailable" and does not recover from this state: > !image-2018-08-15-21-10-33-515.png! > If I try to stop the HandleHttpRequest processor, the running threads does > not terminate: > !image-2018-08-15-21-11-57-818.png! > If I force them to terminate, the listen port continue being bound by NiFi: > !image-2018-08-15-21-15-35-364.png! > If I try to connect again, I got a HTTP ERROR 500: > !image-2018-08-15-21-19-34-431.png! > > If I try to start the HandleHttpRequest processor again, it doesn't start > with the message: > * {{ERROR [Timer-Driven Process Thread-11] > o.a.n.p.standard.HandleHttpRequest > HandleHttpRequest[id=9bae326b-5ac3-3e9f-2dac-c0399d8f2ddb] > {color:#FF}*Failed to process session due to > org.apache.nifi.processor.exception.ProcessException: Failed to initialize > the server: org.apache.nifi.processor.exception.ProcessException: Failed to > initialize the server*{color}}}{\{ > org.apache.nifi.processor.exception.ProcessException: Failed to initialize > the server}}\{{ {{ at > org.apache.nifi.processors.standard.HandleHttpRequest.onTrigger(HandleHttpRequest.java:501)\{{ > {{ at > org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27)\{{ > {{ at > org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1165)\{{ > {{ at > org.apache.nifi.controller.tasks.ConnectableTask.invoke(ConnectableTask.java:203)\{{ > {{ at > org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:117)\{{ > {{ at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)\{{ > {{ at > java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)\{{ {{ at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)\{{ > {{ at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)\{{ > {{ at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)\{{ > {{ at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)\{{ > {{ at java.lang.Thread.run(Thread.java:748){{ {color:#FF}*Caused by: > java.net.BindException: Address already in use*{color}}}\{{ {{ at > sun.nio.ch.Net.bind0(Native Method)\{{ {{ at > sun.nio.ch.Net.bind(Net.java:433)\{{ {{ at > sun.nio.ch.Net.bind(Net.java:425)\{{ {{ at > sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)\{{ > {{ at > sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)\{{ {{ at > org.eclipse.jetty.server.ServerConnector.open(ServerConnector.java:298)\{{ > {{ at > org.eclipse.jetty.server.AbstractNetworkConnector.doStart(AbstractNetworkConnector.java:80)\{{ > {{ at > org.eclipse.jetty.server.ServerConnector.doStart(ServerConnector.java:236)\{{ > {{ at > org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68)\{{ > {{ at org.eclipse.jetty.server.Server.doStart(Server.java:431)\{{ {{ at > org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68)\{{ > {{ at > org.apache.nifi.processors.standard.HandleHttpRequest.initializeServer(HandleHttpRequest.java:430)\{{ > {{ at > org.apach
[jira] [Updated] (NIFI-5534) Create a Nifi Processor using Boilerpipe Article Extractor
[ https://issues.apache.org/jira/browse/NIFI-5534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Vidal updated NIFI-5534: - Description: Using the boilerpipe library ([https://code.google.com/archive/p/boilerpipe/] ), I created a simple processor that reads the content of a URL and extract its text into a flowfile. I think it is a good complement to the HMTL nar bundle. Link to my implementation: https://github.com/paulvid/nifi/tree/NIFI-5534/nifi-nar-bundles/nifi-html-bundle/nifi-html-processors/ was: Using the boilerpipe library ([https://code.google.com/archive/p/boilerpipe/] ), I created a simple processor that reads the content of a URL and extract its text into a flowfile. I think it is a good complement to the HMTL nar bundle > Create a Nifi Processor using Boilerpipe Article Extractor > -- > > Key: NIFI-5534 > URL: https://issues.apache.org/jira/browse/NIFI-5534 > Project: Apache NiFi > Issue Type: New Feature >Reporter: Paul Vidal >Priority: Minor > Labels: github-import > Original Estimate: 24h > Remaining Estimate: 24h > > Using the boilerpipe library ([https://code.google.com/archive/p/boilerpipe/] > ), I created a simple processor that reads the content of a URL and extract > its text into a flowfile. > I think it is a good complement to the HMTL nar bundle. > > Link to my implementation: > https://github.com/paulvid/nifi/tree/NIFI-5534/nifi-nar-bundles/nifi-html-bundle/nifi-html-processors/ -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (NIFI-5534) Create a Nifi Processor using Boilerpipe Article Extractor
Paul Vidal created NIFI-5534: Summary: Create a Nifi Processor using Boilerpipe Article Extractor Key: NIFI-5534 URL: https://issues.apache.org/jira/browse/NIFI-5534 Project: Apache NiFi Issue Type: New Feature Reporter: Paul Vidal Using the boilerpipe library ([https://code.google.com/archive/p/boilerpipe/] ), I created a simple processor that reads the content of a URL and extract its text into a flowfile. I think it is a good complement to the HMTL nar bundle -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (NIFI-4731) BigQuery processors
[ https://issues.apache.org/jira/browse/NIFI-4731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16584232#comment-16584232 ] ASF GitHub Bot commented on NIFI-4731: -- Github user danieljimenez commented on the issue: https://github.com/apache/nifi/pull/2682 Is there anything I can do to ensure this makes it to 1.8? Thanks! > BigQuery processors > --- > > Key: NIFI-4731 > URL: https://issues.apache.org/jira/browse/NIFI-4731 > Project: Apache NiFi > Issue Type: New Feature > Components: Extensions >Reporter: Mikhail Sosonkin >Priority: Major > > NIFI should have processors for putting data into BigQuery (Streaming and > Batch). > Initial working processors can be found this repository: > https://github.com/nologic/nifi/tree/NIFI-4731/nifi-nar-bundles/nifi-gcp-bundle/nifi-gcp-processors/src/main/java/org/apache/nifi/processors/gcp/bigquery > I'd like to get them into Nifi proper. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] nifi issue #2682: NIFI-4731: BQ Processors and GCP library update.
Github user danieljimenez commented on the issue: https://github.com/apache/nifi/pull/2682 Is there anything I can do to ensure this makes it to 1.8? Thanks! ---
[jira] [Commented] (NIFI-5533) Improve efficiency of FlowFiles' heap usage
[ https://issues.apache.org/jira/browse/NIFI-5533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16584226#comment-16584226 ] Mark Payne commented on NIFI-5533: -- StandardRepositoryRecord also keeps a {{Map updatedAttributes}}. For a Repository Record where the type = CREATE, we know that all attributes are 'updated attributes' so we can avoid storing the duplicate map and instead just return {{getCurrent().getAttributes()}} when {{getUpdatedAttributes()}} is called. > Improve efficiency of FlowFiles' heap usage > --- > > Key: NIFI-5533 > URL: https://issues.apache.org/jira/browse/NIFI-5533 > Project: Apache NiFi > Issue Type: Improvement >Reporter: Mark Payne >Assignee: Mark Payne >Priority: Major > > Looking at the code, I see several places that we can improve the heap that > NiFi uses for FlowFiles: > * When StandardPreparedQuery is used (any time Expression Language is > evaluated), it creates a StringBuilder and iterates over all Expressions, > evaluating them and concatenating the results together. If there is only a > single Expression, though, we can avoid this and just return the value > obtained from the Expression. While this will improve the amount of garbage > collected, it plays a more important role: it avoids creating a new String > object for the FlowFile's attribute map. Currently, if 1 million FlowFiles go > through UpdateAttribute to copy the 'abc' attribute to 'xyz', we have 1 > million copies of that String on the heap. If we just returned the result of > evaluating the Expression, we would instead have 1 copy of that String. > * Similar to above, it may make sense in UpdateAttribute to cache N number > of entries, so that when an expression like ${filename}.txt is evaluated, > even though a new String is generated by StandardPreparedQuery, we can > resolve that to the same String object when storing as a FlowFile attribute. > This would work similar to {{String.intern()}} but not use > {{String.intern()}} because we don't want to store an unbounded number of > these values in the {{String.intern()}} cache - we want to cap the number of > entries, in case the values aren't always reused. > * Every FlowFile that is created by StandardProcessSession has a 'filename' > attribute added. The value is obtained by calling > {{String.valueOf(System.nanoTime());}} This comes with a few downsides. > Firstly, the system call is a bit expensive (though not bad). Secondly, the > filename is not very unique - it's common with many dataflows and concurrent > tasks running to have several FlowFiles with 'naming collisions'. Most of > all, though, it means that we are keeping that String on the heap. A simple > test shows that instead using the UUID as the default filename resulted in > allowing 20% more FlowFiles to be generated on the same heap before running > out of heap. > * {{AbstractComponentNode.getProperties()}} creates a copy of its HashMap > for every call. If we instead created a copy of it once when the > StandardProcessContext was created, we could instead just return that one Map > every time, since it can't change over the lifetime of the ProcessContext. > This is more about garbage collection and general processor performance than > about heap utilization but still in the same realm. > I am sure that there are far more of these nuances but these are certainly > worth tackling. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (NIFIREG-193) Upgrade superagent
[ https://issues.apache.org/jira/browse/NIFIREG-193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16584224#comment-16584224 ] ASF GitHub Bot commented on NIFIREG-193: Github user scottyaslan commented on the issue: https://github.com/apache/nifi-registry/pull/135 FYI this PR is still open even though it has been merged > Upgrade superagent > -- > > Key: NIFIREG-193 > URL: https://issues.apache.org/jira/browse/NIFIREG-193 > Project: NiFi Registry > Issue Type: Bug >Reporter: Scott Aslan >Assignee: Scott Aslan >Priority: Major > Fix For: 0.3.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] nifi-registry issue #135: [NIFIREG-193] upgrade superagent
Github user scottyaslan commented on the issue: https://github.com/apache/nifi-registry/pull/135 FYI this PR is still open even though it has been merged ---
[jira] [Commented] (NIFIREG-196) Upgrade lodash, parsejson, https-proxy-agent
[ https://issues.apache.org/jira/browse/NIFIREG-196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16584218#comment-16584218 ] ASF GitHub Bot commented on NIFIREG-196: GitHub user scottyaslan opened a pull request: https://github.com/apache/nifi-registry/pull/138 [NIFIREG-196] update client deps You can merge this pull request into a Git repository by running: $ git pull https://github.com/scottyaslan/nifi-registry NIFIREG-196 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/nifi-registry/pull/138.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #138 commit 225f5482d865ab46c311aa5f7e785d0e46b95ce5 Author: Scott Aslan Date: 2018-08-17T17:52:37Z [NIFIREG-196] update client deps > Upgrade lodash, parsejson, https-proxy-agent > > > Key: NIFIREG-196 > URL: https://issues.apache.org/jira/browse/NIFIREG-196 > Project: NiFi Registry > Issue Type: Bug >Reporter: Scott Aslan >Assignee: Scott Aslan >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] nifi-registry pull request #138: [NIFIREG-196] update client deps
GitHub user scottyaslan opened a pull request: https://github.com/apache/nifi-registry/pull/138 [NIFIREG-196] update client deps You can merge this pull request into a Git repository by running: $ git pull https://github.com/scottyaslan/nifi-registry NIFIREG-196 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/nifi-registry/pull/138.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #138 commit 225f5482d865ab46c311aa5f7e785d0e46b95ce5 Author: Scott Aslan Date: 2018-08-17T17:52:37Z [NIFIREG-196] update client deps ---
[jira] [Created] (NIFIREG-196) Upgrade lodash, parsejson, https-proxy-agent
Scott Aslan created NIFIREG-196: --- Summary: Upgrade lodash, parsejson, https-proxy-agent Key: NIFIREG-196 URL: https://issues.apache.org/jira/browse/NIFIREG-196 Project: NiFi Registry Issue Type: Bug Reporter: Scott Aslan -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (NIFIREG-196) Upgrade lodash, parsejson, https-proxy-agent
[ https://issues.apache.org/jira/browse/NIFIREG-196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Scott Aslan reassigned NIFIREG-196: --- Assignee: Scott Aslan > Upgrade lodash, parsejson, https-proxy-agent > > > Key: NIFIREG-196 > URL: https://issues.apache.org/jira/browse/NIFIREG-196 > Project: NiFi Registry > Issue Type: Bug >Reporter: Scott Aslan >Assignee: Scott Aslan >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (NIFI-5533) Improve efficiency of FlowFiles' heap usage
[ https://issues.apache.org/jira/browse/NIFI-5533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16584139#comment-16584139 ] Mark Payne commented on NIFI-5533: -- Additionally, when we commit a session, we serialize the update to the FlowFileRepository into a ByteArrayOutputStream, then write it to disk in a single call to {{FileOutputStream.write(byte[])}}. This works well for small updates, but for a very large update, such as when we have a long Run Duration or a Split/Merge case, this can result in a lot of heap. It would make sense, instead, for a session commit that has say 10,000 updates (or 1,000 updates) to write to a separate file in the FlowFile Repo's directory, then write to the flowfile repo some sort of "external reference" with the name of the file. In order to ensure the integrity of the FlowFile Repo, we will need to ensure that we perform an fsync() on that file before updating the main 'journal'. Additionally, we would need to ensure that the file is deleted when we perform a checkpoint. > Improve efficiency of FlowFiles' heap usage > --- > > Key: NIFI-5533 > URL: https://issues.apache.org/jira/browse/NIFI-5533 > Project: Apache NiFi > Issue Type: Improvement >Reporter: Mark Payne >Assignee: Mark Payne >Priority: Major > > Looking at the code, I see several places that we can improve the heap that > NiFi uses for FlowFiles: > * When StandardPreparedQuery is used (any time Expression Language is > evaluated), it creates a StringBuilder and iterates over all Expressions, > evaluating them and concatenating the results together. If there is only a > single Expression, though, we can avoid this and just return the value > obtained from the Expression. While this will improve the amount of garbage > collected, it plays a more important role: it avoids creating a new String > object for the FlowFile's attribute map. Currently, if 1 million FlowFiles go > through UpdateAttribute to copy the 'abc' attribute to 'xyz', we have 1 > million copies of that String on the heap. If we just returned the result of > evaluating the Expression, we would instead have 1 copy of that String. > * Similar to above, it may make sense in UpdateAttribute to cache N number > of entries, so that when an expression like ${filename}.txt is evaluated, > even though a new String is generated by StandardPreparedQuery, we can > resolve that to the same String object when storing as a FlowFile attribute. > This would work similar to {{String.intern()}} but not use > {{String.intern()}} because we don't want to store an unbounded number of > these values in the {{String.intern()}} cache - we want to cap the number of > entries, in case the values aren't always reused. > * Every FlowFile that is created by StandardProcessSession has a 'filename' > attribute added. The value is obtained by calling > {{String.valueOf(System.nanoTime());}} This comes with a few downsides. > Firstly, the system call is a bit expensive (though not bad). Secondly, the > filename is not very unique - it's common with many dataflows and concurrent > tasks running to have several FlowFiles with 'naming collisions'. Most of > all, though, it means that we are keeping that String on the heap. A simple > test shows that instead using the UUID as the default filename resulted in > allowing 20% more FlowFiles to be generated on the same heap before running > out of heap. > * {{AbstractComponentNode.getProperties()}} creates a copy of its HashMap > for every call. If we instead created a copy of it once when the > StandardProcessContext was created, we could instead just return that one Map > every time, since it can't change over the lifetime of the ProcessContext. > This is more about garbage collection and general processor performance than > about heap utilization but still in the same realm. > I am sure that there are far more of these nuances but these are certainly > worth tackling. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (NIFI-5533) Improve efficiency of FlowFiles' heap usage
Mark Payne created NIFI-5533: Summary: Improve efficiency of FlowFiles' heap usage Key: NIFI-5533 URL: https://issues.apache.org/jira/browse/NIFI-5533 Project: Apache NiFi Issue Type: Improvement Reporter: Mark Payne Assignee: Mark Payne Looking at the code, I see several places that we can improve the heap that NiFi uses for FlowFiles: * When StandardPreparedQuery is used (any time Expression Language is evaluated), it creates a StringBuilder and iterates over all Expressions, evaluating them and concatenating the results together. If there is only a single Expression, though, we can avoid this and just return the value obtained from the Expression. While this will improve the amount of garbage collected, it plays a more important role: it avoids creating a new String object for the FlowFile's attribute map. Currently, if 1 million FlowFiles go through UpdateAttribute to copy the 'abc' attribute to 'xyz', we have 1 million copies of that String on the heap. If we just returned the result of evaluating the Expression, we would instead have 1 copy of that String. * Similar to above, it may make sense in UpdateAttribute to cache N number of entries, so that when an expression like ${filename}.txt is evaluated, even though a new String is generated by StandardPreparedQuery, we can resolve that to the same String object when storing as a FlowFile attribute. This would work similar to {{String.intern()}} but not use {{String.intern()}} because we don't want to store an unbounded number of these values in the {{String.intern()}} cache - we want to cap the number of entries, in case the values aren't always reused. * Every FlowFile that is created by StandardProcessSession has a 'filename' attribute added. The value is obtained by calling {{String.valueOf(System.nanoTime());}} This comes with a few downsides. Firstly, the system call is a bit expensive (though not bad). Secondly, the filename is not very unique - it's common with many dataflows and concurrent tasks running to have several FlowFiles with 'naming collisions'. Most of all, though, it means that we are keeping that String on the heap. A simple test shows that instead using the UUID as the default filename resulted in allowing 20% more FlowFiles to be generated on the same heap before running out of heap. * {{AbstractComponentNode.getProperties()}} creates a copy of its HashMap for every call. If we instead created a copy of it once when the StandardProcessContext was created, we could instead just return that one Map every time, since it can't change over the lifetime of the ProcessContext. This is more about garbage collection and general processor performance than about heap utilization but still in the same realm. I am sure that there are far more of these nuances but these are certainly worth tackling. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (NIFI-5532) CRON Scheduling with timezone
Julian Gimbel created NIFI-5532: --- Summary: CRON Scheduling with timezone Key: NIFI-5532 URL: https://issues.apache.org/jira/browse/NIFI-5532 Project: Apache NiFi Issue Type: Improvement Reporter: Julian Gimbel Currently it is possible to schedule processors based on a CRON trigger. This is fine as long as we only work in one timezone or in timezones that do not shift between summer and winter time. If processes should be scheduled at the same time the processors need to be rescheduled when timezones are switching from summer to winter time. As pointed out in [https://stackoverflow.com/questions/40212696/apache-nifi-how-to-pass-the-timezone-into-the-crontab-string] Nifi uses quartz scheduler under the hood which already supports Timezones that could be used. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (NIFI-515) DeleteSQS and PutSQS should offer batch processing
[ https://issues.apache.org/jira/browse/NIFI-515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16583870#comment-16583870 ] ASF GitHub Bot commented on NIFI-515: - Github user jzonthemtn commented on the issue: https://github.com/apache/nifi/pull/1977 @pvillard31 Wondering if there's still community interest in this one? It would be useful for me at least. I can help with the merge conflicts if there is. > DeleteSQS and PutSQS should offer batch processing > -- > > Key: NIFI-515 > URL: https://issues.apache.org/jira/browse/NIFI-515 > Project: Apache NiFi > Issue Type: Improvement > Components: Extensions >Reporter: Mark Payne >Assignee: Pierre Villard >Priority: Minor > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] nifi issue #1977: NIFI-515 - DeleteSQS and PutSQS should offer batch process...
Github user jzonthemtn commented on the issue: https://github.com/apache/nifi/pull/1977 @pvillard31 Wondering if there's still community interest in this one? It would be useful for me at least. I can help with the merge conflicts if there is. ---
[GitHub] nifi pull request #2942: NIFI-5500: Array support in QueryElasticseachHttp
Github user jzonthemtn commented on a diff in the pull request: https://github.com/apache/nifi/pull/2942#discussion_r210897472 --- Diff: nifi-nar-bundles/nifi-elasticsearch-bundle/nifi-elasticsearch-processors/src/main/java/org/apache/nifi/processors/elasticsearch/QueryElasticsearchHttp.java --- @@ -432,7 +432,17 @@ private int getPage(final Response getResponse, final URL url, final ProcessCont Map attributes = new HashMap<>(); for(Iterator> it = source.fields(); it.hasNext(); ) { Entry entry = it.next(); -attributes.put(ATTRIBUTE_PREFIX + entry.getKey(), entry.getValue().asText()); +String text_value = ""; +String separator = ""; --- End diff -- I think it might be better if the value of `separator` didn't change. What do you think about adding each item's text to an array and then doing something like a `StringUtils.join()` on it? ---
[jira] [Commented] (NIFI-5500) Add array support to QueryElasticsearchHttp
[ https://issues.apache.org/jira/browse/NIFI-5500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16583861#comment-16583861 ] ASF GitHub Bot commented on NIFI-5500: -- Github user jzonthemtn commented on a diff in the pull request: https://github.com/apache/nifi/pull/2942#discussion_r210897472 --- Diff: nifi-nar-bundles/nifi-elasticsearch-bundle/nifi-elasticsearch-processors/src/main/java/org/apache/nifi/processors/elasticsearch/QueryElasticsearchHttp.java --- @@ -432,7 +432,17 @@ private int getPage(final Response getResponse, final URL url, final ProcessCont Map attributes = new HashMap<>(); for(Iterator> it = source.fields(); it.hasNext(); ) { Entry entry = it.next(); -attributes.put(ATTRIBUTE_PREFIX + entry.getKey(), entry.getValue().asText()); +String text_value = ""; +String separator = ""; --- End diff -- I think it might be better if the value of `separator` didn't change. What do you think about adding each item's text to an array and then doing something like a `StringUtils.join()` on it? > Add array support to QueryElasticsearchHttp > --- > > Key: NIFI-5500 > URL: https://issues.apache.org/jira/browse/NIFI-5500 > Project: Apache NiFi > Issue Type: Improvement >Affects Versions: 1.7.1 >Reporter: Wietze B >Priority: Minor > > When using the QueryElasticsearchHttp processor (with output=Attributes) to > query a document that contains a field with an Array rather than a String or > Integer, the resulting attribute for that field will be an empty string. > This is due to the fact that the QueryElasticsearchHttp component doesn't > handle array fields correctly. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (NIFI-5529) ReplaceText Now Requires Escapes in some cases
[ https://issues.apache.org/jira/browse/NIFI-5529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16583827#comment-16583827 ] Otto Fowler commented on NIFI-5529: --- I tested against that pull and it works as your original statement now. [^test_escapes.xml] > ReplaceText Now Requires Escapes in some cases > -- > > Key: NIFI-5529 > URL: https://issues.apache.org/jira/browse/NIFI-5529 > Project: Apache NiFi > Issue Type: Bug > Components: Extensions >Affects Versions: 1.7.1 >Reporter: Mike Boonstra >Priority: Minor > Attachments: test_escapes.xml > > > In some cases in v1.7.1 you need to escape backslashes in property > ReplacementValue; like when "Replacement Strategy"="Regex Replace". Whereas > in v1.4.0 you didn't. However, when "Replacement Strategy"="Always Replace" > it works as it did in v1.4.0. > So for example if before you had this in v1.4.0: > * > {code:java} > "ReplacementValue"=${csvRow:replaceAll(',','\n')}{code} > In v1.7.1 you would need (note the extra backslash): > * > {code:java} > "ReplacementValue"=$\{csvRow:replaceAll(',','\\n')}{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (NIFI-5529) ReplaceText Now Requires Escapes in some cases
[ https://issues.apache.org/jira/browse/NIFI-5529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Otto Fowler updated NIFI-5529: -- Attachment: test_escapes.xml > ReplaceText Now Requires Escapes in some cases > -- > > Key: NIFI-5529 > URL: https://issues.apache.org/jira/browse/NIFI-5529 > Project: Apache NiFi > Issue Type: Bug > Components: Extensions >Affects Versions: 1.7.1 >Reporter: Mike Boonstra >Priority: Minor > Attachments: test_escapes.xml > > > In some cases in v1.7.1 you need to escape backslashes in property > ReplacementValue; like when "Replacement Strategy"="Regex Replace". Whereas > in v1.4.0 you didn't. However, when "Replacement Strategy"="Always Replace" > it works as it did in v1.4.0. > So for example if before you had this in v1.4.0: > * > {code:java} > "ReplacementValue"=${csvRow:replaceAll(',','\n')}{code} > In v1.7.1 you would need (note the extra backslash): > * > {code:java} > "ReplacementValue"=$\{csvRow:replaceAll(',','\\n')}{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (NIFI-5531) TCPListen issue
Charles Lassiter created NIFI-5531: -- Summary: TCPListen issue Key: NIFI-5531 URL: https://issues.apache.org/jira/browse/NIFI-5531 Project: Apache NiFi Issue Type: Bug Affects Versions: 0.7.4 Environment: RHEL6 server to RHEL6 server over internal network Reporter: Charles Lassiter When utilizing the postTCP and ListenTCP, I'm receiving chunked files when sending one text file. So as an example, If I send one file via TCP, the ListenTCP receives the file and chunks it into 8+ files. Delimiter is set the same on both ends so I don't understand why the ListenTCP is chunking the files. Any suggestions would help. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (NIFI-5529) ReplaceText Now Requires Escapes in some cases
[ https://issues.apache.org/jira/browse/NIFI-5529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16583782#comment-16583782 ] Otto Fowler edited comment on NIFI-5529 at 8/17/18 11:28 AM: - [~Mike Boonstra] can you try this with [https://github.com/apache/nifi/pull/2951?|https://github.com/apache/nifi/pull/2951] It addresses another regression. was (Author: ottobackwards): [~Mike Boonstra] can you try this with [https://github.com/apache/nifi/pull/2951?|https://github.com/apache/nifi/pull/2951] It addresses another regression. It reverts the change/approach that most likely introduced this issue. > ReplaceText Now Requires Escapes in some cases > -- > > Key: NIFI-5529 > URL: https://issues.apache.org/jira/browse/NIFI-5529 > Project: Apache NiFi > Issue Type: Bug > Components: Extensions >Affects Versions: 1.7.1 >Reporter: Mike Boonstra >Priority: Minor > > In some cases in v1.7.1 you need to escape backslashes in property > ReplacementValue; like when "Replacement Strategy"="Regex Replace". Whereas > in v1.4.0 you didn't. However, when "Replacement Strategy"="Always Replace" > it works as it did in v1.4.0. > So for example if before you had this in v1.4.0: > * > {code:java} > "ReplacementValue"=${csvRow:replaceAll(',','\n')}{code} > In v1.7.1 you would need (note the extra backslash): > * > {code:java} > "ReplacementValue"=$\{csvRow:replaceAll(',','\\n')}{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (NIFI-5529) ReplaceText Now Requires Escapes in some cases
[ https://issues.apache.org/jira/browse/NIFI-5529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16583782#comment-16583782 ] Otto Fowler commented on NIFI-5529: --- [~Mike Boonstra] can you try this with [https://github.com/apache/nifi/pull/2951?|https://github.com/apache/nifi/pull/2951] It addresses another regression. It reverts the change/approach that most likely introduced this issue. > ReplaceText Now Requires Escapes in some cases > -- > > Key: NIFI-5529 > URL: https://issues.apache.org/jira/browse/NIFI-5529 > Project: Apache NiFi > Issue Type: Bug > Components: Extensions >Affects Versions: 1.7.1 >Reporter: Mike Boonstra >Priority: Minor > > In some cases in v1.7.1 you need to escape backslashes in property > ReplacementValue; like when "Replacement Strategy"="Regex Replace". Whereas > in v1.4.0 you didn't. However, when "Replacement Strategy"="Always Replace" > it works as it did in v1.4.0. > So for example if before you had this in v1.4.0: > * > {code:java} > "ReplacementValue"=${csvRow:replaceAll(',','\n')}{code} > In v1.7.1 you would need (note the extra backslash): > * > {code:java} > "ReplacementValue"=$\{csvRow:replaceAll(',','\\n')}{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (NIFI-5327) NetFlow Processors
[ https://issues.apache.org/jira/browse/NIFI-5327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16583745#comment-16583745 ] ASF GitHub Bot commented on NIFI-5327: -- Github user PrashanthVenkatesan commented on a diff in the pull request: https://github.com/apache/nifi/pull/2820#discussion_r210868364 --- Diff: nifi-nar-bundles/nifi-network-bundle/nifi-network-processors/src/main/java/org/apache/nifi/processors/network/ParseNetflowv5.java --- @@ -0,0 +1,258 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * http://www.apache.org/licenses/LICENSE-2.0 + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.nifi.processors.network; + +import static org.apache.nifi.processors.network.parser.Netflowv5Parser.getHeaderFields; +import static org.apache.nifi.processors.network.parser.Netflowv5Parser.getRecordFields; + +import java.io.BufferedOutputStream; +import java.io.IOException; +import java.io.InputStream; +import java.io.OutputStream; +import java.util.ArrayList; +import java.util.Arrays; +import java.util.Collections; +import java.util.HashMap; +import java.util.HashSet; +import java.util.List; +import java.util.Map; +import java.util.OptionalInt; +import java.util.Set; + +import org.apache.nifi.annotation.behavior.EventDriven; +import org.apache.nifi.annotation.behavior.InputRequirement; +import org.apache.nifi.annotation.behavior.InputRequirement.Requirement; +import org.apache.nifi.annotation.behavior.ReadsAttribute; +import org.apache.nifi.annotation.behavior.ReadsAttributes; +import org.apache.nifi.annotation.behavior.SideEffectFree; +import org.apache.nifi.annotation.behavior.SupportsBatching; +import org.apache.nifi.annotation.behavior.WritesAttribute; +import org.apache.nifi.annotation.behavior.WritesAttributes; +import org.apache.nifi.annotation.documentation.CapabilityDescription; +import org.apache.nifi.annotation.documentation.Tags; +import org.apache.nifi.annotation.lifecycle.OnScheduled; +import org.apache.nifi.components.PropertyDescriptor; +import org.apache.nifi.flowfile.FlowFile; +import org.apache.nifi.flowfile.attributes.CoreAttributes; +import org.apache.nifi.processor.AbstractProcessor; +import org.apache.nifi.processor.ProcessContext; +import org.apache.nifi.processor.ProcessSession; +import org.apache.nifi.processor.Relationship; +import org.apache.nifi.processor.exception.ProcessException; +import org.apache.nifi.processor.io.InputStreamCallback; +import org.apache.nifi.processor.io.OutputStreamCallback; +import org.apache.nifi.processors.network.parser.Netflowv5Parser; +import org.apache.nifi.stream.io.StreamUtils; + +import com.fasterxml.jackson.core.JsonProcessingException; +import com.fasterxml.jackson.databind.ObjectMapper; +import com.fasterxml.jackson.databind.node.ObjectNode; + +@EventDriven +@SideEffectFree +@SupportsBatching +@InputRequirement(Requirement.INPUT_REQUIRED) +@Tags({ "network", "netflow", "attributes", "datagram", "v5", "packet", "byte" }) +@CapabilityDescription("Parses netflowv5 byte ingest and add to NiFi flowfile as attributes or JSON content.") +@ReadsAttributes({ @ReadsAttribute(attribute = "udp.port", description = "Optionally read if packets are received from UDP datagrams.") }) +@WritesAttributes({ @WritesAttribute(attribute = "netflowv5.header.*", description = "The key and value generated by the parsing of the header fields."), +@WritesAttribute(attribute = "netflowv5.record.*", description = "The key and value generated by the parsing of the record fields.") }) + +public class ParseNetflowv5 extends AbstractProcessor { +private String destination; +// Add mapper +private static final ObjectMapper mapper = new ObjectMapper(); + +public static final String DESTINATION_CONTENT = "flowfile-content"; +public static final String DESTINATION_ATTRIBUTES = "flowfile-attribute"; +public static final PropertyDescriptor FIELDS_DESTINATION
[GitHub] nifi pull request #2820: NIFI-5327 Adding Netflowv5 protocol parser
Github user PrashanthVenkatesan commented on a diff in the pull request: https://github.com/apache/nifi/pull/2820#discussion_r210868364 --- Diff: nifi-nar-bundles/nifi-network-bundle/nifi-network-processors/src/main/java/org/apache/nifi/processors/network/ParseNetflowv5.java --- @@ -0,0 +1,258 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * http://www.apache.org/licenses/LICENSE-2.0 + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.nifi.processors.network; + +import static org.apache.nifi.processors.network.parser.Netflowv5Parser.getHeaderFields; +import static org.apache.nifi.processors.network.parser.Netflowv5Parser.getRecordFields; + +import java.io.BufferedOutputStream; +import java.io.IOException; +import java.io.InputStream; +import java.io.OutputStream; +import java.util.ArrayList; +import java.util.Arrays; +import java.util.Collections; +import java.util.HashMap; +import java.util.HashSet; +import java.util.List; +import java.util.Map; +import java.util.OptionalInt; +import java.util.Set; + +import org.apache.nifi.annotation.behavior.EventDriven; +import org.apache.nifi.annotation.behavior.InputRequirement; +import org.apache.nifi.annotation.behavior.InputRequirement.Requirement; +import org.apache.nifi.annotation.behavior.ReadsAttribute; +import org.apache.nifi.annotation.behavior.ReadsAttributes; +import org.apache.nifi.annotation.behavior.SideEffectFree; +import org.apache.nifi.annotation.behavior.SupportsBatching; +import org.apache.nifi.annotation.behavior.WritesAttribute; +import org.apache.nifi.annotation.behavior.WritesAttributes; +import org.apache.nifi.annotation.documentation.CapabilityDescription; +import org.apache.nifi.annotation.documentation.Tags; +import org.apache.nifi.annotation.lifecycle.OnScheduled; +import org.apache.nifi.components.PropertyDescriptor; +import org.apache.nifi.flowfile.FlowFile; +import org.apache.nifi.flowfile.attributes.CoreAttributes; +import org.apache.nifi.processor.AbstractProcessor; +import org.apache.nifi.processor.ProcessContext; +import org.apache.nifi.processor.ProcessSession; +import org.apache.nifi.processor.Relationship; +import org.apache.nifi.processor.exception.ProcessException; +import org.apache.nifi.processor.io.InputStreamCallback; +import org.apache.nifi.processor.io.OutputStreamCallback; +import org.apache.nifi.processors.network.parser.Netflowv5Parser; +import org.apache.nifi.stream.io.StreamUtils; + +import com.fasterxml.jackson.core.JsonProcessingException; +import com.fasterxml.jackson.databind.ObjectMapper; +import com.fasterxml.jackson.databind.node.ObjectNode; + +@EventDriven +@SideEffectFree +@SupportsBatching +@InputRequirement(Requirement.INPUT_REQUIRED) +@Tags({ "network", "netflow", "attributes", "datagram", "v5", "packet", "byte" }) +@CapabilityDescription("Parses netflowv5 byte ingest and add to NiFi flowfile as attributes or JSON content.") +@ReadsAttributes({ @ReadsAttribute(attribute = "udp.port", description = "Optionally read if packets are received from UDP datagrams.") }) +@WritesAttributes({ @WritesAttribute(attribute = "netflowv5.header.*", description = "The key and value generated by the parsing of the header fields."), +@WritesAttribute(attribute = "netflowv5.record.*", description = "The key and value generated by the parsing of the record fields.") }) + +public class ParseNetflowv5 extends AbstractProcessor { +private String destination; +// Add mapper +private static final ObjectMapper mapper = new ObjectMapper(); + +public static final String DESTINATION_CONTENT = "flowfile-content"; +public static final String DESTINATION_ATTRIBUTES = "flowfile-attribute"; +public static final PropertyDescriptor FIELDS_DESTINATION = new PropertyDescriptor.Builder().name("FIELDS_DESTINATION").displayName("Parsed fields destination") +.description("Indicates whether the results of the parser are written " + "to the FlowFile content or a FlowFile attribute; if usin