[GitHub] nifi issue #497: NIFI-1857: HTTPS Site-to-Site
Github user ijokarumawak commented on the issue: https://github.com/apache/nifi/pull/497 @markap14 I'm not 100% sure if the change will solve the problem since I couldn't reproduce the issue myself. However, I've added additional buffer draining code after receiving EOF from channel. In addition to that, a failure detection code was added to check If it fails sending data as much as expected, when it does, it throws RuntimeException so that we can investigate further. Also, I've changed the default nifi.remote.input.secure to `false`. Could you give it a try once again? If it sill fails, please share your flow template. Thank you. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] nifi pull request #503: NIFI-1978: Restore RPG yield duration.
Github user ijokarumawak closed the pull request at: https://github.com/apache/nifi/pull/503 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] nifi issue #503: NIFI-1978: Restore RPG yield duration.
Github user ijokarumawak commented on the issue: https://github.com/apache/nifi/pull/503 @bbende Thanks for reviewing and merging! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] nifi pull request #492: NIFI-1975 - Processor for parsing evtx files
Github user brosander commented on a diff in the pull request: https://github.com/apache/nifi/pull/492#discussion_r66348434 --- Diff: nifi-nar-bundles/nifi-evtx-bundle/nifi-evtx-processors/src/test/java/org/apache/nifi/processors/evtx/ParseEvtxTest.java --- @@ -0,0 +1,481 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.nifi.processors.evtx; + +import org.apache.nifi.flowfile.FlowFile; +import org.apache.nifi.flowfile.attributes.CoreAttributes; +import org.apache.nifi.logging.ComponentLog; +import org.apache.nifi.processor.ProcessSession; +import org.apache.nifi.processor.io.OutputStreamCallback; +import org.apache.nifi.processors.evtx.parser.ChunkHeader; +import org.apache.nifi.processors.evtx.parser.FileHeader; +import org.apache.nifi.processors.evtx.parser.FileHeaderFactory; +import org.apache.nifi.processors.evtx.parser.MalformedChunkException; +import org.apache.nifi.processors.evtx.parser.Record; +import org.apache.nifi.processors.evtx.parser.bxml.RootNode; +import org.apache.nifi.util.MockFlowFile; +import org.apache.nifi.util.TestRunner; +import org.apache.nifi.util.TestRunners; +import org.junit.Before; +import org.junit.Test; +import org.junit.runner.RunWith; +import org.mockito.Mock; +import org.mockito.runners.MockitoJUnitRunner; +import org.w3c.dom.Document; +import org.w3c.dom.Element; +import org.w3c.dom.Node; +import org.w3c.dom.NodeList; +import org.xml.sax.SAXException; + +import javax.xml.parsers.DocumentBuilderFactory; +import javax.xml.parsers.ParserConfigurationException; +import javax.xml.stream.XMLStreamException; +import java.io.ByteArrayInputStream; +import java.io.IOException; +import java.io.InputStream; +import java.io.OutputStream; +import java.util.Arrays; +import java.util.HashMap; +import java.util.HashSet; +import java.util.List; +import java.util.Map; +import java.util.Set; +import java.util.concurrent.atomic.AtomicReference; + +import static org.junit.Assert.assertEquals; +import static org.junit.Assert.assertTrue; +import static org.mockito.Mockito.any; +import static org.mockito.Mockito.anyString; +import static org.mockito.Mockito.eq; +import static org.mockito.Mockito.isA; +import static org.mockito.Mockito.mock; +import static org.mockito.Mockito.verify; +import static org.mockito.Mockito.verifyNoMoreInteractions; +import static org.mockito.Mockito.when; + +@RunWith(MockitoJUnitRunner.class) +public class ParseEvtxTest { +public static final DocumentBuilderFactory DOCUMENT_BUILDER_FACTORY = DocumentBuilderFactory.newInstance(); +public static final String USER_DATA = "UserData"; +public static final String EVENT_DATA = "EventData"; +public static final Set DATA_TAGS = new HashSet<>(Arrays.asList(EVENT_DATA, USER_DATA)); + +@Mock +FileHeaderFactory fileHeaderFactory; + +@Mock +MalformedChunkHandler malformedChunkHandler; + +@Mock +RootNodeHandlerFactory rootNodeHandlerFactory; + +@Mock +ResultProcessor resultProcessor; + +@Mock +ComponentLog componentLog; + +@Mock +InputStream in; + +@Mock +OutputStream out; + +@Mock +FileHeader fileHeader; + +ParseEvtx parseEvtx; + +@Before +public void setup() throws XMLStreamException, IOException { +parseEvtx = new ParseEvtx(fileHeaderFactory, malformedChunkHandler, rootNodeHandlerFactory, resultProcessor); +when(fileHeaderFactory.create(in, componentLog)).thenReturn(fileHeader); +} + +@Test +public void testGetNameFile() { +String basename = "basename"; +assertEquals(basename + ".xml", parseEvtx.getName(basename, null, null, ParseEvtx.XML_EXTENSION)); +} + +@Test +public void testGetNameFileChunk() { +String
[GitHub] nifi issue #397: NIFI-1815
Github user olegz commented on the issue: https://github.com/apache/nifi/pull/397 I stepped away from it as I am trying to finish something else that is very involved. Will get back to it once I am finished, but I am committed to getting it in to both 0.7.0 and 1.0. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
Re: Limiting a queue
Thanks for your responses, As far as possibly contributing these back to the community, I need a good way for them to connect to any database and generate that database's specific flavor of SQL, so when I get that functionality built out I would be glad to. As far as the memory goes, step two sends an insert query per flow file (for migration), and each query is designed to pull out 1000 records, if that makes more sense. But good to know that Nifi can handle a lot of flow files, with the back-pressure configured it should wait for the queue ahead to clear out before starting the next table. Also forgot to mention, this is interacting with two live databases, but is going through my VM, so in other words, it can actually be faster if placed on the machine the target database is running on. Fun facts - I'm running bench marking now, the speeds I'm seeing are because of the concurrent processing functionality of Nifi 562,000 records inserted from 1 source table into 1 target table - speed: 8 minutes 14 seconds (I'm being throttled by the source database). At that speed, it is approximately 1,137 records per second. Step two is running 1 thread, step three is running 60 threads, step 4 is running 30 threads. On Wed, Jun 8, 2016 at 1:23 PM, Mark Paynewrote: > Shaine, > > This is a really cool set of functionality! Any chance you would be > interested in contributing > these processors back to the NiFi community? > > Regardless, one thing to consider here is that with NiFi, because of the > way that the repositories > are structured, the way that we think about heap utilization is a little > different than with most projects. > As Bryan pointed out, you will want to stream the content directly to the > FlowFile, rather than buffering > in memory. The framework will handle the rest. Where we will be more > concerned about heap utilization > is actually in the number of FlowFiles that are held in memory at any one > time, not the size of those FlowFiles. > So you will be better off keeping a smaller number of FlowFiles, each > having larger content. So I would > recommend making the number of records per FlowFile configurable, perhaps > with a default value of > 25,000. This would also result in far fewer JDBC calls, which should be > beneficial performance-wise. > NiFi will handle swapping FlowFiles to disk when they are queued up, so > you can certainly queue up > millions of FlowFiles in a single queue without exhausting your heap > space. However, if you are buffering > up all of those FlowFiles in your processor, you may run into problems, so > using a smaller number of > FlowFiles, each with many thousand records will likely provide the best > heap utilization. > > Does this help? > > Thanks > -Mark > > > > On Jun 8, 2016, at 2:05 PM, Bryan Bende wrote: > > > > Thank you for the detailed explanation! It sounds like you have built > > something very cool here. > > > > I'm still digesting the different steps and thinking of what can be done, > > but something that initially jumped out at me was > > when you mentioned considering how much memory NiFi has and not wanting > to > > go over 1000 records per chunk... > > > > You should be able to read and write the chunks in a streaming fashion > and > > never have the entire chunk in memory. For example, > > when creating the chunk you would be looping over a ResultSet from the > > database and writing each record to the OutputStream of the > > FlowFile, never having all 1000 records in memory. On the down stream > > processor you would read the record from the InputStream of the > > FlowFile, sending each one to the destination database, again not having > > all 1000 records in memory. If you can operate like this then having > > 1000 records per chunk, or 100,000 records per chunk, shouldn't change > the > > memory requirement for NiFi. > > > > An example of what we do for ExecuteSQL and QueryDatabaseTable is in the > > JdbcCommon util where it converts the ResultSet to Avro records by > writing > > to the OutputStream: > > > https://github.com/apache/nifi/blob/e4b7e47836edf47042973e604005058c28eed23b/nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/util/JdbcCommon.java#L80 > > > > Another point is that it is not necessarily a bad thing to have say > 10,000 > > Flow Files in a queue. The queue is not actually holding the content of > > those FlowFiles, it is only holding pointers to where the content is, and > > only when > > the next processor does a session.read(flowFile, ...) does it then read > in > > the content as a stream. In general NiFi should be able to handle 10s of > > thousands, or even 100s of thousands of Flow Files sitting in a queue. > > > > With your current approach have you seen a specific issue, such as out of > > memory exceptions? or were you just concerned by the number of flow files > > in the queue continuing to grow? > >
Re: Consuming web services through NiFi
Hi Matt, Thank you for the reply , when i tried to use the example "Working_With_CSV.xml" i am getting InvokeHTTP[id=a3aab33d-76dd-4169-9a29-fd0aeae219f3] Yielding processor due to exception encountered as a source processor: java.net.SocketTimeoutException: connect timed out i could go to the URL when i try from my browser directly. but not thru NiFi. i tried other sites thru GetHTTP process , getting same error.. any idea.?? -- View this message in context: http://apache-nifi-developer-list.39713.n7.nabble.com/Consuming-web-services-through-NiFi-tp11190p11249.html Sent from the Apache NiFi Developer List mailing list archive at Nabble.com.
[GitHub] nifi issue #439: NIFI-1866 ProcessException handling in StandardProcessSessi...
Github user markap14 commented on the issue: https://github.com/apache/nifi/pull/439 @pvillard31 I got this merged into both master and 0.x baselines. Thanks for knocking this out!! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] nifi pull request #439: NIFI-1866 ProcessException handling in StandardProce...
Github user asfgit closed the pull request at: https://github.com/apache/nifi/pull/439 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] nifi issue #503: NIFI-1978: Restore RPG yield duration.
Github user bbende commented on the issue: https://github.com/apache/nifi/pull/503 @ijokarumawak I merged to 0.x but GitHub doesn't auto-close from that branch, can you close this PR when you have a chance? Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] nifi issue #397: NIFI-1815
Github user jdye64 commented on the issue: https://github.com/apache/nifi/pull/397 Olegz any luck getting your local install of Tesseract to work? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] nifi issue #503: NIFI-1978: Restore RPG yield duration.
Github user bbende commented on the issue: https://github.com/apache/nifi/pull/503 +1 good find, verified the yield duration is maintained across restarts, will merge to 0.x --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] nifi issue #503: NIFI-1978: Restore RPG yield duration.
Github user bbende commented on the issue: https://github.com/apache/nifi/pull/503 Reviewing... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] nifi pull request #499: NIFI-1052: Added "Ghost" Processors, Reporting Tasks...
Github user asfgit closed the pull request at: https://github.com/apache/nifi/pull/499 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
Re: Limiting a queue
Shaine, This is a really cool set of functionality! Any chance you would be interested in contributing these processors back to the NiFi community? Regardless, one thing to consider here is that with NiFi, because of the way that the repositories are structured, the way that we think about heap utilization is a little different than with most projects. As Bryan pointed out, you will want to stream the content directly to the FlowFile, rather than buffering in memory. The framework will handle the rest. Where we will be more concerned about heap utilization is actually in the number of FlowFiles that are held in memory at any one time, not the size of those FlowFiles. So you will be better off keeping a smaller number of FlowFiles, each having larger content. So I would recommend making the number of records per FlowFile configurable, perhaps with a default value of 25,000. This would also result in far fewer JDBC calls, which should be beneficial performance-wise. NiFi will handle swapping FlowFiles to disk when they are queued up, so you can certainly queue up millions of FlowFiles in a single queue without exhausting your heap space. However, if you are buffering up all of those FlowFiles in your processor, you may run into problems, so using a smaller number of FlowFiles, each with many thousand records will likely provide the best heap utilization. Does this help? Thanks -Mark > On Jun 8, 2016, at 2:05 PM, Bryan Bendewrote: > > Thank you for the detailed explanation! It sounds like you have built > something very cool here. > > I'm still digesting the different steps and thinking of what can be done, > but something that initially jumped out at me was > when you mentioned considering how much memory NiFi has and not wanting to > go over 1000 records per chunk... > > You should be able to read and write the chunks in a streaming fashion and > never have the entire chunk in memory. For example, > when creating the chunk you would be looping over a ResultSet from the > database and writing each record to the OutputStream of the > FlowFile, never having all 1000 records in memory. On the down stream > processor you would read the record from the InputStream of the > FlowFile, sending each one to the destination database, again not having > all 1000 records in memory. If you can operate like this then having > 1000 records per chunk, or 100,000 records per chunk, shouldn't change the > memory requirement for NiFi. > > An example of what we do for ExecuteSQL and QueryDatabaseTable is in the > JdbcCommon util where it converts the ResultSet to Avro records by writing > to the OutputStream: > https://github.com/apache/nifi/blob/e4b7e47836edf47042973e604005058c28eed23b/nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/util/JdbcCommon.java#L80 > > Another point is that it is not necessarily a bad thing to have say 10,000 > Flow Files in a queue. The queue is not actually holding the content of > those FlowFiles, it is only holding pointers to where the content is, and > only when > the next processor does a session.read(flowFile, ...) does it then read in > the content as a stream. In general NiFi should be able to handle 10s of > thousands, or even 100s of thousands of Flow Files sitting in a queue. > > With your current approach have you seen a specific issue, such as out of > memory exceptions? or were you just concerned by the number of flow files > in the queue continuing to grow? > > I'll continue to think about this more, and maybe someone else on the list > has additional idea/thoughts. > > -Bryan > > > > On Wed, Jun 8, 2016 at 12:29 PM, Shaine Berube < > shaine.ber...@perfectsearchcorp.com> wrote: > >> Perhaps I need to explain a little more about the data flow as a whole. >> But yes, Processor A is a custom built processor. >> >> In explanation: >> The data flow that I've built out is basically a 4 to 6 step process (6 >> steps because the first and last processors are optional). In the four >> step process, step one gathers information from the source and target >> databases in preparation for the migration of data from source to target, >> this includes table names, primary keys, and record counts, step one then >> produces a flow file per table which in this case is 24 flow files. >> >> Step two of the process would be the equivalent of processor A, in step two >> I'm taking in a flow file and generating the SQL queries that are going to >> be run. The reason the back pressure doesn't work therefore is because the >> processor is working on one file, which corresponds to a table, which said >> table will be split into 1000 record chunks with the SQL query splitting. >> A fair few of these tables however, are over 10 million records, which >> means that on a single execution, this processor will generate over 10,000 >> flow files (1000 record chunks). As far as it goes, I cannot
[GitHub] nifi issue #499: NIFI-1052: Added "Ghost" Processors, Reporting Tasks, Contr...
Github user bbende commented on the issue: https://github.com/apache/nifi/pull/499 +1 code looks good, build passes, tested a missing processor, controller service, and reporting task and the app still starts up, awesome stuff! Will merge into master shortly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] nifi pull request #511: NIFI-1850 - JSON-to-JSON Schema Converter Editor
GitHub user YolandaMDavis opened a pull request: https://github.com/apache/nifi/pull/511 NIFI-1850 - JSON-to-JSON Schema Converter Editor This is a merge from 0.7 of the Json-to-Json (Jolt) Editor with refactoring for masterless clustering and bower dependency support. You can merge this pull request into a Git repository by running: $ git pull https://github.com/YolandaMDavis/nifi NIFI-1850-master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/nifi/pull/511.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #511 commit 8d6b9a454b5fee312e843aea3fcefcc794087ecf Author: Yolanda M. DavisDate: 2016-06-08T04:53:38Z NIFI-1850 - Initial Commit for JSON-to-JSON Schema Converter Editor (merge from 0.7.0 - refactor for masterless cluster) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] nifi issue #499: NIFI-1052: Added "Ghost" Processors, Reporting Tasks, Contr...
Github user bbende commented on the issue: https://github.com/apache/nifi/pull/499 Reviewing... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] nifi issue #239: Nifi 1540 - AWS Kinesis Get and Put Processors
Github user jvwing commented on the issue: https://github.com/apache/nifi/pull/239 @joewitt, would you please help us with the licensing/notice requirements for using the Kinesis Client Library and Kinesis Producer Library? The Kinesis libraries are licensed under the [Amazon Software License](https://aws.amazon.com/asl/). This does not appear on the published [list of Apache-compatible licenses](http://www.apache.org/legal/resolved.html#category-a). The Apache Spark project includes comparable use of the Kinesis library, although they have chosen to [present their Kinesis integration as an optional add-on](http://spark.apache.org/docs/latest/streaming-kinesis-integration.html). Comparable code is in fact [checked into the Spark repo](https://github.com/apache/spark/tree/master/external), but I did not find mention of the ASL in a NOTICE file. I was really hoping to copy and paste. I found a [JIRA issue raised by the Spark team for the license discussion](https://issues.apache.org/jira/browse/LEGAL-198) which discusses the add-on nature of the component, but not specific referencing language. Is this OK? How can we determine what needs to be added to the NOTICE file in nifi-aws-nar? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
Re: Limiting a queue
Thank you for the detailed explanation! It sounds like you have built something very cool here. I'm still digesting the different steps and thinking of what can be done, but something that initially jumped out at me was when you mentioned considering how much memory NiFi has and not wanting to go over 1000 records per chunk... You should be able to read and write the chunks in a streaming fashion and never have the entire chunk in memory. For example, when creating the chunk you would be looping over a ResultSet from the database and writing each record to the OutputStream of the FlowFile, never having all 1000 records in memory. On the down stream processor you would read the record from the InputStream of the FlowFile, sending each one to the destination database, again not having all 1000 records in memory. If you can operate like this then having 1000 records per chunk, or 100,000 records per chunk, shouldn't change the memory requirement for NiFi. An example of what we do for ExecuteSQL and QueryDatabaseTable is in the JdbcCommon util where it converts the ResultSet to Avro records by writing to the OutputStream: https://github.com/apache/nifi/blob/e4b7e47836edf47042973e604005058c28eed23b/nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/util/JdbcCommon.java#L80 Another point is that it is not necessarily a bad thing to have say 10,000 Flow Files in a queue. The queue is not actually holding the content of those FlowFiles, it is only holding pointers to where the content is, and only when the next processor does a session.read(flowFile, ...) does it then read in the content as a stream. In general NiFi should be able to handle 10s of thousands, or even 100s of thousands of Flow Files sitting in a queue. With your current approach have you seen a specific issue, such as out of memory exceptions? or were you just concerned by the number of flow files in the queue continuing to grow? I'll continue to think about this more, and maybe someone else on the list has additional idea/thoughts. -Bryan On Wed, Jun 8, 2016 at 12:29 PM, Shaine Berube < shaine.ber...@perfectsearchcorp.com> wrote: > Perhaps I need to explain a little more about the data flow as a whole. > But yes, Processor A is a custom built processor. > > In explanation: > The data flow that I've built out is basically a 4 to 6 step process (6 > steps because the first and last processors are optional). In the four > step process, step one gathers information from the source and target > databases in preparation for the migration of data from source to target, > this includes table names, primary keys, and record counts, step one then > produces a flow file per table which in this case is 24 flow files. > > Step two of the process would be the equivalent of processor A, in step two > I'm taking in a flow file and generating the SQL queries that are going to > be run. The reason the back pressure doesn't work therefore is because the > processor is working on one file, which corresponds to a table, which said > table will be split into 1000 record chunks with the SQL query splitting. > A fair few of these tables however, are over 10 million records, which > means that on a single execution, this processor will generate over 10,000 > flow files (1000 record chunks). As far as it goes, I cannot save this > information directly to the VM or server that I'm running the data flow on, > because the information can contain extremely sensitive and secure data. > That being said, I need to consider how much memory the Nifi process has to > run, so I don't want to go over 1000 records in a chunk. > > Step three of the process takes each individual flow file from the queue, > pulls the SQL query out of the flow file contents, runs it against source, > and then puts the results in either a CSV or an XML format into the > contents of a flow file and sends it to the next queue. > > Step four of the process takes the results out of the flow file contents, > sticks them into an SQL query and runs it against target. > > Keep in mind: this data flow has been built to handle migration, but also > is attempting to keep up to date (incrementor/listener), with the source > database. Given that we don't have full access to the source database, I'm > basically limited to running select queries against it and gathering the > information I need to put into target. But this data flow is configured to > handle INSERT and UPDATE SQL queries, with DELETE queries coming some time > in the future. The data flow is configured so that step one can either be > the migrator (full data dump), or the incrementor (incremental data dump, > use incrementor after migrator has been run). > > Now, the six step process adds a step before step one that allows step one > to be multi-threaded, and it adds a step after step four that runs the > queries (basically step four turns into the step that generates queries
[GitHub] nifi pull request #400: Fix for NIFI-1838 & NIFI-1152 & Code modification fo...
Github user PuspenduBanerjee closed the pull request at: https://github.com/apache/nifi/pull/400 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] nifi pull request #492: NIFI-1975 - Processor for parsing evtx files
Github user mattyb149 commented on a diff in the pull request: https://github.com/apache/nifi/pull/492#discussion_r66292230 --- Diff: nifi-nar-bundles/nifi-evtx-bundle/nifi-evtx-processors/src/test/java/org/apache/nifi/processors/evtx/ParseEvtxTest.java --- @@ -0,0 +1,318 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.nifi.processors.evtx; + +import org.apache.nifi.flowfile.FlowFile; +import org.apache.nifi.flowfile.attributes.CoreAttributes; +import org.apache.nifi.logging.ComponentLog; +import org.apache.nifi.processor.ProcessSession; +import org.apache.nifi.processor.io.OutputStreamCallback; +import org.apache.nifi.processors.evtx.parser.ChunkHeader; +import org.apache.nifi.processors.evtx.parser.FileHeader; +import org.apache.nifi.processors.evtx.parser.FileHeaderFactory; +import org.apache.nifi.processors.evtx.parser.MalformedChunkException; +import org.apache.nifi.processors.evtx.parser.Record; +import org.apache.nifi.processors.evtx.parser.bxml.RootNode; +import org.junit.Before; +import org.junit.Test; +import org.junit.runner.RunWith; +import org.mockito.Mock; +import org.mockito.runners.MockitoJUnitRunner; + +import javax.xml.stream.XMLStreamException; +import java.io.IOException; +import java.io.InputStream; +import java.io.OutputStream; +import java.util.concurrent.atomic.AtomicReference; + +import static org.junit.Assert.assertEquals; +import static org.mockito.Mockito.any; +import static org.mockito.Mockito.anyString; +import static org.mockito.Mockito.eq; +import static org.mockito.Mockito.isA; +import static org.mockito.Mockito.mock; +import static org.mockito.Mockito.verify; +import static org.mockito.Mockito.verifyNoMoreInteractions; +import static org.mockito.Mockito.when; + +@RunWith(MockitoJUnitRunner.class) +public class ParseEvtxTest { --- End diff -- This class tests the individual methods in the ParseEvtx processor, but not the processor lifecycle (like onTrigger). Can you add some more tests that exercise the processor? An example of using the nifi-mock framework can be found in TestEvaluateXPath, it has the TestRunner stuff with flowfiles, relationships, asserts, etc. You will likely want a test file or two to be used as input, although if line endings/whitespace are important in the format you may just need the data directly in the Java code. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
Re: Limiting a queue
Ok I didn't realize you had already tried setting the back-pressure settings. Can you described the processors a little more, are they custom processors? I am guessing that ProcessorA is producing all 5k flow files from a single execution of onTrigger, which would explain why back-pressure didn't solve the problem, because back-pressure would stop the processor from executing again, but its already too late because the first execution already went over the limit. Without knowing too much about what ProcessorA is doing, I'm wondering if there is a way to put some indirection between the two processors. What if ProcessorA sent its output to a PutFile processor that wrote all the chunks out to a directory, then there was a separate GetFile processor that was concurrently picking up the chunks from that directory and sending to ProcessorB? Then the back-pressure between GetFile and ProcessorB would work because once the queue reached 2000, GetFile wouldn't pick up anymore files. The downside is you would need enough disk-space on your NiFi node to possibly store your whole database table, which may not be an option. Another idea might be to have two levels of chunks, for example with the SplitText processor if we want to split a file with 1 million lines in it, rather than do one split producing 1 million flow files, we usually do a split to 10k chunks, then another split down to 1 line. Maybe ProcessorA could produce much large chunks, say 10k or 100k records each, then the next processor further splits those before going to ProcessorB. This would also allow back-pressure to work a little better the second split processor and ProcessorB. If anyone else has ideas here, feel free to chime in. Thanks, Bryan On Wed, Jun 8, 2016 at 10:51 AM, Shaine Berube < shaine.ber...@perfectsearchcorp.com> wrote: > I do need more information, because I tried using that option, but the > processor just continued filling the queue anyway, I told it to only allow > 2000 before back pressure kicks in, but it kept going and I ended up with > 5k files in the queue before I restarted Nifi to get the processor to stop. > > On Wed, Jun 8, 2016 at 8:45 AM, Bryan Bendewrote: > > > Hello, > > > > Take a look at the options available when right-clicking on a queue... > > What you described is what NiFi calls back-pressure. You can configured a > > queue to have an object threshold (# of flow files) or data size > threshold > > (total size of all flow files). > > When one of these thresholds is reached, NiFi will no longer let the > source > > processor run until the condition goes back under the threshold. > > > > Let us know if you need any more info on this. > > > > Thanks, > > > > Bryan > > > > On Wed, Jun 8, 2016 at 10:40 AM, Shaine Berube < > > shaine.ber...@perfectsearchcorp.com> wrote: > > > > > Hello all, > > > > > > I'm kind of new to developing Nifi, though I've been doing some pretty > in > > > depth stuff and some advanced database queries. My question is in > > > regarding the queues between processor, I want to limit a queue to... > say > > > 2000, how would I go about doing that? Or better yet, how would I tell > > the > > > processor generating the queue to only put a max of 2000 files into the > > > queue? > > > > > > Allow me to explain with a scenario: > > > We are doing data migration from one database to another. > > > -Processor A is generating a queue consumed by Processor B > > > -Processor A is taking configuration and generating SQL queries in 1000 > > > record chunks so that Processor B can insert them into a new database. > > > Given the size of the source database, Processor A can potentially > > generate > > > hundreds of thousands of files. > > > > > > Is there a way for Processor A to check it's down stream queue for the > > > queue size? How would I get Processor A to only put 2000 files into > the > > > queue at any given time, so that Processor A can continue running but > > wait > > > for room in the queue? > > > > > > Thank you in advance. > > > > > > -- > > > *Shaine Berube* > > > > > > > > > -- > *Shaine Berube* >
[GitHub] nifi pull request #492: NIFI-1975 - Processor for parsing evtx files
Github user joewitt commented on a diff in the pull request: https://github.com/apache/nifi/pull/492#discussion_r66277217 --- Diff: nifi-nar-bundles/nifi-evtx-bundle/nifi-evtx-nar/src/main/resources/META-INF/NOTICE --- @@ -0,0 +1,36 @@ +nifi-evtx-nar +Copyright 2016 The Apache Software Foundation + +This includes derived works from the Apache Software License V2 library python-evtx (https://github.com/williballenthin/python-evtx) +Copyright 2012, 2013 Willi Ballenthin william.ballent...@mandiant.com +while at Mandiant http://www.mandiant.com +The derived work is adapted from Evtx/Evtx.py, Evtx/BinaryParser.py, Evtx/Nodes.py, Evtx/Views.py and can be found in the org.apache.nifi.processors.evtx.parser package. + --- End diff -- i am hardly authoritative but i did review this case and believe it to be correct. Some would argue it is more than necessary i'm sure but let's err on the side of doing more than we must. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] nifi issue #492: NIFI-1975 - Processor for parsing evtx files
Github user brosander commented on the issue: https://github.com/apache/nifi/pull/492 @mattyb149 updated those poms, verified that the nar is in the assembly --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
Re: Limiting a queue
I do need more information, because I tried using that option, but the processor just continued filling the queue anyway, I told it to only allow 2000 before back pressure kicks in, but it kept going and I ended up with 5k files in the queue before I restarted Nifi to get the processor to stop. On Wed, Jun 8, 2016 at 8:45 AM, Bryan Bendewrote: > Hello, > > Take a look at the options available when right-clicking on a queue... > What you described is what NiFi calls back-pressure. You can configured a > queue to have an object threshold (# of flow files) or data size threshold > (total size of all flow files). > When one of these thresholds is reached, NiFi will no longer let the source > processor run until the condition goes back under the threshold. > > Let us know if you need any more info on this. > > Thanks, > > Bryan > > On Wed, Jun 8, 2016 at 10:40 AM, Shaine Berube < > shaine.ber...@perfectsearchcorp.com> wrote: > > > Hello all, > > > > I'm kind of new to developing Nifi, though I've been doing some pretty in > > depth stuff and some advanced database queries. My question is in > > regarding the queues between processor, I want to limit a queue to... say > > 2000, how would I go about doing that? Or better yet, how would I tell > the > > processor generating the queue to only put a max of 2000 files into the > > queue? > > > > Allow me to explain with a scenario: > > We are doing data migration from one database to another. > > -Processor A is generating a queue consumed by Processor B > > -Processor A is taking configuration and generating SQL queries in 1000 > > record chunks so that Processor B can insert them into a new database. > > Given the size of the source database, Processor A can potentially > generate > > hundreds of thousands of files. > > > > Is there a way for Processor A to check it's down stream queue for the > > queue size? How would I get Processor A to only put 2000 files into the > > queue at any given time, so that Processor A can continue running but > wait > > for room in the queue? > > > > Thank you in advance. > > > > -- > > *Shaine Berube* > > > -- *Shaine Berube*
[GitHub] nifi issue #502: Nifi-1972 Apache Ignite Put Cache Processor
Github user mans2singh commented on the issue: https://github.com/apache/nifi/pull/502 Hey Folks: Please let me know your thoughts/suggestions on this Nifi Ignite Put Processor. Thanks --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
Re: [DISCUSS] - Markdown option for documentation artifacts
+1 with template On Wed, Jun 8, 2016 at 10:39 AM, dan bresswrote: > +1 > > On Wed, Jun 8, 2016 at 7:05 AM Andre wrote: > >> +1 on this + a template that matches existing additional.html >> On 8 Jun 2016 04:28, "Bryan Rosander" wrote: >> >> > Hey all, >> > >> > When writing documentation (e.g. the additionalDetails.html for a >> > processor) it would be nice to have the option to use Markdown instead of >> > html. >> > >> > I think Markdown is easier to read and write than raw HTML and for simple >> > cases does the job pretty well. It also has the advantage of being able >> to >> > be translated into other document types easily and it would be rendered >> by >> > default in Github when the file is clicked. >> > >> > There is an MIT-licensed Markdown maven plugin ( >> > https://github.com/walokra/markdown-page-generator-plugin) that seems >> like >> > it might work for translating additionalDetails.md (and others) into an >> > equivalent html page. >> > >> > Thanks, >> > Bryan Rosander >> > >>
[GitHub] nifi issue #497: NIFI-1857: HTTPS Site-to-Site
Github user markap14 commented on the issue: https://github.com/apache/nifi/pull/497 @ijokarumawak i checked out the new PR and tried the test again. Updated nifi.properties only to set secure = false for site-to-site (I think this needs to be the default because as-is, nifi doesn't startup out of the box). I did get further this time, and saw the receiving side trying to receive data but still got Exceptions and no data coming through. The logs show: ``` 2016-06-08 10:39:04,660 ERROR [Timer-Driven Process Thread-10] o.a.nifi.remote.StandardRemoteGroupPort RemoteGroupPort[name=Log,target=http://localhost:8080/nifi] failed to communicate with remote NiFi instance due to java.io.IOException: Failed to confirm transaction with Peer[url=http://127.0.0.1:8080/nifi-api] due to java.io.IOException: Unexpected response code: 500 errCode:Abort errMessage:Server encountered an exception. 2016-06-08 10:39:05,668 ERROR [NiFi Web Server-24] o.apache.nifi.web.api.SiteToSiteResource Unexpected exception occurred. clientId=a252a4c6-5a5f-42f3-8270-322923b8c118, portId=30638675-e655-4719-b684-905ad0d49eac 2016-06-08 10:39:05,671 ERROR [NiFi Web Server-24] o.apache.nifi.web.api.SiteToSiteResource Exception detail: org.apache.nifi.processor.exception.ProcessException: org.apache.nifi.processor.exception.FlowFileAccessException: Failed to import data from org.apache.nifi.stream.io.MinimumLengthInputStream@75b15a7e for StandardFlowFileRecord[uuid=d8046125-80be-4475-bb43-b15cf6dec4d8,claim=,offset=0,name=531446421738053,size=0] due to org.apache.nifi.processor.exception.FlowFileAccessException: Unable to create ContentClaim due to java.io.EOFException at org.apache.nifi.remote.StandardRootGroupPort.receiveFlowFiles(StandardRootGroupPort.java:503) ~[nifi-site-to-site-1.0.0-SNAPSHOT.jar:1.0.0-SNAPSHOT] at org.apache.nifi.web.api.SiteToSiteResource.receiveFlowFiles(SiteToSiteResource.java:418) ~[classes/:na] at sun.reflect.GeneratedMethodAccessor348.invoke(Unknown Source) ~[na:na] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:1.8.0_60] at java.lang.reflect.Method.invoke(Method.java:497) ~[na:1.8.0_60] at com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60) [jersey-server-1.19.jar:1.19] at com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$ResponseOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:205) [jersey-server-1.19.jar:1.19] at com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaMethodDispatcher.java:75) [jersey-server-1.19.jar:1.19] at com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:302) [jersey-server-1.19.jar:1.19] at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147) [jersey-server-1.19.jar:1.19] at com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(ResourceClassRule.java:108) [jersey-server-1.19.jar:1.19] at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147) [jersey-server-1.19.jar:1.19] at com.sun.jersey.server.impl.uri.rules.RootResourceClassesRule.accept(RootResourceClassesRule.java:84) [jersey-server-1.19.jar:1.19] at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1542) [jersey-server-1.19.jar:1.19] at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1473) [jersey-server-1.19.jar:1.19] at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1419) [jersey-server-1.19.jar:1.19] at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1409) [jersey-server-1.19.jar:1.19] at com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:409) [jersey-servlet-1.19.jar:1.19] at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:558) [jersey-servlet-1.19.jar:1.19] at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:733) [jersey-servlet-1.19.jar:1.19] at javax.servlet.http.HttpServlet.service(HttpServlet.java:790) [javax.servlet-api-3.1.0.jar:3.1.0] at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:845) [jetty-servlet-9.3.9.v20160517.jar:9.3.9.v20160517] at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1689) [jetty-servlet-9.3.9.v20160517.jar:9.3.9.v20160517] at org.eclipse.jetty.servlets.GzipFilter.doFilter(GzipFilter.java:51) [jetty-servlets-9.3.9.v20160517.jar:9.3.9.v20160517] at
Limiting a queue
Hello all, I'm kind of new to developing Nifi, though I've been doing some pretty in depth stuff and some advanced database queries. My question is in regarding the queues between processor, I want to limit a queue to... say 2000, how would I go about doing that? Or better yet, how would I tell the processor generating the queue to only put a max of 2000 files into the queue? Allow me to explain with a scenario: We are doing data migration from one database to another. -Processor A is generating a queue consumed by Processor B -Processor A is taking configuration and generating SQL queries in 1000 record chunks so that Processor B can insert them into a new database. Given the size of the source database, Processor A can potentially generate hundreds of thousands of files. Is there a way for Processor A to check it's down stream queue for the queue size? How would I get Processor A to only put 2000 files into the queue at any given time, so that Processor A can continue running but wait for room in the queue? Thank you in advance. -- *Shaine Berube*
Re: [DISCUSS] - Markdown option for documentation artifacts
+1 On Wed, Jun 8, 2016 at 7:05 AM Andrewrote: > +1 on this + a template that matches existing additional.html > On 8 Jun 2016 04:28, "Bryan Rosander" wrote: > > > Hey all, > > > > When writing documentation (e.g. the additionalDetails.html for a > > processor) it would be nice to have the option to use Markdown instead of > > html. > > > > I think Markdown is easier to read and write than raw HTML and for simple > > cases does the job pretty well. It also has the advantage of being able > to > > be translated into other document types easily and it would be rendered > by > > default in Github when the file is clicked. > > > > There is an MIT-licensed Markdown maven plugin ( > > https://github.com/walokra/markdown-page-generator-plugin) that seems > like > > it might work for translating additionalDetails.md (and others) into an > > equivalent html page. > > > > Thanks, > > Bryan Rosander > > >
[GitHub] nifi issue #510: NIFI-1984: Ensure that locks are always cleaned up by Naive...
Github user mattyb149 commented on the issue: https://github.com/apache/nifi/pull/510 +1 LGTM, built and ran tests, also started NiFi and tried various delete operations including an attempt to delete a processor that had an incoming connection. All behavior was as expected --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
Re: [DISCUSS] - Markdown option for documentation artifacts
+1 on this + a template that matches existing additional.html On 8 Jun 2016 04:28, "Bryan Rosander"wrote: > Hey all, > > When writing documentation (e.g. the additionalDetails.html for a > processor) it would be nice to have the option to use Markdown instead of > html. > > I think Markdown is easier to read and write than raw HTML and for simple > cases does the job pretty well. It also has the advantage of being able to > be translated into other document types easily and it would be rendered by > default in Github when the file is clicked. > > There is an MIT-licensed Markdown maven plugin ( > https://github.com/walokra/markdown-page-generator-plugin) that seems like > it might work for translating additionalDetails.md (and others) into an > equivalent html page. > > Thanks, > Bryan Rosander >
[GitHub] nifi issue #492: NIFI-1975 - Processor for parsing evtx files
Github user mattyb149 commented on the issue: https://github.com/apache/nifi/pull/492 The nifi-evtx-nar needs to be added to the top-level POM and the nifi-assembly POM, otherwise it will not be included in the distro. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] nifi pull request #510: NIFI-1984: Ensure that locks are always cleaned up b...
GitHub user markap14 opened a pull request: https://github.com/apache/nifi/pull/510 NIFI-1984: Ensure that locks are always cleaned up by NaiveRevisionManager Ensure that if an Exception is thrown by the 'Deletion Task' when calling NaiveRevisionManager.deleteRevision() that the locking is appropriately cleaned up. Looking through the code, there do not appear to be any other places where we invoke callbacks without handling appropriate with a try/finally or try/catch block. You can merge this pull request into a Git repository by running: $ git pull https://github.com/markap14/nifi NIFI-1984 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/nifi/pull/510.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #510 commit 1d46b5431bf2eca0298d2f2fdc9854ef3f9fedfa Author: Mark PayneDate: 2016-06-08T12:57:37Z NIFI-1984: Ensure that if an Exception is thrown by the 'Deletion Task' when calling NaiveRevisionManager.deleteRevision() that the locking is appropriately cleaned up --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
Re: NIFI ListenTCP Processor
Venkat, You are correct that right now the incoming message delimiter is hard-coded as "/n". The property you see in the UI for message delimiter is actually for the output of the processor. It is used when you increase the batch size > 1 and it writes multiple messages to a single flow file, it uses the value of that property to separate them. We definitely want to expose the incoming delimiter as something that can be set through a property on the processor. I think one reason it hasn't been done yet is because there are several strategies we'd like to support: - exact match - this would be your case where you specify "$" - pattern match - this would be reading until a pattern is seen to help capture multi-line log messages that start with date patterns - size match - this would be reading until a specified number of bytes have been read That being said, we shouldn't need to do all of these at once so I created this JIRA for your scenario: https://issues.apache.org/jira/browse/NIFI-1985 -Bryan On Wed, Jun 8, 2016 at 2:25 AM, Venkatesh Nandigam < venkat.nandi...@bridgera.com> wrote: > Hi Team, > > We started using NIFI data flow in our current project by replacing node js > tcp listeners. we are using nifi listenTCP processor. > > our use case: we have some devices they will send message to tcp port, from > nifi we have to get message and then place data into kafka. > > Processors we are using: > > ListenTCP processor: to receive data from port and send to kafka topic > > PutKafka: place data into kafka topic. > > Problem we are facing: > > from device side we have message delimiter as "$" but nifi listenTCP > processor is accepting only "/n". we changed the delimeter in nifi admin > its not reflecting.. by looking at nifi code we saw TCP_DELIMETER field as > final. then we changed that class with our delimiter its working fine.. > > Path: > > https://github.com/apache/nifi/blob/master/nifi-commons/nifi-processor-utilities/src/main/java/org/apache/nifi/processor/util/listen/handler/socket/SocketChannelHandler.java > > Is this expected behavior? or we missing something?.. if it is excepted > behavior is they any chance to change that modifiable field?. > > Thanks, > Venkat >
[GitHub] nifi pull request #509: NIFI-1982: Use Compressed check box value.
GitHub user ijokarumawak opened a pull request: https://github.com/apache/nifi/pull/509 NIFI-1982: Use Compressed check box value. - The Compressed check box UI input was not used - This commit enables Site-to-Site compression configuration from UI You can merge this pull request into a Git repository by running: $ git pull https://github.com/ijokarumawak/nifi nifi-1982 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/nifi/pull/509.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #509 commit 70bd97a2ca564b14d16353af55647373eb259253 Author: Koji KawamuraDate: 2016-06-08T11:54:40Z NIFI-1982: Use Compressed check box value. - The Compressed check box UI input was not used - This commit enables Site-to-Site compression configuration from UI --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
Re: Nifi behind AWS ELB
Mark, I reread your reply again and realized i missed this statement "These do not send the attributes, though, so you would need to precede this with a MergeContent with Merge Format of "FlowFile Stream, v3". Then, on the receiving side, you could use UnpackContent to unpack these FlowFile Packages back into their 'native' form." I will try to shave some time off and try this out. Also created a ticket for the improvment to the PostHttp processor ( https://issues.apache.org/jira/browse/NIFI-1983) Cheers, Edgardo On Tue, Jun 7, 2016 at 4:20 PM, Edgardo Vegawrote: > Well that blows. Should I create a jira ticket to disable two phased > commit? > > > On Tuesday, June 7, 2016, Mark Payne wrote: > >> Edgardo, >> >> You'd run into a lot of problems trying to use that solution, as many >> attributes contain >> characters that are not valid in HTTP headers, and HTTP Headers are >> delineated with >> new-lines, so if you have an attribute with new-lines you'll get really >> weird results. >> >> -Mark >> >> >> > On Jun 7, 2016, at 3:52 PM, Edgardo Vega >> wrote: >> > >> > Mark, >> > >> > Amazon only supports sticky session via cookies. >> > >> > Disabling the two-phase commit would be really nice >> > >> > What if you do a invokehttp with send all the attributes as Http headers >> > and on the receive side on listenhttp do a .* to turn all the headers >> back >> > into attribute? Would that work? >> > >> > Cheers, >> > >> > Edgardo >> > >> > On Tue, Jun 7, 2016 at 3:19 PM, Mark Payne >> wrote: >> > >> >> The idea behind the DELETE mechanism is that in some environments there >> >> were timeouts >> >> that would occur quite frequently between PostHTTP / ListenHTTP and >> this >> >> resulted in quite >> >> a lot of data duplication. By adding in the two-phase commit, we were >> able >> >> to drastically reduce >> >> the amount of data duplication, as a timeout anywhere in the first >> >> (typically MUCH longer) phase >> >> would result in the data on the receiving side being dropped because >> the >> >> receiving side would >> >> not delete the hold that it placed on the FlowFiles. >> >> >> >> It would be reasonable to add an option for PostHTTP so that it >> requests >> >> not to perform a two-phase >> >> commit. Alternatively, you could use either PostHTTP with 'Send as >> >> FlowFile' set to 'false' or you >> >> could use InvokeHTTP. These do not send the attributes, though, so you >> >> would need to precede this >> >> with a MergeContent with Merge Format of "FlowFile Stream, v3". >> >> Then, on the receiving side, you could use UnpackContent to unpack >> these >> >> FlowFile Packages back >> >> into their 'native' form. >> >> >> >> Or, a simpler option, if Amazon's ELB supports it, is to configure the >> ELB >> >> such that HTTP Requests that >> >> contain the same value for the "x-nifi-transaction-id" header will go >> to >> >> the same node. This >> >> header was added specifically to allow for this functionality through >> Load >> >> Balancers, >> >> but I don't know if ELB specifically supports this or not. >> >> >> >> Thanks >> >> -Mark >> >> >> >> >> >>> On Jun 7, 2016, at 2:16 PM, Aldrin Piri wrote: >> >>> >> >>> InvokeHTTP may be the better option if the user is not interested in >> >>> transmitting content _packaged as_ FlowFiles. Someone with a bit more >> >>> history than myself can provide some additional context if I have >> strayed >> >>> off the path, but PostHTTP and ListenHTTP were precursors to Site to >> >> Site. >> >>> While they can transmit arbitrary content, were created for this >> >>> inter-instance communication to aid in the guaranteed delivery >> semantics. >> >>> The listed hold, in this case, is part of that transaction occurring >> >> where >> >>> a response is returned to acknowledge receipt via ListenHTTP [1] and >> the >> >>> ContentAcknowledgementServlet [2]. >> >>> >> >>> [1] >> >>> >> >> >> https://github.com/apache/nifi/blob/1bd2cf0d09a7111bcecffd0f473aa71c25a69845/nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/ListenHTTP.java >> >>> [2] >> >>> >> >> >> https://github.com/apache/nifi/blob/1bd2cf0d09a7111bcecffd0f473aa71c25a69845/nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/servlets/ContentAcknowledgmentServlet.java >> >>> >> >>> On Tue, Jun 7, 2016 at 2:10 PM, Bryan Bende wrote: >> >>> >> Looks like PostHttp interprets the response, and based on a series of >> conditions can intentionally issue a delete. >> >> I can't fully understand what is happening, but the code is here: >> >> >> >> >>
[GitHub] nifi issue #493: NIFI-1037 Created processor that handles HDFS' inotify even...
Github user pvillard31 commented on the issue: https://github.com/apache/nifi/pull/493 I have played with it and it works great. One remark: I am wondering if the 'HDFS_PATH_TO_WATCH' property should be improved: - accept the expression language to handle time-stamped directory? - accept regular expressions? - accept comma-separated list of paths? Since we are polling all HDFS events it could make sense to have the best filtering options --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] nifi issue #497: NIFI-1857: HTTPS Site-to-Site
Github user ijokarumawak commented on the issue: https://github.com/apache/nifi/pull/497 @markap14 Fixed the sending data issue. The reason was I didn't call ByteBuffer clear() and didn't closed PipedOutputStream properly. Sorry for the inconvenience. Please try it again. Also, I fixed the Remote Process Group Port so that useCompression can be set from UI. It seems the UI configuration hasn't been used. I'll send another PR for 0.x branch for that. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
NIFI ListenTCP Processor
Hi Team, We started using NIFI data flow in our current project by replacing node js tcp listeners. we are using nifi listenTCP processor. our use case: we have some devices they will send message to tcp port, from nifi we have to get message and then place data into kafka. Processors we are using: ListenTCP processor: to receive data from port and send to kafka topic PutKafka: place data into kafka topic. Problem we are facing: from device side we have message delimiter as "$" but nifi listenTCP processor is accepting only "/n". we changed the delimeter in nifi admin its not reflecting.. by looking at nifi code we saw TCP_DELIMETER field as final. then we changed that class with our delimiter its working fine.. Path: https://github.com/apache/nifi/blob/master/nifi-commons/nifi-processor-utilities/src/main/java/org/apache/nifi/processor/util/listen/handler/socket/SocketChannelHandler.java Is this expected behavior? or we missing something?.. if it is excepted behavior is they any chance to change that modifiable field?. Thanks, Venkat