Questions about the ordering of the FlowFile.
Here’s my use case. We have a application protocol between the start and end processors in a data flow, that expect the flow files to arrive in the order they are generated. For e.g Start Record Flowfile End Record Flowfile. The first processor does the following. 1. Generates and transfers the StartRecord flow file. 2. Generates data records and transfers them. 3. Generates and transfers the EndRecord flow file The last processor in the data flow does the following. 1. Looks for the StartRecord flow file and does its thing. 2. Looks for the DataRecord flow file and does its thing. 3. Looks for the EndRecord flow file and updates and cleanups up the target state. The first processor is doing multiple transfers on the session object before calling commit. We see that they are being received in random order. As a result we are not able to execute the app protocol. We have tried the FirstInFirstOutPrioritizer and OldestFlowFilePrioritizer. We would appreciate any insights into this we can get as it seems to be a blocking issue for us. Thanks Paresh The information contained in this transmission may contain privileged and confidential information. It is intended only for the use of the person(s) named above. If you are not the intended recipient, you are hereby notified that any review, dissemination, distribution or duplication of this communication is strictly prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message.
Re: Regarding Nifi packaging and deployment
Shweta, The primary mechanism is flow templates [1]. They do have some important limitations today though that you'll want to understand. First, some properties are sensitive, like passwords, and thus are not included in the templates so you'll have to reenter them when you apply the template in the new environment. Second, there are at times properties you'd want to have different values for in different environments. We need to provide a property/env variable mapping mechanism. Both of these we intend to address but neither are presently actively being worked as far as I'm aware of. [1] https://nifi.apache.org/docs/nifi-docs/html/user-guide.html#templates Thanks Joe On Thu, Dec 10, 2015 at 9:25 PM, shweta wrote: > Hi all, > > I'm new to Nifi. I have created some sample flows. I want to know how can I > package and deploy the same > from development environment to testing environment or do I need to recreate > the entire data flow again in different environment. > > Thanks, > Shweta > > > > -- > View this message in context: > http://apache-nifi-developer-list.39713.n7.nabble.com/Regarding-Nifi-packaging-and-deployment-tp5716.html > Sent from the Apache NiFi Developer List mailing list archive at Nabble.com.
Regarding Nifi packaging and deployment
Hi all, I'm new to Nifi. I have created some sample flows. I want to know how can I package and deploy the same from development environment to testing environment or do I need to recreate the entire data flow again in different environment. Thanks, Shweta -- View this message in context: http://apache-nifi-developer-list.39713.n7.nabble.com/Regarding-Nifi-packaging-and-deployment-tp5716.html Sent from the Apache NiFi Developer List mailing list archive at Nabble.com.
Re: Asynchronous JMS Consumer for IBM MQ
Ian, With run duration the idea is that the processor will be allowed to keep executing for that period of time and the framework will keep giving it the same process session. For the developer this means they get to keep their logic very simple and discrete to a single operation but that the framework will take care of batching those operations together as one for up to 'X secs' of run duration. For event driven the idea is that rather than telling the framework you want the processor to run every X units of time as is the case in timer driven with event driven the framework will execute the processor (give it thread/call ontrigger) whenever there is data being placed into its queue. It can be more efficient in some cases. Thanks Joe On Thu, Dec 10, 2015 at 8:59 PM, ianwork wrote: > Bryan/Aldrin, Adding yielding into my processor prevent the number of tasks > was rapidly increasing. Thanks! > > Aldrin, I would like to dig a little more into the details. My application > is basically set do process logs like logstash. The application is reading > and parsing a high volume of logs. The listener is based up listenSyslog. > The listener forwards the logs to various processors which run regex's on > batches of logs. Increasing the number of regex processors reduces the > performance so i'd like to determine how I can configure the system > resources. > > I'm still struggling with run duration. Does setting run duration mean that > when a thread is allocated to the ontrigger method of that processor it will > run for a maximum of run duration? What if the method executes faster than > the run duration, will ontrigger be called again if there is work to be > done? > > > Is the event driven mode something to consider in my type of processor? > What use cases was that designed to satisfy and is there any documentation > on that? > > > > > -- > View this message in context: > http://apache-nifi-developer-list.39713.n7.nabble.com/Asynchronous-JMS-Consumer-for-IBM-MQ-tp3919p5715.html > Sent from the Apache NiFi Developer List mailing list archive at Nabble.com.
Re: Asynchronous JMS Consumer for IBM MQ
Bryan/Aldrin, Adding yielding into my processor prevent the number of tasks was rapidly increasing. Thanks! Aldrin, I would like to dig a little more into the details. My application is basically set do process logs like logstash. The application is reading and parsing a high volume of logs. The listener is based up listenSyslog. The listener forwards the logs to various processors which run regex's on batches of logs. Increasing the number of regex processors reduces the performance so i'd like to determine how I can configure the system resources. I'm still struggling with run duration. Does setting run duration mean that when a thread is allocated to the ontrigger method of that processor it will run for a maximum of run duration? What if the method executes faster than the run duration, will ontrigger be called again if there is work to be done? Is the event driven mode something to consider in my type of processor? What use cases was that designed to satisfy and is there any documentation on that? -- View this message in context: http://apache-nifi-developer-list.39713.n7.nabble.com/Asynchronous-JMS-Consumer-for-IBM-MQ-tp3919p5715.html Sent from the Apache NiFi Developer List mailing list archive at Nabble.com.
Re: GetHTTP processor not working
Hello Shweta, I think there is a combination of things going on. The error you're probably seeing first is "Illegal character in fragment at index 239". This is due to the "{" and "}" in your URL. They both need to be URL encoded to %7B and %7D respectively. The URL you should be using is below [1]. Second, are you sure the website is running? I tried to reach out to it and my connection times out, specifically "Read timed out". [1] http://unify.impetus.co.in/BigData/_layouts/15/start.aspx#/Shared%20Documents/Forms/AllItems.aspx?RootFolder=%2FBigData%2FShared%20Documents%2F3%20CU%2FSkillset%20Analyzer%2FResumes&FolderCTID=0x012000D7E70BB8AE01E840A767ECB4D05AC5ED&View=%7B107FFCED-34CD-4354-B0D3-422058A26150%7D Joe - - - - - - Joseph Percivall linkedin.com/in/Percivall e: joeperciv...@yahoo.com On Thursday, December 10, 2015 7:18 AM, shweta wrote: I have a url as following http://unify.impetus.co.in/BigData/_layouts/15/start.aspx#/Shared%20Documents/Forms/AllItems.aspx?RootFolder=%2FBigData%2FShared%20Documents%2F3%20CU%2FSkillset%20Analyzer%2FResumes&FolderCTID=0x012000D7E70BB8AE01E840A767ECB4D05AC5ED&View={107FFCED-34CD-4354-B0D3-422058A26150} and I'm trying to fetch some files from above URL using GetHTTP processor. But it fails. I have tried with decoded URL as well but no luck. Can anyone please help how to go about it. -- View this message in context: http://apache-nifi-developer-list.39713.n7.nabble.com/GetHTTP-processor-not-working-tp5711.html Sent from the Apache NiFi Developer List mailing list archive at Nabble.com.
Re: Facing Issue while connecting with HDFS
Site-to-Site is a direct connection between NiFi instances/clusters over a socket, so TCP based. There will always have to be at least one local machine involved. When NiFi pulls/receives data from somewhere, it takes that data under control and stores it in the NiFi content repository on disk (configured in nifi.properties). As a FlowFile moves through the flow, a pointer to this content is being passed around until it needs to be accessed. So when PutHDFS needs to send to the other cluster it would read the content and send to the other HDFS. The data would then eventually age-off from the NiFi content repository depending how it is configured. So it would not have to hold all of the data on the local machine, but it would always have some portion of the most recent data that has been moved across. Let us know if this doesn't make sense. -Bryan On Thu, Dec 10, 2015 at 1:52 AM, digvijayp wrote: > Hi Bryan, > So in edge node approach how data sent in site-to-site ?I mean to say is it > using any protocol to transfer it like FTP,SFTP. > As you are saying If both clusters can fully talk to each other than you > don't need this edge node approach, you could just have a NiFi instance, or > cluster, that pulls from one HDFS and pushes to the other. > so my query is we have to use FetchHDFS/getHDFS process which get data from > HDFS to local machine and putHDFS process which load data from local > machine > to HDFS.I dont have yo use the local machin in between .So how can we > manage > the transfer data without using local machine? Where can we do such > configuration in nifi? > > Thanks in advance. > > Digvijay P. > > > > -- > View this message in context: > http://apache-nifi-developer-list.39713.n7.nabble.com/Facing-Issue-while-connecting-with-HDFS-tp5684p5712.html > Sent from the Apache NiFi Developer List mailing list archive at > Nabble.com. >
GetHTTP processor not working
I have a url as following http://unify.impetus.co.in/BigData/_layouts/15/start.aspx#/Shared%20Documents/Forms/AllItems.aspx?RootFolder=%2FBigData%2FShared%20Documents%2F3%20CU%2FSkillset%20Analyzer%2FResumes&FolderCTID=0x012000D7E70BB8AE01E840A767ECB4D05AC5ED&View={107FFCED-34CD-4354-B0D3-422058A26150} and I'm trying to fetch some files from above URL using GetHTTP processor. But it fails. I have tried with decoded URL as well but no luck. Can anyone please help how to go about it. -- View this message in context: http://apache-nifi-developer-list.39713.n7.nabble.com/GetHTTP-processor-not-working-tp5711.html Sent from the Apache NiFi Developer List mailing list archive at Nabble.com.
Re: Facing Issue while connecting with HDFS
Hi Bryan, So in edge node approach how data sent in site-to-site ?I mean to say is it using any protocol to transfer it like FTP,SFTP. As you are saying If both clusters can fully talk to each other than you don't need this edge node approach, you could just have a NiFi instance, or cluster, that pulls from one HDFS and pushes to the other. so my query is we have to use FetchHDFS/getHDFS process which get data from HDFS to local machine and putHDFS process which load data from local machine to HDFS.I dont have yo use the local machin in between .So how can we manage the transfer data without using local machine? Where can we do such configuration in nifi? Thanks in advance. Digvijay P. -- View this message in context: http://apache-nifi-developer-list.39713.n7.nabble.com/Facing-Issue-while-connecting-with-HDFS-tp5684p5712.html Sent from the Apache NiFi Developer List mailing list archive at Nabble.com.