PutDistributedMapCache
Hi to all, I have these error bulletin message showing now and there on PutDistributedMapCache processor. The error disappears after a few minutes, but now and there it shows up again, and it is self-resolving (I have failure relationship routed again to the processor). I increased connection timeout from 30secs to 60, but it doesn't help. Why is this happening? [Graphical user interface, text Description automatically generated] Thanks, Tom
RE: Performance of adding many keys to redis with PutDistributedMapCache
Maybe something that used records and a record query on top of mset would be the most efficient. On April 2, 2020 at 06:27:53, Hesselmann, Brian (brian.hesselm...@cgi.com) wrote: Hi Bryan and Mike, Thanks for the responses. For now we have introduced a ExecuteStreamCommand to use the redis-cli and different commands directly. It seems to improve performance for now, but we will have to look into introducing a new procesor or different DB if necessary. Thanks, Brian -- *Van:* Mike Thomsen [mikerthom...@gmail.com] *Verzonden:* woensdag 1 april 2020 0:08 *Aan:* users@nifi.apache.org *Onderwerp:* Re: Performance of adding many keys to redis with PutDistributedMapCache Might be worth experimenting with KeyDB and see if that helps. It's a mutli-threaded fork of Redis that's supposedly about as fast in a single node as a same size Redis cluster when you compare cluster nodes to KeyDB thread pool size. https://keydb.dev/ <https://urldefense.proofpoint.com/v2/url?u=https-3A__keydb.dev_=DwMFaQ=H50I6Bh8SW87d_bXfZP_8g=SZ1t8SQDPUG29Dh1I8iJ-uskV9jK3PuRgBiFyP5aljY=1nnOc3V31kMYb0yHffJiNjhefJYM79NHp8bM9bX9i0c=sxTO-sVQaGBua-hqcP-AyOfbdlBidK20WyRaAuw7xsM=> On Tue, Mar 31, 2020 at 4:49 PM Bryan Bende wrote: > Hi Brian, > > I'm not sure what can really be done with the existing processor besides > what you have already done. Have you configured your overall Timer Driven > thread pool appropriately? > > Most likely there would need to be a new PutRedis processor that didn't > have to adhere to the DistributedMapCacheInterface and could use MSET or > whatever specific Redis functionality was needed. > > Another option might be a record-based variation of PutDistributedMapCache > where you could keep thousands of records together and stream them to the > cache. It would take a record-path to specify the key for each record and > serialize the record as the value (assuming your data fits into one of the > record formats like JSON, Avro, CSV). > > -Bryan > > On Tue, Mar 31, 2020 at 4:23 PM Hesselmann, Brian < > brian.hesselm...@cgi.com> wrote: > >> Hi, >> >> We currently run a flow that puts about 700.000 entries/flowfiles into >> Redis every 5 minutes. I'm looking for ways to improve performance. >> >> Currently we've been upping the number of concurrent tasks and run >> duration of the PutDistributedMapCache processor to be able to process >> everything. I know Redis supports setting multiple keys at once using MSET( >> https://redis.io/commands/mset >> <https://urldefense.proofpoint.com/v2/url?u=https-3A__redis.io_commands_mset=DwMFaQ=H50I6Bh8SW87d_bXfZP_8g=SZ1t8SQDPUG29Dh1I8iJ-uskV9jK3PuRgBiFyP5aljY=1nnOc3V31kMYb0yHffJiNjhefJYM79NHp8bM9bX9i0c=M5M84Vuo0mPoJU3kz_Job5q4S0N2sHtinRxUGBpKQew=>), >> however using Nifi this command is not available. >> >> Short of simply upgrading the system we run Nifi/Redis on, do you have >> any suggestions for improving performance of PutDistributedMapCache? >> >> Best, >> Brian >> >
RE: Performance of adding many keys to redis with PutDistributedMapCache
Hi Bryan and Mike, Thanks for the responses. For now we have introduced a ExecuteStreamCommand to use the redis-cli and different commands directly. It seems to improve performance for now, but we will have to look into introducing a new procesor or different DB if necessary. Thanks, Brian Van: Mike Thomsen [mikerthom...@gmail.com] Verzonden: woensdag 1 april 2020 0:08 Aan: users@nifi.apache.org Onderwerp: Re: Performance of adding many keys to redis with PutDistributedMapCache Might be worth experimenting with KeyDB and see if that helps. It's a mutli-threaded fork of Redis that's supposedly about as fast in a single node as a same size Redis cluster when you compare cluster nodes to KeyDB thread pool size. https://keydb.dev/<https://urldefense.proofpoint.com/v2/url?u=https-3A__keydb.dev_=DwMFaQ=H50I6Bh8SW87d_bXfZP_8g=SZ1t8SQDPUG29Dh1I8iJ-uskV9jK3PuRgBiFyP5aljY=1nnOc3V31kMYb0yHffJiNjhefJYM79NHp8bM9bX9i0c=sxTO-sVQaGBua-hqcP-AyOfbdlBidK20WyRaAuw7xsM=> On Tue, Mar 31, 2020 at 4:49 PM Bryan Bende mailto:bbe...@gmail.com>> wrote: Hi Brian, I'm not sure what can really be done with the existing processor besides what you have already done. Have you configured your overall Timer Driven thread pool appropriately? Most likely there would need to be a new PutRedis processor that didn't have to adhere to the DistributedMapCacheInterface and could use MSET or whatever specific Redis functionality was needed. Another option might be a record-based variation of PutDistributedMapCache where you could keep thousands of records together and stream them to the cache. It would take a record-path to specify the key for each record and serialize the record as the value (assuming your data fits into one of the record formats like JSON, Avro, CSV). -Bryan On Tue, Mar 31, 2020 at 4:23 PM Hesselmann, Brian mailto:brian.hesselm...@cgi.com>> wrote: Hi, We currently run a flow that puts about 700.000 entries/flowfiles into Redis every 5 minutes. I'm looking for ways to improve performance. Currently we've been upping the number of concurrent tasks and run duration of the PutDistributedMapCache processor to be able to process everything. I know Redis supports setting multiple keys at once using MSET(https://redis.io/commands/mset<https://urldefense.proofpoint.com/v2/url?u=https-3A__redis.io_commands_mset=DwMFaQ=H50I6Bh8SW87d_bXfZP_8g=SZ1t8SQDPUG29Dh1I8iJ-uskV9jK3PuRgBiFyP5aljY=1nnOc3V31kMYb0yHffJiNjhefJYM79NHp8bM9bX9i0c=M5M84Vuo0mPoJU3kz_Job5q4S0N2sHtinRxUGBpKQew=>), however using Nifi this command is not available. Short of simply upgrading the system we run Nifi/Redis on, do you have any suggestions for improving performance of PutDistributedMapCache? Best, Brian
Re: Performance of adding many keys to redis with PutDistributedMapCache
Might be worth experimenting with KeyDB and see if that helps. It's a mutli-threaded fork of Redis that's supposedly about as fast in a single node as a same size Redis cluster when you compare cluster nodes to KeyDB thread pool size. https://keydb.dev/ On Tue, Mar 31, 2020 at 4:49 PM Bryan Bende wrote: > Hi Brian, > > I'm not sure what can really be done with the existing processor besides > what you have already done. Have you configured your overall Timer Driven > thread pool appropriately? > > Most likely there would need to be a new PutRedis processor that didn't > have to adhere to the DistributedMapCacheInterface and could use MSET or > whatever specific Redis functionality was needed. > > Another option might be a record-based variation of PutDistributedMapCache > where you could keep thousands of records together and stream them to the > cache. It would take a record-path to specify the key for each record and > serialize the record as the value (assuming your data fits into one of the > record formats like JSON, Avro, CSV). > > -Bryan > > On Tue, Mar 31, 2020 at 4:23 PM Hesselmann, Brian < > brian.hesselm...@cgi.com> wrote: > >> Hi, >> >> We currently run a flow that puts about 700.000 entries/flowfiles into >> Redis every 5 minutes. I'm looking for ways to improve performance. >> >> Currently we've been upping the number of concurrent tasks and run >> duration of the PutDistributedMapCache processor to be able to process >> everything. I know Redis supports setting multiple keys at once using MSET( >> https://redis.io/commands/mset), however using Nifi this command is not >> available. >> >> Short of simply upgrading the system we run Nifi/Redis on, do you have >> any suggestions for improving performance of PutDistributedMapCache? >> >> Best, >> Brian >> >
Re: Performance of adding many keys to redis with PutDistributedMapCache
Hi Brian, I'm not sure what can really be done with the existing processor besides what you have already done. Have you configured your overall Timer Driven thread pool appropriately? Most likely there would need to be a new PutRedis processor that didn't have to adhere to the DistributedMapCacheInterface and could use MSET or whatever specific Redis functionality was needed. Another option might be a record-based variation of PutDistributedMapCache where you could keep thousands of records together and stream them to the cache. It would take a record-path to specify the key for each record and serialize the record as the value (assuming your data fits into one of the record formats like JSON, Avro, CSV). -Bryan On Tue, Mar 31, 2020 at 4:23 PM Hesselmann, Brian wrote: > Hi, > > We currently run a flow that puts about 700.000 entries/flowfiles into > Redis every 5 minutes. I'm looking for ways to improve performance. > > Currently we've been upping the number of concurrent tasks and run > duration of the PutDistributedMapCache processor to be able to process > everything. I know Redis supports setting multiple keys at once using MSET( > https://redis.io/commands/mset), however using Nifi this command is not > available. > > Short of simply upgrading the system we run Nifi/Redis on, do you have any > suggestions for improving performance of PutDistributedMapCache? > > Best, > Brian >
Performance of adding many keys to redis with PutDistributedMapCache
Hi, We currently run a flow that puts about 700.000 entries/flowfiles into Redis every 5 minutes. I'm looking for ways to improve performance. Currently we've been upping the number of concurrent tasks and run duration of the PutDistributedMapCache processor to be able to process everything. I know Redis supports setting multiple keys at once using MSET(https://redis.io/commands/mset), however using Nifi this command is not available. Short of simply upgrading the system we run Nifi/Redis on, do you have any suggestions for improving performance of PutDistributedMapCache? Best, Brian
Append Value to cache with FetchDistributedMapCache / PutDistributedMapCache
À users Hello ! I am new to Nifi and I'm trying to find the best workflow to update or append content to the MapCache I created my DistributedMapCacheServer and Service. I have several simple flowlogs with Content Like this: { "eventType" : "api", "message" : "blabla" } and simple Attribute "ServiceName" : abcd" 1 ) At firts i created aggregation using MergeRecord with my correlation attribute set on ${ServiceName) the output is something like this : [ { "eventType" : "api", "message" : "blabla" }, { "eventType" : "api", "message" : "blabla" }, { "eventType" : "api", "message" : "blabla" }] I then add the value to the cache with PutDistributedMapCache and cache entry identifier on ${ServiceName} but i'am blocked with the featching , I wish add the new aggregation value with the values provided by the cache but if link the mergedRecord with FetchDistributedMapCache operator will replace my value by the cache value and i will lost the context of my entry i'am taking a look on "Put Cache Value In Attribute"in order to keep both context but I'm not sure if I can then merge the attributes in the Content Thank for your suggestion ! Yanna elina
Re: PutDistributedMapCache
Hello Sudeep, Sorry, not following your emails, did you need more help importing the processor? Currently the way you would clear a DistributedMapCache is to just remove the DistributedMapCacheServer controller service and make a new one. Joe - - - - - - Joseph Percivalllinkedin.com/in/Percivalle: joeperciv...@yahoo.com On Thursday, January 14, 2016 7:04 AM, sudeep mishra <sudeepshekh...@gmail.com> wrote: Thanks Joe. The GetDistributedMapCache seems to be working fine. Is there a way to clear DistributedMapCache on demand? Regards, Sudeep On Thu, Jan 14, 2016 at 12:42 PM, sudeep mishra <sudeepshekh...@gmail.com> wrote: Upon building the repository we get different .nar files which can be updated in the lib for my requirement. Thanks for your help. On Thu, Jan 14, 2016 at 9:27 AM, sudeep mishra <sudeepshekh...@gmail.com> wrote: Is it possible to build the code for only a particular processor? Just curious if we can build and deploy a particular processor in an existing NiFi environment. On Wed, Jan 13, 2016 at 9:33 PM, sudeep mishra <sudeepshekh...@gmail.com> wrote: Thanks Joe. I will try out the patch. On Wed, Jan 13, 2016 at 9:31 PM, Joe Percivall <joeperciv...@yahoo.com> wrote: You would need to clone the nifi source from github and then apply the patch using git. Here is how to clone a repo: https://help.github.com/articles/cloning-a-repository/ Along with the nifi repo itself: https://github.com/apache/nifi and how to apply a patch: http://makandracards.com/makandra/2521-git-how-to-create-and-apply-patches Let me know if you have any other questions, Joe - - - - - - Joseph Percivall linkedin.com/in/Percivall e: joeperciv...@yahoo.com On Wednesday, January 13, 2016 10:56 AM, sudeep mishra <sudeepshekh...@gmail.com> wrote: Thank you very much Joe. Can you please let me know how I can use the .patch file? I am using the NiFi via the binaries... Do I need to setup the source code and build the same along with the patch? Thanks & Regards, Sudeep On Wed, Jan 13, 2016 at 9:02 PM, Joe Percivall <joeperciv...@yahoo.com> wrote: Hello Sudeep, > >I put up a patch on the GetDistributedMapCache ticket[1]. Let me know what you >think. > >The PutDistributedMapCache processor and GetDistributedMapCache work with the >data as a byte[] so it should be format agnostic. That being said it will be >up to you to know what is in there in order to use it later. > >[1] https://issues.apache.org/jira/browse/NIFI-1382 > >Joe >- - - - - - >Joseph Percivall >linkedin.com/in/Percivall >e: joeperciv...@yahoo.com > > > > >On Tuesday, January 12, 2016 11:34 PM, sudeep mishra ><sudeepshekh...@gmail.com> wrote: > > > >Thanks Joe. > >I do not have specific configuration as of now as I am still exploring NiFi. >Though I think it would be helpful to let user store and retrieve the cache >values in different formats json, avro etc. > >Thanks & Regards, > >Sudeep > > > > > >On Tue, Jan 12, 2016 at 9:15 PM, Joe Percivall <joeperciv...@yahoo.com> wrote: > >Hello Sudeep, >> >> >>We are currently lacking a "GetDistributedMapCache" processor that >>corresponds to the "PutDistributedMapCache". I created a ticket[1] and will >>be working on it today. If you have any comments, configuration suggestions, >>etc. please let me know or comment on the ticket. >> >> >>[1] https://issues.apache.org/jira/browse/NIFI-1382 >> >>Joe >>- - - - - - >>Joseph Percivall >>linkedin.com/in/Percivall >>e: joeperciv...@yahoo.com >> >> >> >> >> >>On Tuesday, January 12, 2016 9:46 AM, sudeep mishra >><sudeepshekh...@gmail.com> wrote: >> >> >> >>Thanks Matt. >> >> >>In my data flow I am expected to perform certain validations on data. I am >>loading some SQLServer data into HDFSusing Sqoop (not part of NiFi flow). For >>each record in HDFS file I have to query another database and then save the >>validated record again in HDFS which will be processed bysome Spark jobs. >> >> >>Since I have to query for each record thus I was planning to cache the >>database records against which I have to validate the HDFS. Thus I was >>evaluating the DistributedCacheServer. But looks like its purpose is >>different. Alternatively can we integrate Redis or another distributed cache >>with NiFi as I do not see any processor for it. >> >> >>Appreciate your help. >> >> >>Thanks & Regards, >> >> >>Sudeep >> >> >> >> >>On Tue, Jan 12, 2016 at 6:59 PM, Matthew Clarke <matt.clarke@gmail.com> &g
Re: PutDistributedMapCache
Thanks Joe. The GetDistributedMapCache seems to be working fine. Is there a way to clear DistributedMapCache on demand? Regards, Sudeep On Thu, Jan 14, 2016 at 12:42 PM, sudeep mishra <sudeepshekh...@gmail.com> wrote: > Upon building the repository we get different .nar files which can be > updated in the lib for my requirement. > Thanks for your help. > > On Thu, Jan 14, 2016 at 9:27 AM, sudeep mishra <sudeepshekh...@gmail.com> > wrote: > >> Is it possible to build the code for only a particular processor? Just >> curious if we can build and deploy a particular processor in an existing >> NiFi environment. >> >> On Wed, Jan 13, 2016 at 9:33 PM, sudeep mishra <sudeepshekh...@gmail.com> >> wrote: >> >>> Thanks Joe. I will try out the patch. >>> >>> On Wed, Jan 13, 2016 at 9:31 PM, Joe Percivall <joeperciv...@yahoo.com> >>> wrote: >>> >>>> You would need to clone the nifi source from github and then apply the >>>> patch using git. >>>> >>>> Here is how to clone a repo: >>>> https://help.github.com/articles/cloning-a-repository/ >>>> Along with the nifi repo itself: https://github.com/apache/nifi >>>> >>>> and how to apply a patch: >>>> http://makandracards.com/makandra/2521-git-how-to-create-and-apply-patches >>>> >>>> Let me know if you have any other questions, >>>> Joe >>>> - - - - - - >>>> Joseph Percivall >>>> linkedin.com/in/Percivall >>>> e: joeperciv...@yahoo.com >>>> >>>> >>>> >>>> On Wednesday, January 13, 2016 10:56 AM, sudeep mishra < >>>> sudeepshekh...@gmail.com> wrote: >>>> >>>> >>>> >>>> Thank you very much Joe. >>>> >>>> Can you please let me know how I can use the .patch file? I am using >>>> the NiFi via the binaries... Do I need to setup the source code and build >>>> the same along with the patch? >>>> >>>> Thanks & Regards, >>>> >>>> Sudeep >>>> >>>> >>>> On Wed, Jan 13, 2016 at 9:02 PM, Joe Percivall <joeperciv...@yahoo.com> >>>> wrote: >>>> >>>> Hello Sudeep, >>>> > >>>> >I put up a patch on the GetDistributedMapCache ticket[1]. Let me know >>>> what you think. >>>> > >>>> >The PutDistributedMapCache processor and GetDistributedMapCache work >>>> with the data as a byte[] so it should be format agnostic. That being said >>>> it will be up to you to know what is in there in order to use it later. >>>> > >>>> >[1] https://issues.apache.org/jira/browse/NIFI-1382 >>>> > >>>> >Joe >>>> >- - - - - - >>>> >Joseph Percivall >>>> >linkedin.com/in/Percivall >>>> >e: joeperciv...@yahoo.com >>>> > >>>> > >>>> > >>>> > >>>> >On Tuesday, January 12, 2016 11:34 PM, sudeep mishra < >>>> sudeepshekh...@gmail.com> wrote: >>>> > >>>> > >>>> > >>>> >Thanks Joe. >>>> > >>>> >I do not have specific configuration as of now as I am still exploring >>>> NiFi. Though I think it would be helpful to let user store and retrieve the >>>> cache values in different formats json, avro etc. >>>> > >>>> >Thanks & Regards, >>>> > >>>> >Sudeep >>>> > >>>> > >>>> > >>>> > >>>> > >>>> >On Tue, Jan 12, 2016 at 9:15 PM, Joe Percivall <joeperciv...@yahoo.com> >>>> wrote: >>>> > >>>> >Hello Sudeep, >>>> >> >>>> >> >>>> >>We are currently lacking a "GetDistributedMapCache" processor that >>>> corresponds to the "PutDistributedMapCache". I created a ticket[1] and will >>>> be working on it today. If you have any comments, configuration >>>> suggestions, etc. please let me know or comment on the ticket. >>>> >> >>>> >> >>>> >>[1] https://issues.apache.org/jira/browse/NIFI-1382 >>>> >> >>>> >>Joe >>>> >>- - - - - - >>>> >>Joseph Percivall >>>> >>linkedin.com/
Re: PutDistributedMapCache
Is it possible to build the code for only a particular processor? Just curious if we can build and deploy a particular processor in an existing NiFi environment. On Wed, Jan 13, 2016 at 9:33 PM, sudeep mishra <sudeepshekh...@gmail.com> wrote: > Thanks Joe. I will try out the patch. > > On Wed, Jan 13, 2016 at 9:31 PM, Joe Percivall <joeperciv...@yahoo.com> > wrote: > >> You would need to clone the nifi source from github and then apply the >> patch using git. >> >> Here is how to clone a repo: >> https://help.github.com/articles/cloning-a-repository/ >> Along with the nifi repo itself: https://github.com/apache/nifi >> >> and how to apply a patch: >> http://makandracards.com/makandra/2521-git-how-to-create-and-apply-patches >> >> Let me know if you have any other questions, >> Joe >> - - - - - - >> Joseph Percivall >> linkedin.com/in/Percivall >> e: joeperciv...@yahoo.com >> >> >> >> On Wednesday, January 13, 2016 10:56 AM, sudeep mishra < >> sudeepshekh...@gmail.com> wrote: >> >> >> >> Thank you very much Joe. >> >> Can you please let me know how I can use the .patch file? I am using the >> NiFi via the binaries... Do I need to setup the source code and build the >> same along with the patch? >> >> Thanks & Regards, >> >> Sudeep >> >> >> On Wed, Jan 13, 2016 at 9:02 PM, Joe Percivall <joeperciv...@yahoo.com> >> wrote: >> >> Hello Sudeep, >> > >> >I put up a patch on the GetDistributedMapCache ticket[1]. Let me know >> what you think. >> > >> >The PutDistributedMapCache processor and GetDistributedMapCache work >> with the data as a byte[] so it should be format agnostic. That being said >> it will be up to you to know what is in there in order to use it later. >> > >> >[1] https://issues.apache.org/jira/browse/NIFI-1382 >> > >> >Joe >> >- - - - - - >> >Joseph Percivall >> >linkedin.com/in/Percivall >> >e: joeperciv...@yahoo.com >> > >> > >> > >> > >> >On Tuesday, January 12, 2016 11:34 PM, sudeep mishra < >> sudeepshekh...@gmail.com> wrote: >> > >> > >> > >> >Thanks Joe. >> > >> >I do not have specific configuration as of now as I am still exploring >> NiFi. Though I think it would be helpful to let user store and retrieve the >> cache values in different formats json, avro etc. >> > >> >Thanks & Regards, >> > >> >Sudeep >> > >> > >> > >> > >> > >> >On Tue, Jan 12, 2016 at 9:15 PM, Joe Percivall <joeperciv...@yahoo.com> >> wrote: >> > >> >Hello Sudeep, >> >> >> >> >> >>We are currently lacking a "GetDistributedMapCache" processor that >> corresponds to the "PutDistributedMapCache". I created a ticket[1] and will >> be working on it today. If you have any comments, configuration >> suggestions, etc. please let me know or comment on the ticket. >> >> >> >> >> >>[1] https://issues.apache.org/jira/browse/NIFI-1382 >> >> >> >>Joe >> >>- - - - - - >> >>Joseph Percivall >> >>linkedin.com/in/Percivall >> >>e: joeperciv...@yahoo.com >> >> >> >> >> >> >> >> >> >> >> >>On Tuesday, January 12, 2016 9:46 AM, sudeep mishra < >> sudeepshekh...@gmail.com> wrote: >> >> >> >> >> >> >> >>Thanks Matt. >> >> >> >> >> >>In my data flow I am expected to perform certain validations on data. I >> am loading some SQLServer data into HDFSusing Sqoop (not part of NiFi >> flow). For each record in HDFS file I have to query another database and >> then save the validated record again in HDFS which will be processed bysome >> Spark jobs. >> >> >> >> >> >>Since I have to query for each record thus I was planning to cache the >> database records against which I have to validate the HDFS. Thus I was >> evaluating the DistributedCacheServer. But looks like its purpose is >> different. Alternatively can we integrate Redis or another distributed >> cache with NiFi as I do not see any processor for it. >> >> >> >> >> >>Appreciate your help. >> >> >> >> >> >>Thanks & Regards, >> >> >>
Re: PutDistributedMapCache
Thank you very much Joe. Can you please let me know how I can use the .patch file? I am using the NiFi via the binaries... Do I need to setup the source code and build the same along with the patch? Thanks & Regards, Sudeep On Wed, Jan 13, 2016 at 9:02 PM, Joe Percivall <joeperciv...@yahoo.com> wrote: > Hello Sudeep, > > I put up a patch on the GetDistributedMapCache ticket[1]. Let me know what > you think. > > The PutDistributedMapCache processor and GetDistributedMapCache work with > the data as a byte[] so it should be format agnostic. That being said it > will be up to you to know what is in there in order to use it later. > > [1] https://issues.apache.org/jira/browse/NIFI-1382 > > Joe > - - - - - - > Joseph Percivall > linkedin.com/in/Percivall > e: joeperciv...@yahoo.com > > > > On Tuesday, January 12, 2016 11:34 PM, sudeep mishra < > sudeepshekh...@gmail.com> wrote: > > > > Thanks Joe. > > I do not have specific configuration as of now as I am still exploring > NiFi. Though I think it would be helpful to let user store and retrieve the > cache values in different formats json, avro etc. > > Thanks & Regards, > > Sudeep > > > > > > On Tue, Jan 12, 2016 at 9:15 PM, Joe Percivall <joeperciv...@yahoo.com> > wrote: > > Hello Sudeep, > > > > > >We are currently lacking a "GetDistributedMapCache" processor that > corresponds to the "PutDistributedMapCache". I created a ticket[1] and will > be working on it today. If you have any comments, configuration > suggestions, etc. please let me know or comment on the ticket. > > > > > >[1] https://issues.apache.org/jira/browse/NIFI-1382 > > > >Joe > >- - - - - - > >Joseph Percivall > >linkedin.com/in/Percivall > >e: joeperciv...@yahoo.com > > > > > > > > > > > >On Tuesday, January 12, 2016 9:46 AM, sudeep mishra < > sudeepshekh...@gmail.com> wrote: > > > > > > > >Thanks Matt. > > > > > >In my data flow I am expected to perform certain validations on data. I > am loading some SQLServer data into HDFSusing Sqoop (not part of NiFi > flow). For each record in HDFS file I have to query another database and > then save the validated record again in HDFS which will be processed bysome > Spark jobs. > > > > > >Since I have to query for each record thus I was planning to cache the > database records against which I have to validate the HDFS. Thus I was > evaluating the DistributedCacheServer. But looks like its purpose is > different. Alternatively can we integrate Redis or another distributed > cache with NiFi as I do not see any processor for it. > > > > > >Appreciate your help. > > > > > >Thanks & Regards, > > > > > >Sudeep > > > > > > > > > >On Tue, Jan 12, 2016 at 6:59 PM, Matthew Clarke < > matt.clarke@gmail.com> wrote: > > > >Sudeep, > >> I was a little off on my second scenario. The detectduplicate > processor uses the distributedcache service all on its own.. Files that are > route through it are loaded into the cache if they do not already exist in > the cache. if they do already exist they are routed to duplicate. The > putDistributedCache processor was a community contribution to which there > are no processor that make use of the info that it caches. > >> > >> We should probably build a processor that would make use of the > data that can be loaded by the putDistributeCache processor. Is there a > particular use case you are trying to solve where this would be applicable? > >> > >> > >>Thanks, > >>Matt > >> > >> > >>On Tue, Jan 12, 2016 at 8:11 AM, Matthew Clarke < > matt.clarke@gmail.com> wrote: > >> > >>Sudeep, > >>>The DistributedMapCache is typically used to prevent the > consumption of duplicate data by some of the ingest type processors > (GetHBASE, ListHDFS, and ListSFTP). NiFi uses the service to keep a > listing of what has been consumed so the same files are not consumed > multiple times. The Service can also be used to detect if duplicate data > already exists within a NiFi Instance or cluster. This would be the > scenario where some source is pushing data to your NiFi and perhaps they > push the same data more than once. You want to catch these duplicates so > you can perhaps kick them out of your flow. For this you would use the > PutDistributedCache processor to cache all incoming data and then use the > DetectDuplicate processor to find those duplicat
Re: PutDistributedMapCache
Thanks Joe. I will try out the patch. On Wed, Jan 13, 2016 at 9:31 PM, Joe Percivall <joeperciv...@yahoo.com> wrote: > You would need to clone the nifi source from github and then apply the > patch using git. > > Here is how to clone a repo: > https://help.github.com/articles/cloning-a-repository/ > Along with the nifi repo itself: https://github.com/apache/nifi > > and how to apply a patch: > http://makandracards.com/makandra/2521-git-how-to-create-and-apply-patches > > Let me know if you have any other questions, > Joe > - - - - - - > Joseph Percivall > linkedin.com/in/Percivall > e: joeperciv...@yahoo.com > > > > On Wednesday, January 13, 2016 10:56 AM, sudeep mishra < > sudeepshekh...@gmail.com> wrote: > > > > Thank you very much Joe. > > Can you please let me know how I can use the .patch file? I am using the > NiFi via the binaries... Do I need to setup the source code and build the > same along with the patch? > > Thanks & Regards, > > Sudeep > > > On Wed, Jan 13, 2016 at 9:02 PM, Joe Percivall <joeperciv...@yahoo.com> > wrote: > > Hello Sudeep, > > > >I put up a patch on the GetDistributedMapCache ticket[1]. Let me know > what you think. > > > >The PutDistributedMapCache processor and GetDistributedMapCache work with > the data as a byte[] so it should be format agnostic. That being said it > will be up to you to know what is in there in order to use it later. > > > >[1] https://issues.apache.org/jira/browse/NIFI-1382 > > > >Joe > >- - - - - - > >Joseph Percivall > >linkedin.com/in/Percivall > >e: joeperciv...@yahoo.com > > > > > > > > > >On Tuesday, January 12, 2016 11:34 PM, sudeep mishra < > sudeepshekh...@gmail.com> wrote: > > > > > > > >Thanks Joe. > > > >I do not have specific configuration as of now as I am still exploring > NiFi. Though I think it would be helpful to let user store and retrieve the > cache values in different formats json, avro etc. > > > >Thanks & Regards, > > > >Sudeep > > > > > > > > > > > >On Tue, Jan 12, 2016 at 9:15 PM, Joe Percivall <joeperciv...@yahoo.com> > wrote: > > > >Hello Sudeep, > >> > >> > >>We are currently lacking a "GetDistributedMapCache" processor that > corresponds to the "PutDistributedMapCache". I created a ticket[1] and will > be working on it today. If you have any comments, configuration > suggestions, etc. please let me know or comment on the ticket. > >> > >> > >>[1] https://issues.apache.org/jira/browse/NIFI-1382 > >> > >>Joe > >>- - - - - - > >>Joseph Percivall > >>linkedin.com/in/Percivall > >>e: joeperciv...@yahoo.com > >> > >> > >> > >> > >> > >>On Tuesday, January 12, 2016 9:46 AM, sudeep mishra < > sudeepshekh...@gmail.com> wrote: > >> > >> > >> > >>Thanks Matt. > >> > >> > >>In my data flow I am expected to perform certain validations on data. I > am loading some SQLServer data into HDFSusing Sqoop (not part of NiFi > flow). For each record in HDFS file I have to query another database and > then save the validated record again in HDFS which will be processed bysome > Spark jobs. > >> > >> > >>Since I have to query for each record thus I was planning to cache the > database records against which I have to validate the HDFS. Thus I was > evaluating the DistributedCacheServer. But looks like its purpose is > different. Alternatively can we integrate Redis or another distributed > cache with NiFi as I do not see any processor for it. > >> > >> > >>Appreciate your help. > >> > >> > >>Thanks & Regards, > >> > >> > >>Sudeep > >> > >> > >> > >> > >>On Tue, Jan 12, 2016 at 6:59 PM, Matthew Clarke < > matt.clarke@gmail.com> wrote: > >> > >>Sudeep, > >>> I was a little off on my second scenario. The detectduplicate > processor uses the distributedcache service all on its own.. Files that are > route through it are loaded into the cache if they do not already exist in > the cache. if they do already exist they are routed to duplicate. The > putDistributedCache processor was a community contribution to which there > are no processor that make use of the info that it caches. > >>> > >>> We should probably build a processor that would m
Re: PutDistributedMapCache
Hello Sudeep, I put up a patch on the GetDistributedMapCache ticket[1]. Let me know what you think. The PutDistributedMapCache processor and GetDistributedMapCache work with the data as a byte[] so it should be format agnostic. That being said it will be up to you to know what is in there in order to use it later. [1] https://issues.apache.org/jira/browse/NIFI-1382 Joe - - - - - - Joseph Percivall linkedin.com/in/Percivall e: joeperciv...@yahoo.com On Tuesday, January 12, 2016 11:34 PM, sudeep mishra <sudeepshekh...@gmail.com> wrote: Thanks Joe. I do not have specific configuration as of now as I am still exploring NiFi. Though I think it would be helpful to let user store and retrieve the cache values in different formats json, avro etc. Thanks & Regards, Sudeep On Tue, Jan 12, 2016 at 9:15 PM, Joe Percivall <joeperciv...@yahoo.com> wrote: Hello Sudeep, > > >We are currently lacking a "GetDistributedMapCache" processor that corresponds >to the "PutDistributedMapCache". I created a ticket[1] and will be working on >it today. If you have any comments, configuration suggestions, etc. please let >me know or comment on the ticket. > > >[1] https://issues.apache.org/jira/browse/NIFI-1382 > >Joe >- - - - - - >Joseph Percivall >linkedin.com/in/Percivall >e: joeperciv...@yahoo.com > > > > > >On Tuesday, January 12, 2016 9:46 AM, sudeep mishra <sudeepshekh...@gmail.com> >wrote: > > > >Thanks Matt. > > >In my data flow I am expected to perform certain validations on data. I am >loading some SQLServer data into HDFSusing Sqoop (not part of NiFi flow). For >each record in HDFS file I have to query another database and then save the >validated record again in HDFS which will be processed bysome Spark jobs. > > >Since I have to query for each record thus I was planning to cache the >database records against which I have to validate the HDFS. Thus I was >evaluating the DistributedCacheServer. But looks like its purpose is >different. Alternatively can we integrate Redis or another distributed cache >with NiFi as I do not see any processor for it. > > >Appreciate your help. > > >Thanks & Regards, > > >Sudeep > > > > >On Tue, Jan 12, 2016 at 6:59 PM, Matthew Clarke <matt.clarke@gmail.com> >wrote: > >Sudeep, >> I was a little off on my second scenario. The detectduplicate >> processor uses the distributedcache service all on its own.. Files that are >> route through it are loaded into the cache if they do not already exist in >> the cache. if they do already exist they are routed to duplicate. The >> putDistributedCache processor was a community contribution to which there >> are no processor that make use of the info that it caches. >> >> We should probably build a processor that would make use of the data >> that can be loaded by the putDistributeCache processor. Is there a >> particular use case you are trying to solve where this would be applicable? >> >> >>Thanks, >>Matt >> >> >>On Tue, Jan 12, 2016 at 8:11 AM, Matthew Clarke <matt.clarke@gmail.com> >>wrote: >> >>Sudeep, >>>The DistributedMapCache is typically used to prevent the consumption of >>> duplicate data by some of the ingest type processors (GetHBASE, ListHDFS, >>> and ListSFTP). NiFi uses the service to keep a listing of what has been >>> consumed so the same files are not consumed multiple times. The Service can >>> also be used to detect if duplicate data already exists within a NiFi >>> Instance or cluster. This would be the scenario where some source is >>> pushing data to your NiFi and perhaps they push the same data more than >>> once. You want to catch these duplicates so you can perhaps kick them out >>> of your flow. For this you would use the PutDistributedCache processor to >>> cache all incoming data and then use the DetectDuplicate processor to find >>> those duplicates. >>> >>>Was there a different use case you were looking to solve using the >>> Distributed cache service? >>> >>> >>>Thanks, >>>Matt >>> >>> >>>On Tue, Jan 12, 2016 at 4:36 AM, sudeep mishra <sudeepshekh...@gmail.com> >>>wrote: >>> >>>Hi, >>>> >>>> >>>>I can cache some data to be used in NiFi flow. I can see the processor >>>>PutDistributedMapCache in the documentation which saves key-value pairs in >>>>DistributedMapCache for NiFi but I do not see any processor to red this >>>>data. How can I read data from DistributedMapCache in my data flow? >>>> >>>> >>>> >>>> >>>>Thanks & Regards, >>>> >>>> >>>>Sudeep Shekhar Mishra >>>> >>>> >>> >> > > > >-- > >Thanks & Regards, > > >Sudeep Shekhar Mishra > > >+91-9167519029 >sudeepshekh...@gmail.com > > -- Thanks & Regards, Sudeep Shekhar Mishra +91-9167519029 sudeepshekh...@gmail.com
PutDistributedMapCache
Hi, I can cache some data to be used in NiFi flow. I can see the processor PutDistributedMapCache in the documentation which saves key-value pairs in DistributedMapCache for NiFi but I do not see any processor to red this data. How can I read data from DistributedMapCache in my data flow? Thanks & Regards, Sudeep Shekhar Mishra
Re: PutDistributedMapCache
Sudeep, I was a little off on my second scenario. The detectduplicate processor uses the distributedcache service all on its own.. Files that are route through it are loaded into the cache if they do not already exist in the cache. if they do already exist they are routed to duplicate. The putDistributedCache processor was a community contribution to which there are no processor that make use of the info that it caches. We should probably build a processor that would make use of the data that can be loaded by the putDistributeCache processor. Is there a particular use case you are trying to solve where this would be applicable? Thanks, Matt On Tue, Jan 12, 2016 at 8:11 AM, Matthew Clarke <matt.clarke@gmail.com> wrote: > Sudeep, > The DistributedMapCache is typically used to prevent the consumption > of duplicate data by some of the ingest type processors (GetHBASE, > ListHDFS, and ListSFTP). NiFi uses the service to keep a listing of what > has been consumed so the same files are not consumed multiple times. The > Service can also be used to detect if duplicate data already exists within > a NiFi Instance or cluster. This would be the scenario where some source is > pushing data to your NiFi and perhaps they push the same data more than > once. You want to catch these duplicates so you can perhaps kick them out > of your flow. For this you would use the PutDistributedCache processor to > cache all incoming data and then use the DetectDuplicate processor to find > those duplicates. > > Was there a different use case you were looking to solve using the > Distributed cache service? > > Thanks, > Matt > > On Tue, Jan 12, 2016 at 4:36 AM, sudeep mishra <sudeepshekh...@gmail.com> > wrote: > >> Hi, >> >> I can cache some data to be used in NiFi flow. I can see the >> processor PutDistributedMapCache in the documentation which saves key-value >> pairs in DistributedMapCache for NiFi but I do not see any processor to red >> this data. How can I read data from DistributedMapCache in my data flow? >> >> >> Thanks & Regards, >> >> Sudeep Shekhar Mishra >> >> >
Re: PutDistributedMapCache
Sudeep, The DistributedMapCache is typically used to prevent the consumption of duplicate data by some of the ingest type processors (GetHBASE, ListHDFS, and ListSFTP). NiFi uses the service to keep a listing of what has been consumed so the same files are not consumed multiple times. The Service can also be used to detect if duplicate data already exists within a NiFi Instance or cluster. This would be the scenario where some source is pushing data to your NiFi and perhaps they push the same data more than once. You want to catch these duplicates so you can perhaps kick them out of your flow. For this you would use the PutDistributedCache processor to cache all incoming data and then use the DetectDuplicate processor to find those duplicates. Was there a different use case you were looking to solve using the Distributed cache service? Thanks, Matt On Tue, Jan 12, 2016 at 4:36 AM, sudeep mishra <sudeepshekh...@gmail.com> wrote: > Hi, > > I can cache some data to be used in NiFi flow. I can see the > processor PutDistributedMapCache in the documentation which saves key-value > pairs in DistributedMapCache for NiFi but I do not see any processor to red > this data. How can I read data from DistributedMapCache in my data flow? > > > Thanks & Regards, > > Sudeep Shekhar Mishra > >
Re: PutDistributedMapCache
Thanks Matt. In my data flow I am expected to perform certain validations on data. I am loading some SQLServer data into HDFSusing Sqoop (not part of NiFi flow). For each record in HDFS file I have to query another database and then save the validated record again in HDFS which will be processed bysome Spark jobs. Since I have to query for each record thus I was planning to cache the database records against which I have to validate the HDFS. Thus I was evaluating the DistributedCacheServer. But looks like its purpose is different. Alternatively can we integrate Redis or another distributed cache with NiFi as I do not see any processor for it. Appreciate your help. Thanks & Regards, Sudeep On Tue, Jan 12, 2016 at 6:59 PM, Matthew Clarke <matt.clarke@gmail.com> wrote: > Sudeep, >I was a little off on my second scenario. The detectduplicate > processor uses the distributedcache service all on its own.. Files that are > route through it are loaded into the cache if they do not already exist in > the cache. if they do already exist they are routed to duplicate. The > putDistributedCache processor was a community contribution to which there > are no processor that make use of the info that it caches. > >We should probably build a processor that would make use of the > data that can be loaded by the putDistributeCache processor. Is there a > particular use case you are trying to solve where this would be applicable? > > Thanks, > Matt > > On Tue, Jan 12, 2016 at 8:11 AM, Matthew Clarke <matt.clarke@gmail.com > > wrote: > >> Sudeep, >> The DistributedMapCache is typically used to prevent the consumption >> of duplicate data by some of the ingest type processors (GetHBASE, >> ListHDFS, and ListSFTP). NiFi uses the service to keep a listing of what >> has been consumed so the same files are not consumed multiple times. The >> Service can also be used to detect if duplicate data already exists within >> a NiFi Instance or cluster. This would be the scenario where some source is >> pushing data to your NiFi and perhaps they push the same data more than >> once. You want to catch these duplicates so you can perhaps kick them out >> of your flow. For this you would use the PutDistributedCache processor to >> cache all incoming data and then use the DetectDuplicate processor to find >> those duplicates. >> >> Was there a different use case you were looking to solve using the >> Distributed cache service? >> >> Thanks, >> Matt >> >> On Tue, Jan 12, 2016 at 4:36 AM, sudeep mishra <sudeepshekh...@gmail.com> >> wrote: >> >>> Hi, >>> >>> I can cache some data to be used in NiFi flow. I can see the >>> processor PutDistributedMapCache in the documentation which saves key-value >>> pairs in DistributedMapCache for NiFi but I do not see any processor to red >>> this data. How can I read data from DistributedMapCache in my data flow? >>> >>> >>> Thanks & Regards, >>> >>> Sudeep Shekhar Mishra >>> >>> >> > -- Thanks & Regards, Sudeep Shekhar Mishra +91-9167519029 sudeepshekh...@gmail.com
Re: PutDistributedMapCache
Hello Sudeep, We are currently lacking a "GetDistributedMapCache" processor that corresponds to the "PutDistributedMapCache". I created a ticket[1] and will be working on it today. If you have any comments, configuration suggestions, etc. please let me know or comment on the ticket. [1] https://issues.apache.org/jira/browse/NIFI-1382 Joe- - - - - - Joseph Percivalllinkedin.com/in/Percivalle: joeperciv...@yahoo.com On Tuesday, January 12, 2016 9:46 AM, sudeep mishra <sudeepshekh...@gmail.com> wrote: Thanks Matt. In my data flow I am expected to perform certain validations on data. I am loading some SQLServer data into HDFSusing Sqoop (not part of NiFi flow). For each record in HDFS file I have to query another database and then save the validated record again in HDFS which will be processed bysome Spark jobs. Since I have to query for each record thus I was planning to cache the database records against which I have to validate the HDFS. Thus I was evaluating the DistributedCacheServer. But looks like its purpose is different. Alternatively can we integrate Redis or another distributed cache with NiFi as I do not see any processor for it. Appreciate your help. Thanks & Regards, Sudeep On Tue, Jan 12, 2016 at 6:59 PM, Matthew Clarke <matt.clarke@gmail.com> wrote: Sudeep, I was a little off on my second scenario. The detectduplicate processor uses the distributedcache service all on its own.. Files that are route through it are loaded into the cache if they do not already exist in the cache. if they do already exist they are routed to duplicate. The putDistributedCache processor was a community contribution to which there are no processor that make use of the info that it caches. We should probably build a processor that would make use of the data that can be loaded by the putDistributeCache processor. Is there a particular use case you are trying to solve where this would be applicable? Thanks,Matt On Tue, Jan 12, 2016 at 8:11 AM, Matthew Clarke <matt.clarke@gmail.com> wrote: Sudeep, The DistributedMapCache is typically used to prevent the consumption of duplicate data by some of the ingest type processors (GetHBASE, ListHDFS, and ListSFTP). NiFi uses the service to keep a listing of what has been consumed so the same files are not consumed multiple times. The Service can also be used to detect if duplicate data already exists within a NiFi Instance or cluster. This would be the scenario where some source is pushing data to your NiFi and perhaps they push the same data more than once. You want to catch these duplicates so you can perhaps kick them out of your flow. For this you would use the PutDistributedCache processor to cache all incoming data and then use the DetectDuplicate processor to find those duplicates. Was there a different use case you were looking to solve using the Distributed cache service? Thanks,Matt On Tue, Jan 12, 2016 at 4:36 AM, sudeep mishra <sudeepshekh...@gmail.com> wrote: Hi, I can cache some data to be used in NiFi flow. I can see the processor PutDistributedMapCache in the documentation which saves key-value pairs in DistributedMapCache for NiFi but I do not see any processor to red this data. How can I read data from DistributedMapCache in my data flow? Thanks & Regards, Sudeep Shekhar Mishra -- Thanks & Regards, Sudeep Shekhar Mishra +91-9167519029sudeepshekh...@gmail.com