How to calculate and set fragment.count attribute for MergeRecord
Hi All, Below is my use case: Flow: 1. I have multiple zip files and read it from a folder 2. I use CompressContent processor unzip content -> contains multiple json files 3. Each json file is an array of json object I use split json processor to extract individual json object 4. Each json object contains nested json array, I extract each nested array object and write to a single file using mergeRecord processor MergeRecord with defragment, csvReader, csvRecordSetWriter and schemaRegistry and updating fragment.identifier (using updateAttribute processor prior to mergeRecord) as filename so that all records from single seed file are kept in single file. My question is how to set fragment.count (giving round figure, say 1000 creates multiple files each with 1000 records but the remainder remains in the queue ) Also, how can I get summary stats like number of nested array records exratcted across all json files. Thanks & regards, Hemal
Re: New processors: PublishRedis and SubscribeRedis
You might want to refactor that bundle to reuse the existing Redis infrastructure. Even if it just means dropping the Java code in, it'll save on the build size of the overall NiFi assembly. That's becoming a serious problem with adding new bundles to the core build. Other than that, go ahead and add a Jira ticket and send a PR! On Tue, Jan 1, 2019 at 9:51 AM Букарев Александр wrote: > Hi, > will it be interesting for anybody to get one more Redis processor I'm > developing for internal usage? > The processor (actually 2 of them: PublishRedis and SubscribeRedis) > implements topic and queue patterns on top of Redis. I have some plans to > improve it in future, by the way, I'll be happy to contribute it to Apache > NiFi. All the source code is here > https://github.com/javajefe/nifi-redis-pubsub-bundle >
Proposing NiFi-Fn
Hello, I have not been very active on theNiFi mailing lists, but I have been working with NiFi for several years acrossdozens of companies. I have a great appreciation for NiFi’s value in real-worldscenarios. Its growth over the last few years has been very impressive, and Iwould like to see a further expansion of NiFi’s capabilities. Over the last few months, I have beenworking on a new NiFi run-time to address some of the limitation that I haveseen in the field. Its intent is not to replace the existing NiFi engine, butrather to extend the possible applications. Similar to MiNiFi extendingNiFi to the edge, NiFi-Fn is an alternate run-time that expands NiFi’s reach tocloud scale. Given the similarities, MagNiFi might have been a bettername, but it was already trademarked. Here are some of the limitations thatI have seen in the field. In many cases, there are entirely valid reasons forthis behavior, but this behavior also prevents NiFi from being used for certainuse cases. - NiFi flows do not succeed or fail as a unit. Part of a flow can succeed while the other part fails - For example, ConsumeKafka acks beforedownstream processing even starts. - Given this behavior, data deliveryguarantees require writing all incoming data to local disk in order to handlenode failures. - While this helps to accommodate non-resilient sources (e.g.TCP), it has downsides: - Increases cost significantly as throughput requirements rise(especially in the cloud) - Increases HA complexity, because the state on each node must bedurable - e.g. content repository replicationsimilar to Kafka is a common ask to improve this - Reduces flexibility, because data has to be migrated off of nodesto scale down - NiFi environments must be sized forthe peak expected volumes given the complexity of scaling up and down. - Resources are wasted when use caseshave periods of lower volume (such as overnight or on weekends) - This improved in 1.8, but it isnowhere near as fluid as DistCp or Sqoop (i.e. MapReduce) - Flow-specific error handling isrequired (such as this processor group) - NiFi’s content repository is now the source of truth and the flowcannot be restarted easily. - This is useful for multi-destination flows, because errors can behandled individually, but unnecessary in other cases (e.g. Kafka to Solr). - Job/task oriented data movement usecases do not fit well with NiFi - For example: triggering data movement as part of a scheduler job - Every hour,run a MySQL extract, load it into HDFS using NiFi, run a spark ETL job to loadit into Hive, then run a report and send it to users. - In every other way, NiFi fits this use case. It just needs a joboriented interface/runtime that returns success or fail and allows fortimeouts. - I have seen this “macgyvered” using ListenHTTP and the NiFi RESTAPIs, but it should be a first class runtime option - NiFi does not provide resource controls for multi-tenancy, requiring organizations to have multiple clusters - Granular authorization policies are possible, but there are no resource usage policies such as what YARN and other container engines provide. - The items listed in #1 make this even more challenging to accommodate than it would be otherwise. NiFi-Fn is a library for running NiFiflows as stateless functions. It provides similar delivery guarantees as NiFiwithout the need for on-disk repositories by waiting to confirm receipt ofincoming data until it has been written to the destination. This is similar toStorm’s acking mechanism and Spark’s interface for committing Kafka offsets,except that in nifi-fn, this is completely handled by the framework while stillsupporting all NiFi processors and controller services natively without change.This results in the ability to run NiFi flows as ephemeral, stateless functionsand should be able to rival MirrorMaker, Distcp, and Scoop for performance,efficiency, and scalability while leveraging the vast library of NiFiprocessors and the NiFi UI for building custom flows. By leveraging container engines (e.g.YARN, Kubernetes), long-running NiFi-Fn flows can be deployed that take fulladvantage of the platform’s scale and multi-tenancy features. By leveragingFunction as a Service engines (FaaS) (e.g. AWS Lambda, Apache OpenWhisk), NiFi-Fn flows can be attached to event sources (or just cron) for event-drivendata movement where flows only run when triggered and pricing is measured atthe 100ms granularity. By combining the two, large-scale batch processing couldalso be performed. An additional opportunity is tointegrate NiFi-Fn back into NiFi. This could provide a clean solution for aNiFi jobs interface. A user could select a run-time on a per process group basisto take advantage of the NiFi-Fn efficiency and job-like execution whenappropriate without requiring a contai
Re: Proposing NiFi-Fn
Hi Sam, Thanks for writing all this up. I’m wondering if you are prepared to share the code you referenced below so people can take a look. Do you have a preferred communication mechanism (GitHub issues, direct PRs, etc.?). Once there is more discussion from the community on this, I think (if it moves forward), the standard platform choices would apply. Thanks. Andy LoPresto alopre...@apache.org alopresto.apa...@gmail.com PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4 BACE 3C6E F65B 2F7D EF69 > On Jan 2, 2019, at 5:04 PM, Samuel Hjelmfelt > wrote: > > > Hello, > > I have not been very active on theNiFi mailing lists, but I have been working > with NiFi for several years acrossdozens of companies. I have a great > appreciation for NiFi’s value in real-worldscenarios. Its growth over the > last few years has been very impressive, and Iwould like to see a further > expansion of NiFi’s capabilities. > > > > Over the last few months, I have beenworking on a new NiFi run-time to > address some of the limitation that I haveseen in the field. Its intent is > not to replace the existing NiFi engine, butrather to extend the possible > applications. Similar to MiNiFi extendingNiFi to the edge, NiFi-Fn is an > alternate run-time that expands NiFi’s reach tocloud scale. Given the > similarities, MagNiFi might have been a bettername, but it was already > trademarked. > > > > Here are some of the limitations thatI have seen in the field. In many cases, > there are entirely valid reasons forthis behavior, but this behavior also > prevents NiFi from being used for certainuse cases. > > - NiFi flows do not succeed or fail as a unit. Part of a flow can succeed > while the other part fails > > - For example, ConsumeKafka acks beforedownstream processing even starts. > - Given this behavior, data deliveryguarantees require writing all incoming > data to local disk in order to handlenode failures. > > - While this helps to accommodate non-resilient sources (e.g.TCP), it has > downsides: > > - Increases cost significantly as throughput requirements rise(especially > in the cloud) > - Increases HA complexity, because the state on each node must bedurable > > - e.g. content repository replicationsimilar to Kafka is a common ask to > improve this > > - Reduces flexibility, because data has to be migrated off of nodesto scale > down > > - NiFi environments must be sized forthe peak expected volumes given the > complexity of scaling up and down. > - Resources are wasted when use caseshave periods of lower volume (such as > overnight or on weekends) > - This improved in 1.8, but it isnowhere near as fluid as DistCp or Sqoop > (i.e. MapReduce) > > - Flow-specific error handling isrequired (such as this processor group) > > - NiFi’s content repository is now the source of truth and the flowcannot > be restarted easily. > - This is useful for multi-destination flows, because errors can behandled > individually, but unnecessary in other cases (e.g. Kafka to Solr). > > - Job/task oriented data movement usecases do not fit well with NiFi > > - For example: triggering data movement as part of a scheduler job > > - Every hour,run a MySQL extract, load it into HDFS using NiFi, run a spark > ETL job to loadit into Hive, then run a report and send it to users. > > - In every other way, NiFi fits this use case. It just needs a joboriented > interface/runtime that returns success or fail and allows fortimeouts. > - I have seen this “macgyvered” using ListenHTTP and the NiFi RESTAPIs, but > it should be a first class runtime option > > - NiFi does not provide resource controls for multi-tenancy, requiring > organizations to have multiple clusters > > - Granular authorization policies are possible, but there are no resource > usage policies such as what YARN and other container engines provide. > - The items listed in #1 make this even more challenging to accommodate > than it would be otherwise. > > > NiFi-Fn is a library for running NiFiflows as stateless functions. It > provides similar delivery guarantees as NiFiwithout the need for on-disk > repositories by waiting to confirm receipt ofincoming data until it has been > written to the destination. This is similar toStorm’s acking mechanism and > Spark’s interface for committing Kafka offsets,except that in nifi-fn, this > is completely handled by the framework while stillsupporting all NiFi > processors and controller services natively without change.This results in > the ability to run NiFi flows as ephemeral, stateless functionsand should be > able to rival MirrorMaker, Distcp, and Scoop for performance,efficiency, and > scalability while leveraging the vast library of NiFiprocessors and the NiFi > UI for building custom flows. > > > > > By leveraging container engines (e.g.YARN, Kubernetes), long-running NiFi-Fn > flows can be deployed that take fulladvantage of t
Re: Proposing NiFi-Fn
Hi Andy,I just submitted a JIRA and PR. I also put a pre-built docker image on docker hub. Here are the links: https://issues.apache.org/jira/browse/NIFI-5922https://github.com/apache/nifi/pull/3241 https://hub.docker.com/r/samhjelmfelt/nifi-fn I am open to communication on any platform. Thanks, Sam Hjelmfelt On Wednesday, January 2, 2019, 6:27:02 PM MST, Andy LoPresto wrote: Hi Sam, Thanks for writing all this up. I’m wondering if you are prepared to share the code you referenced below so people can take a look. Do you have a preferred communication mechanism (GitHub issues, direct PRs, etc.?). Once there is more discussion from the community on this, I think (if it moves forward), the standard platform choices would apply. Thanks. Andy LoPresto alopre...@apache.org alopresto.apa...@gmail.com PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4 BACE 3C6E F65B 2F7D EF69 > On Jan 2, 2019, at 5:04 PM, Samuel Hjelmfelt > wrote: > > > Hello, > > I have not been very active on theNiFi mailing lists, but I have been working > with NiFi for several years acrossdozens of companies. I have a great > appreciation for NiFi’s value in real-worldscenarios. Its growth over the > last few years has been very impressive, and Iwould like to see a further > expansion of NiFi’s capabilities. > > > > Over the last few months, I have beenworking on a new NiFi run-time to > address some of the limitation that I haveseen in the field. Its intent is > not to replace the existing NiFi engine, butrather to extend the possible > applications. Similar to MiNiFi extendingNiFi to the edge, NiFi-Fn is an > alternate run-time that expands NiFi’s reach tocloud scale. Given the > similarities, MagNiFi might have been a bettername, but it was already > trademarked. > > > > Here are some of the limitations thatI have seen in the field. In many cases, > there are entirely valid reasons forthis behavior, but this behavior also > prevents NiFi from being used for certainuse cases. > > - NiFi flows do not succeed or fail as a unit. Part of a flow can succeed >while the other part fails > > - For example, ConsumeKafka acks beforedownstream processing even starts. > - Given this behavior, data deliveryguarantees require writing all incoming >data to local disk in order to handlenode failures. > > - While this helps to accommodate non-resilient sources (e.g.TCP), it has >downsides: > > - Increases cost significantly as throughput requirements rise(especially in >the cloud) > - Increases HA complexity, because the state on each node must bedurable > > - e.g. content repository replicationsimilar to Kafka is a common ask to >improve this > > - Reduces flexibility, because data has to be migrated off of nodesto scale >down > > - NiFi environments must be sized forthe peak expected volumes given the >complexity of scaling up and down. > - Resources are wasted when use caseshave periods of lower volume (such as >overnight or on weekends) > - This improved in 1.8, but it isnowhere near as fluid as DistCp or Sqoop >(i.e. MapReduce) > > - Flow-specific error handling isrequired (such as this processor group) > > - NiFi’s content repository is now the source of truth and the flowcannot be >restarted easily. > - This is useful for multi-destination flows, because errors can behandled >individually, but unnecessary in other cases (e.g. Kafka to Solr). > > - Job/task oriented data movement usecases do not fit well with NiFi > > - For example: triggering data movement as part of a scheduler job > > - Every hour,run a MySQL extract, load it into HDFS using NiFi, run a spark >ETL job to loadit into Hive, then run a report and send it to users. > > - In every other way, NiFi fits this use case. It just needs a joboriented >interface/runtime that returns success or fail and allows fortimeouts. > - I have seen this “macgyvered” using ListenHTTP and the NiFi RESTAPIs, but >it should be a first class runtime option > > - NiFi does not provide resource controls for multi-tenancy, requiring >organizations to have multiple clusters > > - Granular authorization policies are possible, but there are no resource >usage policies such as what YARN and other container engines provide. > - The items listed in #1 make this even more challenging to accommodate than >it would be otherwise. > > > NiFi-Fn is a library for running NiFiflows as stateless functions. It > provides similar delivery guarantees as NiFiwithout the need for on-disk > repositories by waiting to confirm receipt ofincoming data until it has been > written to the destination. This is similar toStorm’s acking mechanism and > Spark’s interface for committing Kafka offsets,except that in nifi-fn, this > is completely handled by the framework while stillsupporting all NiFi > processors and controller services natively without change.This results in > the ability to run NiFi flows as ephemeral, stateless funct
[ANNOUNCE] New Apache NiFi Committer Nathan Gough
On behalf of the Apache NiFI PMC, I am very pleased to announce that Nathan has accepted the PMC's invitation to become a committer on the Apache NiFi project. We greatly appreciate all of Nathan's hard work and generous contributions to the project. We look forward to his continued involvement in the project. What stood out for the PMC was Nathan's long history of code contribution especially in the area of security, and his always helpful conduct on the mailing lists, the jiras, reviews, and releases. Thanks Nathan! Welcome and congratulations! - Tony
[ANNOUNCE] New Apache NiFi Committer Ed Berezitsky
On behalf of the Apache NiFI PMC, I am very pleased to announce that Ed has accepted the PMC's invitation to become a committer on the Apache NiFi project. We greatly appreciate all of Ed's hard work and generous contributions to the project. We look forward to his continued involvement in the project. Ed has been contributing code to the project through most of 2018, in areas such as HBase, HDFS, and fixing some long standing bugs. Also, many of you have had the pleasure of interacting with him on the dev and users mailing lists, epitomizing The Apache Way. Welcome and congratulations! Tony
Re: [ANNOUNCE] New Apache NiFi Committer Ed Berezitsky
Congrats and thanks Ed! On Wed, Jan 2, 2019 at 9:36 PM Tony Kurc wrote: > > On behalf of the Apache NiFI PMC, I am very pleased to announce that Ed has > accepted the PMC's invitation to become a committer on the Apache NiFi > project. We greatly appreciate all of Ed's hard work and generous > contributions to the project. We look forward to his continued involvement > in the project. > > Ed has been contributing code to the project through most of 2018, in areas > such as HBase, HDFS, and fixing some long standing bugs. Also, many of you > have had the pleasure of interacting with him on the dev and users mailing > lists, epitomizing The Apache Way. > > Welcome and congratulations! > > Tony
Re: [ANNOUNCE] New Apache NiFi Committer Nathan Gough
Congrats and thanks Nathan! On Wed, Jan 2, 2019 at 9:30 PM Tony Kurc wrote: > > On behalf of the Apache NiFI PMC, I am very pleased to announce that Nathan > has accepted the PMC's invitation to become a committer on the Apache NiFi > project. We greatly appreciate all of Nathan's hard work and generous > contributions to the project. We look forward to his continued involvement > in the project. > > What stood out for the PMC was Nathan's long history of code contribution > especially in the area of security, and his always helpful conduct on the > mailing lists, the jiras, reviews, and releases. Thanks Nathan! > > Welcome and congratulations! > > - Tony
Re: [ANNOUNCE] New Apache NiFi Committer Ed Berezitsky
Congratulations, Ed! - Sivaprasanna On Thu, 3 Jan 2019 at 8:12 AM, Joe Witt wrote: > Congrats and thanks Ed! > > On Wed, Jan 2, 2019 at 9:36 PM Tony Kurc wrote: > > > > On behalf of the Apache NiFI PMC, I am very pleased to announce that Ed > has > > accepted the PMC's invitation to become a committer on the Apache NiFi > > project. We greatly appreciate all of Ed's hard work and generous > > contributions to the project. We look forward to his continued > involvement > > in the project. > > > > Ed has been contributing code to the project through most of 2018, in > areas > > such as HBase, HDFS, and fixing some long standing bugs. Also, many of > you > > have had the pleasure of interacting with him on the dev and users > mailing > > lists, epitomizing The Apache Way. > > > > Welcome and congratulations! > > > > Tony >
Re: [ANNOUNCE] New Apache NiFi Committer Nathan Gough
Congratulations, Nathan! On Thu, 3 Jan 2019 at 8:12 AM, Joe Witt wrote: > Congrats and thanks Nathan! > > On Wed, Jan 2, 2019 at 9:30 PM Tony Kurc wrote: > > > > On behalf of the Apache NiFI PMC, I am very pleased to announce that > Nathan > > has accepted the PMC's invitation to become a committer on the Apache > NiFi > > project. We greatly appreciate all of Nathan's hard work and generous > > contributions to the project. We look forward to his continued > involvement > > in the project. > > > > What stood out for the PMC was Nathan's long history of code contribution > > especially in the area of security, and his always helpful conduct on the > > mailing lists, the jiras, reviews, and releases. Thanks Nathan! > > > > Welcome and congratulations! > > > > - Tony >
Re: [ANNOUNCE] New Apache NiFi Committer Nathan Gough
Congrats, Nathan! Thanks for all the contributions to the project! On 1/2/19, 22:33, "Sivaprasanna" wrote: Congratulations, Nathan! On Thu, 3 Jan 2019 at 8:12 AM, Joe Witt wrote: > Congrats and thanks Nathan! > > On Wed, Jan 2, 2019 at 9:30 PM Tony Kurc wrote: > > > > On behalf of the Apache NiFI PMC, I am very pleased to announce that > Nathan > > has accepted the PMC's invitation to become a committer on the Apache > NiFi > > project. We greatly appreciate all of Nathan's hard work and generous > > contributions to the project. We look forward to his continued > involvement > > in the project. > > > > What stood out for the PMC was Nathan's long history of code contribution > > especially in the area of security, and his always helpful conduct on the > > mailing lists, the jiras, reviews, and releases. Thanks Nathan! > > > > Welcome and congratulations! > > > > - Tony >
Re: [ANNOUNCE] New Apache NiFi Committer Ed Berezitsky
Congrats, Ed! Thanks for all the hard work! On 1/2/19, 22:32, "Sivaprasanna" wrote: Congratulations, Ed! - Sivaprasanna On Thu, 3 Jan 2019 at 8:12 AM, Joe Witt wrote: > Congrats and thanks Ed! > > On Wed, Jan 2, 2019 at 9:36 PM Tony Kurc wrote: > > > > On behalf of the Apache NiFI PMC, I am very pleased to announce that Ed > has > > accepted the PMC's invitation to become a committer on the Apache NiFi > > project. We greatly appreciate all of Ed's hard work and generous > > contributions to the project. We look forward to his continued > involvement > > in the project. > > > > Ed has been contributing code to the project through most of 2018, in > areas > > such as HBase, HDFS, and fixing some long standing bugs. Also, many of > you > > have had the pleasure of interacting with him on the dev and users > mailing > > lists, epitomizing The Apache Way. > > > > Welcome and congratulations! > > > > Tony >