date:20190102

How to calculate and set fragment.count attribute for MergeRecord

2019-01-02 Thread Hemal Padhiar

Hi All,

Below is my use case:
Flow:
1. I have multiple zip files and read it from a folder
2. I use CompressContent processor unzip content -> contains multiple json
files
3. Each json file is an array of json object I use split json processor to
extract individual json object
4. Each json object contains nested json array, I extract each nested array
object and write to a single file using mergeRecord processor

MergeRecord with defragment, csvReader, csvRecordSetWriter and
schemaRegistry and updating fragment.identifier (using updateAttribute
processor prior to mergeRecord) as filename so that all records from single
seed file are kept in single file. My question is how to set fragment.count
(giving round figure, say 1000 creates multiple files each with 1000
records but the remainder remains in the queue )

Also, how can I get summary stats like number of nested array records
exratcted across all json files.

Thanks & regards,
Hemal

Re: New processors: PublishRedis and SubscribeRedis

2019-01-02 Thread Mike Thomsen

You might want to refactor that bundle to reuse the existing Redis
infrastructure. Even if it just means dropping the Java code in, it'll save
on the build size of the overall NiFi assembly. That's becoming a serious
problem with adding new bundles to the core build. Other than that, go
ahead and add a Jira ticket and send a PR!

On Tue, Jan 1, 2019 at 9:51 AM Букарев Александр  wrote:

> Hi,
> will it be interesting for anybody to get one more Redis processor I'm
> developing for internal usage?
> The processor (actually 2 of them: PublishRedis and SubscribeRedis)
> implements topic and queue patterns on top of Redis. I have some plans to
> improve it in future, by the way, I'll be happy to contribute it to Apache
> NiFi. All the source code is here
> https://github.com/javajefe/nifi-redis-pubsub-bundle
>

Proposing NiFi-Fn

2019-01-02 Thread Samuel Hjelmfelt

 
Hello,

I have not been very active on theNiFi mailing lists, but I have been working 
with NiFi for several years acrossdozens of companies. I have a great 
appreciation for NiFi’s value in real-worldscenarios. Its growth over the last 
few years has been very impressive, and Iwould like to see a further expansion 
of NiFi’s capabilities.

 

Over the last few months, I have beenworking on a new NiFi run-time to address 
some of the limitation that I haveseen in the field. Its intent is not to 
replace the existing NiFi engine, butrather to extend the possible 
applications. Similar to MiNiFi extendingNiFi to the edge, NiFi-Fn is an 
alternate run-time that expands NiFi’s reach tocloud scale. Given the 
similarities, MagNiFi might have been a bettername, but it was already 
trademarked.

 

Here are some of the limitations thatI have seen in the field. In many cases, 
there are entirely valid reasons forthis behavior, but this behavior also 
prevents NiFi from being used for certainuse cases.
   
   - NiFi flows do not succeed or fail as a unit. Part of a flow can succeed 
while the other part fails
   
   - For example, ConsumeKafka acks beforedownstream processing even starts.
   - Given this behavior, data deliveryguarantees require writing all incoming 
data to local disk in order to handlenode failures.

   - While this helps to accommodate non-resilient sources (e.g.TCP), it has 
downsides:
   
   - Increases cost significantly as throughput requirements rise(especially in 
the cloud)
   - Increases HA complexity, because the state on each node must bedurable
   
   - e.g. content repository replicationsimilar to Kafka is a common ask to 
improve this
   
   - Reduces flexibility, because data has to be migrated off of nodesto scale 
down
   
   - NiFi environments must be sized forthe peak expected volumes given the 
complexity of scaling up and down.
   - Resources are wasted when use caseshave periods of lower volume (such as 
overnight or on weekends)
   - This improved in 1.8, but it isnowhere near as fluid as DistCp or Sqoop 
(i.e. MapReduce)
   
   - Flow-specific error handling isrequired (such as this processor group)
   
   - NiFi’s content repository is now the source of truth and the flowcannot be 
restarted easily.
   - This is useful for multi-destination flows, because errors can behandled 
individually, but unnecessary in other cases (e.g. Kafka to Solr).
   
   - Job/task oriented data movement usecases do not fit well with NiFi
   
   - For example: triggering data movement as part of a scheduler job
   
   - Every hour,run a MySQL extract, load it into HDFS using NiFi, run a spark 
ETL job to loadit into Hive, then run a report and send it to users.
   
   - In every other way, NiFi fits this use case. It just needs a joboriented 
interface/runtime that returns success or fail and allows fortimeouts.
   - I have seen this “macgyvered” using ListenHTTP and the NiFi RESTAPIs, but 
it should be a first class runtime option
   
   -  NiFi does not provide resource controls for multi-tenancy, requiring 
organizations to have multiple clusters
   
   - Granular authorization policies are possible, but there are no resource 
usage policies such as what YARN and other container engines provide.
   - The items listed in #1 make this even more challenging to accommodate than 
it would be otherwise.   


NiFi-Fn is a library for running NiFiflows as stateless functions. It provides 
similar delivery guarantees as NiFiwithout the need for on-disk repositories by 
waiting to confirm receipt ofincoming data until it has been written to the 
destination. This is similar toStorm’s acking mechanism and Spark’s interface 
for committing Kafka offsets,except that in nifi-fn, this is completely handled 
by the framework while stillsupporting all NiFi processors and controller 
services natively without change.This results in the ability to run NiFi flows 
as ephemeral, stateless functionsand should be able to rival MirrorMaker, 
Distcp, and Scoop for performance,efficiency, and scalability while leveraging 
the vast library of NiFiprocessors and the NiFi UI for building custom flows.




By leveraging container engines (e.g.YARN, Kubernetes), long-running NiFi-Fn 
flows can be deployed that take fulladvantage of the platform’s scale and 
multi-tenancy features. By leveragingFunction as a Service engines (FaaS) (e.g. 
AWS Lambda, Apache OpenWhisk), NiFi-Fn flows can be attached to event sources 
(or just cron) for event-drivendata movement where flows only run when 
triggered and pricing is measured atthe 100ms granularity. By combining the 
two, large-scale batch processing couldalso be performed.




An additional opportunity is tointegrate NiFi-Fn back into NiFi. This could 
provide a clean solution for aNiFi jobs interface. A user could select a 
run-time on a per process group basisto take advantage of the NiFi-Fn 
efficiency and job-like execution whenappropriate without requiring a contai

Re: Proposing NiFi-Fn

2019-01-02 Thread Andy LoPresto

Hi Sam,

Thanks for writing all this up. I’m wondering if you are prepared to share the 
code you referenced below so people can take a look. Do you have a preferred 
communication mechanism (GitHub issues, direct PRs, etc.?). Once there is more 
discussion from the community on this, I think (if it moves forward), the 
standard platform choices would apply. Thanks. 


Andy LoPresto
alopre...@apache.org
alopresto.apa...@gmail.com
PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69

> On Jan 2, 2019, at 5:04 PM, Samuel Hjelmfelt  
> wrote:
> 
> 
> Hello,
> 
> I have not been very active on theNiFi mailing lists, but I have been working 
> with NiFi for several years acrossdozens of companies. I have a great 
> appreciation for NiFi’s value in real-worldscenarios. Its growth over the 
> last few years has been very impressive, and Iwould like to see a further 
> expansion of NiFi’s capabilities.
> 
>  
> 
> Over the last few months, I have beenworking on a new NiFi run-time to 
> address some of the limitation that I haveseen in the field. Its intent is 
> not to replace the existing NiFi engine, butrather to extend the possible 
> applications. Similar to MiNiFi extendingNiFi to the edge, NiFi-Fn is an 
> alternate run-time that expands NiFi’s reach tocloud scale. Given the 
> similarities, MagNiFi might have been a bettername, but it was already 
> trademarked.
> 
>  
> 
> Here are some of the limitations thatI have seen in the field. In many cases, 
> there are entirely valid reasons forthis behavior, but this behavior also 
> prevents NiFi from being used for certainuse cases.
> 
>   - NiFi flows do not succeed or fail as a unit. Part of a flow can succeed 
> while the other part fails
> 
>   - For example, ConsumeKafka acks beforedownstream processing even starts.
>   - Given this behavior, data deliveryguarantees require writing all incoming 
> data to local disk in order to handlenode failures.
> 
>   - While this helps to accommodate non-resilient sources (e.g.TCP), it has 
> downsides:
> 
>   - Increases cost significantly as throughput requirements rise(especially 
> in the cloud)
>   - Increases HA complexity, because the state on each node must bedurable
> 
>   - e.g. content repository replicationsimilar to Kafka is a common ask to 
> improve this
> 
>   - Reduces flexibility, because data has to be migrated off of nodesto scale 
> down
> 
>   - NiFi environments must be sized forthe peak expected volumes given the 
> complexity of scaling up and down.
>   - Resources are wasted when use caseshave periods of lower volume (such as 
> overnight or on weekends)
>   - This improved in 1.8, but it isnowhere near as fluid as DistCp or Sqoop 
> (i.e. MapReduce)
> 
>   - Flow-specific error handling isrequired (such as this processor group)
> 
>   - NiFi’s content repository is now the source of truth and the flowcannot 
> be restarted easily.
>   - This is useful for multi-destination flows, because errors can behandled 
> individually, but unnecessary in other cases (e.g. Kafka to Solr).
> 
>   - Job/task oriented data movement usecases do not fit well with NiFi
> 
>   - For example: triggering data movement as part of a scheduler job
> 
>   - Every hour,run a MySQL extract, load it into HDFS using NiFi, run a spark 
> ETL job to loadit into Hive, then run a report and send it to users.
> 
>   - In every other way, NiFi fits this use case. It just needs a joboriented 
> interface/runtime that returns success or fail and allows fortimeouts.
>   - I have seen this “macgyvered” using ListenHTTP and the NiFi RESTAPIs, but 
> it should be a first class runtime option
> 
>   -  NiFi does not provide resource controls for multi-tenancy, requiring 
> organizations to have multiple clusters
> 
>   - Granular authorization policies are possible, but there are no resource 
> usage policies such as what YARN and other container engines provide.
>   - The items listed in #1 make this even more challenging to accommodate 
> than it would be otherwise.   
> 
> 
> NiFi-Fn is a library for running NiFiflows as stateless functions. It 
> provides similar delivery guarantees as NiFiwithout the need for on-disk 
> repositories by waiting to confirm receipt ofincoming data until it has been 
> written to the destination. This is similar toStorm’s acking mechanism and 
> Spark’s interface for committing Kafka offsets,except that in nifi-fn, this 
> is completely handled by the framework while stillsupporting all NiFi 
> processors and controller services natively without change.This results in 
> the ability to run NiFi flows as ephemeral, stateless functionsand should be 
> able to rival MirrorMaker, Distcp, and Scoop for performance,efficiency, and 
> scalability while leveraging the vast library of NiFiprocessors and the NiFi 
> UI for building custom flows.
> 
> 
> 
> 
> By leveraging container engines (e.g.YARN, Kubernetes), long-running NiFi-Fn 
> flows can be deployed that take fulladvantage of t

Re: Proposing NiFi-Fn

2019-01-02 Thread Samuel Hjelmfelt

Hi Andy,I just submitted a JIRA and PR. I also put a pre-built docker image on 
docker hub. Here are the links:
https://issues.apache.org/jira/browse/NIFI-5922https://github.com/apache/nifi/pull/3241
 https://hub.docker.com/r/samhjelmfelt/nifi-fn
I am open to communication on any platform.
Thanks,
Sam Hjelmfelt
 

On Wednesday, January 2, 2019, 6:27:02 PM MST, Andy LoPresto 
 wrote:  
 
 Hi Sam,

Thanks for writing all this up. I’m wondering if you are prepared to share the 
code you referenced below so people can take a look. Do you have a preferred 
communication mechanism (GitHub issues, direct PRs, etc.?). Once there is more 
discussion from the community on this, I think (if it moves forward), the 
standard platform choices would apply. Thanks. 


Andy LoPresto
alopre...@apache.org
alopresto.apa...@gmail.com
PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69

> On Jan 2, 2019, at 5:04 PM, Samuel Hjelmfelt  
> wrote:
> 
> 
> Hello,
> 
> I have not been very active on theNiFi mailing lists, but I have been working 
> with NiFi for several years acrossdozens of companies. I have a great 
> appreciation for NiFi’s value in real-worldscenarios. Its growth over the 
> last few years has been very impressive, and Iwould like to see a further 
> expansion of NiFi’s capabilities.
> 
>  
> 
> Over the last few months, I have beenworking on a new NiFi run-time to 
> address some of the limitation that I haveseen in the field. Its intent is 
> not to replace the existing NiFi engine, butrather to extend the possible 
> applications. Similar to MiNiFi extendingNiFi to the edge, NiFi-Fn is an 
> alternate run-time that expands NiFi’s reach tocloud scale. Given the 
> similarities, MagNiFi might have been a bettername, but it was already 
> trademarked.
> 
>  
> 
> Here are some of the limitations thatI have seen in the field. In many cases, 
> there are entirely valid reasons forthis behavior, but this behavior also 
> prevents NiFi from being used for certainuse cases.
> 
>  - NiFi flows do not succeed or fail as a unit. Part of a flow can succeed 
>while the other part fails
> 
>  - For example, ConsumeKafka acks beforedownstream processing even starts.
>  - Given this behavior, data deliveryguarantees require writing all incoming 
>data to local disk in order to handlenode failures.    
> 
>  - While this helps to accommodate non-resilient sources (e.g.TCP), it has 
>downsides:
> 
>  - Increases cost significantly as throughput requirements rise(especially in 
>the cloud)
>  - Increases HA complexity, because the state on each node must bedurable
> 
>  - e.g. content repository replicationsimilar to Kafka is a common ask to 
>improve this
> 
>  - Reduces flexibility, because data has to be migrated off of nodesto scale 
>down
> 
>  - NiFi environments must be sized forthe peak expected volumes given the 
>complexity of scaling up and down.
>  - Resources are wasted when use caseshave periods of lower volume (such as 
>overnight or on weekends)
>  - This improved in 1.8, but it isnowhere near as fluid as DistCp or Sqoop 
>(i.e. MapReduce)
> 
>  - Flow-specific error handling isrequired (such as this processor group)
> 
>  - NiFi’s content repository is now the source of truth and the flowcannot be 
>restarted easily.
>  - This is useful for multi-destination flows, because errors can behandled 
>individually, but unnecessary in other cases (e.g. Kafka to Solr).
> 
>  - Job/task oriented data movement usecases do not fit well with NiFi
> 
>  - For example: triggering data movement as part of a scheduler job
> 
>  - Every hour,run a MySQL extract, load it into HDFS using NiFi, run a spark 
>ETL job to loadit into Hive, then run a report and send it to users.
> 
>  - In every other way, NiFi fits this use case. It just needs a joboriented 
>interface/runtime that returns success or fail and allows fortimeouts.
>  - I have seen this “macgyvered” using ListenHTTP and the NiFi RESTAPIs, but 
>it should be a first class runtime option
> 
>  -  NiFi does not provide resource controls for multi-tenancy, requiring 
>organizations to have multiple clusters
> 
>  - Granular authorization policies are possible, but there are no resource 
>usage policies such as what YARN and other container engines provide.
>  - The items listed in #1 make this even more challenging to accommodate than 
>it would be otherwise.  
> 
> 
> NiFi-Fn is a library for running NiFiflows as stateless functions. It 
> provides similar delivery guarantees as NiFiwithout the need for on-disk 
> repositories by waiting to confirm receipt ofincoming data until it has been 
> written to the destination. This is similar toStorm’s acking mechanism and 
> Spark’s interface for committing Kafka offsets,except that in nifi-fn, this 
> is completely handled by the framework while stillsupporting all NiFi 
> processors and controller services natively without change.This results in 
> the ability to run NiFi flows as ephemeral, stateless funct

[ANNOUNCE] New Apache NiFi Committer Nathan Gough

2019-01-02 Thread Tony Kurc

On behalf of the Apache NiFI PMC, I am very pleased to announce that Nathan
has accepted the PMC's invitation to become a committer on the Apache NiFi
project. We greatly appreciate all of Nathan's hard work and generous
contributions to the project. We look forward to his continued involvement
in the project.

What stood out for the PMC was Nathan's long history of code contribution
especially in the area of security, and his always helpful conduct on the
mailing lists, the jiras, reviews, and releases. Thanks Nathan!

Welcome and congratulations!

- Tony

[ANNOUNCE] New Apache NiFi Committer Ed Berezitsky

2019-01-02 Thread Tony Kurc

On behalf of the Apache NiFI PMC, I am very pleased to announce that Ed has
accepted the PMC's invitation to become a committer on the Apache NiFi
project. We greatly appreciate all of Ed's hard work and generous
contributions to the project. We look forward to his continued involvement
in the project.

Ed has been contributing code to the project through most of 2018, in areas
such as HBase, HDFS, and fixing some long standing bugs. Also, many of you
have had the pleasure of interacting with him on the dev and users mailing
lists, epitomizing The Apache Way.

Welcome and congratulations!

Tony

Re: [ANNOUNCE] New Apache NiFi Committer Ed Berezitsky

2019-01-02 Thread Joe Witt

Congrats and thanks Ed!

On Wed, Jan 2, 2019 at 9:36 PM Tony Kurc  wrote:
>
> On behalf of the Apache NiFI PMC, I am very pleased to announce that Ed has
> accepted the PMC's invitation to become a committer on the Apache NiFi
> project. We greatly appreciate all of Ed's hard work and generous
> contributions to the project. We look forward to his continued involvement
> in the project.
>
> Ed has been contributing code to the project through most of 2018, in areas
> such as HBase, HDFS, and fixing some long standing bugs. Also, many of you
> have had the pleasure of interacting with him on the dev and users mailing
> lists, epitomizing The Apache Way.
>
> Welcome and congratulations!
>
> Tony

Re: [ANNOUNCE] New Apache NiFi Committer Nathan Gough

2019-01-02 Thread Joe Witt

Congrats and thanks Nathan!

On Wed, Jan 2, 2019 at 9:30 PM Tony Kurc  wrote:
>
> On behalf of the Apache NiFI PMC, I am very pleased to announce that Nathan
> has accepted the PMC's invitation to become a committer on the Apache NiFi
> project. We greatly appreciate all of Nathan's hard work and generous
> contributions to the project. We look forward to his continued involvement
> in the project.
>
> What stood out for the PMC was Nathan's long history of code contribution
> especially in the area of security, and his always helpful conduct on the
> mailing lists, the jiras, reviews, and releases. Thanks Nathan!
>
> Welcome and congratulations!
>
> - Tony

Re: [ANNOUNCE] New Apache NiFi Committer Ed Berezitsky

2019-01-02 Thread Sivaprasanna

Congratulations, Ed!

-
Sivaprasanna

On Thu, 3 Jan 2019 at 8:12 AM, Joe Witt  wrote:

> Congrats and thanks Ed!
>
> On Wed, Jan 2, 2019 at 9:36 PM Tony Kurc  wrote:
> >
> > On behalf of the Apache NiFI PMC, I am very pleased to announce that Ed
> has
> > accepted the PMC's invitation to become a committer on the Apache NiFi
> > project. We greatly appreciate all of Ed's hard work and generous
> > contributions to the project. We look forward to his continued
> involvement
> > in the project.
> >
> > Ed has been contributing code to the project through most of 2018, in
> areas
> > such as HBase, HDFS, and fixing some long standing bugs. Also, many of
> you
> > have had the pleasure of interacting with him on the dev and users
> mailing
> > lists, epitomizing The Apache Way.
> >
> > Welcome and congratulations!
> >
> > Tony
>

Re: [ANNOUNCE] New Apache NiFi Committer Nathan Gough

2019-01-02 Thread Sivaprasanna

Congratulations, Nathan!

On Thu, 3 Jan 2019 at 8:12 AM, Joe Witt  wrote:

> Congrats and thanks Nathan!
>
> On Wed, Jan 2, 2019 at 9:30 PM Tony Kurc  wrote:
> >
> > On behalf of the Apache NiFI PMC, I am very pleased to announce that
> Nathan
> > has accepted the PMC's invitation to become a committer on the Apache
> NiFi
> > project. We greatly appreciate all of Nathan's hard work and generous
> > contributions to the project. We look forward to his continued
> involvement
> > in the project.
> >
> > What stood out for the PMC was Nathan's long history of code contribution
> > especially in the area of security, and his always helpful conduct on the
> > mailing lists, the jiras, reviews, and releases. Thanks Nathan!
> >
> > Welcome and congratulations!
> >
> > - Tony
>

Re: [ANNOUNCE] New Apache NiFi Committer Nathan Gough

2019-01-02 Thread Kevin Doran

Congrats, Nathan! Thanks for all the contributions to the project!

On 1/2/19, 22:33, "Sivaprasanna"  wrote:

Congratulations, Nathan!

On Thu, 3 Jan 2019 at 8:12 AM, Joe Witt  wrote:

> Congrats and thanks Nathan!
>
> On Wed, Jan 2, 2019 at 9:30 PM Tony Kurc  wrote:
> >
> > On behalf of the Apache NiFI PMC, I am very pleased to announce that
> Nathan
> > has accepted the PMC's invitation to become a committer on the Apache
> NiFi
> > project. We greatly appreciate all of Nathan's hard work and generous
> > contributions to the project. We look forward to his continued
> involvement
> > in the project.
> >
> > What stood out for the PMC was Nathan's long history of code 
contribution
> > especially in the area of security, and his always helpful conduct on 
the
> > mailing lists, the jiras, reviews, and releases. Thanks Nathan!
> >
> > Welcome and congratulations!
> >
> > - Tony
>

Re: [ANNOUNCE] New Apache NiFi Committer Ed Berezitsky

2019-01-02 Thread Kevin Doran

Congrats, Ed! Thanks for all the hard work!

On 1/2/19, 22:32, "Sivaprasanna"  wrote:

Congratulations, Ed!

-
Sivaprasanna

On Thu, 3 Jan 2019 at 8:12 AM, Joe Witt  wrote:

> Congrats and thanks Ed!
>
> On Wed, Jan 2, 2019 at 9:36 PM Tony Kurc  wrote:
> >
> > On behalf of the Apache NiFI PMC, I am very pleased to announce that Ed
> has
> > accepted the PMC's invitation to become a committer on the Apache NiFi
> > project. We greatly appreciate all of Ed's hard work and generous
> > contributions to the project. We look forward to his continued
> involvement
> > in the project.
> >
> > Ed has been contributing code to the project through most of 2018, in
> areas
> > such as HBase, HDFS, and fixing some long standing bugs. Also, many of
> you
> > have had the pleasure of interacting with him on the dev and users
> mailing
> > lists, epitomizing The Apache Way.
> >
> > Welcome and congratulations!
> >
> > Tony
>

How to calculate and set fragment.count attribute for MergeRecord

Re: New processors: PublishRedis and SubscribeRedis

Proposing NiFi-Fn

Re: Proposing NiFi-Fn

Re: Proposing NiFi-Fn

[ANNOUNCE] New Apache NiFi Committer Nathan Gough

[ANNOUNCE] New Apache NiFi Committer Ed Berezitsky

Re: [ANNOUNCE] New Apache NiFi Committer Ed Berezitsky

Re: [ANNOUNCE] New Apache NiFi Committer Nathan Gough

Re: [ANNOUNCE] New Apache NiFi Committer Ed Berezitsky

Re: [ANNOUNCE] New Apache NiFi Committer Nathan Gough

Re: [ANNOUNCE] New Apache NiFi Committer Nathan Gough

Re: [ANNOUNCE] New Apache NiFi Committer Ed Berezitsky

13 matches

Site Navigation

Mail list logo

Footer information