Re: Processor logic

2017-03-16 Thread Andy LoPresto
Hi Uwe,

I believe a lot of this is covered in the Developer Guide [1]. Specifically, 
there are discussions of various processor patterns, including Split Content 
[2], and a section on Cohesion and Usability [3], which states:

> In order to avoid these issues, and make Processors more reusable, a 
> Processor should always stick to the principal of "do one thing and do it 
> well." Such a Processor should be broken into two separate Processors: one to 
> convert the data from Format X to Format Y, and another Processor to send 
> data to the remote resource.


I call this the “Unix model” — it is better to join several small, specific 
tools together to accomplish a task than re-invent larger tools every time a 
small modification is required. In general, that would lead me to develop 
processors that operate on the smallest unit of data necessary — a single line, 
element, or record makes sense — unless more context is needed for 
completeness, or the performance is so grossly different that it is inefficient 
to operate on such small quantities.

Finally, with regards to your question about which provenance events to use in 
various scenarios, I agree the documentation is lacking. Luckily Drew Lim did 
some great work improving this documentation. While it has not been released in 
an official version, both the Developer Guide [4] and User Guide [5] have 
received substantial enhancements, describing the complete list of provenance 
event types and their usage/meaning. This work is available on master and will 
be released in 1.2.0.

The project has certainly evolved over a long lifetime, and you are correct 
that different processors have different philosophies. Sometimes that is the 
result of different authors, sometimes it is a legitimate result of the wide 
variety of scenarios that these processors interact with. Improving the user 
experience and documentation is always important, and getting started with and 
maximizing the usefulness of these processors is one of our top priorities.

I would also reference Chesterton’s Fence [6] here — there are definitely 
improvements to be made, I do not disagree. I would also caution against, as I 
have done in the past, making changes to improve a system without understanding 
the way it got to the current state. Once one has a firm grasp on the history, 
then a reasonable plan may be made to improve things. We are always welcoming 
of suggestions to improve the experience for the community.

Hope this helps and I’d love to get your feedback on where else we can be 
better. Thanks.

[1] https://nifi.apache.org/docs/nifi-docs/html/developer-guide.html 

[2] 
https://nifi.apache.org/docs/nifi-docs/html/developer-guide.html#split-content-one-to-many
 

[3] 
https://nifi.apache.org/docs/nifi-docs/html/developer-guide.html#cohesion-and-reusability
 

[4] 
https://github.com/andrewmlim/nifi/blob/bd9eb0ac6009845de9d5a34bd5384ade1945befd/nifi-docs/src/main/asciidoc/developer-guide.adoc#provenance-events
 

[5] 
https://github.com/andrewmlim/nifi/blob/bd9eb0ac6009845de9d5a34bd5384ade1945befd/nifi-docs/src/main/asciidoc/user-guide.adoc#data-provenance
 

[6] https://en.wikipedia.org/wiki/Wikipedia:Chesterton's_fence 


Andy LoPresto
alopre...@apache.org
alopresto.apa...@gmail.com
PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69

> On Mar 16, 2017, at 3:28 PM, Uwe Geercken  wrote:
> 
> Hello,
> 
> I have a little bit of a hard time to design processors correctly. I find it 
> difficult to decide if a processor should e.g. process a single line from a 
> flow file or process also flow files with multiples lines of data (e.g. in 
> the case of CSV files). Another point is the handling of header rows. One 
> other point is data provenenance events: what is the correct event I should 
> use when modifying attributes, content or both?
> 
> Is there a guide which outlines the best practices for such cases? I have the 
> feeling that many of the processors handle these issues quite differently. I 
> think there either should be a sort of standard or otherwise it should be 
> well documented. And although there is very good documentation available for 
> the project, for some of the processors one has to play around quite a bit to 
> get it right because they behave differently or have a different philosophie 
> and one has to understand it 

Re: Kafkaesque Output port

2017-03-16 Thread Andre F de Miranda
Aldrin,



Another point for consideration is the scope of this information.  Core
NiFi flows and those components are very much about data whereas those IPs
may not necessarily be data for consumption, per se, but context that
governs how the data flow is operating.  In this case, there is a different
plane of information exchange and may deserve a discrete logical channel to
make this available.



Not sure what you mean by governing the context but maybe I should explain
myself a bit better: from a NiFi point of view the data (IPs)  are still
data to be consumed, the only difference in this case is that the data is
very small and the final delivery mechanism is a Unix command (instead of a
PutElasticSearch processor for example)

The fact we update firewall ACL from the sensors themselves is relatively
irrelevant to the overall flow.

As an example: The IPs could be sent  from sensors m1 and m2 to core and
then to a completely independent set of sensors mm1-mmN where the same data
is simply saved into disk (instead of running ExecuteScript).

I may be wrong but I was under the impression that from a minifi point of
view, what we do with the data at the end of the data flow is not a matter
of relevance, if so, then at least in this case, this would still count as
"data in movement scenario" that we try to cover.


Would you agree?


Processor logic

2017-03-16 Thread Uwe Geercken
Hello,

I have a little bit of a hard time to design processors correctly. I find it 
difficult to decide if a processor should e.g. process a single line from a 
flow file or process also flow files with multiples lines of data (e.g. in the 
case of CSV files). Another point is the handling of header rows. One other 
point is data provenenance events: what is the correct event I should use when 
modifying attributes, content or both?

Is there a guide which outlines the best practices for such cases? I have the 
feeling that many of the processors handle these issues quite differently. I 
think there either should be a sort of standard or otherwise it should be well 
documented. And although there is very good documentation available for the 
project, for some of the processors one has to play around quite a bit to get 
it right because they behave differently or have a different philosophie and 
one has to understand it first to get it right.

Would appreciate to get some feedback and advice or pointers to documentation.

Uwe


Re: All Partitions have been blacklisted due to failures when attempting to update. If the Write-Ahead Log is able to perform a checkpoint, this issue may resolve itself. Otherwise, manual interventio

2017-03-16 Thread James Wing
Would it be possible to configure EvaluateJsonPath to place the selected
JSON fragment in the flowfile content instead of an attribute?  Or to break
the selection across multiple attributes rather than one big one?

Thanks,

James

On Thu, Mar 16, 2017 at 11:57 AM, srini  wrote:

> Hi James,
>
> Yes, EvaluateJsonPath is creating attributes exceeding 64 KB. What should I
> do to avoid this?
> thanks
> Srini
>
>
>
> --
> View this message in context: http://apache-nifi-developer-
> list.39713.n7.nabble.com/All-Partitions-have-been-
> blacklisted-due-to-failures-when-attempting-to-update-If-
> the-Write-Ahead-Lo-tp15167p15169.html
> Sent from the Apache NiFi Developer List mailing list archive at
> Nabble.com.
>


Re: All Partitions have been blacklisted due to failures when attempting to update. If the Write-Ahead Log is able to perform a checkpoint, this issue may resolve itself. Otherwise, manual interventio

2017-03-16 Thread srini
Hi James,

Yes, EvaluateJsonPath is creating attributes exceeding 64 KB. What should I
do to avoid this?
thanks
Srini



--
View this message in context: 
http://apache-nifi-developer-list.39713.n7.nabble.com/All-Partitions-have-been-blacklisted-due-to-failures-when-attempting-to-update-If-the-Write-Ahead-Lo-tp15167p15169.html
Sent from the Apache NiFi Developer List mailing list archive at Nabble.com.


Re: Need help in JSON genarate automation

2017-03-16 Thread Andy LoPresto
Anshuman,

You can use the GenerateFlowFile processor to generate arbitrary amounts of 
binary or text data at any interval you want. You can copy some static JSON 
template into the processor properties as default content (Lorem ipsum, if you 
will), and then use a follow-on ReplaceText to perform token replacement with 
dynamic values (times, randoms, etc.).


Andy LoPresto
alopre...@apache.org
alopresto.apa...@gmail.com
PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69

> On Mar 16, 2017, at 2:12 AM, Anshuman Ghosh  
> wrote:
> 
> Hello all,
> 
> Trust you are doing great great!
> One quick question, I need some help for simulating a testing pipeline -
> 
> I want to create an automatic pipeline where there should be continuous
> flow of JSON messages/ records that I would publish onto Kafka topic and
> later consume the same for further processing.
> I have found some JSON generator site "http://www.json-generator.com/; but
> that is like some static process.
> Can someone please guide me here.
> 
> 
> Thank you!
> ​
> __
> 
> *Kind Regards,*
> *Anshuman Ghosh*
> *Contact - +49 179 9090964*



signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: All Partitions have been blacklisted due to failures when attempting to update. If the Write-Ahead Log is able to perform a checkpoint, this issue may resolve itself. Otherwise, manual interventio

2017-03-16 Thread James Wing
Srini,

The error message "FlowFile Repository failed to update" matches a known
issue where NiFi has trouble persisting attributes larger than 64 KB (
https://issues.apache.org/jira/browse/NIFI-3389).  Is it possible that your
EvaluateJsonPath is creating attributes exceeding 64 KB?  From your error
file:

o.a.n.p.standard.EvaluateJsonPath
EvaluateJsonPath[id=b71b17e5-1046-115a-5d5c-e4f368141f13] Failed to process
session due to org.apache.nifi.processor.exception.ProcessException:
FlowFile Repository failed to update:
org.apache.nifi.processor.exception.ProcessException: FlowFile Repository
failed to update


Thanks,

James

On Thu, Mar 16, 2017 at 11:40 AM, srini  wrote:

> Hi,
> We have single nifi instance, and this is our production environment.
> Yesterday I increased from 512m to 4096m in bootstrap.conf for both Xms and
> Xmx, and restarted nifi.
> Then I see this error [1]. Then I deleted the folder
> ../nifi-1.1.0/flowfile_repository and restarted the nifi.
> Then everything is fine, and it has been working close 20 hours without any
> issue.
>
> Now all of sudden, I am seeing the same error. What should I do?
>
> [1] All Partitions have been blacklisted due to failures when attempting to
> update. If the Write-Ahead Log is able to perform a checkpoint, this issue
> may resolve itself. Otherwise, manual intervention will be required.
>
> thanks
> Srini error.txt
>  n15167/error.txt>
>
>
>
> --
> View this message in context: http://apache-nifi-developer-
> list.39713.n7.nabble.com/All-Partitions-have-been-
> blacklisted-due-to-failures-when-attempting-to-update-If-
> the-Write-Ahead-Lo-tp15167.html
> Sent from the Apache NiFi Developer List mailing list archive at
> Nabble.com.
>


All Partitions have been blacklisted due to failures when attempting to update. If the Write-Ahead Log is able to perform a checkpoint, this issue may resolve itself. Otherwise, manual intervention wi

2017-03-16 Thread srini
Hi,
We have single nifi instance, and this is our production environment.
Yesterday I increased from 512m to 4096m in bootstrap.conf for both Xms and
Xmx, and restarted nifi. 
Then I see this error [1]. Then I deleted the folder
../nifi-1.1.0/flowfile_repository and restarted the nifi.
Then everything is fine, and it has been working close 20 hours without any
issue.

Now all of sudden, I am seeing the same error. What should I do?

[1] All Partitions have been blacklisted due to failures when attempting to
update. If the Write-Ahead Log is able to perform a checkpoint, this issue
may resolve itself. Otherwise, manual intervention will be required.

thanks
Srini error.txt
  



--
View this message in context: 
http://apache-nifi-developer-list.39713.n7.nabble.com/All-Partitions-have-been-blacklisted-due-to-failures-when-attempting-to-update-If-the-Write-Ahead-Lo-tp15167.html
Sent from the Apache NiFi Developer List mailing list archive at Nabble.com.


Re: [VOTE] Release Apache NiFi nifi-nar-maven-plugin-1.2.0

2017-03-16 Thread James Wing
+1 Release this package as nifi-nar-maven-plugin-1.2.0

Went through the release helper, built NiFi with the new plugin, ran NiFi
to make sure it didn't explode (it didn't).


On Tue, Mar 14, 2017 at 9:21 AM, Bryan Bende  wrote:

> Hello,
>
> I am pleased to be calling this vote for the source release of Apache
> NiFi nifi-nar-maven-plugin-1.2.0.
>
> The source zip, including signatures, digests, etc. can be found at:
> https://repository.apache.org/content/repositories/orgapachenifi-1101
>
> The Git tag is nifi-nar-maven-plugin-1.2.0-RC1
> The Git commit ID is d0c9d46d25a3eb8d3dbeb2783477b1a7c5b2f345
> https://git-wip-us.apache.org/repos/asf?p=nifi-maven.git;a=commit;h=
> d0c9d46d25a3eb8d3dbeb2783477b1a7c5b2f345
>
> Checksums of nifi-nar-maven-plugin-1.2.0-source-release.zip:
> MD5: a20b62075f79bb890c270445097dc337
> SHA1: 68e4739c9a4c4b2c69ff4adab8e1fdb0e7840923
> SHA256: f5d4acbaa38460bcf19e9b33f385aa643798788026875bd034ee837e5d9d45a8
>
> Release artifacts are signed with the following key:
> https://people.apache.org/keys/committer/bbende.asc
>
> KEYS file available here:
> https://dist.apache.org/repos/dist/release/nifi/KEYS
>
> 3 issues were closed/resolved for this release:
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?
> projectId=12316020=12339193
>
> Release note highlights can be found here:
> https://cwiki.apache.org/confluence/display/NIFI/
> Release+Notes#ReleaseNotes-NiFiNARMavenPluginVersion1.2.0
>
> The vote will be open for 72 hours.
> Please download the release candidate and evaluate the necessary items
> including checking hashes, signatures, build from source, and test.
>
> Then please vote:
>
> [ ] +1 Release this package as nifi-nar-maven-plugin-1.2.0
> [ ] +0 no opinion
> [ ] -1 Do not release this package because because...
>


Re: When should MergeContent stop and proceed to next processor?

2017-03-16 Thread Oleg Zhurakousky
Ok, can you please set the “Correlation Identifier” to “fragment.identifier”?
That is what I was trying to explain in the previous email.

Cheers
Oleg

> On Mar 16, 2017, at 11:06 AM, srini  wrote:
> 
> Hi Oleg,
> 
> Here is MergetContent screenshot. My flowfiles don't give any clue about
> what record it belongs to. I have an attribute called recordId which
> distinguishes each record. But I shouldn't add recordId in the flowfiles to
> be merged.
> 
>  
> 
> thanks
> Srini
> 
> 
> 
> --
> View this message in context: 
> http://apache-nifi-developer-list.39713.n7.nabble.com/When-should-MergeContent-stop-and-proceed-to-next-processor-tp15148p15164.html
> Sent from the Apache NiFi Developer List mailing list archive at Nabble.com.
> 



Re: When should MergeContent stop and proceed to next processor?

2017-03-16 Thread srini
Hi Oleg,

Here is MergetContent screenshot. My flowfiles don't give any clue about
what record it belongs to. I have an attribute called recordId which
distinguishes each record. But I shouldn't add recordId in the flowfiles to
be merged.

 

thanks
Srini



--
View this message in context: 
http://apache-nifi-developer-list.39713.n7.nabble.com/When-should-MergeContent-stop-and-proceed-to-next-processor-tp15148p15164.html
Sent from the Apache NiFi Developer List mailing list archive at Nabble.com.


Re: Old Data Provenance. Even though there is new, it is not showing.

2017-03-16 Thread Matt Gilman
Srini,

Looking at the screenshot, it appears that you have a search applied. The
criteria have resulted in more than the maximum supported number of results
(1000). If you want more recent results, try updating the search criteria
to more closely hone in on the desired timeframe.

The sorting that is applied simply sorts the results of the search. This is
meant to allow the user to more easily navigate to the desired events.

Matt

On Fri, Mar 10, 2017 at 11:41 AM, srini  wrote:

> Hi,
>
> The screenshot shows yesterdays data. Even though, there is new it is not
> showing. I tried by clicking refresh icon, but no change.
>
> But I click on 'please refine the search', it is showing ALL latest, but
> that data is not specific to this particular process I am looking for.
>
> https://screencast.com/t/TqHrPxB7Kr
>
> thanks
> Srini
>
>
>
> --
> View this message in context: http://apache-nifi-developer-
> list.39713.n7.nabble.com/Old-Data-Provenance-Even-though-
> there-is-new-it-is-not-showing-tp15107.html
> Sent from the Apache NiFi Developer List mailing list archive at
> Nabble.com.
>


Re: Kafkaesque Output port

2017-03-16 Thread Aldrin Piri
Interesting points.

Certainly agree with the difference between the two classes as well as
where output ports are now.  Whether or not there is an extension of the
output port or a whole new component, the shared references/data set is a
common one.  There are a lot of options out there that provide some form of
the functionality but can often be an issue from a sizing and complexity
perspective, especially from the perspective Andre covered with MiNiFi
instances.

Another point for consideration is the scope of this information.  Core
NiFi flows and those components are very much about data whereas those IPs
may not necessarily be data for consumption, per se, but context that
governs how the data flow is operating.  In this case, there is a different
plane of information exchange and may deserve a discrete logical channel to
make this available.

I think we also need to consider how data transits between the two roles as
a logical follow up.  I receive some data, do some processing/enrichment
and now want to make this available for usage in flow control elsewhere.


On Thu, Mar 16, 2017 at 10:08 AM, Bryan Bende  wrote:

> Just wanted to throw out a couple of other ideas...
>
> I ran into a similar situation and ended up creating a web-service at
> the core (HandleHttpRequest/HandleHttpResponse) where the edge
> instances could poll for the latest instructions [1][2]. This works
> well when theres basically one new piece of information that you want
> to push out, like the latest results of some calculation, but it is
> obviously not a general purpose queue.
>
> I believe Joey Frazee was also working on an approach that involved a
> PutSiteToSite processor [3] that functioned similar to a
> RemoteProcessGroup, except the host name property could be set to
> expression language and the preceding processor could fan-out a flow
> file for each host to push to. In this case you would need to know the
> list of MiNiFi hosts to push to.
>
> [1] https://github.com/bbende/nifi-streaming-examples
> [2] https://www.slideshare.net/BryanBende/integrating-nifi-and-flink/23
> [3] https://github.com/jfrazee/nifi-put-site-to-site-bundle
>
> On Thu, Mar 16, 2017 at 9:29 AM, Simon Lucy  wrote:
> > There are two different classes of queues really and you can't mix them
> > semantically.
> >
> > The pubsub model
> >
> >  * where ordering isn't guaranteed,
> >  * messages may appear at least once but can be duplicates
> >  * messages need to be explicitly deleted or aged
> >  * messages may or may not be persisted
> >
> > The event model
> >
> >  * ordering is necessary and guaranteed
> >  * messages appear only once
> >  * once read a message is discarded from the queue
> >  * messages are probably persisted
> >
> > Kafka can be used for both models but Nifi Output Ports are meant to be
> > Event Queues. What you could do though is have an external message broker
> > connect to the Output Port and distribute that to subscribers. It could
> be
> > Kafka, Rabbit, AMQ AWS's SQS/SNS whatever makes sense in the context.
> >
> > There's no need to modify or extend Nifi then.
> >
> > S
> >
> >
> >
> >> Aldrin Piri 
> >> Thursday, March 16, 2017 1:07 PMvia Postbox
> >>  mlink_campaign=reach>
> >>
> >> Hey Andre,
> >>
> >> Interesting scenario and certainly can understand the need for such
> >> functionality. As a bit of background, my mind traditionally goes to
> >> custom controller services used for referencing datasets typically
> served
> >> up via some service. This means we don't get the Site to Site goodness
> and
> >> may be duplicating effort in terms of transiting information. Aspects of
> >> this need are emerging in some of the initial C2 efforts where we have
> >> this
> >> 1-N dispersion of flows/updates to instances, initial approaches are
> >> certainly reminiscent of the above. I am an advocate for Site to Site
> >> being "the way" data transport is done amongst NiFi/MiNiFi instances and
> >> this is interlaced, in part, with the JIRA to make this an extension
> point
> >> [1]. Perhaps, more simply and to frame in the sense of messaging, we
> have
> >> no way of providing topic semantics between instances and only support
> >> queueing whether that is push/pull. This new component or mode would be
> >> very compelling in conjunction with the introduction of new protocols
> each
> >> with their own operational guarantees/caveats.
> >>
> >> Some of the first thoughts/questions that come to mind are:
> >> * what buffering/age off looks like in context of a connection. In part,
> >> I think we have the framework functionality already in place, but
> requires
> >> a slightly different though process and context.
> >> * management of progress through the "queue", for lack of a better word,
> >> on a given topic and how/where that gets stored/managed. this would be
> the
> >> analog of offsets
> 

Re: Regarding changes in Apache Nifi

2017-03-16 Thread Dave Hirko
We’ve built custom, UI abstractions on top of Nifi using the REST API’s 
exclusively.  The API documentation is very good, and as people have suggested, 
we became very effective at using the browser Developer console to understand 
the REST calls in the native UI so we could better understand how and when 
those API calls are made.  Toggling back and forth between watching the 
requests in the browser development console, and the API documentation was key 
to making us effective at building our own UI’s.  if you go down this path, you 
quickly realize just how many API calls are made to perform very basic actions 
in the native Nifi UI.  


Dave Hirko | d...@b23.io | 571.421.7729



On 3/16/17, 10:19 AM, "Bryan Rosander"  wrote:

Hi Sunil,

Everything that the NiFi UI does is performed via a REST API [1].  You
could write your own front end that utilizes that API to perform operations.

There are also processors that contribute their own UI [2] so you could
potentially go down that road if you wanted.

Thanks,
Bryan

[1] https://nifi.apache.org/docs.html  (search for rest in the bottom left
text box)
[2]

https://github.com/apache/nifi/tree/master/nifi-nar-bundles/nifi-standard-bundle/nifi-jolt-transform-json-ui

On Thu, Mar 16, 2017 at 8:37 AM, Suneel Marthi  wrote:

> Forwarding this to dev@nifi.a.o
>
> If I understood the question here, the ask is for a white-labeled Nifi 
that
> could be customized per deployment !!
>
>
>
> On Thu, Mar 16, 2017 at 8:23 AM, Sunil Neurgaonkar <
> sunil.neurgaon...@techprimelab.com> wrote:
>
> > Hey Suneel,
> >
> > I am a developer @ Techprimelab Software Pvt. Ltd., and I was working on
> > Apache Nifi (https://github.com/apache/nifi)
> >
> > I wanted to know that is there any way we could add custom UI to Apache
> > Nifi. Like which files/folders to makes the changes in for them to
> reflect
> > in the App.
> >
> > It would be great if you give us anything related to this. I look 
forward
> > to hearing from you soon.
> >
> > Thanks and Regards
> >
> > --
> > Sunil Neurgaonkar
> >
> >
>




Re: Regarding changes in Apache Nifi

2017-03-16 Thread Bryan Rosander
Hi Sunil,

Everything that the NiFi UI does is performed via a REST API [1].  You
could write your own front end that utilizes that API to perform operations.

There are also processors that contribute their own UI [2] so you could
potentially go down that road if you wanted.

Thanks,
Bryan

[1] https://nifi.apache.org/docs.html  (search for rest in the bottom left
text box)
[2]
https://github.com/apache/nifi/tree/master/nifi-nar-bundles/nifi-standard-bundle/nifi-jolt-transform-json-ui

On Thu, Mar 16, 2017 at 8:37 AM, Suneel Marthi  wrote:

> Forwarding this to dev@nifi.a.o
>
> If I understood the question here, the ask is for a white-labeled Nifi that
> could be customized per deployment !!
>
>
>
> On Thu, Mar 16, 2017 at 8:23 AM, Sunil Neurgaonkar <
> sunil.neurgaon...@techprimelab.com> wrote:
>
> > Hey Suneel,
> >
> > I am a developer @ Techprimelab Software Pvt. Ltd., and I was working on
> > Apache Nifi (https://github.com/apache/nifi)
> >
> > I wanted to know that is there any way we could add custom UI to Apache
> > Nifi. Like which files/folders to makes the changes in for them to
> reflect
> > in the App.
> >
> > It would be great if you give us anything related to this. I look forward
> > to hearing from you soon.
> >
> > Thanks and Regards
> >
> > --
> > Sunil Neurgaonkar
> >
> >
>


Re: Kafkaesque Output port

2017-03-16 Thread Andre
Simon,

Thank you for your comments... I was aware of the use of Kafka and
alternatives but I think that limitations aside (e. g. dificulty of
transfering files over kafka, broken provenance chain, etc) many would
refrain from using yet another piece of infra to reach the 1-n clients.

This is specially undesirable when you think that many minifi users are
likely to be using S2S to send data into NiFi

Cheers

On 17 Mar 2017 12:29 AM, "Simon Lucy"  wrote:

There are two different classes of queues really and you can't mix them
semantically.

The pubsub model

 * where ordering isn't guaranteed,
 * messages may appear at least once but can be duplicates
 * messages need to be explicitly deleted or aged
 * messages may or may not be persisted

The event model

 * ordering is necessary and guaranteed
 * messages appear only once
 * once read a message is discarded from the queue
 * messages are probably persisted

Kafka can be used for both models but Nifi Output Ports are meant to be
Event Queues. What you could do though is have an external message broker
connect to the Output Port and distribute that to subscribers. It could be
Kafka, Rabbit, AMQ AWS's SQS/SNS whatever makes sense in the context.

There's no need to modify or extend Nifi then.

S



Aldrin Piri 
> Thursday, March 16, 2017 1:07 PMvia Postbox  utm_source=email_medium=sumlink_campaign=reach>
>
> Hey Andre,
>
> Interesting scenario and certainly can understand the need for such
> functionality. As a bit of background, my mind traditionally goes to
> custom controller services used for referencing datasets typically served
> up via some service. This means we don't get the Site to Site goodness and
> may be duplicating effort in terms of transiting information. Aspects of
> this need are emerging in some of the initial C2 efforts where we have this
> 1-N dispersion of flows/updates to instances, initial approaches are
> certainly reminiscent of the above. I am an advocate for Site to Site
> being "the way" data transport is done amongst NiFi/MiNiFi instances and
> this is interlaced, in part, with the JIRA to make this an extension point
> [1]. Perhaps, more simply and to frame in the sense of messaging, we have
> no way of providing topic semantics between instances and only support
> queueing whether that is push/pull. This new component or mode would be
> very compelling in conjunction with the introduction of new protocols each
> with their own operational guarantees/caveats.
>
> Some of the first thoughts/questions that come to mind are:
> * what buffering/age off looks like in context of a connection. In part,
> I think we have the framework functionality already in place, but requires
> a slightly different though process and context.
> * management of progress through the "queue", for lack of a better word,
> on a given topic and how/where that gets stored/managed. this would be the
> analog of offsets
> * is prioritization still a possibility? at first blush, it seems like
> this would no longer make sense and/or be applicable
> * what impact does this have on provenance? seems like it would still map
> correctly; just many child send events for a given piece of data
> * what would the sequence of receiving input port look like? just use run
> schedule we have currently? Presumably this would be used for updates, so
> I schedule it to check every N minutes and get all the updates since then?
> (this could potentially be mitigated with backpressure/expiration
> configuration on the associated connection).
>
> I agree there is a certain need to fulfill that seems applicable to a
> number of situations and finding a way to support this general data
> exchange pattern in a framework level would be most excellent. Look
> forward to discussing and exploring a bit more.
>
> --aldrin
>
> [1] https://issues.apache.org/jira/browse/NIFI-1820
>
>
> Andre 
> Thursday, March 16, 2017 12:27 PMvia Postbox <
> https://www.postbox-inc.com/?utm_source=email_medium=su
> mlink_campaign=reach>
>
> dev,
>
> I recently created a demo environment where two remote MiNiFi instances (m1
> and m2) were sending diverse range of security telemetry (suspicious email
> attachments, syslog streams, individual session honeypot logs, merged
> honeypot session logs, etc) from edge to DC via S2S Input ports
>
> Once some of this data was processed at the hub I then used Output ports to
> send contents back to the spokes, where the minifi instances use the
> flowfiles contents as arguments of OS commands (called via Gooovy
> String.execute().text via ExecuteScript).
>
> The idea being to show how NiFi can be used in basic security orchestration
> (in this case updating m1's firewall tables with malicious IPs observed in
> m2 and vice versa).
>
>
> While crafting the demo I noticed the Output ports operate like queues,
> therefore if one client consumed data from 

Re: Kafkaesque Output port

2017-03-16 Thread Bryan Bende
Just wanted to throw out a couple of other ideas...

I ran into a similar situation and ended up creating a web-service at
the core (HandleHttpRequest/HandleHttpResponse) where the edge
instances could poll for the latest instructions [1][2]. This works
well when theres basically one new piece of information that you want
to push out, like the latest results of some calculation, but it is
obviously not a general purpose queue.

I believe Joey Frazee was also working on an approach that involved a
PutSiteToSite processor [3] that functioned similar to a
RemoteProcessGroup, except the host name property could be set to
expression language and the preceding processor could fan-out a flow
file for each host to push to. In this case you would need to know the
list of MiNiFi hosts to push to.

[1] https://github.com/bbende/nifi-streaming-examples
[2] https://www.slideshare.net/BryanBende/integrating-nifi-and-flink/23
[3] https://github.com/jfrazee/nifi-put-site-to-site-bundle

On Thu, Mar 16, 2017 at 9:29 AM, Simon Lucy  wrote:
> There are two different classes of queues really and you can't mix them
> semantically.
>
> The pubsub model
>
>  * where ordering isn't guaranteed,
>  * messages may appear at least once but can be duplicates
>  * messages need to be explicitly deleted or aged
>  * messages may or may not be persisted
>
> The event model
>
>  * ordering is necessary and guaranteed
>  * messages appear only once
>  * once read a message is discarded from the queue
>  * messages are probably persisted
>
> Kafka can be used for both models but Nifi Output Ports are meant to be
> Event Queues. What you could do though is have an external message broker
> connect to the Output Port and distribute that to subscribers. It could be
> Kafka, Rabbit, AMQ AWS's SQS/SNS whatever makes sense in the context.
>
> There's no need to modify or extend Nifi then.
>
> S
>
>
>
>> Aldrin Piri 
>> Thursday, March 16, 2017 1:07 PMvia Postbox
>> 
>>
>> Hey Andre,
>>
>> Interesting scenario and certainly can understand the need for such
>> functionality. As a bit of background, my mind traditionally goes to
>> custom controller services used for referencing datasets typically served
>> up via some service. This means we don't get the Site to Site goodness and
>> may be duplicating effort in terms of transiting information. Aspects of
>> this need are emerging in some of the initial C2 efforts where we have
>> this
>> 1-N dispersion of flows/updates to instances, initial approaches are
>> certainly reminiscent of the above. I am an advocate for Site to Site
>> being "the way" data transport is done amongst NiFi/MiNiFi instances and
>> this is interlaced, in part, with the JIRA to make this an extension point
>> [1]. Perhaps, more simply and to frame in the sense of messaging, we have
>> no way of providing topic semantics between instances and only support
>> queueing whether that is push/pull. This new component or mode would be
>> very compelling in conjunction with the introduction of new protocols each
>> with their own operational guarantees/caveats.
>>
>> Some of the first thoughts/questions that come to mind are:
>> * what buffering/age off looks like in context of a connection. In part,
>> I think we have the framework functionality already in place, but requires
>> a slightly different though process and context.
>> * management of progress through the "queue", for lack of a better word,
>> on a given topic and how/where that gets stored/managed. this would be the
>> analog of offsets
>> * is prioritization still a possibility? at first blush, it seems like
>> this would no longer make sense and/or be applicable
>> * what impact does this have on provenance? seems like it would still map
>> correctly; just many child send events for a given piece of data
>> * what would the sequence of receiving input port look like? just use run
>> schedule we have currently? Presumably this would be used for updates, so
>> I schedule it to check every N minutes and get all the updates since then?
>> (this could potentially be mitigated with backpressure/expiration
>> configuration on the associated connection).
>>
>> I agree there is a certain need to fulfill that seems applicable to a
>> number of situations and finding a way to support this general data
>> exchange pattern in a framework level would be most excellent. Look
>> forward to discussing and exploring a bit more.
>>
>> --aldrin
>>
>> [1] https://issues.apache.org/jira/browse/NIFI-1820
>>
>>
>> Andre 
>> Thursday, March 16, 2017 12:27 PMvia Postbox
>> 
>> dev,
>>
>> I recently created a demo environment where two remote MiNiFi instances
>> (m1
>> and m2) were sending diverse range of security telemetry (suspicious email
>> attachments, syslog 

Re: Kafkaesque Output port

2017-03-16 Thread Simon Lucy
There are two different classes of queues really and you can't mix them 
semantically.


The pubsub model

 * where ordering isn't guaranteed,
 * messages may appear at least once but can be duplicates
 * messages need to be explicitly deleted or aged
 * messages may or may not be persisted

The event model

 * ordering is necessary and guaranteed
 * messages appear only once
 * once read a message is discarded from the queue
 * messages are probably persisted

Kafka can be used for both models but Nifi Output Ports are meant to be 
Event Queues. What you could do though is have an external message 
broker connect to the Output Port and distribute that to subscribers. It 
could be Kafka, Rabbit, AMQ AWS's SQS/SNS whatever makes sense in the 
context.


There's no need to modify or extend Nifi then.

S




Aldrin Piri 
Thursday, March 16, 2017 1:07 PMvia Postbox 


Hey Andre,

Interesting scenario and certainly can understand the need for such
functionality. As a bit of background, my mind traditionally goes to
custom controller services used for referencing datasets typically served
up via some service. This means we don't get the Site to Site goodness and
may be duplicating effort in terms of transiting information. Aspects of
this need are emerging in some of the initial C2 efforts where we have 
this

1-N dispersion of flows/updates to instances, initial approaches are
certainly reminiscent of the above. I am an advocate for Site to Site
being "the way" data transport is done amongst NiFi/MiNiFi instances and
this is interlaced, in part, with the JIRA to make this an extension point
[1]. Perhaps, more simply and to frame in the sense of messaging, we have
no way of providing topic semantics between instances and only support
queueing whether that is push/pull. This new component or mode would be
very compelling in conjunction with the introduction of new protocols each
with their own operational guarantees/caveats.

Some of the first thoughts/questions that come to mind are:
* what buffering/age off looks like in context of a connection. In part,
I think we have the framework functionality already in place, but requires
a slightly different though process and context.
* management of progress through the "queue", for lack of a better word,
on a given topic and how/where that gets stored/managed. this would be the
analog of offsets
* is prioritization still a possibility? at first blush, it seems like
this would no longer make sense and/or be applicable
* what impact does this have on provenance? seems like it would still map
correctly; just many child send events for a given piece of data
* what would the sequence of receiving input port look like? just use run
schedule we have currently? Presumably this would be used for updates, so
I schedule it to check every N minutes and get all the updates since then?
(this could potentially be mitigated with backpressure/expiration
configuration on the associated connection).

I agree there is a certain need to fulfill that seems applicable to a
number of situations and finding a way to support this general data
exchange pattern in a framework level would be most excellent. Look
forward to discussing and exploring a bit more.

--aldrin

[1] https://issues.apache.org/jira/browse/NIFI-1820


Andre 
Thursday, March 16, 2017 12:27 PMvia Postbox 


dev,

I recently created a demo environment where two remote MiNiFi 
instances (m1

and m2) were sending diverse range of security telemetry (suspicious email
attachments, syslog streams, individual session honeypot logs, merged
honeypot session logs, etc) from edge to DC via S2S Input ports

Once some of this data was processed at the hub I then used Output 
ports to

send contents back to the spokes, where the minifi instances use the
flowfiles contents as arguments of OS commands (called via Gooovy
String.execute().text via ExecuteScript).

The idea being to show how NiFi can be used in basic security 
orchestration

(in this case updating m1's firewall tables with malicious IPs observed in
m2 and vice versa).


While crafting the demo I noticed the Output ports operate like queues,
therefore if one client consumed data from the port, the other was unable
to obtain the same flowfiles.

This is obviously not an issue when using 2 minifi clients (where I can
just create another output port and clone to content) but wouldn't flow
very well with hundred of clients.

I wonder if anyone would have a suggestion of how to achieve a N to 1
Output port like that? And if not, I wonder if we should create one?

Cheers



--
Simon Lucy
Technologist
G30 Consultants Ltd
+44 77 20 29 4658



Re: Kafkaesque Output port

2017-03-16 Thread Aldrin Piri
Hey Andre,

Interesting scenario and certainly can understand the need for such
functionality.  As a bit of background, my mind traditionally goes to
custom controller services used for referencing datasets typically served
up via some service.  This means we don't get the Site to Site goodness and
may be duplicating effort in terms of transiting information.  Aspects of
this need are emerging in some of the initial C2 efforts where we have this
1-N dispersion of flows/updates to instances, initial approaches are
certainly reminiscent of the above.  I am an advocate for Site to Site
being "the way" data transport is done amongst NiFi/MiNiFi instances and
this is interlaced, in part, with the JIRA to make this an extension point
[1].  Perhaps, more simply and to frame in the sense of messaging, we have
no way of providing topic semantics between instances and only support
queueing whether that is push/pull.  This new component or mode would be
very compelling in conjunction with the introduction of new protocols each
with their own operational guarantees/caveats.

Some of the first thoughts/questions that come to mind are:
 * what buffering/age off looks like in context of a connection.  In part,
I think we have the framework functionality already in place, but requires
a slightly different though process and context.
 * management of progress through the "queue", for lack of a better word,
on a given topic and how/where that gets stored/managed.  this would be the
analog of offsets
 * is prioritization still a possibility?  at first blush, it seems like
this would no longer make sense and/or be applicable
 * what impact does this have on provenance?  seems like it would still map
correctly; just many child send events for a given piece of data
 * what would the sequence of receiving input port look like?  just use run
schedule we have currently?  Presumably this would be used for updates, so
I schedule it to check every N minutes and get all the updates since then?
 (this could potentially be mitigated with backpressure/expiration
configuration on the associated connection).

I agree there is a certain need to fulfill that seems applicable to a
number of situations and finding a way to support this general data
exchange pattern in a framework level would be most excellent.  Look
forward to discussing and exploring a bit more.

--aldrin

[1] https://issues.apache.org/jira/browse/NIFI-1820

On Thu, Mar 16, 2017 at 8:27 AM, Andre  wrote:

> dev,
>
> I recently created a demo environment where two remote MiNiFi instances (m1
> and m2) were sending diverse range of security telemetry (suspicious email
> attachments, syslog streams, individual session honeypot logs, merged
> honeypot session logs, etc) from edge to DC via S2S Input ports
>
> Once some of this data was processed at the hub I then used Output ports to
> send contents back to the spokes, where the minifi instances use the
> flowfiles contents as arguments of OS commands (called via Gooovy
> String.execute().text via ExecuteScript).
>
> The idea being to show how NiFi can be used in basic security orchestration
> (in this case updating m1's firewall tables with malicious IPs observed in
> m2 and vice versa).
>
>
> While crafting the demo I noticed the Output ports operate like queues,
> therefore if one client consumed data from the port, the other was unable
> to obtain the same flowfiles.
>
> This is obviously not an issue when using 2 minifi clients (where I can
> just create another output port and clone to content) but wouldn't flow
> very well with hundred of clients.
>
> I wonder if anyone would have a suggestion of how to achieve a N to 1
> Output port like that? And if not, I wonder if we should create one?
>
> Cheers
>


Re: When should MergeContent stop and proceed to next processor?

2017-03-16 Thread Oleg Zhurakousky
Hi

Is there any chance you can share your processor’s configuration? I am curious 
as to what are you using as “Correlation Attribute Name” in the MergeContent 
processor.
Basically this attribute allows to distinguish groups of flow files so, since 
you have SplitJson as an upstream processor feeding MergeContent you can use 
“fragment.identifier” as correlation attribute.
Anyway, please share what you can.

Cheers
Oleg

> On Mar 15, 2017, at 5:55 PM, srini  wrote:
> 
> Hi,
> I have a subflow like this. From SplitJson to MergeContent it is in loop. I
> expect it loops based on the number of splits of that record. How it know
> that splits for that record is over, and it needs to be proceed to next
> processor that is ExtractText? 
> 
> I have 3 records. In my case it is merging 25 (8 + 5 + 12 = 25). It is
> merging all records data into one record. It shouldn't merge all, instead
> after each record it should proceed to next processor.
> 1st record: Merge 8 items
> 2nd record:  Merge 5 items.
> 3rd record: Merge 12 items.
> 
> 
>  
> 
> What changes do you recommend to my flow?
> Here is my MergeContent screenshot.
> 
>  
> 
> 
> 
> 
> --
> View this message in context: 
> http://apache-nifi-developer-list.39713.n7.nabble.com/When-should-MergeContent-stop-and-proceed-to-next-processor-tp15148.html
> Sent from the Apache NiFi Developer List mailing list archive at Nabble.com.
> 



Re: Regarding changes in Apache Nifi

2017-03-16 Thread Suneel Marthi
Forwarding this to dev@nifi.a.o

If I understood the question here, the ask is for a white-labeled Nifi that
could be customized per deployment !!



On Thu, Mar 16, 2017 at 8:23 AM, Sunil Neurgaonkar <
sunil.neurgaon...@techprimelab.com> wrote:

> Hey Suneel,
>
> I am a developer @ Techprimelab Software Pvt. Ltd., and I was working on
> Apache Nifi (https://github.com/apache/nifi)
>
> I wanted to know that is there any way we could add custom UI to Apache
> Nifi. Like which files/folders to makes the changes in for them to reflect
> in the App.
>
> It would be great if you give us anything related to this. I look forward
> to hearing from you soon.
>
> Thanks and Regards
>
> --
> Sunil Neurgaonkar
> m: +91 904 976 3339 <+91%2090497%2063339>
> e: sunil.neurgaon...@techprimelab.com 
>    
> 
>
>


Kafkaesque Output port

2017-03-16 Thread Andre
dev,

I recently created a demo environment where two remote MiNiFi instances (m1
and m2) were sending diverse range of security telemetry (suspicious email
attachments, syslog streams, individual session honeypot logs, merged
honeypot session logs, etc) from edge to DC via S2S Input ports

Once some of this data was processed at the hub I then used Output ports to
send contents back to the spokes, where the minifi instances use the
flowfiles contents as arguments of OS commands (called via Gooovy
String.execute().text via ExecuteScript).

The idea being to show how NiFi can be used in basic security orchestration
(in this case updating m1's firewall tables with malicious IPs observed in
m2 and vice versa).


While crafting the demo I noticed the Output ports operate like queues,
therefore if one client consumed data from the port, the other was unable
to obtain the same flowfiles.

This is obviously not an issue when using 2 minifi clients (where I can
just create another output port and clone to content) but wouldn't flow
very well with hundred of clients.

I wonder if anyone would have a suggestion of how to achieve a N to 1
Output port like that? And if not, I wonder if we should create one?

Cheers


Re: [VOTE] Release Apache NiFi nifi-nar-maven-plugin-1.2.0

2017-03-16 Thread Koji Kawamura
+1 Release this package as nifi-nar-maven-plugin-1.2.0

Verified checksums and git commit id.
Built NiFi using new nar plugin was successful.
Created simple NiFi flow, confirmed it works as expected.

On Thu, Mar 16, 2017 at 3:31 AM, Scott Aslan  wrote:
> Built sample NAR, built NiFi with new NAR and with pr/1585. Ran through
> building a flow with a secured NiFi using versioned components.
>
> +1 Release this package as nifi-nar-maven-plugin-1.2.0
>
> On Wed, Mar 15, 2017 at 10:35 AM, Oleg Zhurakousky <
> ozhurakou...@hortonworks.com> wrote:
>
>> Build successful, built sample NAR all is good
>> +1
>>
>> > On Mar 15, 2017, at 10:25 AM, Matt Burgess  wrote:
>> >
>> > +1 Release this package as nifi-nar-maven-plugin-1.2.0
>> >
>> > Verified checksums, verified and built from commit, built a NAR with
>> > the updated plugin.
>> >
>> > On Tue, Mar 14, 2017 at 12:21 PM, Bryan Bende  wrote:
>> >> Hello,
>> >>
>> >> I am pleased to be calling this vote for the source release of Apache
>> >> NiFi nifi-nar-maven-plugin-1.2.0.
>> >>
>> >> The source zip, including signatures, digests, etc. can be found at:
>> >> https://repository.apache.org/content/repositories/orgapachenifi-1101
>> >>
>> >> The Git tag is nifi-nar-maven-plugin-1.2.0-RC1
>> >> The Git commit ID is d0c9d46d25a3eb8d3dbeb2783477b1a7c5b2f345
>> >> https://git-wip-us.apache.org/repos/asf?p=nifi-maven.git;a=commit;h=
>> d0c9d46d25a3eb8d3dbeb2783477b1a7c5b2f345
>> >>
>> >> Checksums of nifi-nar-maven-plugin-1.2.0-source-release.zip:
>> >> MD5: a20b62075f79bb890c270445097dc337
>> >> SHA1: 68e4739c9a4c4b2c69ff4adab8e1fdb0e7840923
>> >> SHA256: f5d4acbaa38460bcf19e9b33f385aa643798788026875bd034ee837e5d9d
>> 45a8
>> >>
>> >> Release artifacts are signed with the following key:
>> >> https://people.apache.org/keys/committer/bbende.asc
>> >>
>> >> KEYS file available here:
>> >> https://dist.apache.org/repos/dist/release/nifi/KEYS
>> >>
>> >> 3 issues were closed/resolved for this release:
>> >> https://issues.apache.org/jira/secure/ReleaseNote.jspa?
>> projectId=12316020=12339193
>> >>
>> >> Release note highlights can be found here:
>> >> https://cwiki.apache.org/confluence/display/NIFI/
>> Release+Notes#ReleaseNotes-NiFiNARMavenPluginVersion1.2.0
>> >>
>> >> The vote will be open for 72 hours.
>> >> Please download the release candidate and evaluate the necessary items
>> >> including checking hashes, signatures, build from source, and test.
>> >>
>> >> Then please vote:
>> >>
>> >> [ ] +1 Release this package as nifi-nar-maven-plugin-1.2.0
>> >> [ ] +0 no opinion
>> >> [ ] -1 Do not release this package because because...
>> >
>>
>>
>
>
> --
> *Scott Aslan = new WebDeveloper(*
> *{"location": {"city": "Saint Cloud","state": "FL",
> "zip": "34771"},"contact": {"email":
> "scottyas...@gmail.com ","linkedin":
> "http://www.linkedin.com/in/scottyaslan
> "}});*


Need help in JSON genarate automation

2017-03-16 Thread Anshuman Ghosh
Hello all,

Trust you are doing great great!
One quick question, I need some help for simulating a testing pipeline -

I want to create an automatic pipeline where there should be continuous
flow of JSON messages/ records that I would publish onto Kafka topic and
later consume the same for further processing.
I have found some JSON generator site "http://www.json-generator.com/; but
that is like some static process.
Can someone please guide me here.


Thank you!
​
__

*Kind Regards,*
*Anshuman Ghosh*
*Contact - +49 179 9090964*