Re: [DISCUSS] git branching model

2016-02-13 Thread Matthew Burgess
The release branch approach (at least happy path) is IMHO the way to go. I’d 
just like to address from personal experience those times when commits don’t 
cherry-pick over. Do we put the onus on the developer to push PRs for the 
intended branches? In either case I would think the original patch/PR would go 
against master, then whomever is tasked with getting it back to the appropriate 
branches needs to do the cherry-pick (or whatever is the prudent way to apply 
the change) and get it into “past” branches. At my last place, the onus was on 
the developer to introduce the PR for master and a PR for every previous branch 
he/she would like to get the fix/feature into, and approval was given based on 
branch and community approval. Working backwards in time like this reduces 
regressions and forces a per-PR look at the commit in the appropriate context.

Regards,
Matt



On 2/13/16, 9:26 PM, "Joe Skora"  wrote:

>+1 for a branching model with master as current dev, release branches for
>release development and fixes, and tags for marking release points.
>>
>> ** What will we call our branches?*
>> -> development (current the master)
>> -> v0.x-stable
>> -> v1.x-stable
>> -> v2.x-stable (v0.x-stable is deleted)
>> -> v3.x-stable (v1.x-stable is deleted)
>> Each branch would have multiple tags marking the minor releases.
>
>
>In general I don't care what the dev branch is called, but in my
>experience, Git-life is easier when master is the branch where development
>occurs.
>
>On Sat, Feb 13, 2016 at 8:44 PM, Andre  wrote:
>
>> I am not a git specialist but I can share my view as a user:
>>
>> ** What will master look like while we're doing this?*
>> I've noticed depending on the project a master branch can be a stable
>> branch or a development branch and as long the behaviour of the branches is
>> clearly documented, the approach used is secondary. (except on golang
>> project where rules apply).
>>
>> ** What will we call our branches?*
>>
>> -> development (current the master)
>> -> v0.x-stable
>> -> v1.x-stable
>> -> v2.x-stable (v0.x-stable is deleted)
>> -> v3.x-stable (v1.x-stable is deleted)
>>
>> Each branch would have multiple tags marking the minor releases.
>>
>> Ditching master as a name will clearly state the intent of the branch and
>> allow the user / developer to know that by running on that version you are
>> from cutting edge.
>>
>> Having said that I suspect there are some minor issues about getting a git
>> without master to run on github [1]but given the project uses the ASF bot
>> and github replication it may worth checking if this is possible.
>>
>> Independently of the name master would be cutting edge and things could
>> break.
>>
>> ** Who would integrate patches and PRs into multiple versions?
>> Reviewer? Submitter? Or would this be another ticket?*
>>
>> If it is a new feature (e.g. new listener) it should be up to the submitter
>> to decide if support would be extended to currently stable release or would
>> reside just on the development branch.
>>
>> The key IMHO aren't the features but changes to shared code; as long we
>> prevent changes to existing classes and method signatures I think we would
>> be following the right track.
>>
>> It should be paramount to provide stability to code crafted outside the
>> project (a perfect example being NATS messaging processor that was never
>> merged into the project [2]) without hindering development of the product
>> within minor releases.
>>
>> Regarding bug fixes, I think anyone would be welcome to submit a fix to any
>> of the supported branches.
>>
>> ** What project does this well and could be a model?*
>>
>> I think a good model to look at is the one adopted by rsyslog project.
>>
>> If I'm not mistaken they adopt a release branch model.
>>
>> v7.x is no longer improved but still available for bug fix backport into
>> minor releases (controlled via tags).
>> v8.x stable is there and has tags for each of the minor releases.
>> master is the development tree
>>
>> ** Should we decide to only have one version "supported" at a time to
>> avoid this?*
>>
>> I reckon that nowadays the minimum expected by user base is major - 1 as
>> this prevents the requirement to adopt rolling releases.
>>
>> Also, by supported I mean security fixes and critical issues that may lead
>> to data loss and system crashes. features, nice to haves and other things
>> are up to a number of factors and I may or may not get them backported.
>>
>> Those who have ever dealt with RHEL know that you may ask RH to backport
>> feature blah to "version - 1"... you may ask, but truth is that sometimes
>> you will get it, sometimes you won't.
>>
>> Cheers
>>
>>
>> [1] https://matthew-brett.github.io/pydagogue/gh_delete_master.html
>> [2] https://github.com/mring33621/nats-messaging-for-nifi
>>



Re: [VOTE] Release Apache NiFi 0.5.0 (RC2)

2016-02-11 Thread Matthew Burgess
Ran through the helper (which I have a shell script for :) but still manually 
verified the artifacts) 
Verified the checksums, keys, git tag, removed the m2 repo and did a contrib 
check mvn build, and then ran the resulting build with some scripting processor 
templates.


+1 (non-binding)




On 2/11/16, 8:47 AM, "Tony Kurc"  wrote:

>Hello
>I am pleased to be calling this vote for the source release of Apache NiFi
>nifi-0.5.0.
>
>The source zip, including signatures, digests, etc. can be found at:
>https://repository.apache.org/content/repositories/orgapachenifi-1072
>
>
>The Git tag is nifi-0.5.0-RC2
>The Git commit ID is 9f0433888b9b87ab6c0e031a544cfc56e036083d
>https://git-wip-us.apache.org/repos/asf?p=nifi.git;a=commit;h=9f0433888b9b87ab6c0e031a544cfc56e036083d
>
>Checksums of nifi-0.5.0-source-release.zip:
>MD5: 4546a6af95211d66696c29724a14b1fd
>SHA1: 711559b772885d65f0f1e00107ab913ab071f530
>
>Release artifacts are signed with the following key:
>https://people.apache.org/keys/committer/tkurc.asc
>
>KEYS file available here:
>https://dist.apache.org/repos/dist/dev/nifi/KEYS
>
>112 issues were closed/resolved for this release:
>https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12316020=12334158
>
>Release note highlights can be found here:
>https://cwiki.apache.org/confluence/display/NIFI/Release+Notes#ReleaseNotes-Version0.5.0
>
>Migration guidance can be found here:
>https://cwiki.apache.org/confluence/display/NIFI/Migration+Guidance
>
>The vote will be open for 72 hours.
>Please download the release candidate and evaluate the necessary items
>including checking hashes, signatures, build from source, and test.  The
>please vote:
>
>[ ] +1 Release this package as nifi-0.5.0
>[ ] +0 no opinion
>[ ] -1 Do not release this package because because...



Re: java.lang.UnsatisfiedLinkError in PutHDFS with snappy compression.

2016-02-09 Thread Matthew Burgess
When you say you pointed LD_LIBRARY_PATH to the location of libsnappy.so, do 
you mean just the setting of the “mapreduce.admin.user.env” property in 
mapred-site.xml, or the actual environment variable before starting NiFi?  The 
mapred-site settings won’t be used as PutHDFS does not use MapReduce. If you do 
something like:

export LD_LIBRARY_PATH=/usr/hdp/2.2.0.0-1084/hadoop/lib/native
bin/nifi.sh start

That should let PutHDFS know about the appropriate libraries.




On 2/9/16, 4:38 AM, "shweta"  wrote:

>Hi Jeremy,
>
>Even after copying libsnappy.so to java_home/jre/lib it did not help much. I
>also pointed LD_LIBRARY_PATH to the location of libsnappy.so. Even went to
>the extent of modyfying bootstrap.conf with jvm params 
> -Djava.library.path=//.
>
>But received the same error again. I have configured following properties in 
>Hadoop files as following:-
>
>core-site.xml
>
>
>  io.compression.codecs
>org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.SnappyCodec
>
>
>mapred-site.xml
>
> 
>  mapreduce.map.output.compress
>  true
>
>
>
> mapred.map.output.compress.codec  
> org.apache.hadoop.io.compress.SnappyCodec
>
>
>
>
>  mapreduce.admin.user.env
>  LD_LIBRARY_PATH=/usr/hdp/2.2.0.0-1084/hadoop/lib/native
>
>
>Anything else I'm missing on to get this issue fixed?? 
>
>Thanks,
>Shweta
>
>
>
>--
>View this message in context: 
>http://apache-nifi-developer-list.39713.n7.nabble.com/java-lang-UnsatisfiedLinkError-in-PutHDFS-with-snappy-compression-tp7182p7236.html
>Sent from the Apache NiFi Developer List mailing list archive at Nabble.com.



Re: How to capture more than 40 groups in Extract Text

2016-02-03 Thread Matthew Burgess
Schweta,

The ExecuteScript processor (coming in NiFi 0.5.0) will allow you to do this in 
code without having to build a whole processor bundle. I have a template where 
I do something similar, although I only grab two columns not 135 :)  However it 
seems very possible and likely more efficient than the regex approach.




On 2/3/16, 9:30 AM, "Bryan Bende"  wrote:

>Hi Shweta,
>
>You may want to consider a custom processor at this point.
>The csv-to-json example works ok for smaller csv files, but admittedly is
>not a great solution when there are a lot of columns.
>There has been interest from the community in the past on having a
>ConvertCsvToJson processor, but no one has taken on the task yet [1].
>
>-Bryan
>
>[1] https://issues.apache.org/jira/browse/NIFI-1398
>
>
>On Tue, Feb 2, 2016 at 11:40 PM, shweta  wrote:
>
>> Hi All,
>>
>> I have requirement wherein I need to convert a csv file to JSON. The input
>> csv file has 135 attributes.
>> I referred to nifi example template csv-to-json.xml which uses a
>> combination
>> of replaceText and ExtractText processor.
>> But I think ExtractText has limitation of capturing not more that 40
>> groups.
>> Is there a way around to handle this scenario.
>>
>> Regards,
>> Shweta
>>
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-nifi-developer-list.39713.n7.nabble.com/How-to-capture-more-than-40-groups-in-Extract-Text-tp7115.html
>> Sent from the Apache NiFi Developer List mailing list archive at
>> Nabble.com.
>>



Re: Component documentation improvements

2016-01-18 Thread Matthew Burgess
I’d like to add comments to the wiki, may I have permissions for that?

Thanks in advance,
Matt



On 1/18/16, 6:27 PM, "Joe Witt"  wrote:

>Very much agreed Dan.  I submitted my comments on the wiki.
>
>On Mon, Jan 18, 2016 at 6:06 PM, dan bress  wrote:
>> I just tossed some comments on the wikipage.
>>
>> TL;DR I took a look at generating docs as part of the build months ago.  I
>> think its doable, and someone should pursue it.
>>
>> On Mon, Jan 18, 2016 at 2:38 PM Joe Witt  wrote:
>>
>>> Oleg,
>>>
>>> "Yes, there will be breaking changes. Its not a question of IF, but
>>> rather WHEN."
>>>
>>> I disagree.  It is always a question of IF.
>>>
>>> We have to be extremely judicious in the use of breaking changes and
>>> we owe the user/developer based excellent justification in such cases.
>>>
>>> I will comment on the Wiki page for the substance of this particular
>>> proposal (documentation generation).
>>>
>>> Joe
>>>
>>> On Mon, Jan 18, 2016 at 1:22 PM, Oleg Zhurakousky
>>>  wrote:
>>> > Josh
>>> >
>>> > FWIW, let’s use WIKI comments to maintain a discussion. It will be
>>> simpler in the end to compile a resolution and move on.
>>> >
>>> > Yet, I’ll reply here anyway.
>>> > Yes, there will be breaking changes. Its not a question of IF, but
>>> rather WHEN.
>>> > What we can do is make it less painful by introducing certain changes
>>> gradually with deprecations and clear communication to the community on
>>> what is about to change. Other mechanics could be applied here as well, but
>>> before we get into the mechanics, I’d like to see if there are any more
>>> ideas, concerns etc, so we can have a join resolution as to what is a
>>> sustainable documentation model of the future NiFi, then we can figure out
>>> how to get there.
>>> >
>>> > Cheers
>>> > Oleg
>>> >
>>> >> On Jan 18, 2016, at 1:08 PM, Joshua Davis 
>>> wrote:
>>> >>
>>> >> Oleg,
>>> >>
>>> >> Interesting document, what impact would it have on existing
>>> installations
>>> >> of NIFI?
>>> >>
>>> >> What would be the upgrade path for Custom Processors?
>>> >>
>>> >> Are we breaking compatibility with the previous way of doing
>>> documentation?
>>> >>
>>> >> Why not create a simple content repository that can hold the
>>> documentation
>>> >> information?
>>> >>
>>> >> Is there a plan for multiple languages?
>>> >>
>>> >> Joshua Davis
>>> >> Senior Consultant
>>> >> Hortonworks Professional Services
>>> >> (407)476-6752
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >> On 1/18/16, 12:09 PM, "Oleg Zhurakousky" 
>>> >> wrote:
>>> >>
>>> >>> Guys
>>> >>>
>>> >>> I¹ve just finished initial draft of the proposal to improve our
>>> component
>>> >>> documentation mechanisms (e.g., Processors, ControllerServices etc).
>>> >>>
>>> https://cwiki.apache.org/confluence/display/NIFI/Component+documentation+i
>>> >>> mprovements
>>> >>> Please give it a read and let¹s get discussion going.
>>> >>>
>>> >>> Cheers
>>> >>> Oleg
>>> >>
>>> >>
>>> >
>>>



Re: Component documentation improvements

2016-01-18 Thread Matthew Burgess
I’m in! Thanks :)




On 1/18/16, 6:54 PM, "Joe Witt" <joe.w...@gmail.com> wrote:

>doing now.  Try again in 30 secs.
>
>On Mon, Jan 18, 2016 at 6:54 PM, Matthew Burgess <mattyb...@gmail.com> wrote:
>> I’d like to add comments to the wiki, may I have permissions for that?
>>
>> Thanks in advance,
>> Matt
>>
>>
>>
>> On 1/18/16, 6:27 PM, "Joe Witt" <joe.w...@gmail.com> wrote:
>>
>>>Very much agreed Dan.  I submitted my comments on the wiki.
>>>
>>>On Mon, Jan 18, 2016 at 6:06 PM, dan bress <danbr...@gmail.com> wrote:
>>>> I just tossed some comments on the wikipage.
>>>>
>>>> TL;DR I took a look at generating docs as part of the build months ago.  I
>>>> think its doable, and someone should pursue it.
>>>>
>>>> On Mon, Jan 18, 2016 at 2:38 PM Joe Witt <joe.w...@gmail.com> wrote:
>>>>
>>>>> Oleg,
>>>>>
>>>>> "Yes, there will be breaking changes. Its not a question of IF, but
>>>>> rather WHEN."
>>>>>
>>>>> I disagree.  It is always a question of IF.
>>>>>
>>>>> We have to be extremely judicious in the use of breaking changes and
>>>>> we owe the user/developer based excellent justification in such cases.
>>>>>
>>>>> I will comment on the Wiki page for the substance of this particular
>>>>> proposal (documentation generation).
>>>>>
>>>>> Joe
>>>>>
>>>>> On Mon, Jan 18, 2016 at 1:22 PM, Oleg Zhurakousky
>>>>> <ozhurakou...@hortonworks.com> wrote:
>>>>> > Josh
>>>>> >
>>>>> > FWIW, let’s use WIKI comments to maintain a discussion. It will be
>>>>> simpler in the end to compile a resolution and move on.
>>>>> >
>>>>> > Yet, I’ll reply here anyway.
>>>>> > Yes, there will be breaking changes. Its not a question of IF, but
>>>>> rather WHEN.
>>>>> > What we can do is make it less painful by introducing certain changes
>>>>> gradually with deprecations and clear communication to the community on
>>>>> what is about to change. Other mechanics could be applied here as well, 
>>>>> but
>>>>> before we get into the mechanics, I’d like to see if there are any more
>>>>> ideas, concerns etc, so we can have a join resolution as to what is a
>>>>> sustainable documentation model of the future NiFi, then we can figure out
>>>>> how to get there.
>>>>> >
>>>>> > Cheers
>>>>> > Oleg
>>>>> >
>>>>> >> On Jan 18, 2016, at 1:08 PM, Joshua Davis <jda...@hortonworks.com>
>>>>> wrote:
>>>>> >>
>>>>> >> Oleg,
>>>>> >>
>>>>> >> Interesting document, what impact would it have on existing
>>>>> installations
>>>>> >> of NIFI?
>>>>> >>
>>>>> >> What would be the upgrade path for Custom Processors?
>>>>> >>
>>>>> >> Are we breaking compatibility with the previous way of doing
>>>>> documentation?
>>>>> >>
>>>>> >> Why not create a simple content repository that can hold the
>>>>> documentation
>>>>> >> information?
>>>>> >>
>>>>> >> Is there a plan for multiple languages?
>>>>> >>
>>>>> >> Joshua Davis
>>>>> >> Senior Consultant
>>>>> >> Hortonworks Professional Services
>>>>> >> (407)476-6752
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >> On 1/18/16, 12:09 PM, "Oleg Zhurakousky" <ozhurakou...@hortonworks.com>
>>>>> >> wrote:
>>>>> >>
>>>>> >>> Guys
>>>>> >>>
>>>>> >>> I¹ve just finished initial draft of the proposal to improve our
>>>>> component
>>>>> >>> documentation mechanisms (e.g., Processors, ControllerServices etc).
>>>>> >>>
>>>>> https://cwiki.apache.org/confluence/display/NIFI/Component+documentation+i
>>>>> >>> mprovements
>>>>> >>> Please give it a read and let¹s get discussion going.
>>>>> >>>
>>>>> >>> Cheers
>>>>> >>> Oleg
>>>>> >>
>>>>> >>
>>>>> >
>>>>>
>>



Re: Syslog Classes

2016-01-05 Thread Matthew Burgess
That’s my bad, and I can’t blame bourbon this time :)

Hadoop has an annotation class for InterfaceStability [1]. It is used to 
annotate interfaces with a “contract” about whether they are likely to change 
or not (example [2]). They use values like Stable, Unstable, and Evolving, 
explained in javadoc [3].  I thought maybe this was the kind of thing you were 
referring to when you mentioned annotating NiFi classes with a sort of contract 
about their potential volatility?

Regards,
Matt

[1] 
https://hadoop.apache.org/docs/r2.7.0/api/org/apache/hadoop/classification/InterfaceStability.html
[2] 
https://hadoop.apache.org/docs/r2.7.0/api/org/apache/hadoop/yarn/api/records/ContainerReport.html
[3] 
https://hadoop.apache.org/docs/r2.7.0/api/src-html/org/apache/hadoop/classification/InterfaceStability.html#line.42





On 1/5/16, 7:43 PM, "Tony Kurc"  wrote:

>I think I parsed your sentence differently than you intended. Was your
>"this" in your opening sentence "what Tony described" or "what Matt is
>going describe"?
>On Jan 5, 2016 7:35 PM, "Matt Burgess"  wrote:
>
>> Roger that. This is what Hadoop does, for an API method (class, etc.) in
>> Java it is annotated as @Stable or @Unstable. I was just referring to the
>> semantics of when you might expect an @Unstable method to change, for
>> example. Or am I still misunderstanding what you mean?
>>
>> Regards,
>> Matt
>>
>> Sent from my iPhone
>>
>> > On Jan 5, 2016, at 7:29 PM, Tony Kurc  wrote:
>> >
>> > Matt,
>> > What I'm talking about is annotating individual fields,  methods, and
>> > classes, giving some contract other than the access modifiers of java.
>>



Re: Processor name and type

2015-12-26 Thread Matthew Burgess
Maybe getIdentifier()? If the Processor subclasses AbstractProcessor or 
AbstractSessionFactoryProcessor, it also extends AbstractConfigurableComponent 
and the identifier will be set at initialization and available via 
getIdentifier().  I don’t have a debug instance handy so I can’t verify that’s 
what’s returned, but it might be worth a try :)




On 12/26/15, 2:00 PM, "Jagannathrao Mudda"  
wrote:

>Oleg,
>
>The type of the processor is known (which is the class name), however the
>processor name can be different for every instance of the processor and
>would like to know if there is any way I can get the processor name which
>is given while creating the processor from UI.
>
>Thanks a lot
>Mudda
>
>On 12/26/15, 5:37 AM, "Oleg Zhurakousky" 
>wrote:
>
>>Muddy
>>
>>I am not sure I understand the question, since you have all the info
>>about the processor when you implement its onTrigger method.
>>
>>Oleg
>>
>>> On Dec 26, 2015, at 2:59 AM, Jagannathrao Mudda
>>> wrote:
>>>
>>> Hi,
>>>
>>> How do I get processor name and the type in onTrigger method? Please
>>>let me know.
>>>
>>> I really appreciate your help.
>>>
>>> Thanks
>>> Mudda
>>>
>>> 
>>> The information contained in this transmission may contain privileged
>>>and confidential information. It is intended only for the use of the
>>>person(s) named above. If you are not the intended recipient, you are
>>>hereby notified that any review, dissemination, distribution or
>>>duplication of this communication is strictly prohibited. If you are not
>>>the intended recipient, please contact the sender by reply email and
>>>destroy all copies of the original message.
>>> 
>>
>
>
> The information contained in this transmission may contain privileged and 
> confidential information. It is intended only for the use of the person(s) 
> named above. If you are not the intended recipient, you are hereby notified 
> that any review, dissemination, distribution or duplication of this 
> communication is strictly prohibited. If you are not the intended recipient, 
> please contact the sender by reply email and destroy all copies of the 
> original message.
>



Re: nfii-integration-tests

2015-12-21 Thread Matthew Burgess
Does it use Failsafe/Surefire to treat them as integration tests vs “regular” 
unit tests?

http://maven.apache.org/surefire/maven-failsafe-plugin/integration-test-mojo.html


This has advantages like -DskipITs which would still run through unit tests but 
would skip potentially long-running integration tests when not desired.



On 12/21/15, 3:22 PM, "Oleg Zhurakousky"  wrote:

>Not on github yet. Basically another module right at the root of nifi, hence 
>it will be subject to global build, test etc.
>Oleg
>
>> On Dec 21, 2015, at 3:12 PM, Tony Kurc  wrote:
>> 
>> I certainly am interested in such a thing. How do you see this fitting in
>> the source tree/ build cycle?
>> On Dec 21, 2015 3:07 PM, "Oleg Zhurakousky" 
>> wrote:
>> 
>>> Guys
>>> 
>>> I’ve created a module called nfii-integration-tests. The goal of this
>>> module is to facilitate unit testing of things that required collaboration
>>> between the modules (e.g., site-to-site) as well as discover potential
>>> improvements that could be made to the code base (e.g., NIFI-1318).
>>> It helps me quite a bit so I was wondering if there is any interest in
>>> adding such module to NiFi?
>>> 
>>> Cheers
>>> Oleg
>



Re: Support for Elastic Search in Future releases

2015-12-18 Thread Matthew Burgess
Shweta,

There is a Jira case for Elasticsearch processors:

https://issues.apache.org/jira/browse/NIFI-1275


I plan to work on these (with other folks in the NiFi community if interested) 
very soon.

Regards,
Matt



On 12/18/15, 2:17 AM, "shweta"  wrote:

>Hi,
>
>I wanted to know if there are any plans to have custom processors supporting 
>Data ingestion/egestion 
>for Elastic search just like there is on for SOLR.
>
>Thanks,
>Shweta
>
>
>
>--
>View this message in context: 
>http://apache-nifi-developer-list.39713.n7.nabble.com/Support-for-Elastic-Search-in-Future-releases-tp5849.html
>Sent from the Apache NiFi Developer List mailing list archive at Nabble.com.



Re: Testing handling of static class methods

2015-12-18 Thread Matthew Burgess
Definitely a topic ripe for debate :) In my view there’s a whole spectrum, with 
one side being the case Oleg describes, where the existing encapsulation is not 
compromised solely for the sake of testing. On the far side is pure 
design-by-contract. For example, the case could be made that the JMS processor 
should not be so tightly coupled to a particular client, and certainly not to a 
class but rather to an interface. Another upside for moving the client call to 
a protected method is not just for testing but so that child classes can 
override, which is not an encapsulation thing but inheritance. That might not 
be useful in this particular case, but if we’re talking OO in general then it 
applies.

Since Bryan has cited precedence for the inner class stuff in NiFi, I tend 
towards that as a consistent approach. Then again, to quote my close friend 
Oscar Wilde ;) "Consistency is the last refuge of the unimaginative" lol

Cheers!
Matt



On 12/18/15, 5:54 PM, "Oleg Zhurakousky"  wrote:

>Personally I am with Joe on this one
>
>Exposing visibility on the method just for testing is very dangerous as it 
>breaks encapsulation. There are different expectations and consideration on 
>things that are made private, protected and public. Yes, all of that is 
>meaningless when one uses reflection, but that’s a whole other discussion. 
>Relaxing visibility implies advertisement that something is up for grabs. And 
>in fact in Joe’s case while his intentions were noble, the fallout could be 
>anything but (see my comments here https://github.com/apache/nifi/pull/145).
>
>Just my opinion
>
>Cheers
>Oleg
>
>On Dec 18, 2015, at 5:42 PM, Joe Skora 
>> wrote:
>
>Wrapping createMessageProducer() in an instance method is a good
>suggestion, but it seems overkill just to enable testing.  Prompted by
>Oleg's suggestion, I got around instance variable visibility with
>Reflection, which is nice because it doesn't require "private" be changed
>to "protected" in the class under test and doesn't require an inner test
>class.  But, that doesn't appear to be possible for static methods, so
>wrapping with class methods may be the only choice.  Hopefully, I've missed
>something.
>
>
>On Fri, Dec 18, 2015 at 3:58 PM, Bryan Bende 
>> wrote:
>
>If you get it into a protected instance method, you can also make an inner
>class in your test, something like TestablePutJMS extends PutJMS, and
>overrides that method to return a mock or whatever you want. That is a
>common pattern in a lot of the processor tests.
>
>On Fri, Dec 18, 2015 at 3:44 PM, Matt Burgess 
>> wrote:
>
>You could move the one static call into an instance method of PutJMS, and
>use Mockito.spy() to get a partial mock of the processor, then use when()
>to override the instance method in the test. Not sure if that's how it's
>done in other places but it's worked for me in the past.
>
>Regards,
>Matt
>
>Sent from my iPhone
>
>On Dec 18, 2015, at 3:20 PM, Joe Skora 
>> wrote:
>
>For unit testing, one problem I've run into is overriding the returns
>from
>static class methods.
>
>For instance, PutJMS contains this code:
>
>try {
>  wrappedProducer = JmsFactory.createMessageProducer(context, true);
>  logger.info("Connected to JMS server {}",
>  new Object[]{context.getProperty(URL).getValue()});
>} catch (final JMSException e) {
>  logger.error("Failed to connect to JMS Server due to {}", new
>Object[]{e});
>  session.transfer(flowFiles, REL_FAILURE);
>  context.yield();
>  return;
>}
>
>where JmsFactory.createmessageProducer call being defined as
>
>public static WrappedMessageProducer createMessageProducer(...
>
>which presents a problem since it can't be easily overridden for a unit
>test.  Exercising the
>
>How you handle this problem?
>
>Regards,
>Joe
>
>
>



Re: How to iterate through complex JSON objects.

2015-12-16 Thread Matthew Burgess
All,

I have submitted a patch for NIFI-210 to offer scripting capabilities, my 
GitHub feature branch is at:

https://github.com/mattyb149/nifi/tree/script-processors


I would truly appreciate any comments, questions, or suggestions about this 
capability.

Regards,
Matt




On 12/16/15, 11:41 AM, "Joe Witt"  wrote:

>It is a fair criticism that sometimes the cohesion level of processors
>can be simply too much.  Early on I used to 'fight' to find the right
>abstraction and argue that others do the same.  But what I've found is
>that it is better to let it happen naturally and to offer options.
>Matt, I think your approach of giving yourself an option to break into
>scripting in the middle of the flow in a way that lets you mangle data
>as needed but benefitting from the strength of the framework is
>perfect.  Matt Burgess is working on NIFI-210 to incorporate those
>languages and many others.
>
>Thanks
>Joe
>
>On Wed, Dec 16, 2015 at 8:27 AM, Angry Duck Studio
> wrote:
>> Shweta,
>>
>> I think your issue demonstrates one of my minor complaints with NiFi --
>> that you always have to think in terms of several little, built-in pieces
>> to get a simple job done. Sometimes it's fun, like a puzzle, but other
>> times, I don't feel like dealing with it. That's why I wrote this:
>> https://github.com/mring33621/scripting-for-nifi. A short, custom JS or
>> Groovy script could have handled your JSON data munging in a single stroke.
>>
>> -Matt
>>
>> On Tue, Dec 15, 2015 at 8:40 PM, shweta  wrote:
>>
>>> Thanks Bryan!! Infact I followed the exact approach that you told. Just
>>> that
>>> I was clueless about using Mergecontent processor. So I wrote my custom
>>> script to combine the different outputs and executed it using Execute
>>> Stream
>>> command.
>>> Will try the same with Mergecontent.
>>>
>>>
>>>
>>> --
>>> View this message in context:
>>> http://apache-nifi-developer-list.39713.n7.nabble.com/How-to-iterate-through-complex-JSON-objects-tp5776p5806.html
>>> Sent from the Apache NiFi Developer List mailing list archive at
>>> Nabble.com.
>>>



Re: [VOTE] Release Apache NiFi 0.4.0 (rc2)

2015-12-08 Thread Matthew Burgess
I ran through Joe’s helper procedure:

Signatures, digests, etc. look good. Source downloaded and built (with tests 
successful) and started successfully. Tried importing, instantiating, and 
running templates. Added, configured, and deleted processors. Looks good to me 
:)

+1 (non-binding)




On 12/8/15, 3:29 PM, "Joe Witt"  wrote:

>Hello NiFi Community,
>
>I am pleased to be calling this vote for the source release of Apache
>NiFi 0.4.0.
>
>The source zip, including signatures, digests, and associated
>convenience binaries can be found at
>  https://dist.apache.org/repos/dist/dev/nifi/nifi-0.4.0/
>
>The staged maven artifacts of the build can be found at
>  https://repository.apache.org/content/repositories/orgapachenifi-1065
>
>The Git tag is nifi-0.4.0-RC2
>The Git commit ID is b66c029090f395c0cbc001fd483e86895b133e46
>  
> https://git-wip-us.apache.org/repos/asf?p=nifi.git;a=commit;h=b66c029090f395c0cbc001fd483e86895b133e46
>
>Checksums of NiFi 0.4.0 Source Release
>MD5: da733f8fdb520a0346dcda59940b2c12
>SHA1: 82fffbc5f8d7e4724bbe2f794bdde39396dae745
>
>Release artifacts are signed with the following key
>  https://people.apache.org/keys/committer/joewitt.asc
>
>KEYS file available here
>  https://dist.apache.org/repos/dist/release/nifi/KEYS
>
>161 issues were closed/resolved for this release
>  
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12316020=12333070
>
>Release note highlights
>  
> https://cwiki.apache.org/confluence/display/NIFI/Release+Notes#ReleaseNotes-Version0.4.0
>
>Migration/Upgrade guidance
>  https://cwiki.apache.org/confluence/display/NIFI/Migration+Guidance
>  https://cwiki.apache.org/confluence/display/NIFI/Upgrading+NiFi
>
>The vote will be open for 72 hours.
>Please download the release candidate and evaluate the necessary items
>including checking hashes, signatures, build from source, and test.
>
>Then please vote:
>
>[ ] +1 Release this package as Apache NiFi 0.4.0
>[ ] +0 no opinion
>[ ] -1 Do not release this package because...



Re: remote command execution via SSH?

2015-12-02 Thread Matthew Burgess
Sumo,

NIFI-210 [1] has been re-opened to enable support for scripting language 
processors, and I believe is intended to supersede the others (NIFI-684 
(Groovy), NIFI-935 (Jython), and NIFI-1167 (Javascript)).

I’ve been doing some work on this, part of it (like yours) is based on Ricky 
Saltzer’s POC (in NIFI-684). Are you interested in collaborating on this?

My feature branch is at [2], I’ve written two processors:

InvokeScriptProcessor:  For JSR-223 script engines that are Invocable (Jython, 
Javascript, Groovy, JRuby, etc.), the user can provide a script that defines an 
implementation of the Processor interface, and assigns an instance of that to 
the “processor” variable. The InvokeScriptProcessor will delegate its own 
Processor methods out to the ones supplied by the script.

ExecuteScript: This is much like your ExecuteGroovy processor, but is extended 
to all registered JSR-223 script engines, which so far includes Jython, 
Javascript, Groovy, JRuby, Lua, and Scala.

I’ve got a template I can provide, and will be making a video demo soon to show 
the config and execution of these processors.  I look forward to your comments 
and possibly working together!

Regards,
Matt

[1] https://issues.apache.org/jira/browse/NIFI-210
[2] https://github.com/mattyb149/nifi/tree/script-processors





On 12/1/15, 12:52 AM, "Sumanth Chinthagunta"  wrote:

>Sure Joe. I will create Jira tickets for those processors . I am also working 
>on to move groovy lib dependency to parent nar level  to keep processor nars 
>sleek.
>Sumo 
>
>Sent from my iPhone
>
>> On Nov 30, 2015, at 7:25 AM, Joe Percivall  
>> wrote:
>> 
>> Hey Sumo,
>> 
>> I don't know much about this use-case but just taking a quick look the 
>> processors in that github repo they seem to be potentially a great addition 
>> to NiFi!
>> 
>> I think you should consider creating a Jira and working this there. It would 
>> a lot easier to get feedback and have a record of it on Jira than just on 
>> the Dev list.
>> 
>> Joe
>> - - - - - - 
>> Joseph Percivall
>> linkedin.com/in/Percivall
>> e: joeperciv...@yahoo.com
>> 
>> 
>> 
>> 
>> On Wednesday, November 25, 2015 2:12 PM, Sumanth Chinthagunta 
>>  wrote:
>> I have first-cut  implementation of ExecuteRemoteProcess processor   at: 
>> 
>> https://github.com/xmlking/nifi-scripting/releases 
>> 
>> 
>> I tried to provide all capabilities offed by groovy-ssh 
>> (https://gradle-ssh-plugin.github.io/docs/ 
>> ) to ExecuteRemoteProcess user.
>> it takes three attributes: 
>> 1. SSH Config DSL (run once on OnScheduled)
>> remotes {
>>web01 {
>>role 'masterNode'
>>host = '192.168.1.5'
>>user = 'sumo'
>>password = ‘fake'
>>knownHosts = allowAnyHosts
>>}
>>web02 {
>>host = '192.168.1.5'
>>user = 'sumo'
>>knownHosts = allowAnyHosts
>>}
>> }
>> 2. Run DSL ( run on each onTrigger)
>> ssh.run {
>>session(ssh.remotes.web01) {
>>  result = execute 'uname -a' 
>>}
>> }
>> 3. User supplied Arguments which will be available in Run DSL 
>> 
>> anything that is assigned to ‘result’ in RunDSL  is passed as flowfile to 
>> success relationship.
>> 
>> Any suggestions for improvements are welcome.
>> 
>> -Sumo
>> 
>> 
>>> On Nov 24, 2015, at 8:19 PM, Adam Taft  wrote:
>>> 
>>> Sumo,
>>> 
>>> On Tue, Nov 24, 2015 at 10:27 PM, Sumanth Chinthagunta 
>>> wrote:
>>> 
 I think you guys may have configured password less login for  SSH (keys?)
 
>>> 
>>> ​Correct.  I'm using SSH key exchange for authentication.  It's usually
>>> done password-less, true, but it doesn't necessarily have to be (if using
>>> ssh-agent).
>>> 
>>> ​
>>> 
>>> 
 In my case the  edge node is managed by different team and they don’t
 allow me to add my SSH key.
 
>>> 
>>> ​Yikes.  Someone should teach them the benefits of ssh keys!  :)​
>>> 
>>> 
>>> 
 I am thinking we need ExecuteRemoteCommand processor (based on
 https://github.com/int128/groovy-ssh) that will take care of key or
 password base SSH login.
 
>>> 
>>> ​+1  - this would be a pretty nice contribution.  Recommend building the
>>> processor and then posting here for review. I'm sure this would be a useful
>>> processor for many people.
>>> 
>>> 
>>> ExecuteRemoteCommand should have configurable attributes and return command
 output as flowfile
 
 host : Hostname or IP address.
 port : Port. Defaults to 22.
 user : User name.
 password: A password for password authentication.
 identity : A private key file for public-key authentication.
 execute - Execute a command.
 executeBackground - Execute a command in background.
 executeSudo - Execute a command with sudo support.
 shell - Execute a shell.
 
 
>>> ​As we do for 

End of stream?

2015-11-06 Thread Matthew Burgess
Does NiFi have the concept of an "end of stream" or is it designed to pretty
much always be running? For example if I use a GetFile processor pointing at
a single directory (with remove files = true), once all the files have been
processed, can downstream processors know that?

I'm working on a ReservoirSampling processor, and I have it successfully
building the reservoir from all incoming FlowFiles. However it never gets to
the logic that sends the sampled FlowFiles to the downstream processor (just
a PutFile at this point). I have the logic in a block like:

FlowFile flowFile = session.get();
if(flowFile == null) {
  // send reservoir
}
else {
 // build reservoir
}

But the if-clause never gets entered.  Is there a different approach and/or
am I misunderstanding how the data flow works?

Thanks in advance,
Matt




Re: End of stream?

2015-11-06 Thread Matthew Burgess
No that makes sense, thanks much!

So for my case, I'm thinking I'd want another attribute from GetFile called
"lastInStream" or something? It would be set once processing of the current
directory is complete (for the time being), and reset each time the
onTrigger is called.  At that point it's really more of a "lastInBatch", so
maybe instead I could use the batch size somehow as a hint to the
ReservoirSampling processor that the current reservoir is ready to send
along?  The use case is a kind of burst processing (or per-batch filtering),
where FlowFiles are available in "groups", where I could sample from the
incoming group with equal probability to give a smaller output group.


From:  Joe Witt <joe.w...@gmail.com>
Reply-To:  <dev@nifi.apache.org>
Date:  Friday, November 6, 2015 at 11:38 AM
To:  <dev@nifi.apache.org>
Subject:  Re: End of stream?

Matt,

For processors in the middle of the flow the null check is important
for race conditions where it is told it can run but by the time it
does there are no flowfiles left.  The framework though in general
will avoid this because it is checking if there is work to do.  So, in
short you can't use that mechanism to know there are no items left to
process.

The only way to know that a given flowfile was the last in a bunch
would be for that fact to be an attribute on a given flow file.

There is really no concept of an end of stream so to speak from a
processor perspective.  Processors are either running on not running.
You can, as i mentioned before though, use attributes of flowfiles to
annotate their relative position in a stream.

Does that help explain it at all or did I make it more confusing?

Thanks
Joe

On Fri, Nov 6, 2015 at 11:32 AM, Matthew Burgess <mattyb...@gmail.com>
wrote:
>  Does NiFi have the concept of an "end of stream" or is it designed to pretty
>  much always be running? For example if I use a GetFile processor pointing at
>  a single directory (with remove files = true), once all the files have been
>  processed, can downstream processors know that?
> 
>  I'm working on a ReservoirSampling processor, and I have it successfully
>  building the reservoir from all incoming FlowFiles. However it never gets to
>  the logic that sends the sampled FlowFiles to the downstream processor (just
>  a PutFile at this point). I have the logic in a block like:
> 
>  FlowFile flowFile = session.get();
>  if(flowFile == null) {
>// send reservoir
>  }
>  else {
>   // build reservoir
>  }
> 
>  But the if-clause never gets entered.  Is there a different approach and/or
>  am I misunderstanding how the data flow works?
> 
>  Thanks in advance,
>  Matt
> 
> 





Re: Incorporation of other Maven repositories

2015-11-03 Thread Matthew Burgess
Bintray JCenter (https://bintray.com/bintray/jcenter/) is also moderated and
claims to be "the repository with the biggest collection of Maven artifacts
in the world". I think Bintray itself proxies out to Maven Central, but it
appears that for JCenter you choose to sync your artifacts with Maven
Central: http://blog.bintray.com/tag/maven-central/

I imagine trust is still a per-organization or per-artifact issue, but
Bintray claims to be even safer and more trustworthy than Maven Central
(source: 
http://blog.bintray.com/2014/08/04/feel-secure-with-ssl-think-again/).  For
my (current) work and home projects, I still resolve from Maven Central, but
I have been publishing my own artifacts to Bintray.

Regards,
Matt

From:  Aldrin Piri 
Reply-To:  
Date:  Tuesday, November 3, 2015 at 12:34 PM
To:  
Subject:  Incorporation of other Maven repositories

I am writing to see what the general guidance and posture is on
incorporating additional repositories into the build process.

Obviously, Maven Central provides a very known quantity.  Are there other
repositories that are viewed with the same level of trust?  If so, is there
a listing? If not, do we vet new sources as they bring libraries that aid
our project and how is this accomplished?

Incorporating other repos brings up additional areas of concern,
specifically availability but also some additional security considerations
to the binaries that are being retrieved.

Any thoughts on this front would be much appreciated.





Re: Common data exchange formats and tabular data

2015-11-02 Thread Matthew Burgess
Hello all,

I am new to the NiFi community but I have a good amount of experience with
ETL tools and applications that process lots of tabular data. In my
experience, JSON is only useful as the common format for tabular data if it
has a "flat" schema, in which case there aren't any advantages for JSON over
other formats such as CSV. However, I've seen lots of "CSV" files that don't
seem to adhere to any standard, so I would presume NiFi would need a rigid
schema such as RFC-4180 (http://www.rfc-base.org/txt/rfc-4180.txt).

However CSV isn't a natural way to express the schema of the rows, so JSON
or YAML is probably a better choice. There's a format called Tabular Data
Package that combines CSV and JSON for tabular data serialization:
http://dataprotocols.org/tabular-data-package/

Avro is similar, but the schema must always be provided with the data. In
the case of NiFi DataFlows, it's likely more efficient to send the schema
once as an initialization packet (I can't remember the real term in NiFi),
then the rows can be streamed individually, in batches of user-defined size,
sampled, etc.

Having said all that, there are projects like Apache Drill that can handle
non-flat JSON files and still present them in tabular format. They have
functions like KVGEN and FLATTEN to transform the document(s) into tabular
format. In the use cases you present below, you already know the data is
tabular and as such, the extra data model transformation is not needed.  If
this is desired, it should be apparent that a Streaming JSON processor would
be necessary; otherwise, for large tabular datasets you'd have to read the
whole JSON file into memory to parse individual rows.

Regards,
Matt

From:  Toivo Adams 
Reply-To:  
Date:  Monday, November 2, 2015 at 5:12 AM
To:  
Subject:  Common data exchange formats and tabular data

All,
Some processors get/put data in tabular form. (PutSQL, ExecuteSQL, soon
Cassandra) 
It would be very nice to be able use such processors in pipeline ­ previous
processor output is next processor input. To achieve this, processors should
use common data exchange format.

JSON is most widely used, it¹s simple and readable. But JSON lacks schema.
Schema can be very useful to automate data insert/update.

Avro has schema, but is somewhat more complicated and not widely used
(yet?).

Please see also:

https://issues.apache.org/jira/browse/NIFI-978

https://issues.apache.org/jira/browse/NIFI-901

Opinions?

Thanks
Toivo




--
View this message in context:
http://apache-nifi-developer-list.39713.n7.nabble.com/Common-data-exchange-f
ormats-and-tabular-data-tp3508.html
Sent from the Apache NiFi Developer List mailing list archive at Nabble.com.