Re: ListSFTP incoming relationship

2018-03-29 Thread Andy LoPresto
You would still need to store the state per-directory-scanned, and this would 
scale with the number of directories used — this raises resolution questions as 
well, like does “~” == “/usr/xyz/home”, “/Users/xyz”, etc.? Is “~” the user 
NiFi is running as? 

So eventually you will end up with a map of some kind using resolved or 
unresolved directories as the keys and some state indicator (timestamp or 
otherwise) as the value. How long do you wait to age these values out? What if 
it scales to the hundreds of thousands of different key entries? The incoming 
attribute can have unbounded range, so there is no guarantee on the upper 
limit. 

I think the “minimum value” idea scales for a single directory listing, but not 
on the orthogonal axis for many possible directory values. 

Andy LoPresto
alopre...@apache.org
alopresto.apa...@gmail.com
PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69

> On Mar 29, 2018, at 13:15, Charlie Meyer  
> wrote:
> 
> Just a thought,
> 
> 
> Could a processor that did the scan and stored state be implemented similar
> to GenerateTableFetch, where there is a minimum value attribute that is
> specified that could be read from the source (such as created date, updated
> date, etc)? That way the state could potentially be manageable.
> 
>> On Thu, Mar 29, 2018 at 2:43 PM, Andy LoPresto  wrote:
>> 
>> Bryan,
>> 
>> No, that was exactly what I was referencing regarding the attribute
>> output. It would have been clearer if I had said it like you did. Thanks.
>> 
>> Andy LoPresto
>> alopre...@apache.org
>> *alopresto.apa...@gmail.com *
>> PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69
>> 
>> On Mar 29, 2018, at 10:46 AM, Bryan Bende  wrote:
>> 
>> Scott,
>> 
>> You are correct that the overall discussion is about allowing incoming
>> flow files to ListSFTP.
>> 
>> However, the previous discussion on this thread highlighted that the
>> main reason ListSFTP currently doesn't allow incoming flow files is
>> because of challenges when storing state.
>> 
>> This led to the proposal of a new processor that allowed incoming flow
>> files, but did not store state, thus avoiding the challenges mentioned
>> above. If we were going to store state in this new processor, then
>> we'd be back to the exact same challenges.
>> 
>> Providing an option to turn on state also doesn't really help, because
>> if there is an option provided to users,then the option will be used,
>> and it needs to work when it is used.
>> 
>> If we can come up with something that stores state and works well for
>> all scenarios, then we aren't against it, we just need to handle the
>> challenges highlighted by Joe's original email.
>> 
>> Regarding some of the other ideas...
>> 
>> The current output of ListSFTP already includes flow file attributes
>> for each listing that include the full path, filename, last update
>> time, owner, group, permissions, and file size were you thinking
>> of something different than that?
>> 
>> See the "Writes Attributes" section here:
>> https://nifi.apache.org/docs/nifi-docs/components/org.
>> apache.nifi/nifi-standard-nar/1.5.0/org.apache.nifi.
>> processors.standard.ListSFTP/index.html
>> 
>> Thanks,
>> 
>> Bryan
>> 
>> 
>> 
>> On Thu, Mar 29, 2018 at 12:43 PM, Andy LoPresto 
>> wrote:
>> 
>> Scott,
>> 
>> I think there are two conversations going on here. You are finding the
>> requirements for your specific use case, and that’s great. But I echo
>> Bryan’s point that a community processor for this scenario should not store
>> state at all. Sivaprasanna’s point that given dynamic directory input,
>> storing state based on that can cause massive data ingestion problems still
>> stands.
>> 
>> For your specific use case, you can prototype (or possibly even get to a
>> stable and robust-enough point) using ExecuteScript to model the behavior
>> you need.
>> 
>> In regards to the desired output format, I would suggest a few items:
>> 
>> * Avro requires a schema to be defined, and this raises the barrier to use
>> of the processor. Also, unless being sent to a processor that understands
>> Avro, the result will need to be converted anyway using Record* processors.
>> * If the output is individual flowfiles on a 1:1 basis, each should have as
>> many attributes populated with the parsed information as possible (i.e.
>> file.name, file.path, file.size, file.owner, file.permissions, etc.). This
>> allows for easily-consumable and routable flowfiles.
>> * If the output is a full directory listing, I would suggest `ls -al` type
>> raw text output, or JSON (arbitrary human-readable and machine-readable
>> format with many consuming/transforming processors).
>> 
>> 
>> Andy LoPresto
>> alopre...@apache.org
>> alopresto.apa...@gmail.com
>> PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69
>> 
>> On Mar 29, 2018, at 9:34 AM, scott  wrote:
>> 
>> Sorry Bryan, but I disagree with you. Not storing state is NOT the main
>> point of this ne

[CANCEL][VOTE] Release Apache NiFi 1.6.0 (RC2)

2018-03-29 Thread Joe Witt
The Apache NiFi 1.6.0 RC2 is cancelled due to NIFI-5033.

Thanks all - really appreciate your efforts to review here!

Joe

On Thu, Mar 29, 2018 at 6:38 PM, Joe Witt  wrote:
> Wellbummer.  Good find though Pierre.  Lets get that fixed.  I'll
> send the RC2 cancel email.
>
> I'll turn around RC3 as soon as it is fixed.
>
> thanks
>
> On Thu, Mar 29, 2018 at 5:36 PM, Pierre Villard
>  wrote:
>> Guys... I'll have to change my vote to -1.
>>
>> While playing with NiFi, I think I found an issue: when updating a variable
>> at pg level that references a restricted component it will fail. It seems
>> the code is the same for secured and unsecured instance and it fails when
>> NiFi is unsecured since the user is unknown. Even though that's not a
>> blocker, it'd certainly require a 1.6.1 version so it's probably best to
>> fix it before releasing 1.6.0.
>>
>> The issue has probably been introduced with NIFI-4885 [1] (I confirmed that
>> it's working as expected with a NiFi 1.5.0 instance). I opened a JIRA to
>> give more details [2].
>>
>> [1] https://issues.apache.org/jira/browse/NIFI-4885
>> [2] https://issues.apache.org/jira/browse/NIFI-5033
>>
>> Pierre
>>
>>
>> 2018-03-29 22:25 GMT+02:00 Matt Gilman :
>>
>>> +1 (binding) Release this package as nifi-1.6.0
>>>
>>> - Ran through release help
>>> - Verified issued discovered by Andrew Lim in previous RC
>>>
>>> Thanks for RMing Joe!
>>>
>>> Matt
>>>
>>> On Thu, Mar 29, 2018 at 8:36 AM, Aldrin Piri  wrote:
>>>
>>> > +1, binding
>>> >
>>> >
>>> > comments:
>>> >
>>> > * commit looks correct
>>> > * signature and hashes check out
>>> > * build and tests verified on CentOS 7 and OS X 10.12
>>> > * ran some simple flows on each and things looked fine
>>> >
>>> > On Thu, Mar 29, 2018 at 1:06 AM, Sivaprasanna >> >
>>> > wrote:
>>> >
>>> > > Thanks, Scott. That helped.
>>> > >
>>> > > On Thu, 29 Mar 2018 at 10:09 AM, James Wing  wrote:
>>> > >
>>> > > > +1 (binding).  Ran through the release helper, worked with a test
>>> flow.
>>> > > > Thanks for putting this together.
>>> > > >
>>> > > > On Mon, Mar 26, 2018 at 8:34 PM, Joe Witt 
>>> wrote:
>>> > > >
>>> > > > > Hello,
>>> > > > >
>>> > > > > I am pleased to be calling this vote for the source release of
>>> Apache
>>> > > > > NiFi nifi-1.6.0.
>>> > > > >
>>> > > > > The source zip, including signatures, digests, etc. can be found
>>> at:
>>> > > > > https://repository.apache.org/content/repositories/
>>> > orgapachenifi-1123
>>> > > > >
>>> > > > > The Git tag is nifi-1.6.0-RC2
>>> > > > > The Git commit ID is b5935ec81a7cbc048820781ac62cd96bbea5b232
>>> > > > > https://git-wip-us.apache.org/repos/asf?p=nifi.git;a=commit;h=
>>> > > > > b5935ec81a7cbc048820781ac62cd96bbea5b232
>>> > > > >
>>> > > > > Checksums of nifi-1.6.0-source-release.zip:
>>> > > > > SHA1: 009f1e2e3c17e38f21f27170b9c06228d11653c0
>>> > > > > SHA256: 39941a5b25427e2b4cc5ba8206084f
>>> f92df58863f29ddd097d4ac1e85424
>>> > > beb9
>>> > > > > SHA512: 1773417a48665e3cda22180ea7f401
>>> bc8190ebddbf3f7bc29831e46e7ab0
>>> > > > > a07694c6e478d252fa573209d4a3c8132a522a8507b6a8784669ab736484
>>> 7a07e234
>>> > > > >
>>> > > > > Release artifacts are signed with the following key:
>>> > > > > https://people.apache.org/keys/committer/joewitt.asc
>>> > > > >
>>> > > > > KEYS file available here:
>>> > > > > https://dist.apache.org/repos/dist/release/nifi/KEYS
>>> > > > >
>>> > > > > 146 issues were closed/resolved for this release:
>>> > > > > https://issues.apache.org/jira/secure/ReleaseNote.jspa?
>>> > > > > projectId=12316020&version=12342422
>>> > > > >
>>> > > > > Release note highlights can be found here:
>>> > > > > https://cwiki.apache.org/confluence/display/NIFI/
>>> > > > > Release+Notes#ReleaseNotes-Version1.6.0
>>> > > > >
>>> > > > > The vote will be open for 72 hours.
>>> > > > > Please download the release candidate and evaluate the necessary
>>> > items
>>> > > > > including checking hashes, signatures, build
>>> > > > > from source, and test.  The please vote:
>>> > > > >
>>> > > > > [ ] +1 Release this package as nifi-1.6.0
>>> > > > > [ ] +0 no opinion
>>> > > > > [ ] -1 Do not release this package because...
>>> > > > >
>>> > > >
>>> > >
>>> >
>>>


Re: [VOTE] Release Apache NiFi 1.6.0 (RC2)

2018-03-29 Thread Joe Witt
Wellbummer.  Good find though Pierre.  Lets get that fixed.  I'll
send the RC2 cancel email.

I'll turn around RC3 as soon as it is fixed.

thanks

On Thu, Mar 29, 2018 at 5:36 PM, Pierre Villard
 wrote:
> Guys... I'll have to change my vote to -1.
>
> While playing with NiFi, I think I found an issue: when updating a variable
> at pg level that references a restricted component it will fail. It seems
> the code is the same for secured and unsecured instance and it fails when
> NiFi is unsecured since the user is unknown. Even though that's not a
> blocker, it'd certainly require a 1.6.1 version so it's probably best to
> fix it before releasing 1.6.0.
>
> The issue has probably been introduced with NIFI-4885 [1] (I confirmed that
> it's working as expected with a NiFi 1.5.0 instance). I opened a JIRA to
> give more details [2].
>
> [1] https://issues.apache.org/jira/browse/NIFI-4885
> [2] https://issues.apache.org/jira/browse/NIFI-5033
>
> Pierre
>
>
> 2018-03-29 22:25 GMT+02:00 Matt Gilman :
>
>> +1 (binding) Release this package as nifi-1.6.0
>>
>> - Ran through release help
>> - Verified issued discovered by Andrew Lim in previous RC
>>
>> Thanks for RMing Joe!
>>
>> Matt
>>
>> On Thu, Mar 29, 2018 at 8:36 AM, Aldrin Piri  wrote:
>>
>> > +1, binding
>> >
>> >
>> > comments:
>> >
>> > * commit looks correct
>> > * signature and hashes check out
>> > * build and tests verified on CentOS 7 and OS X 10.12
>> > * ran some simple flows on each and things looked fine
>> >
>> > On Thu, Mar 29, 2018 at 1:06 AM, Sivaprasanna > >
>> > wrote:
>> >
>> > > Thanks, Scott. That helped.
>> > >
>> > > On Thu, 29 Mar 2018 at 10:09 AM, James Wing  wrote:
>> > >
>> > > > +1 (binding).  Ran through the release helper, worked with a test
>> flow.
>> > > > Thanks for putting this together.
>> > > >
>> > > > On Mon, Mar 26, 2018 at 8:34 PM, Joe Witt 
>> wrote:
>> > > >
>> > > > > Hello,
>> > > > >
>> > > > > I am pleased to be calling this vote for the source release of
>> Apache
>> > > > > NiFi nifi-1.6.0.
>> > > > >
>> > > > > The source zip, including signatures, digests, etc. can be found
>> at:
>> > > > > https://repository.apache.org/content/repositories/
>> > orgapachenifi-1123
>> > > > >
>> > > > > The Git tag is nifi-1.6.0-RC2
>> > > > > The Git commit ID is b5935ec81a7cbc048820781ac62cd96bbea5b232
>> > > > > https://git-wip-us.apache.org/repos/asf?p=nifi.git;a=commit;h=
>> > > > > b5935ec81a7cbc048820781ac62cd96bbea5b232
>> > > > >
>> > > > > Checksums of nifi-1.6.0-source-release.zip:
>> > > > > SHA1: 009f1e2e3c17e38f21f27170b9c06228d11653c0
>> > > > > SHA256: 39941a5b25427e2b4cc5ba8206084f
>> f92df58863f29ddd097d4ac1e85424
>> > > beb9
>> > > > > SHA512: 1773417a48665e3cda22180ea7f401
>> bc8190ebddbf3f7bc29831e46e7ab0
>> > > > > a07694c6e478d252fa573209d4a3c8132a522a8507b6a8784669ab736484
>> 7a07e234
>> > > > >
>> > > > > Release artifacts are signed with the following key:
>> > > > > https://people.apache.org/keys/committer/joewitt.asc
>> > > > >
>> > > > > KEYS file available here:
>> > > > > https://dist.apache.org/repos/dist/release/nifi/KEYS
>> > > > >
>> > > > > 146 issues were closed/resolved for this release:
>> > > > > https://issues.apache.org/jira/secure/ReleaseNote.jspa?
>> > > > > projectId=12316020&version=12342422
>> > > > >
>> > > > > Release note highlights can be found here:
>> > > > > https://cwiki.apache.org/confluence/display/NIFI/
>> > > > > Release+Notes#ReleaseNotes-Version1.6.0
>> > > > >
>> > > > > The vote will be open for 72 hours.
>> > > > > Please download the release candidate and evaluate the necessary
>> > items
>> > > > > including checking hashes, signatures, build
>> > > > > from source, and test.  The please vote:
>> > > > >
>> > > > > [ ] +1 Release this package as nifi-1.6.0
>> > > > > [ ] +0 no opinion
>> > > > > [ ] -1 Do not release this package because...
>> > > > >
>> > > >
>> > >
>> >
>>


Re: [VOTE] Release Apache NiFi 1.6.0 (RC2)

2018-03-29 Thread Pierre Villard
Guys... I'll have to change my vote to -1.

While playing with NiFi, I think I found an issue: when updating a variable
at pg level that references a restricted component it will fail. It seems
the code is the same for secured and unsecured instance and it fails when
NiFi is unsecured since the user is unknown. Even though that's not a
blocker, it'd certainly require a 1.6.1 version so it's probably best to
fix it before releasing 1.6.0.

The issue has probably been introduced with NIFI-4885 [1] (I confirmed that
it's working as expected with a NiFi 1.5.0 instance). I opened a JIRA to
give more details [2].

[1] https://issues.apache.org/jira/browse/NIFI-4885
[2] https://issues.apache.org/jira/browse/NIFI-5033

Pierre


2018-03-29 22:25 GMT+02:00 Matt Gilman :

> +1 (binding) Release this package as nifi-1.6.0
>
> - Ran through release help
> - Verified issued discovered by Andrew Lim in previous RC
>
> Thanks for RMing Joe!
>
> Matt
>
> On Thu, Mar 29, 2018 at 8:36 AM, Aldrin Piri  wrote:
>
> > +1, binding
> >
> >
> > comments:
> >
> > * commit looks correct
> > * signature and hashes check out
> > * build and tests verified on CentOS 7 and OS X 10.12
> > * ran some simple flows on each and things looked fine
> >
> > On Thu, Mar 29, 2018 at 1:06 AM, Sivaprasanna  >
> > wrote:
> >
> > > Thanks, Scott. That helped.
> > >
> > > On Thu, 29 Mar 2018 at 10:09 AM, James Wing  wrote:
> > >
> > > > +1 (binding).  Ran through the release helper, worked with a test
> flow.
> > > > Thanks for putting this together.
> > > >
> > > > On Mon, Mar 26, 2018 at 8:34 PM, Joe Witt 
> wrote:
> > > >
> > > > > Hello,
> > > > >
> > > > > I am pleased to be calling this vote for the source release of
> Apache
> > > > > NiFi nifi-1.6.0.
> > > > >
> > > > > The source zip, including signatures, digests, etc. can be found
> at:
> > > > > https://repository.apache.org/content/repositories/
> > orgapachenifi-1123
> > > > >
> > > > > The Git tag is nifi-1.6.0-RC2
> > > > > The Git commit ID is b5935ec81a7cbc048820781ac62cd96bbea5b232
> > > > > https://git-wip-us.apache.org/repos/asf?p=nifi.git;a=commit;h=
> > > > > b5935ec81a7cbc048820781ac62cd96bbea5b232
> > > > >
> > > > > Checksums of nifi-1.6.0-source-release.zip:
> > > > > SHA1: 009f1e2e3c17e38f21f27170b9c06228d11653c0
> > > > > SHA256: 39941a5b25427e2b4cc5ba8206084f
> f92df58863f29ddd097d4ac1e85424
> > > beb9
> > > > > SHA512: 1773417a48665e3cda22180ea7f401
> bc8190ebddbf3f7bc29831e46e7ab0
> > > > > a07694c6e478d252fa573209d4a3c8132a522a8507b6a8784669ab736484
> 7a07e234
> > > > >
> > > > > Release artifacts are signed with the following key:
> > > > > https://people.apache.org/keys/committer/joewitt.asc
> > > > >
> > > > > KEYS file available here:
> > > > > https://dist.apache.org/repos/dist/release/nifi/KEYS
> > > > >
> > > > > 146 issues were closed/resolved for this release:
> > > > > https://issues.apache.org/jira/secure/ReleaseNote.jspa?
> > > > > projectId=12316020&version=12342422
> > > > >
> > > > > Release note highlights can be found here:
> > > > > https://cwiki.apache.org/confluence/display/NIFI/
> > > > > Release+Notes#ReleaseNotes-Version1.6.0
> > > > >
> > > > > The vote will be open for 72 hours.
> > > > > Please download the release candidate and evaluate the necessary
> > items
> > > > > including checking hashes, signatures, build
> > > > > from source, and test.  The please vote:
> > > > >
> > > > > [ ] +1 Release this package as nifi-1.6.0
> > > > > [ ] +0 no opinion
> > > > > [ ] -1 Do not release this package because...
> > > > >
> > > >
> > >
> >
>


Re: [VOTE] Release Apache NiFi 1.6.0 (RC2)

2018-03-29 Thread Matt Gilman
+1 (binding) Release this package as nifi-1.6.0

- Ran through release help
- Verified issued discovered by Andrew Lim in previous RC

Thanks for RMing Joe!

Matt

On Thu, Mar 29, 2018 at 8:36 AM, Aldrin Piri  wrote:

> +1, binding
>
>
> comments:
>
> * commit looks correct
> * signature and hashes check out
> * build and tests verified on CentOS 7 and OS X 10.12
> * ran some simple flows on each and things looked fine
>
> On Thu, Mar 29, 2018 at 1:06 AM, Sivaprasanna 
> wrote:
>
> > Thanks, Scott. That helped.
> >
> > On Thu, 29 Mar 2018 at 10:09 AM, James Wing  wrote:
> >
> > > +1 (binding).  Ran through the release helper, worked with a test flow.
> > > Thanks for putting this together.
> > >
> > > On Mon, Mar 26, 2018 at 8:34 PM, Joe Witt  wrote:
> > >
> > > > Hello,
> > > >
> > > > I am pleased to be calling this vote for the source release of Apache
> > > > NiFi nifi-1.6.0.
> > > >
> > > > The source zip, including signatures, digests, etc. can be found at:
> > > > https://repository.apache.org/content/repositories/
> orgapachenifi-1123
> > > >
> > > > The Git tag is nifi-1.6.0-RC2
> > > > The Git commit ID is b5935ec81a7cbc048820781ac62cd96bbea5b232
> > > > https://git-wip-us.apache.org/repos/asf?p=nifi.git;a=commit;h=
> > > > b5935ec81a7cbc048820781ac62cd96bbea5b232
> > > >
> > > > Checksums of nifi-1.6.0-source-release.zip:
> > > > SHA1: 009f1e2e3c17e38f21f27170b9c06228d11653c0
> > > > SHA256: 39941a5b25427e2b4cc5ba8206084ff92df58863f29ddd097d4ac1e85424
> > beb9
> > > > SHA512: 1773417a48665e3cda22180ea7f401bc8190ebddbf3f7bc29831e46e7ab0
> > > > a07694c6e478d252fa573209d4a3c8132a522a8507b6a8784669ab7364847a07e234
> > > >
> > > > Release artifacts are signed with the following key:
> > > > https://people.apache.org/keys/committer/joewitt.asc
> > > >
> > > > KEYS file available here:
> > > > https://dist.apache.org/repos/dist/release/nifi/KEYS
> > > >
> > > > 146 issues were closed/resolved for this release:
> > > > https://issues.apache.org/jira/secure/ReleaseNote.jspa?
> > > > projectId=12316020&version=12342422
> > > >
> > > > Release note highlights can be found here:
> > > > https://cwiki.apache.org/confluence/display/NIFI/
> > > > Release+Notes#ReleaseNotes-Version1.6.0
> > > >
> > > > The vote will be open for 72 hours.
> > > > Please download the release candidate and evaluate the necessary
> items
> > > > including checking hashes, signatures, build
> > > > from source, and test.  The please vote:
> > > >
> > > > [ ] +1 Release this package as nifi-1.6.0
> > > > [ ] +0 no opinion
> > > > [ ] -1 Do not release this package because...
> > > >
> > >
> >
>


Re: ListSFTP incoming relationship

2018-03-29 Thread Charlie Meyer
Just a thought,


Could a processor that did the scan and stored state be implemented similar
to GenerateTableFetch, where there is a minimum value attribute that is
specified that could be read from the source (such as created date, updated
date, etc)? That way the state could potentially be manageable.

On Thu, Mar 29, 2018 at 2:43 PM, Andy LoPresto  wrote:

> Bryan,
>
> No, that was exactly what I was referencing regarding the attribute
> output. It would have been clearer if I had said it like you did. Thanks.
>
> Andy LoPresto
> alopre...@apache.org
> *alopresto.apa...@gmail.com *
> PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69
>
> On Mar 29, 2018, at 10:46 AM, Bryan Bende  wrote:
>
> Scott,
>
> You are correct that the overall discussion is about allowing incoming
> flow files to ListSFTP.
>
> However, the previous discussion on this thread highlighted that the
> main reason ListSFTP currently doesn't allow incoming flow files is
> because of challenges when storing state.
>
> This led to the proposal of a new processor that allowed incoming flow
> files, but did not store state, thus avoiding the challenges mentioned
> above. If we were going to store state in this new processor, then
> we'd be back to the exact same challenges.
>
> Providing an option to turn on state also doesn't really help, because
> if there is an option provided to users,then the option will be used,
> and it needs to work when it is used.
>
> If we can come up with something that stores state and works well for
> all scenarios, then we aren't against it, we just need to handle the
> challenges highlighted by Joe's original email.
>
> Regarding some of the other ideas...
>
> The current output of ListSFTP already includes flow file attributes
> for each listing that include the full path, filename, last update
> time, owner, group, permissions, and file size were you thinking
> of something different than that?
>
> See the "Writes Attributes" section here:
> https://nifi.apache.org/docs/nifi-docs/components/org.
> apache.nifi/nifi-standard-nar/1.5.0/org.apache.nifi.
> processors.standard.ListSFTP/index.html
>
> Thanks,
>
> Bryan
>
>
>
> On Thu, Mar 29, 2018 at 12:43 PM, Andy LoPresto 
> wrote:
>
> Scott,
>
> I think there are two conversations going on here. You are finding the
> requirements for your specific use case, and that’s great. But I echo
> Bryan’s point that a community processor for this scenario should not store
> state at all. Sivaprasanna’s point that given dynamic directory input,
> storing state based on that can cause massive data ingestion problems still
> stands.
>
> For your specific use case, you can prototype (or possibly even get to a
> stable and robust-enough point) using ExecuteScript to model the behavior
> you need.
>
> In regards to the desired output format, I would suggest a few items:
>
> * Avro requires a schema to be defined, and this raises the barrier to use
> of the processor. Also, unless being sent to a processor that understands
> Avro, the result will need to be converted anyway using Record* processors.
> * If the output is individual flowfiles on a 1:1 basis, each should have as
> many attributes populated with the parsed information as possible (i.e.
> file.name, file.path, file.size, file.owner, file.permissions, etc.). This
> allows for easily-consumable and routable flowfiles.
> * If the output is a full directory listing, I would suggest `ls -al` type
> raw text output, or JSON (arbitrary human-readable and machine-readable
> format with many consuming/transforming processors).
>
>
> Andy LoPresto
> alopre...@apache.org
> alopresto.apa...@gmail.com
> PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69
>
> On Mar 29, 2018, at 9:34 AM, scott  wrote:
>
> Sorry Bryan, but I disagree with you. Not storing state is NOT the main
> point of this new processor. The main point is to allow an incoming
> relationship flowfile to trigger the action, and allow variables to be used
> from the attributes therein.
>
> I agree that if the NiFi community deems it too risky to distribute this
> processor with state keeping optionally available, even if the default is
> to
> disable it, then so be it. If state is not included optionally, then how
> about making the output flowfile content include more than just the file
> names? Have it include last updated time along with the filename. If it
> searches recursively, you'll want to include the path to the file also.
> Maybe it would be best to output the results into a structured format, such
> as AVRO? Or, maybe it would just be best to output one flowfile per remote
> file found, and include updated time and fully qualified path as
> attributes?
>
> Scott
>
>
> On 03/29/2018 04:32 AM, Bryan Bende wrote:
>
> The main point of the new processor is to NOT store state so that it
> becomes more reasonable to allow incoming flow files.
>
> You could probably implement your own custom processor that does both
> be

Re: ListSFTP incoming relationship

2018-03-29 Thread Andy LoPresto
Bryan,

No, that was exactly what I was referencing regarding the attribute output. It 
would have been clearer if I had said it like you did. Thanks.

Andy LoPresto
alopre...@apache.org
alopresto.apa...@gmail.com
PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69

> On Mar 29, 2018, at 10:46 AM, Bryan Bende  wrote:
> 
> Scott,
> 
> You are correct that the overall discussion is about allowing incoming
> flow files to ListSFTP.
> 
> However, the previous discussion on this thread highlighted that the
> main reason ListSFTP currently doesn't allow incoming flow files is
> because of challenges when storing state.
> 
> This led to the proposal of a new processor that allowed incoming flow
> files, but did not store state, thus avoiding the challenges mentioned
> above. If we were going to store state in this new processor, then
> we'd be back to the exact same challenges.
> 
> Providing an option to turn on state also doesn't really help, because
> if there is an option provided to users,then the option will be used,
> and it needs to work when it is used.
> 
> If we can come up with something that stores state and works well for
> all scenarios, then we aren't against it, we just need to handle the
> challenges highlighted by Joe's original email.
> 
> Regarding some of the other ideas...
> 
> The current output of ListSFTP already includes flow file attributes
> for each listing that include the full path, filename, last update
> time, owner, group, permissions, and file size were you thinking
> of something different than that?
> 
> See the "Writes Attributes" section here:
> https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-standard-nar/1.5.0/org.apache.nifi.processors.standard.ListSFTP/index.html
> 
> Thanks,
> 
> Bryan
> 
> 
> 
> On Thu, Mar 29, 2018 at 12:43 PM, Andy LoPresto  wrote:
>> Scott,
>> 
>> I think there are two conversations going on here. You are finding the
>> requirements for your specific use case, and that’s great. But I echo
>> Bryan’s point that a community processor for this scenario should not store
>> state at all. Sivaprasanna’s point that given dynamic directory input,
>> storing state based on that can cause massive data ingestion problems still
>> stands.
>> 
>> For your specific use case, you can prototype (or possibly even get to a
>> stable and robust-enough point) using ExecuteScript to model the behavior
>> you need.
>> 
>> In regards to the desired output format, I would suggest a few items:
>> 
>> * Avro requires a schema to be defined, and this raises the barrier to use
>> of the processor. Also, unless being sent to a processor that understands
>> Avro, the result will need to be converted anyway using Record* processors.
>> * If the output is individual flowfiles on a 1:1 basis, each should have as
>> many attributes populated with the parsed information as possible (i.e.
>> file.name, file.path, file.size, file.owner, file.permissions, etc.). This
>> allows for easily-consumable and routable flowfiles.
>> * If the output is a full directory listing, I would suggest `ls -al` type
>> raw text output, or JSON (arbitrary human-readable and machine-readable
>> format with many consuming/transforming processors).
>> 
>> 
>> Andy LoPresto
>> alopre...@apache.org
>> alopresto.apa...@gmail.com
>> PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69
>> 
>> On Mar 29, 2018, at 9:34 AM, scott  wrote:
>> 
>> Sorry Bryan, but I disagree with you. Not storing state is NOT the main
>> point of this new processor. The main point is to allow an incoming
>> relationship flowfile to trigger the action, and allow variables to be used
>> from the attributes therein.
>> 
>> I agree that if the NiFi community deems it too risky to distribute this
>> processor with state keeping optionally available, even if the default is to
>> disable it, then so be it. If state is not included optionally, then how
>> about making the output flowfile content include more than just the file
>> names? Have it include last updated time along with the filename. If it
>> searches recursively, you'll want to include the path to the file also.
>> Maybe it would be best to output the results into a structured format, such
>> as AVRO? Or, maybe it would just be best to output one flowfile per remote
>> file found, and include updated time and fully qualified path as attributes?
>> 
>> Scott
>> 
>> 
>> On 03/29/2018 04:32 AM, Bryan Bende wrote:
>> 
>> The main point of the new processor is to NOT store state so that it
>> becomes more reasonable to allow incoming flow files.
>> 
>> You could probably implement your own custom processor that does both
>> because you can make assumptions about how you are going to use it, but if
>> the NiFi community provides one then it needs to work well for all
>> situations, such as dynamically listing hundreds of directories, which is
>> problematic when state is involved.
>> 
>> On Thu, Mar 29, 2018 at 1:05 AM Sivaprasann

Re: ListSFTP incoming relationship

2018-03-29 Thread Bryan Bende
Scott,

You are correct that the overall discussion is about allowing incoming
flow files to ListSFTP.

However, the previous discussion on this thread highlighted that the
main reason ListSFTP currently doesn't allow incoming flow files is
because of challenges when storing state.

This led to the proposal of a new processor that allowed incoming flow
files, but did not store state, thus avoiding the challenges mentioned
above. If we were going to store state in this new processor, then
we'd be back to the exact same challenges.

Providing an option to turn on state also doesn't really help, because
if there is an option provided to users,then the option will be used,
and it needs to work when it is used.

If we can come up with something that stores state and works well for
all scenarios, then we aren't against it, we just need to handle the
challenges highlighted by Joe's original email.

Regarding some of the other ideas...

The current output of ListSFTP already includes flow file attributes
for each listing that include the full path, filename, last update
time, owner, group, permissions, and file size were you thinking
of something different than that?

See the "Writes Attributes" section here:
https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-standard-nar/1.5.0/org.apache.nifi.processors.standard.ListSFTP/index.html

Thanks,

Bryan



On Thu, Mar 29, 2018 at 12:43 PM, Andy LoPresto  wrote:
> Scott,
>
> I think there are two conversations going on here. You are finding the
> requirements for your specific use case, and that’s great. But I echo
> Bryan’s point that a community processor for this scenario should not store
> state at all. Sivaprasanna’s point that given dynamic directory input,
> storing state based on that can cause massive data ingestion problems still
> stands.
>
> For your specific use case, you can prototype (or possibly even get to a
> stable and robust-enough point) using ExecuteScript to model the behavior
> you need.
>
> In regards to the desired output format, I would suggest a few items:
>
> * Avro requires a schema to be defined, and this raises the barrier to use
> of the processor. Also, unless being sent to a processor that understands
> Avro, the result will need to be converted anyway using Record* processors.
> * If the output is individual flowfiles on a 1:1 basis, each should have as
> many attributes populated with the parsed information as possible (i.e.
> file.name, file.path, file.size, file.owner, file.permissions, etc.). This
> allows for easily-consumable and routable flowfiles.
> * If the output is a full directory listing, I would suggest `ls -al` type
> raw text output, or JSON (arbitrary human-readable and machine-readable
> format with many consuming/transforming processors).
>
>
> Andy LoPresto
> alopre...@apache.org
> alopresto.apa...@gmail.com
> PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69
>
> On Mar 29, 2018, at 9:34 AM, scott  wrote:
>
> Sorry Bryan, but I disagree with you. Not storing state is NOT the main
> point of this new processor. The main point is to allow an incoming
> relationship flowfile to trigger the action, and allow variables to be used
> from the attributes therein.
>
> I agree that if the NiFi community deems it too risky to distribute this
> processor with state keeping optionally available, even if the default is to
> disable it, then so be it. If state is not included optionally, then how
> about making the output flowfile content include more than just the file
> names? Have it include last updated time along with the filename. If it
> searches recursively, you'll want to include the path to the file also.
> Maybe it would be best to output the results into a structured format, such
> as AVRO? Or, maybe it would just be best to output one flowfile per remote
> file found, and include updated time and fully qualified path as attributes?
>
> Scott
>
>
> On 03/29/2018 04:32 AM, Bryan Bende wrote:
>
> The main point of the new processor is to NOT store state so that it
> becomes more reasonable to allow incoming flow files.
>
> You could probably implement your own custom processor that does both
> because you can make assumptions about how you are going to use it, but if
> the NiFi community provides one then it needs to work well for all
> situations, such as dynamically listing hundreds of directories, which is
> problematic when state is involved.
>
> On Thu, Mar 29, 2018 at 1:05 AM Sivaprasanna 
> wrote:
>
> Should we really have to have an optional state saving functionality? If
> the user is unaware of the implications and proceed to store the state then
> what Andrew Grande mentioned will happen - possibilities of never ending
> stream of state information being stored. If we still go with the optional
> state management approach, documentation have to be clear in explaining the
> implications.
>
> Sivaprasanna
>
> On Thu, 29 Mar 2018 at 9:28 AM, scott  wrote:
>
> Okay. So, a

Re: ListSFTP incoming relationship

2018-03-29 Thread Andy LoPresto
Scott,

I think there are two conversations going on here. You are finding the 
requirements for your specific use case, and that’s great. But I echo Bryan’s 
point that a community processor for this scenario should not store state at 
all. Sivaprasanna’s point that given dynamic directory input, storing state 
based on that can cause massive data ingestion problems still stands.

For your specific use case, you can prototype (or possibly even get to a stable 
and robust-enough point) using ExecuteScript to model the behavior you need.

In regards to the desired output format, I would suggest a few items:

* Avro requires a schema to be defined, and this raises the barrier to use of 
the processor. Also, unless being sent to a processor that understands Avro, 
the result will need to be converted anyway using Record* processors.
* If the output is individual flowfiles on a 1:1 basis, each should have as 
many attributes populated with the parsed information as possible (i.e. 
file.name, file.path, file.size, file.owner, file.permissions, etc.). This 
allows for easily-consumable and routable flowfiles.
* If the output is a full directory listing, I would suggest `ls -al` type raw 
text output, or JSON (arbitrary human-readable and machine-readable format with 
many consuming/transforming processors).


Andy LoPresto
alopre...@apache.org
alopresto.apa...@gmail.com
PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69

> On Mar 29, 2018, at 9:34 AM, scott  wrote:
> 
> Sorry Bryan, but I disagree with you. Not storing state is NOT the main point 
> of this new processor. The main point is to allow an incoming relationship 
> flowfile to trigger the action, and allow variables to be used from the 
> attributes therein.
> 
> I agree that if the NiFi community deems it too risky to distribute this 
> processor with state keeping optionally available, even if the default is to 
> disable it, then so be it. If state is not included optionally, then how 
> about making the output flowfile content include more than just the file 
> names? Have it include last updated time along with the filename. If it 
> searches recursively, you'll want to include the path to the file also. Maybe 
> it would be best to output the results into a structured format, such as 
> AVRO? Or, maybe it would just be best to output one flowfile per remote file 
> found, and include updated time and fully qualified path as attributes?
> 
> Scott
> 
> 
> On 03/29/2018 04:32 AM, Bryan Bende wrote:
>> The main point of the new processor is to NOT store state so that it
>> becomes more reasonable to allow incoming flow files.
>> 
>> You could probably implement your own custom processor that does both
>> because you can make assumptions about how you are going to use it, but if
>> the NiFi community provides one then it needs to work well for all
>> situations, such as dynamically listing hundreds of directories, which is
>> problematic when state is involved.
>> 
>> On Thu, Mar 29, 2018 at 1:05 AM Sivaprasanna 
>> wrote:
>> 
>>> Should we really have to have an optional state saving functionality? If
>>> the user is unaware of the implications and proceed to store the state then
>>> what Andrew Grande mentioned will happen - possibilities of never ending
>>> stream of state information being stored. If we still go with the optional
>>> state management approach, documentation have to be clear in explaining the
>>> implications.
>>> 
>>> Sivaprasanna
>>> 
>>> On Thu, 29 Mar 2018 at 9:28 AM, scott  wrote:
>>> 
 Okay. So, a new processor called "ScanSFTP", allow incoming relationship
 where the content of the flow file is replaced with the list of matching
 files from the remote directory, then the list is filtered by the usual
 regex parameters like today. Optional state information is kept to
 additionally filter the list of files older than the newest file
 observed during the last run. Does that sound okay to everyone? If so,
 what's the next step?
 
 Scott
 
 
 On 03/27/2018 06:21 PM, scott wrote:
> This is a great discussion, and appreciate the interest in my problem.
> I think there are workarounds if you decide not to store state, but
> I'd recommend keeping it. I think state should be kept optionally,
> even turned off by default. Several times I've had issues where the
> state has cause me to miss files, because files get moved into the
> source folder out of order, and I've wished I could turn the state
> feature off.
> 
> In my current use-case, I would not be frequently, dynamically
> changing the source directory, though I can see the use-cases where it
> would be. In my current use-case, I want to use an external database
> table to control the configuration of all my flows. I do this by first
> reading the content of the table for this particular flow ID, then
> assign the result as attributes to the flowfile, essen

Re: ListSFTP incoming relationship

2018-03-29 Thread scott
Sorry Bryan, but I disagree with you. Not storing state is NOT the main 
point of this new processor. The main point is to allow an incoming 
relationship flowfile to trigger the action, and allow variables to be 
used from the attributes therein.


I agree that if the NiFi community deems it too risky to distribute this 
processor with state keeping optionally available, even if the default 
is to disable it, then so be it. If state is not included optionally, 
then how about making the output flowfile content include more than just 
the file names? Have it include last updated time along with the 
filename. If it searches recursively, you'll want to include the path to 
the file also. Maybe it would be best to output the results into a 
structured format, such as AVRO? Or, maybe it would just be best to 
output one flowfile per remote file found, and include updated time and 
fully qualified path as attributes?


Scott


On 03/29/2018 04:32 AM, Bryan Bende wrote:

The main point of the new processor is to NOT store state so that it
becomes more reasonable to allow incoming flow files.

You could probably implement your own custom processor that does both
because you can make assumptions about how you are going to use it, but if
the NiFi community provides one then it needs to work well for all
situations, such as dynamically listing hundreds of directories, which is
problematic when state is involved.

On Thu, Mar 29, 2018 at 1:05 AM Sivaprasanna 
wrote:


Should we really have to have an optional state saving functionality? If
the user is unaware of the implications and proceed to store the state then
what Andrew Grande mentioned will happen - possibilities of never ending
stream of state information being stored. If we still go with the optional
state management approach, documentation have to be clear in explaining the
implications.

Sivaprasanna

On Thu, 29 Mar 2018 at 9:28 AM, scott  wrote:


Okay. So, a new processor called "ScanSFTP", allow incoming relationship
where the content of the flow file is replaced with the list of matching
files from the remote directory, then the list is filtered by the usual
regex parameters like today. Optional state information is kept to
additionally filter the list of files older than the newest file
observed during the last run. Does that sound okay to everyone? If so,
what's the next step?

Scott


On 03/27/2018 06:21 PM, scott wrote:

This is a great discussion, and appreciate the interest in my problem.
I think there are workarounds if you decide not to store state, but
I'd recommend keeping it. I think state should be kept optionally,
even turned off by default. Several times I've had issues where the
state has cause me to miss files, because files get moved into the
source folder out of order, and I've wished I could turn the state
feature off.

In my current use-case, I would not be frequently, dynamically
changing the source directory, though I can see the use-cases where it
would be. In my current use-case, I want to use an external database
table to control the configuration of all my flows. I do this by first
reading the content of the table for this particular flow ID, then
assign the result as attributes to the flowfile, essentially creating
variables I can use throughout the flow to control its behavior. This
works great with flows that initiate with HTTP or SQL, but not
ListSFTP or ListFile.

Scott


On 03/27/2018 02:05 PM, Andy LoPresto wrote:

I think Bryan’s point is a good one and when I first saw this
question (and thought of the previous times it’s been asked), my
initial response is to propose a second processor.

Something like “ScanSFTP”/“IndexSFTP”/“SnapshotSFTP” which operates
differently from ListSFTP — it does not maintain state, and performs
a one-time tabulation/chronicling of the state of that directory at
the given point in time.

The responsibility to maintain and compare state across time is no
longer a requirement. There could even be a setting in the processor
to allow for “individual flowfile output” (i.e. act the same as
ListSFTP and output one flowfile per item listed) or “summary
flowfile output” where a single flowfile is generated containing the
directory listing information for all the items there. (Another
option is to output both on two different relationships).

I think this would enable the types of workflows that users have
asked about in the past without compromising the mechanism by which
List* processors work and adding undue complexity to those processors.

Absolutely crystal clear documentation (and a standard verb for the
new processor family) would be necessary (not only because these
processor solve different problems, but to avoid a million variants
of “I used ScanSFTP processor and it’s not tracking state”/“How do I
provide a directory in an attribute to ListSFTP” mailing list
questions).


Andy LoPresto
alopre...@apache.org 
/alopresto.apa...@gmail.com 

Re: Read processor property in init()

2018-03-29 Thread Sivaprasanna
Yep. That’s correct.

On Thu, 29 Mar 2018 at 6:45 PM, Jeff Zemerick  wrote:

> Thanks! Just to confirm, each time the processor is started the
> @OnScheduled annotated method is executed, right?
>
> Jeff
>
>
> On Wed, Mar 28, 2018 at 9:07 AM, Sivaprasanna 
> wrote:
>
> > Just to add on top of what Mike said. The @OnScheduled annotation
> indicates
> > that the method that is marked with this annotation will run when a
> > processor is started every time. So basically the setup() will be called
> > and executed everytime the processor is started from the UI.
> >
> > On Wed, 28 Mar 2018 at 6:34 PM, Jeff Zemerick 
> > wrote:
> >
> > > I will give that a go. Thanks for the quick answer, Mike!
> > >
> > > On Wed, Mar 28, 2018 at 9:01 AM, Mike Thomsen 
> > > wrote:
> > > > Just do...
> > > >
> > > > @OnScheduled
> > > > public void setup(ProcessContext context) {
> > > > //Read properties and do setup.
> > > > }
> > > >
> > > > On Wed, Mar 28, 2018 at 8:57 AM, Jeff Zemerick  >
> > > wrote:
> > > >
> > > >> Hi everyone,
> > > >>
> > > >> Is there a recommended method for making user-configurable property
> > > >> values available to a processor's init()? I would like to load a
> large
> > > >> index file but allow the user to specify the index's path. I am
> > > >> guessing that init() is executed too early to read user properties.
> > > >>
> > > >> Thanks for any suggestions.
> > > >>
> > > >> Jeff
> > > >>
> > >
> >
>


Re: Read processor property in init()

2018-03-29 Thread Jeff Zemerick
Thanks! Just to confirm, each time the processor is started the
@OnScheduled annotated method is executed, right?

Jeff


On Wed, Mar 28, 2018 at 9:07 AM, Sivaprasanna 
wrote:

> Just to add on top of what Mike said. The @OnScheduled annotation indicates
> that the method that is marked with this annotation will run when a
> processor is started every time. So basically the setup() will be called
> and executed everytime the processor is started from the UI.
>
> On Wed, 28 Mar 2018 at 6:34 PM, Jeff Zemerick 
> wrote:
>
> > I will give that a go. Thanks for the quick answer, Mike!
> >
> > On Wed, Mar 28, 2018 at 9:01 AM, Mike Thomsen 
> > wrote:
> > > Just do...
> > >
> > > @OnScheduled
> > > public void setup(ProcessContext context) {
> > > //Read properties and do setup.
> > > }
> > >
> > > On Wed, Mar 28, 2018 at 8:57 AM, Jeff Zemerick 
> > wrote:
> > >
> > >> Hi everyone,
> > >>
> > >> Is there a recommended method for making user-configurable property
> > >> values available to a processor's init()? I would like to load a large
> > >> index file but allow the user to specify the index's path. I am
> > >> guessing that init() is executed too early to read user properties.
> > >>
> > >> Thanks for any suggestions.
> > >>
> > >> Jeff
> > >>
> >
>


Re: [VOTE] Release Apache NiFi 1.6.0 (RC2)

2018-03-29 Thread Aldrin Piri
+1, binding


comments:

* commit looks correct
* signature and hashes check out
* build and tests verified on CentOS 7 and OS X 10.12
* ran some simple flows on each and things looked fine

On Thu, Mar 29, 2018 at 1:06 AM, Sivaprasanna 
wrote:

> Thanks, Scott. That helped.
>
> On Thu, 29 Mar 2018 at 10:09 AM, James Wing  wrote:
>
> > +1 (binding).  Ran through the release helper, worked with a test flow.
> > Thanks for putting this together.
> >
> > On Mon, Mar 26, 2018 at 8:34 PM, Joe Witt  wrote:
> >
> > > Hello,
> > >
> > > I am pleased to be calling this vote for the source release of Apache
> > > NiFi nifi-1.6.0.
> > >
> > > The source zip, including signatures, digests, etc. can be found at:
> > > https://repository.apache.org/content/repositories/orgapachenifi-1123
> > >
> > > The Git tag is nifi-1.6.0-RC2
> > > The Git commit ID is b5935ec81a7cbc048820781ac62cd96bbea5b232
> > > https://git-wip-us.apache.org/repos/asf?p=nifi.git;a=commit;h=
> > > b5935ec81a7cbc048820781ac62cd96bbea5b232
> > >
> > > Checksums of nifi-1.6.0-source-release.zip:
> > > SHA1: 009f1e2e3c17e38f21f27170b9c06228d11653c0
> > > SHA256: 39941a5b25427e2b4cc5ba8206084ff92df58863f29ddd097d4ac1e85424
> beb9
> > > SHA512: 1773417a48665e3cda22180ea7f401bc8190ebddbf3f7bc29831e46e7ab0
> > > a07694c6e478d252fa573209d4a3c8132a522a8507b6a8784669ab7364847a07e234
> > >
> > > Release artifacts are signed with the following key:
> > > https://people.apache.org/keys/committer/joewitt.asc
> > >
> > > KEYS file available here:
> > > https://dist.apache.org/repos/dist/release/nifi/KEYS
> > >
> > > 146 issues were closed/resolved for this release:
> > > https://issues.apache.org/jira/secure/ReleaseNote.jspa?
> > > projectId=12316020&version=12342422
> > >
> > > Release note highlights can be found here:
> > > https://cwiki.apache.org/confluence/display/NIFI/
> > > Release+Notes#ReleaseNotes-Version1.6.0
> > >
> > > The vote will be open for 72 hours.
> > > Please download the release candidate and evaluate the necessary items
> > > including checking hashes, signatures, build
> > > from source, and test.  The please vote:
> > >
> > > [ ] +1 Release this package as nifi-1.6.0
> > > [ ] +0 no opinion
> > > [ ] -1 Do not release this package because...
> > >
> >
>


Re: ListSFTP incoming relationship

2018-03-29 Thread Bryan Bende
The main point of the new processor is to NOT store state so that it
becomes more reasonable to allow incoming flow files.

You could probably implement your own custom processor that does both
because you can make assumptions about how you are going to use it, but if
the NiFi community provides one then it needs to work well for all
situations, such as dynamically listing hundreds of directories, which is
problematic when state is involved.

On Thu, Mar 29, 2018 at 1:05 AM Sivaprasanna 
wrote:

> Should we really have to have an optional state saving functionality? If
> the user is unaware of the implications and proceed to store the state then
> what Andrew Grande mentioned will happen - possibilities of never ending
> stream of state information being stored. If we still go with the optional
> state management approach, documentation have to be clear in explaining the
> implications.
>
> Sivaprasanna
>
> On Thu, 29 Mar 2018 at 9:28 AM, scott  wrote:
>
> > Okay. So, a new processor called "ScanSFTP", allow incoming relationship
> > where the content of the flow file is replaced with the list of matching
> > files from the remote directory, then the list is filtered by the usual
> > regex parameters like today. Optional state information is kept to
> > additionally filter the list of files older than the newest file
> > observed during the last run. Does that sound okay to everyone? If so,
> > what's the next step?
> >
> > Scott
> >
> >
> > On 03/27/2018 06:21 PM, scott wrote:
> > >
> > > This is a great discussion, and appreciate the interest in my problem.
> > > I think there are workarounds if you decide not to store state, but
> > > I'd recommend keeping it. I think state should be kept optionally,
> > > even turned off by default. Several times I've had issues where the
> > > state has cause me to miss files, because files get moved into the
> > > source folder out of order, and I've wished I could turn the state
> > > feature off.
> > >
> > > In my current use-case, I would not be frequently, dynamically
> > > changing the source directory, though I can see the use-cases where it
> > > would be. In my current use-case, I want to use an external database
> > > table to control the configuration of all my flows. I do this by first
> > > reading the content of the table for this particular flow ID, then
> > > assign the result as attributes to the flowfile, essentially creating
> > > variables I can use throughout the flow to control its behavior. This
> > > works great with flows that initiate with HTTP or SQL, but not
> > > ListSFTP or ListFile.
> > >
> > > Scott
> > >
> > >
> > > On 03/27/2018 02:05 PM, Andy LoPresto wrote:
> > >> I think Bryan’s point is a good one and when I first saw this
> > >> question (and thought of the previous times it’s been asked), my
> > >> initial response is to propose a second processor.
> > >>
> > >> Something like “ScanSFTP”/“IndexSFTP”/“SnapshotSFTP” which operates
> > >> differently from ListSFTP — it does not maintain state, and performs
> > >> a one-time tabulation/chronicling of the state of that directory at
> > >> the given point in time.
> > >>
> > >> The responsibility to maintain and compare state across time is no
> > >> longer a requirement. There could even be a setting in the processor
> > >> to allow for “individual flowfile output” (i.e. act the same as
> > >> ListSFTP and output one flowfile per item listed) or “summary
> > >> flowfile output” where a single flowfile is generated containing the
> > >> directory listing information for all the items there. (Another
> > >> option is to output both on two different relationships).
> > >>
> > >> I think this would enable the types of workflows that users have
> > >> asked about in the past without compromising the mechanism by which
> > >> List* processors work and adding undue complexity to those processors.
> > >>
> > >> Absolutely crystal clear documentation (and a standard verb for the
> > >> new processor family) would be necessary (not only because these
> > >> processor solve different problems, but to avoid a million variants
> > >> of “I used ScanSFTP processor and it’s not tracking state”/“How do I
> > >> provide a directory in an attribute to ListSFTP” mailing list
> > >> questions).
> > >>
> > >>
> > >> Andy LoPresto
> > >> alopre...@apache.org 
> > >> /alopresto.apa...@gmail.com /
> > >> PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69
> > >>
> > >>> On Mar 27, 2018, at 8:33 AM, Andrew Grande  > >>> > wrote:
> > >>>
> > >>> The key here is that ListXXX processor maintains state. A directory
> > >>> is part
> > >>> of such state. Allowing arbitrary directories via an expression would
> > >>> create never ending stream of new entries in the state storage,
> > >>> effectively
> > >>> engineering a distributed DoS attack on the NiFi node or shared ZK
> > >>> quorum
> > >>> (for when sta

Re: NiFi 1.6 build failure nifi-web-ui

2018-03-29 Thread Sivaprasanna
The latest master should be able to build without this error. I recently
faced this problem.  Please make sure you’re using the latest master. Take
a look at this PR :
https://github.com/apache/nifi/pull/2571

Sivaprasanna

On Thu, 29 Mar 2018 at 1:07 PM, luis_size 
wrote:

> Hi,
> I am trying to build NiFi from source but the build fails with the below
> error. Can anyone help me with this?
> Sorry if it's a newbie question. I am new to the NiFi development world.
> Thanks
>
> [INFO]
> 
>
> [INFO] BUILD FAILURE
>
> [INFO]
> 
>
> [INFO] Total time: 04:48 min
>
> [INFO] Finished at: 2018-03-29T09:31:54+02:00
>
> [INFO] Final Memory: 288M/1219M
>
> [INFO]
> 
>
> [ERROR] Failed to execute goal
> com.github.eirslett:frontend-maven-plugin:1.1:npm (npm install) on project
> nifi-web-ui: Failed to run task: 'npm --cache-min Infinity install' failed.
> (error code 1) -> [Help 1]
>
> org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute
> goal com.github.eirslett:frontend-maven-plugin:1.1:npm (npm install) on
> project nifi-web-ui: Failed to run task
>
>  at
> org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:212)
>
>  at
> org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:153)
>
>  at
> org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:145)
>
>  at
> org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:116)
>
>  at
> org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:80)
>
>  at
> org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build(SingleThreadedBuilder.java:51)
>
>  at
> org.apache.maven.lifecycle.internal.LifecycleStarter.execute(LifecycleStarter.java:128)
>
>  at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:307)
>
>  at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:193)
>
>  at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:106)
>
>  at org.apache.maven.cli.MavenCli.execute(MavenCli.java:863)
>
>  at org.apache.maven.cli.MavenCli.doMain(MavenCli.java:288)
>
>  at org.apache.maven.cli.MavenCli.main(MavenCli.java:199)
>
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>
>  at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>
>  at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>
>  at java.lang.reflect.Method.invoke(Method.java:497)
>
>  at
> org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced(Launcher.java:289)
>
>  at
> org.codehaus.plexus.classworlds.launcher.Launcher.launch(Launcher.java:229)
>
>  at
> org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode(Launcher.java:415)
>
>  at
> org.codehaus.plexus.classworlds.launcher.Launcher.main(Launcher.java:356)
>
> Caused by: org.apache.maven.plugin.MojoFailureException: Failed to run task
>
>  at
> com.github.eirslett.maven.plugins.frontend.mojo.AbstractFrontendMojo.execute(AbstractFrontendMojo.java:95)
>
>  at
> org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:134)
>
>  at
> org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:207)
>
>  ... 20 more
>
> Caused by:
> com.github.eirslett.maven.plugins.frontend.lib.TaskRunnerException: 'npm
> --cache-min Infinity install' failed. (error code 1)
>
>  at
> com.github.eirslett.maven.plugins.frontend.lib.NodeTaskExecutor.execute(NodeTaskExecutor.java:60)
>
>  at
> com.github.eirslett.maven.plugins.frontend.mojo.NpmMojo.execute(NpmMojo.java:62)
>
>  at
> com.github.eirslett.maven.plugins.frontend.mojo.AbstractFrontendMojo.execute(AbstractFrontendMojo.java:89)
>
>  ... 22 more
>
> [ERROR]
>
> [ERROR]
>
> [ERROR] For more information about the errors and possible solutions,
> please read the following articles:
>
> [ERROR] [Help 1]
> http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
>
> [ERROR]
>
> [ERROR] After correcting the problems, you can resume the build with the
> command
>
> [ERROR]   mvn  -rf :nifi-web-ui
>
>


NiFi 1.6 build failure nifi-web-ui

2018-03-29 Thread luis_size
Hi,
I am trying to build NiFi from source but the build fails with the below error. 
Can anyone help me with this?
Sorry if it's a newbie question. I am new to the NiFi development world.
Thanks

[INFO] 

[INFO] BUILD FAILURE

[INFO] 

[INFO] Total time: 04:48 min

[INFO] Finished at: 2018-03-29T09:31:54+02:00

[INFO] Final Memory: 288M/1219M

[INFO] 

[ERROR] Failed to execute goal 
com.github.eirslett:frontend-maven-plugin:1.1:npm (npm install) on project 
nifi-web-ui: Failed to run task: 'npm --cache-min Infinity install' failed. 
(error code 1) -> [Help 1]

org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute goal 
com.github.eirslett:frontend-maven-plugin:1.1:npm (npm install) on project 
nifi-web-ui: Failed to run task

 at 
org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:212)

 at 
org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:153)

 at 
org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:145)

 at 
org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:116)

 at 
org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:80)

 at 
org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build(SingleThreadedBuilder.java:51)

 at 
org.apache.maven.lifecycle.internal.LifecycleStarter.execute(LifecycleStarter.java:128)

 at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:307)

 at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:193)

 at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:106)

 at org.apache.maven.cli.MavenCli.execute(MavenCli.java:863)

 at org.apache.maven.cli.MavenCli.doMain(MavenCli.java:288)

 at org.apache.maven.cli.MavenCli.main(MavenCli.java:199)

 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

 at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)

 at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

 at java.lang.reflect.Method.invoke(Method.java:497)

 at 
org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced(Launcher.java:289)

 at org.codehaus.plexus.classworlds.launcher.Launcher.launch(Launcher.java:229)

 at 
org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode(Launcher.java:415)

 at org.codehaus.plexus.classworlds.launcher.Launcher.main(Launcher.java:356)

Caused by: org.apache.maven.plugin.MojoFailureException: Failed to run task

 at 
com.github.eirslett.maven.plugins.frontend.mojo.AbstractFrontendMojo.execute(AbstractFrontendMojo.java:95)

 at 
org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:134)

 at 
org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:207)

 ... 20 more

Caused by: com.github.eirslett.maven.plugins.frontend.lib.TaskRunnerException: 
'npm --cache-min Infinity install' failed. (error code 1)

 at 
com.github.eirslett.maven.plugins.frontend.lib.NodeTaskExecutor.execute(NodeTaskExecutor.java:60)

 at 
com.github.eirslett.maven.plugins.frontend.mojo.NpmMojo.execute(NpmMojo.java:62)

 at 
com.github.eirslett.maven.plugins.frontend.mojo.AbstractFrontendMojo.execute(AbstractFrontendMojo.java:89)

 ... 22 more

[ERROR] 

[ERROR] 

[ERROR] For more information about the errors and possible solutions, please 
read the following articles:

[ERROR] [Help 1] 
http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException

[ERROR] 

[ERROR] After correcting the problems, you can resume the build with the command

[ERROR]   mvn  -rf :nifi-web-ui