[GitHub] nifi pull request #1407: NIFI-2881: Added EL support to DB Fetch processors,...

2017-01-12 Thread ijokarumawak
Github user ijokarumawak commented on a diff in the pull request:

https://github.com/apache/nifi/pull/1407#discussion_r95919672
  
--- Diff: 
nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/QueryDatabaseTable.java
 ---
@@ -127,6 +129,7 @@
 .defaultValue("0")
 .required(true)
 
.addValidator(StandardValidators.NON_NEGATIVE_INTEGER_VALIDATOR)
+.expressionLanguageSupported(true)
 .build();
 
 public QueryDatabaseTable() {
--- End diff --

Thanks! JIRA is still under maintenance process..


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] nifi pull request #1407: NIFI-2881: Added EL support to DB Fetch processors,...

2017-01-12 Thread ijokarumawak
Github user ijokarumawak commented on a diff in the pull request:

https://github.com/apache/nifi/pull/1407#discussion_r95919431
  
--- Diff: 
nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/GenerateTableFetch.java
 ---
@@ -115,20 +128,36 @@ public GenerateTableFetch() {
 
 @OnScheduled
 public void setup(final ProcessContext context) {
+// The processor is invalid if there is an incoming connection and 
max-value columns are defined
+if (context.getProperty(MAX_VALUE_COLUMN_NAMES).isSet() && 
context.hasIncomingConnection()) {
+throw new ProcessException("If an incoming connection is 
supplied, no max-value column names may be specified");
--- End diff --

That is a good point, for supporting older format and migration. However, 
the same problem exists even for now. If the processor was configured to fetch 
from `users` using `last_updated` as max column and ran. The processor has 
`last_updated` state. Then the user may change table name to 
`purchase_histories`. Since the processor doesn't implement 
`onPropertyModified` method to handle these change, I guess the processor will 
use the state that was actually for different table.
Maybe we can implement something intelligent by capturing the old 
configuration at onPropertyModified. It maybe a bit difficult though, since we 
can't access state manager at onPropertyModified method.

For the size of state map, there is a check at 
[ZooKeeperStateProvider](https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-framework-bundle/nifi-framework/nifi-framework-core/src/main/java/org/apache/nifi/controller/state/providers/zookeeper/ZooKeeperStateProvider.java#L310),
 so once a state map becomes grater than 1MB in serialized size, the processor 
would get a specific StateTooLargeException. Although it's too late to rollback 
the process session because it has already been committed (I think this 
ordering is correct as it is now, prefer duplicate over loss), we can throw 
StateTooLargeException so that NiFi framework can yield the processor. Then 
user can see what went wrong by looking at the bulletin or error log message. 
Those indicator will keep telling the user until they fix it by for example, 
split tables to fetch into smaller groups and distribute it to multiple 
GenerateTableFetch processors to reduce state size.
Since this is an edge case and won't affect other part of flow, and hard to 
predict the optimal maximum entries for the state, I think throwing 
StateTooLargeException to framework and yield the processor would be a 
sufficient handling.
Having `state map full` relationship would be overkill for most case it's 
unnecessary but it forces user to auto terminate or route to somewhere.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] nifi pull request #1412: NIFI-2861 ControlRate should accept more than one f...

2017-01-12 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/nifi/pull/1412


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] nifi issue #1412: NIFI-2861 ControlRate should accept more than one flow fil...

2017-01-12 Thread mosermw
Github user mosermw commented on the issue:

https://github.com/apache/nifi/pull/1412
  
+1 will merge, thanks jskora


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] nifi issue #1412: NIFI-2861 ControlRate should accept more than one flow fil...

2017-01-12 Thread mosermw
Github user mosermw commented on the issue:

https://github.com/apache/nifi/pull/1412
  
reviewing


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] nifi issue #1128: NIFI-2861 ControlRate should accept more than one flow fil...

2017-01-12 Thread mosermw
Github user mosermw commented on the issue:

https://github.com/apache/nifi/pull/1128
  
benchmarked ControlRate before patch and after the patch, ControlRate could 
handle a lot more small files.  +1 will merge.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] nifi issue #1412: NIFI-2861 ControlRate should accept more than one flow fil...

2017-01-12 Thread jskora
Github user jskora commented on the issue:

https://github.com/apache/nifi/pull/1412
  
@mosermw this is the replacement for 
https://github.com/apache/nifi/pull/1127, rebased and squashed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] nifi pull request #1412: NIFI-2861 ControlRate should accept more than one f...

2017-01-12 Thread jskora
GitHub user jskora opened a pull request:

https://github.com/apache/nifi/pull/1412

NIFI-2861 ControlRate should accept more than one flow file per execution

* Support multiple files per onTrigger call.

Thank you for submitting a contribution to Apache NiFi.

In order to streamline the review of the contribution we ask you
to ensure the following steps have been taken:

### For all changes:
- [X] Is there a JIRA ticket associated with this PR? Is it referenced 
 in the commit message?

- [X] Does your PR title start with NIFI- where  is the JIRA number 
you are trying to resolve? Pay particular attention to the hyphen "-" character.

- [X] Has your PR been rebased against the latest commit within the target 
branch (typically master)?

- [X] Is your initial contribution a single, squashed commit?

### For code changes:
- [X] Have you ensured that the full suite of tests is executed via mvn 
-Pcontrib-check clean install at the root nifi folder?
- [X] Have you written or updated unit tests to verify your changes?
- [ ] If adding new dependencies to the code, are these dependencies 
licensed in a way that is compatible for inclusion under [ASF 
2.0](http://www.apache.org/legal/resolved.html#category-a)? 
- [ ] If applicable, have you updated the LICENSE file, including the main 
LICENSE file under nifi-assembly?
- [ ] If applicable, have you updated the NOTICE file, including the main 
NOTICE file found under nifi-assembly?
- [ ] If adding new Properties, have you added .displayName in addition to 
.name (programmatic access) for each of the new properties?

### For documentation related changes:
- [ ] Have you ensured that format looks appropriate for the output in 
which it is rendered?

### Note:
Please ensure that once the PR is submitted, you check travis-ci for build 
issues and submit an update to your PR as soon as possible.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jskora/nifi NIFI-2861-1.x-v2

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/nifi/pull/1412.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1412


commit e2f167379a9ffc84537ad60cc130848989f49110
Author: Joe Skora 
Date:   2017-01-12T16:28:34Z

NIFI-2861 ControlRate should accept more than one flow file per execution
* Support multiple files per onTrigger call.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] nifi pull request #1411: NIFI-3309 ensures that CS are deleted when a proces...

2017-01-12 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/nifi/pull/1411


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] nifi issue #1411: NIFI-3309 ensures that CS are deleted when a process group...

2017-01-12 Thread bbende
Github user bbende commented on the issue:

https://github.com/apache/nifi/pull/1411
  
I was able to resolve the checkstyle issues and folded it into your commit, 
so I'm a +1 and will merge it master, thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] nifi issue #1411: NIFI-3309 ensures that CS are deleted when a process group...

2017-01-12 Thread bbende
Github user bbende commented on the issue:

https://github.com/apache/nifi/pull/1411
  
Looks like a checkstyle failure when I ran the contrib-check:

[WARNING] 
src/main/java/org/apache/nifi/groups/StandardProcessGroup.java[621] (regexp) 
RegexpSinglelineJava: Line has trailing whitespace.

[WARNING] 
src/main/java/org/apache/nifi/groups/StandardProcessGroup.java[2330] (regexp) 
RegexpSinglelineJava: Line has trailing whitespace.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] nifi-minifi-cpp pull request #36: MINIFI-180 Duplicate Variable causes MINIF...

2017-01-12 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/nifi-minifi-cpp/pull/36


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] nifi-minifi-cpp issue #36: MINIFI-180 Duplicate Variable causes MINIFI_HOME ...

2017-01-12 Thread apiri
Github user apiri commented on the issue:

https://github.com/apache/nifi-minifi-cpp/pull/36
  
reviewing


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] nifi issue #1128: NIFI-2861 ControlRate should accept more than one flow fil...

2017-01-12 Thread mosermw
Github user mosermw commented on the issue:

https://github.com/apache/nifi/pull/1128
  
reviewing


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (NIFI-2861) ControlRate should accept more than one flow file per execution

2017-01-12 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-2861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15821221#comment-15821221
 ] 

ASF GitHub Bot commented on NIFI-2861:
--

Github user mosermw commented on the issue:

https://github.com/apache/nifi/pull/1128
  
reviewing


> ControlRate should accept more than one flow file per execution
> ---
>
> Key: NIFI-2861
> URL: https://issues.apache.org/jira/browse/NIFI-2861
> Project: Apache NiFi
>  Issue Type: Bug
>  Components: Core Framework
>Affects Versions: 1.0.0, 0.7.0
>Reporter: Joe Skora
>Assignee: Joe Skora
>
> The {{ControlRate}} processor implements a {{FlowFileFilter}} that returns 
> the {{FlowFileFilter.ACCEPT_AND_TERMINATE}} result if the {{FlowFile}} fits 
> with the rate limit, affectively limiting it to one {{FlowFile}} per 
> {{ConrolRate.onTrigger()}} invocation.  This is a significant bottleneck when 
> processing very large quantities of small files making it unlikely to hit the 
> rate limits.
> It should allow multiple files, perhaps with a configurable maximum, per 
> {{ControlRate.onTrigger()}} invocation by issuing the 
> {{FlowFileFilter.ACCEPT_AND_CONTINUE}} result until the limits are reached.  
> In a preliminary test this eliminated the bottleneck.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] nifi issue #1127: NIFI-2861 ControlRate should accept more than one flow fil...

2017-01-12 Thread jskora
Github user jskora commented on the issue:

https://github.com/apache/nifi/pull/1127
  
@mwoser, I will close and resubmit a clean request with squashed commits.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (NIFI-2861) ControlRate should accept more than one flow file per execution

2017-01-12 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-2861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15821216#comment-15821216
 ] 

ASF GitHub Bot commented on NIFI-2861:
--

Github user jskora commented on the issue:

https://github.com/apache/nifi/pull/1127
  
@mwoser, I will close and resubmit a clean request with squashed commits.


> ControlRate should accept more than one flow file per execution
> ---
>
> Key: NIFI-2861
> URL: https://issues.apache.org/jira/browse/NIFI-2861
> Project: Apache NiFi
>  Issue Type: Bug
>  Components: Core Framework
>Affects Versions: 1.0.0, 0.7.0
>Reporter: Joe Skora
>Assignee: Joe Skora
>
> The {{ControlRate}} processor implements a {{FlowFileFilter}} that returns 
> the {{FlowFileFilter.ACCEPT_AND_TERMINATE}} result if the {{FlowFile}} fits 
> with the rate limit, affectively limiting it to one {{FlowFile}} per 
> {{ConrolRate.onTrigger()}} invocation.  This is a significant bottleneck when 
> processing very large quantities of small files making it unlikely to hit the 
> rate limits.
> It should allow multiple files, perhaps with a configurable maximum, per 
> {{ControlRate.onTrigger()}} invocation by issuing the 
> {{FlowFileFilter.ACCEPT_AND_CONTINUE}} result until the limits are reached.  
> In a preliminary test this eliminated the bottleneck.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] nifi pull request #1127: NIFI-2861 ControlRate should accept more than one f...

2017-01-12 Thread jskora
Github user jskora closed the pull request at:

https://github.com/apache/nifi/pull/1127


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (NIFI-2861) ControlRate should accept more than one flow file per execution

2017-01-12 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-2861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15821217#comment-15821217
 ] 

ASF GitHub Bot commented on NIFI-2861:
--

Github user jskora closed the pull request at:

https://github.com/apache/nifi/pull/1127


> ControlRate should accept more than one flow file per execution
> ---
>
> Key: NIFI-2861
> URL: https://issues.apache.org/jira/browse/NIFI-2861
> Project: Apache NiFi
>  Issue Type: Bug
>  Components: Core Framework
>Affects Versions: 1.0.0, 0.7.0
>Reporter: Joe Skora
>Assignee: Joe Skora
>
> The {{ControlRate}} processor implements a {{FlowFileFilter}} that returns 
> the {{FlowFileFilter.ACCEPT_AND_TERMINATE}} result if the {{FlowFile}} fits 
> with the rate limit, affectively limiting it to one {{FlowFile}} per 
> {{ConrolRate.onTrigger()}} invocation.  This is a significant bottleneck when 
> processing very large quantities of small files making it unlikely to hit the 
> rate limits.
> It should allow multiple files, perhaps with a configurable maximum, per 
> {{ControlRate.onTrigger()}} invocation by issuing the 
> {{FlowFileFilter.ACCEPT_AND_CONTINUE}} result until the limits are reached.  
> In a preliminary test this eliminated the bottleneck.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NIFI-2881) Allow Database Fetch processor(s) to accept incoming flow files and use Expression Language

2017-01-12 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-2881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15821114#comment-15821114
 ] 

ASF GitHub Bot commented on NIFI-2881:
--

Github user mattyb149 commented on a diff in the pull request:

https://github.com/apache/nifi/pull/1407#discussion_r95801717
  
--- Diff: 
nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/GenerateTableFetch.java
 ---
@@ -115,20 +128,36 @@ public GenerateTableFetch() {
 
 @OnScheduled
 public void setup(final ProcessContext context) {
+// The processor is invalid if there is an incoming connection and 
max-value columns are defined
+if (context.getProperty(MAX_VALUE_COLUMN_NAMES).isSet() && 
context.hasIncomingConnection()) {
+throw new ProcessException("If an incoming connection is 
supplied, no max-value column names may be specified");
--- End diff --

I thought about supporting the older format, but that could lead to 
problems depending on which table name you pass in. Using your "users" and 
"purchase_histories" tables above, let's say I was running with the old version 
and a hard-coded "purchase_histories" table, which stores "last_updated" in the 
state map.  Then with the new version the first table name I pass in via an 
attribute is "users".  I will not find "users.last_updated" so I would check 
for just "last_updated", whose value is not associated with the users table but 
rather the purchase_histories table.  This is an edge case but I would hate to 
see the very first run of the processor fail when it used to work.

I do like the example you have of a mapping of tables to max-value columns, 
I think other products (GoldenGate or Sqoop or something?) allows for this 
flexibility (you just have to provide your own map). If we end up supporting 
Max-Value columns with incoming connections then I will make sure this 
capability is present.

I'm most worried about the arbitrary number of entries in the state map.  
Once the total size gets above 1MB, I think ZooKeeper starts acting strangely 
and I certainly wouldn't want this processor to affect the entire NiFi system.  
This too is an edge case, since I imagine most entries are small (~64 bytes 
max?) so it would only happen if the number of tables/columns was very large, 
or for some reason the max-values were large.  Historically I've seen 
limitations placed on other NiFi resources (threads, e.g.) to ensure a discrete 
maximum to avoid the issues due to arbitrarily large things.

I would very much like to allow for the Max-Value columns, do you have any 
suggestions on how the processor should behave in the face of an arbitrarily 
large state map? Perhaps we could set an artificial limit on number of entries 
(implying a limit on the number of table/columns) and route future flow files 
(whose table is not yet present in the state map) to a "state map full" 
relationship or something.


> Allow Database Fetch processor(s) to accept incoming flow files and use 
> Expression Language
> ---
>
> Key: NIFI-2881
> URL: https://issues.apache.org/jira/browse/NIFI-2881
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Extensions
>Reporter: Matt Burgess
>Assignee: Matt Burgess
>
> The QueryDatabaseTable and GenerateTableFetch processors do not allow 
> Expression Language to be used in the properties, mainly because they also do 
> not allow incoming connections. This means if the user desires to fetch from 
> multiple tables, they currently need one instance of the processor for each 
> table, and those table names must be hard-coded.
> To support the same capabilities for multiple tables and more flexible 
> configuration via Expression Language, these processors should have 
> properties that accept Expression Language, and GenerateTableFetch should 
> accept (optional) incoming connections.
> Conversation about the behavior of the processors is welcomed and encouraged. 
> For example, if an incoming flow file is available, do we also still run the 
> incremental fetch logic for tables that aren't specified by this flow file, 
> or do we just do incremental fetching when the processor is scheduled but 
> there is no incoming flow file. The latter implies a denial-of-service could 
> take place, by flooding the processor with flow files and not letting it do 
> its original job of querying the table, keeping track of maximum values, etc.
> This is likely a breaking change to the processors because of how state 
> management is implemented. Currently since the table name is hard coded, only 
> the column name comprises the key in the state. This wo

[GitHub] nifi pull request #1407: NIFI-2881: Added EL support to DB Fetch processors,...

2017-01-12 Thread mattyb149
Github user mattyb149 commented on a diff in the pull request:

https://github.com/apache/nifi/pull/1407#discussion_r95801717
  
--- Diff: 
nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/GenerateTableFetch.java
 ---
@@ -115,20 +128,36 @@ public GenerateTableFetch() {
 
 @OnScheduled
 public void setup(final ProcessContext context) {
+// The processor is invalid if there is an incoming connection and 
max-value columns are defined
+if (context.getProperty(MAX_VALUE_COLUMN_NAMES).isSet() && 
context.hasIncomingConnection()) {
+throw new ProcessException("If an incoming connection is 
supplied, no max-value column names may be specified");
--- End diff --

I thought about supporting the older format, but that could lead to 
problems depending on which table name you pass in. Using your "users" and 
"purchase_histories" tables above, let's say I was running with the old version 
and a hard-coded "purchase_histories" table, which stores "last_updated" in the 
state map.  Then with the new version the first table name I pass in via an 
attribute is "users".  I will not find "users.last_updated" so I would check 
for just "last_updated", whose value is not associated with the users table but 
rather the purchase_histories table.  This is an edge case but I would hate to 
see the very first run of the processor fail when it used to work.

I do like the example you have of a mapping of tables to max-value columns, 
I think other products (GoldenGate or Sqoop or something?) allows for this 
flexibility (you just have to provide your own map). If we end up supporting 
Max-Value columns with incoming connections then I will make sure this 
capability is present.

I'm most worried about the arbitrary number of entries in the state map.  
Once the total size gets above 1MB, I think ZooKeeper starts acting strangely 
and I certainly wouldn't want this processor to affect the entire NiFi system.  
This too is an edge case, since I imagine most entries are small (~64 bytes 
max?) so it would only happen if the number of tables/columns was very large, 
or for some reason the max-values were large.  Historically I've seen 
limitations placed on other NiFi resources (threads, e.g.) to ensure a discrete 
maximum to avoid the issues due to arbitrarily large things.

I would very much like to allow for the Max-Value columns, do you have any 
suggestions on how the processor should behave in the face of an arbitrarily 
large state map? Perhaps we could set an artificial limit on number of entries 
(implying a limit on the number of table/columns) and route future flow files 
(whose table is not yet present in the state map) to a "state map full" 
relationship or something.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (NIFI-2881) Allow Database Fetch processor(s) to accept incoming flow files and use Expression Language

2017-01-12 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-2881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15821087#comment-15821087
 ] 

ASF GitHub Bot commented on NIFI-2881:
--

Github user mattyb149 commented on a diff in the pull request:

https://github.com/apache/nifi/pull/1407#discussion_r95799167
  
--- Diff: 
nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/QueryDatabaseTable.java
 ---
@@ -127,6 +129,7 @@
 .defaultValue("0")
 .required(true)
 
.addValidator(StandardValidators.NON_NEGATIVE_INTEGER_VALIDATOR)
+.expressionLanguageSupported(true)
 .build();
 
 public QueryDatabaseTable() {
--- End diff --

I agree that Initial Max Value would be useful for GenerateTableFetch as 
well. https://issues.apache.org/jira/browse/NIFI-2583 was written and 
implemented by a community member, his focus was on QueryDatabaseTable at the 
time. Jira is down ATM but I will write an improvement Jira for 
GenerateTableFetch and post the link here.


> Allow Database Fetch processor(s) to accept incoming flow files and use 
> Expression Language
> ---
>
> Key: NIFI-2881
> URL: https://issues.apache.org/jira/browse/NIFI-2881
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Extensions
>Reporter: Matt Burgess
>Assignee: Matt Burgess
>
> The QueryDatabaseTable and GenerateTableFetch processors do not allow 
> Expression Language to be used in the properties, mainly because they also do 
> not allow incoming connections. This means if the user desires to fetch from 
> multiple tables, they currently need one instance of the processor for each 
> table, and those table names must be hard-coded.
> To support the same capabilities for multiple tables and more flexible 
> configuration via Expression Language, these processors should have 
> properties that accept Expression Language, and GenerateTableFetch should 
> accept (optional) incoming connections.
> Conversation about the behavior of the processors is welcomed and encouraged. 
> For example, if an incoming flow file is available, do we also still run the 
> incremental fetch logic for tables that aren't specified by this flow file, 
> or do we just do incremental fetching when the processor is scheduled but 
> there is no incoming flow file. The latter implies a denial-of-service could 
> take place, by flooding the processor with flow files and not letting it do 
> its original job of querying the table, keeping track of maximum values, etc.
> This is likely a breaking change to the processors because of how state 
> management is implemented. Currently since the table name is hard coded, only 
> the column name comprises the key in the state. This would have to be 
> extended to have a compound key that represents table name, max-value column 
> name, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] nifi pull request #1407: NIFI-2881: Added EL support to DB Fetch processors,...

2017-01-12 Thread mattyb149
Github user mattyb149 commented on a diff in the pull request:

https://github.com/apache/nifi/pull/1407#discussion_r95799167
  
--- Diff: 
nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/QueryDatabaseTable.java
 ---
@@ -127,6 +129,7 @@
 .defaultValue("0")
 .required(true)
 
.addValidator(StandardValidators.NON_NEGATIVE_INTEGER_VALIDATOR)
+.expressionLanguageSupported(true)
 .build();
 
 public QueryDatabaseTable() {
--- End diff --

I agree that Initial Max Value would be useful for GenerateTableFetch as 
well. https://issues.apache.org/jira/browse/NIFI-2583 was written and 
implemented by a community member, his focus was on QueryDatabaseTable at the 
time. Jira is down ATM but I will write an improvement Jira for 
GenerateTableFetch and post the link here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] nifi-minifi-cpp issue #40: MINIFI-182 Implemented event-driven scheduler

2017-01-12 Thread achristianson
Github user achristianson commented on the issue:

https://github.com/apache/nifi-minifi-cpp/pull/40
  
@benqiu2016 Good point. Fixed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (NIFI-3313) First deployment of NiFi can hang on VMs without sufficient entropy if using /dev/random

2017-01-12 Thread Anders Breindahl (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-3313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15820702#comment-15820702
 ] 

Anders Breindahl commented on NIFI-3313:


The thing that bugs me about this is, that _if_ indeed the problem is that 
/dev/random blocks because of lack of entropy, then sourcing data from 
/dev/urandom is _exactly_ the case where /dev/urandom is behaving predictably. 
I.e. there could be something to it.

What do we require entropy for? If it's, e.g. to generate the sensitive keys 
protection key (which is a long-lived secret) during encrypt-config, then 
having it be predictably generated could be up there with "Debian openssl" bad.

Have we tried giving the reporter a binary to re-produce with, where 
instrumentation is in place so that we can find out which callers of GetRandom 
are getting blocked?

Just my two cents, though. Please shoot me down. :)

> First deployment of NiFi can hang on VMs without sufficient entropy if using 
> /dev/random
> 
>
> Key: NIFI-3313
> URL: https://issues.apache.org/jira/browse/NIFI-3313
> Project: Apache NiFi
>  Issue Type: Bug
>  Components: Core Framework
>Affects Versions: 1.1.1
>Reporter: Andy LoPresto
>Assignee: Andy LoPresto
>Priority: Critical
>  Labels: entropy, security, virtual-machine
>
> h1. Analysis of Issue
> h2. Statement of Problem:
> NiFi deployed on headless VM (little user interaction by way of keyboard and 
> mouse I/O) can take 5-10 minutes (reported) to start up. User reports this 
> occurs on a "secure" cluster. Further examination is required to determine 
> which specific process requires the large amount of random input (no steps to 
> reproduce, configuration files, logs, or VM environment information 
> provided). 
> h2. Context
> The likely cause of this issue is that a process is attempting to read from 
> _/dev/random_, a \*nix "device" providing a pseudo-random number generator 
> (PRNG). Also available is _/dev/urandom_, a related PRNG. Despite common 
> misperceptions, _/dev/urandom_ is not "less-secure" than _/dev/random_ for 
> all general use cases. _/dev/random_ blocks if the entropy *estimate* (a 
> "guess" of the existing entropy introduced into the pool) is lower than the 
> amount of random data requested by the caller. In contrast, _/dev/urandom_ 
> does not block, but provides the output of the same cryptographically-secure 
> PRNG (CSPRNG) that _/dev/random_ reads from \[myths\]. After as little as 256 
> bytes of initial seeding, accessing _/dev/random_ and _/dev/urandom_ are 
> functionally equivalent, as the long period of random data generated will not 
> require re-seeding before sufficient entropy can be provided again. 
> As mentioned earlier, further examination is required to determine if the 
> process requiring random input occurs at application boot or only at 
> "machine" (hardware or VM) boot. On the first deployment of the system with 
> certificates, the certificate generation process will require substantial 
> random input. However, on application launch and connection to a cluster, 
> even the TLS/SSL protocol requires some amount of random input. 
> h2. Proposed Solutions
> h3. rngd
> A software toolset for accessing dedicated hardware PRNG (*true* RNG, or 
> TRNG) called _rng-tools_ \[rngtools\] exists for Linux. Specialized hardware, 
> as well as Intel chips from IvyBridge and on (2012), can provide 
> hardware-generated random input to the kernel. Using the daemon _rngd_ to 
> seed the _/dev/random_ and _/dev/urandom_ entropy pool is the simplest 
> solution. 
> *Note: Do not use _/dev/urandom_ to seed _/dev/random_ using _rngd_. This is 
> like running a garden hose from a car's exhaust back into its gas tank and 
> trying to drive.*
> h3. Instruct Java to use /dev/urandom
> The Java Runtime Environment (JRE) can be instructed to use _/dev/urandom_ 
> for all invocations of {{SecureRandom}}, either on a per-Java process basis 
> \[jdk-urandom\] or in the JVM configuration \[oracle-urandom\], which means 
> it will not block on server startup. The NiFi {{bootstrap.conf}} file can be 
> modified to contain an additional Java argument directing the JVM to use 
> _/dev/urandom_. 
> h2. Other Solutions
> h3. Entropy Gathering Tools
> Tools to gather entropy from non-standard sources (audio card noise, video 
> capture from webcams, etc.) have been developed such as audio-entropyd 
> \[wagner\], but these tools are not verified or well-examined -- usually when 
> tested, they are only tested for the strength of their PRNG, not the ability 
> of the tool to capture entropy and generate sufficiently random data 
> unavailable to an attacker who may be able to determine the internal state. 
> h3. haveg