[jira] [Updated] (NIFI-1008) NiFi should swap out FlowFiles to disk even before the session is committed

2015-09-29 Thread Mark Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/NIFI-1008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Payne updated NIFI-1008:
-
Fix Version/s: 0.4.0

> NiFi should swap out FlowFiles to disk even before the session is committed
> ---
>
> Key: NIFI-1008
> URL: https://issues.apache.org/jira/browse/NIFI-1008
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Core Framework
>Reporter: Mark Payne
> Fix For: 0.4.0
>
>
> Currently, NiFi will swap out FlowFiles if there are a large number in a 
> FlowFile Queue. This is done to avoid running out of JVM heap space. However, 
> if we have a simple flow like GetFile -> SplitText and GetFile pulls in a 
> large file, SplitText can quickly cause OutOfMemoryError. This is not because 
> it buffers the content of the FlowFile in memory but rather because it holds 
> the millions of FlowFile objects in memory. We can do better.
> When we call session.transfer for the FlowFiles, once we hit a magical 
> threshold (say 10,000), we should swap those FlowFiles to disk and the 
> session should transfer them to the queue "swapped out" flowfiles, rather 
> than having to buffer all of these in memory and then swapping them out once 
> they land in the queue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (NIFI-1008) NiFi should swap out FlowFiles to disk even before the session is committed

2015-09-29 Thread Mark Payne (JIRA)
Mark Payne created NIFI-1008:


 Summary: NiFi should swap out FlowFiles to disk even before the 
session is committed
 Key: NIFI-1008
 URL: https://issues.apache.org/jira/browse/NIFI-1008
 Project: Apache NiFi
  Issue Type: Improvement
  Components: Core Framework
Reporter: Mark Payne


Currently, NiFi will swap out FlowFiles if there are a large number in a 
FlowFile Queue. This is done to avoid running out of JVM heap space. However, 
if we have a simple flow like GetFile -> SplitText and GetFile pulls in a large 
file, SplitText can quickly cause OutOfMemoryError. This is not because it 
buffers the content of the FlowFile in memory but rather because it holds the 
millions of FlowFile objects in memory. We can do better.

When we call session.transfer for the FlowFiles, once we hit a magical 
threshold (say 10,000), we should swap those FlowFiles to disk and the session 
should transfer them to the queue "swapped out" flowfiles, rather than having 
to buffer all of these in memory and then swapping them out once they land in 
the queue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NIFI-997) Kerberos tickets are not being renewed by Hadoop

2015-09-29 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14935580#comment-14935580
 ] 

ASF GitHub Bot commented on NIFI-997:
-

GitHub user rickysaltzer opened a pull request:

https://github.com/apache/nifi/pull/97

NIFI-997: Periodically Renew Kerberos Tickets

Adding a patch to renew ticket every 4 hours to avoid inactive Kerberos 
tickets. This was an issue found when running Kerberos enabled Hadoop 
processors for a long period of time. This technically _should_ have been 
handled by the Hadoop library, but due to unknown issues, the renewal thread 
inside of Hadoop doesn't seem to be doing that. 

This patch is fairly simplistic, and applies to all Hadoop processors as 
it's implemented at on the AbstractHadoopProcessor. The kerberos ticket age is 
checked against a threshold (4 hours is a safe bet) when getFileSystem() is 
called. If the age exceeds the threshold, we re-login using the 
UserGroupInformation class before passing back the filesystem. 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rickysaltzer/nifi kerberos-renewal

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/nifi/pull/97.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #97


commit b2eb61dca1204afb317bd40346065aa6a0e97647
Author: ricky 
Date:   2015-09-25T18:15:09Z

NIFI-997: Periodically Renew Kerberos Tickets

- Renew ticket every 4 hours to avoid inactive Kerberos tickets.




> Kerberos tickets are not being renewed by Hadoop
> 
>
> Key: NIFI-997
> URL: https://issues.apache.org/jira/browse/NIFI-997
> Project: Apache NiFi
>  Issue Type: Bug
>Reporter: Ricky Saltzer
>Assignee: Ricky Saltzer
>
> I've discovered after some time of having kerberos enabled processors, that 
> the kerberos ticket is not being renewed as it should. This is strange 
> because according to HADOOP-6656, this should be automatically taken care of 
> with a utility thread. I examined the NiFi jstack and saw that the renewal 
> thread was present, so I'm not sure what's going on.
> Does NiFi do something with the processor threads that cause child threads to 
> suspend? I have a patch that I'm currently testing (currently looking good), 
> that will renew the kerberos ticket on getFileSystem() if a threshold is 
> reached (e.g. 4 hours). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NIFI-997) Kerberos tickets are not being renewed by Hadoop

2015-09-29 Thread Joseph Witt (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14935593#comment-14935593
 ] 

Joseph Witt commented on NIFI-997:
--

Ricky thanks for finding this and following up on it!

> Kerberos tickets are not being renewed by Hadoop
> 
>
> Key: NIFI-997
> URL: https://issues.apache.org/jira/browse/NIFI-997
> Project: Apache NiFi
>  Issue Type: Bug
>Reporter: Ricky Saltzer
>Assignee: Ricky Saltzer
>
> I've discovered after some time of having kerberos enabled processors, that 
> the kerberos ticket is not being renewed as it should. This is strange 
> because according to HADOOP-6656, this should be automatically taken care of 
> with a utility thread. I examined the NiFi jstack and saw that the renewal 
> thread was present, so I'm not sure what's going on.
> Does NiFi do something with the processor threads that cause child threads to 
> suspend? I have a patch that I'm currently testing (currently looking good), 
> that will renew the kerberos ticket on getFileSystem() if a threshold is 
> reached (e.g. 4 hours). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NIFI-992) Couchbase Server Processors

2015-09-29 Thread Koji Kawamura (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14935403#comment-14935403
 ] 

Koji Kawamura commented on NIFI-992:


[~bende] Thanks for reviewing the code. I just pushed the new commit to the PR. 
 Please check that again!

> Couchbase Server Processors
> ---
>
> Key: NIFI-992
> URL: https://issues.apache.org/jira/browse/NIFI-992
> Project: Apache NiFi
>  Issue Type: New Feature
>  Components: Core Framework
>Reporter: Koji Kawamura
>  Labels: processor
>
> Processors providing data access interface with a Couchbase Server cluster.
> I've started writing a set of processors for interacting with Couchbase 
> Server. There are several ways to integrate with Couchbase such as:
> 1. Key/Value CRUD operations
> 2. View (Map/Reduce) queries
> 3. N1QL queries
> For the first step, I'm implementing the Key/Value CRUD operations. I will 
> send a pull request once the code and test get clean.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NIFI-817) Create Processors to interact with HBase

2015-09-29 Thread Mark Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14935349#comment-14935349
 ] 

Mark Payne commented on NIFI-817:
-

[~ndimiduk] [~nmaillard] - the basic premise is to provide an ETL-like 
functionality to pull data from HBase, as well as provide the ability to stream 
data into HBase. I very much like the idea of plugging into an HBase Firehose. 
Is this something that is configured on the HBase instance itself, to allow 
NiFi access to the stream? I will have to look more deeply into how those 
semantics work for sure. Would also like to have [~bende] looking into this 
from the NiFi perspective. I definitely agree as well that we need to look into 
the filter language.

Kite is not a NiFi thing. It is a set of libraries developed by Cloudera (quick 
intro at http://kitesdk.org/docs/1.0.0/Kite-SDK-Guide.html). Some of the 
Cloudera guys provided some Kite Processors for NiFi help push "Kite Datasets" 
(Avro based datasets) to HDFS. 


> Create Processors to interact with HBase
> 
>
> Key: NIFI-817
> URL: https://issues.apache.org/jira/browse/NIFI-817
> Project: Apache NiFi
>  Issue Type: New Feature
>  Components: Extensions
>Reporter: Mark Payne
>Assignee: Mark Payne
> Fix For: 0.4.0
>
> Attachments: 
> 0001-NIFI-817-Initial-implementation-of-HBase-processors.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (NIFI-1007) Stats Configuration

2015-09-29 Thread Debbie Marcin (JIRA)
Debbie Marcin created NIFI-1007:
---

 Summary: Stats Configuration
 Key: NIFI-1007
 URL: https://issues.apache.org/jira/browse/NIFI-1007
 Project: Apache NiFi
  Issue Type: Improvement
  Components: Documentation & Website
Reporter: Debbie Marcin


In your User Guide, under "Historical Statistics of a Component," it is 
mentioned that "The amount of historical information that is stored is 
configurable in the NiFi properties..." but it's not mentioned what that 
particular property is.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (NIFI-1006) Change format for configuring repositories in nifi.properties file

2015-09-29 Thread Mark Payne (JIRA)
Mark Payne created NIFI-1006:


 Summary: Change format for configuring repositories in 
nifi.properties file
 Key: NIFI-1006
 URL: https://issues.apache.org/jira/browse/NIFI-1006
 Project: Apache NiFi
  Issue Type: Improvement
  Components: Configuration, Core Framework
Reporter: Mark Payne
 Fix For: 1.0.0


Currently, the Content Repository, FlowFile Repository, and Provenance 
Repository are all configured within the nifi.properties file. This includes 
the repository implementation to use and all properties for those repositories. 
This becomes quite confusing to configure, and it makes it very difficult to 
provide examples of each of these repositories.

I would like to see this changed to a format like is used to configure the 
Authority Providers. This way, in the nifi.properties file, we would configure 
just two things: the .xml file that includes the repository configurations 
(with IDs) and the ID of the repository to use.

This also makes it far easier to create a repository that may wrap one or more 
other repositories, by defining them all in the .xml file and then specifying 
the ID of the 'wrapping' repository as the one to use.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (NIFI-1005) ControllerStatusReportingTask logger name is incorrect

2015-09-29 Thread Matt Gilman (JIRA)
Matt Gilman created NIFI-1005:
-

 Summary: ControllerStatusReportingTask logger name is incorrect
 Key: NIFI-1005
 URL: https://issues.apache.org/jira/browse/NIFI-1005
 Project: Apache NiFi
  Issue Type: Bug
  Components: Extensions
Reporter: Matt Gilman
Priority: Trivial
 Fix For: 0.4.0


The documentation for the ControllerStatusReportingTask indicates that the 
messages will be logged to

{noformat}
org.apache.nifi.controller.ControllerStatusReportingTask.Processors
org.apache.nifi.controller.ControllerStatusReportingTask.Connections
{noformat}

however they are actually written to 

{noformat}
ControllerStatusReportingTask.Processors
ControllerStatusReportingTask.Connections
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NIFI-810) Create Annotation that indicates that a Processor cannot be scheduled to run without an incoming connection

2015-09-29 Thread Rob Moran (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14935206#comment-14935206
 ] 

Rob Moran commented on NIFI-810:


Functionality looks good.

More user-friendly messaging could help a lot. I think what I suggested in my 
earlier comment for the validation error language is more clear that what is 
currently being generated. It uses language seen in UI, such as 'upstream 
connection' making it more relatable to a users workflow. Current language is 
clear for the most part, but it is difficult to read and therefore probably 
takes longer to comprehend.

It would also be great to include follow-on actions directly from tooltips - 
where applicable of course. For example, if there is a validation error stating 
some configuration has not been made, provide a link at the end of the message 
(e.g., 'Configure') that would open the configuration dialog, select the 
correct tab, and put browser focus on the relevant input needed to correct the 
issue.

> Create Annotation that indicates that a Processor cannot be scheduled to run 
> without an incoming connection
> ---
>
> Key: NIFI-810
> URL: https://issues.apache.org/jira/browse/NIFI-810
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Extensions
>Reporter: Mark Payne
>
> Currently, if a Processor has no incoming connections but is started, it will 
> continually without ever accomplishing anything. We should have an 
> annotation, perhaps @RequiresInput, that indicates that the Processor should 
> not be scheduled to run unless it has an incoming connection.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)