[jira] [Updated] (NIFI-1008) NiFi should swap out FlowFiles to disk even before the session is committed
[ https://issues.apache.org/jira/browse/NIFI-1008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Payne updated NIFI-1008: - Fix Version/s: 0.4.0 > NiFi should swap out FlowFiles to disk even before the session is committed > --- > > Key: NIFI-1008 > URL: https://issues.apache.org/jira/browse/NIFI-1008 > Project: Apache NiFi > Issue Type: Improvement > Components: Core Framework >Reporter: Mark Payne > Fix For: 0.4.0 > > > Currently, NiFi will swap out FlowFiles if there are a large number in a > FlowFile Queue. This is done to avoid running out of JVM heap space. However, > if we have a simple flow like GetFile -> SplitText and GetFile pulls in a > large file, SplitText can quickly cause OutOfMemoryError. This is not because > it buffers the content of the FlowFile in memory but rather because it holds > the millions of FlowFile objects in memory. We can do better. > When we call session.transfer for the FlowFiles, once we hit a magical > threshold (say 10,000), we should swap those FlowFiles to disk and the > session should transfer them to the queue "swapped out" flowfiles, rather > than having to buffer all of these in memory and then swapping them out once > they land in the queue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (NIFI-1008) NiFi should swap out FlowFiles to disk even before the session is committed
Mark Payne created NIFI-1008: Summary: NiFi should swap out FlowFiles to disk even before the session is committed Key: NIFI-1008 URL: https://issues.apache.org/jira/browse/NIFI-1008 Project: Apache NiFi Issue Type: Improvement Components: Core Framework Reporter: Mark Payne Currently, NiFi will swap out FlowFiles if there are a large number in a FlowFile Queue. This is done to avoid running out of JVM heap space. However, if we have a simple flow like GetFile -> SplitText and GetFile pulls in a large file, SplitText can quickly cause OutOfMemoryError. This is not because it buffers the content of the FlowFile in memory but rather because it holds the millions of FlowFile objects in memory. We can do better. When we call session.transfer for the FlowFiles, once we hit a magical threshold (say 10,000), we should swap those FlowFiles to disk and the session should transfer them to the queue "swapped out" flowfiles, rather than having to buffer all of these in memory and then swapping them out once they land in the queue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (NIFI-997) Kerberos tickets are not being renewed by Hadoop
[ https://issues.apache.org/jira/browse/NIFI-997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14935580#comment-14935580 ] ASF GitHub Bot commented on NIFI-997: - GitHub user rickysaltzer opened a pull request: https://github.com/apache/nifi/pull/97 NIFI-997: Periodically Renew Kerberos Tickets Adding a patch to renew ticket every 4 hours to avoid inactive Kerberos tickets. This was an issue found when running Kerberos enabled Hadoop processors for a long period of time. This technically _should_ have been handled by the Hadoop library, but due to unknown issues, the renewal thread inside of Hadoop doesn't seem to be doing that. This patch is fairly simplistic, and applies to all Hadoop processors as it's implemented at on the AbstractHadoopProcessor. The kerberos ticket age is checked against a threshold (4 hours is a safe bet) when getFileSystem() is called. If the age exceeds the threshold, we re-login using the UserGroupInformation class before passing back the filesystem. You can merge this pull request into a Git repository by running: $ git pull https://github.com/rickysaltzer/nifi kerberos-renewal Alternatively you can review and apply these changes as the patch at: https://github.com/apache/nifi/pull/97.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #97 commit b2eb61dca1204afb317bd40346065aa6a0e97647 Author: ricky Date: 2015-09-25T18:15:09Z NIFI-997: Periodically Renew Kerberos Tickets - Renew ticket every 4 hours to avoid inactive Kerberos tickets. > Kerberos tickets are not being renewed by Hadoop > > > Key: NIFI-997 > URL: https://issues.apache.org/jira/browse/NIFI-997 > Project: Apache NiFi > Issue Type: Bug >Reporter: Ricky Saltzer >Assignee: Ricky Saltzer > > I've discovered after some time of having kerberos enabled processors, that > the kerberos ticket is not being renewed as it should. This is strange > because according to HADOOP-6656, this should be automatically taken care of > with a utility thread. I examined the NiFi jstack and saw that the renewal > thread was present, so I'm not sure what's going on. > Does NiFi do something with the processor threads that cause child threads to > suspend? I have a patch that I'm currently testing (currently looking good), > that will renew the kerberos ticket on getFileSystem() if a threshold is > reached (e.g. 4 hours). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (NIFI-997) Kerberos tickets are not being renewed by Hadoop
[ https://issues.apache.org/jira/browse/NIFI-997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14935593#comment-14935593 ] Joseph Witt commented on NIFI-997: -- Ricky thanks for finding this and following up on it! > Kerberos tickets are not being renewed by Hadoop > > > Key: NIFI-997 > URL: https://issues.apache.org/jira/browse/NIFI-997 > Project: Apache NiFi > Issue Type: Bug >Reporter: Ricky Saltzer >Assignee: Ricky Saltzer > > I've discovered after some time of having kerberos enabled processors, that > the kerberos ticket is not being renewed as it should. This is strange > because according to HADOOP-6656, this should be automatically taken care of > with a utility thread. I examined the NiFi jstack and saw that the renewal > thread was present, so I'm not sure what's going on. > Does NiFi do something with the processor threads that cause child threads to > suspend? I have a patch that I'm currently testing (currently looking good), > that will renew the kerberos ticket on getFileSystem() if a threshold is > reached (e.g. 4 hours). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (NIFI-992) Couchbase Server Processors
[ https://issues.apache.org/jira/browse/NIFI-992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14935403#comment-14935403 ] Koji Kawamura commented on NIFI-992: [~bende] Thanks for reviewing the code. I just pushed the new commit to the PR. Please check that again! > Couchbase Server Processors > --- > > Key: NIFI-992 > URL: https://issues.apache.org/jira/browse/NIFI-992 > Project: Apache NiFi > Issue Type: New Feature > Components: Core Framework >Reporter: Koji Kawamura > Labels: processor > > Processors providing data access interface with a Couchbase Server cluster. > I've started writing a set of processors for interacting with Couchbase > Server. There are several ways to integrate with Couchbase such as: > 1. Key/Value CRUD operations > 2. View (Map/Reduce) queries > 3. N1QL queries > For the first step, I'm implementing the Key/Value CRUD operations. I will > send a pull request once the code and test get clean. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (NIFI-817) Create Processors to interact with HBase
[ https://issues.apache.org/jira/browse/NIFI-817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14935349#comment-14935349 ] Mark Payne commented on NIFI-817: - [~ndimiduk] [~nmaillard] - the basic premise is to provide an ETL-like functionality to pull data from HBase, as well as provide the ability to stream data into HBase. I very much like the idea of plugging into an HBase Firehose. Is this something that is configured on the HBase instance itself, to allow NiFi access to the stream? I will have to look more deeply into how those semantics work for sure. Would also like to have [~bende] looking into this from the NiFi perspective. I definitely agree as well that we need to look into the filter language. Kite is not a NiFi thing. It is a set of libraries developed by Cloudera (quick intro at http://kitesdk.org/docs/1.0.0/Kite-SDK-Guide.html). Some of the Cloudera guys provided some Kite Processors for NiFi help push "Kite Datasets" (Avro based datasets) to HDFS. > Create Processors to interact with HBase > > > Key: NIFI-817 > URL: https://issues.apache.org/jira/browse/NIFI-817 > Project: Apache NiFi > Issue Type: New Feature > Components: Extensions >Reporter: Mark Payne >Assignee: Mark Payne > Fix For: 0.4.0 > > Attachments: > 0001-NIFI-817-Initial-implementation-of-HBase-processors.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (NIFI-1007) Stats Configuration
Debbie Marcin created NIFI-1007: --- Summary: Stats Configuration Key: NIFI-1007 URL: https://issues.apache.org/jira/browse/NIFI-1007 Project: Apache NiFi Issue Type: Improvement Components: Documentation & Website Reporter: Debbie Marcin In your User Guide, under "Historical Statistics of a Component," it is mentioned that "The amount of historical information that is stored is configurable in the NiFi properties..." but it's not mentioned what that particular property is. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (NIFI-1006) Change format for configuring repositories in nifi.properties file
Mark Payne created NIFI-1006: Summary: Change format for configuring repositories in nifi.properties file Key: NIFI-1006 URL: https://issues.apache.org/jira/browse/NIFI-1006 Project: Apache NiFi Issue Type: Improvement Components: Configuration, Core Framework Reporter: Mark Payne Fix For: 1.0.0 Currently, the Content Repository, FlowFile Repository, and Provenance Repository are all configured within the nifi.properties file. This includes the repository implementation to use and all properties for those repositories. This becomes quite confusing to configure, and it makes it very difficult to provide examples of each of these repositories. I would like to see this changed to a format like is used to configure the Authority Providers. This way, in the nifi.properties file, we would configure just two things: the .xml file that includes the repository configurations (with IDs) and the ID of the repository to use. This also makes it far easier to create a repository that may wrap one or more other repositories, by defining them all in the .xml file and then specifying the ID of the 'wrapping' repository as the one to use. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (NIFI-1005) ControllerStatusReportingTask logger name is incorrect
Matt Gilman created NIFI-1005: - Summary: ControllerStatusReportingTask logger name is incorrect Key: NIFI-1005 URL: https://issues.apache.org/jira/browse/NIFI-1005 Project: Apache NiFi Issue Type: Bug Components: Extensions Reporter: Matt Gilman Priority: Trivial Fix For: 0.4.0 The documentation for the ControllerStatusReportingTask indicates that the messages will be logged to {noformat} org.apache.nifi.controller.ControllerStatusReportingTask.Processors org.apache.nifi.controller.ControllerStatusReportingTask.Connections {noformat} however they are actually written to {noformat} ControllerStatusReportingTask.Processors ControllerStatusReportingTask.Connections {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (NIFI-810) Create Annotation that indicates that a Processor cannot be scheduled to run without an incoming connection
[ https://issues.apache.org/jira/browse/NIFI-810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14935206#comment-14935206 ] Rob Moran commented on NIFI-810: Functionality looks good. More user-friendly messaging could help a lot. I think what I suggested in my earlier comment for the validation error language is more clear that what is currently being generated. It uses language seen in UI, such as 'upstream connection' making it more relatable to a users workflow. Current language is clear for the most part, but it is difficult to read and therefore probably takes longer to comprehend. It would also be great to include follow-on actions directly from tooltips - where applicable of course. For example, if there is a validation error stating some configuration has not been made, provide a link at the end of the message (e.g., 'Configure') that would open the configuration dialog, select the correct tab, and put browser focus on the relevant input needed to correct the issue. > Create Annotation that indicates that a Processor cannot be scheduled to run > without an incoming connection > --- > > Key: NIFI-810 > URL: https://issues.apache.org/jira/browse/NIFI-810 > Project: Apache NiFi > Issue Type: Improvement > Components: Extensions >Reporter: Mark Payne > > Currently, if a Processor has no incoming connections but is started, it will > continually without ever accomplishing anything. We should have an > annotation, perhaps @RequiresInput, that indicates that the Processor should > not be scheduled to run unless it has an incoming connection. -- This message was sent by Atlassian JIRA (v6.3.4#6332)