[ 
https://issues.apache.org/jira/browse/NIFI-5004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16411727#comment-16411727
 ] 

Joseph Witt commented on NIFI-5004:
-----------------------------------

[~gss2002] In your case where you're operating inside a Hadoop cluster and the 
data you want is one hop-out/adjacent to your cluster you have lots of options. 
 I'd certainly encourage you to go with the simplest, easiest to manage, and 
properly secured answer you can.  What NiFi gives you is a single place to 
command and control flows - focus on flow management.  In doing its job it will 
pull a copy of the data locally and use that to drive delivery to one or more 
places.  If all you want is to copy from one place to another tools like distcp 
will be more appropriate or even scripts.  What NiFi gives you is a flow 
management tool to do these routing, transformation, mediation functions and it 
offers clustering, provenance, highly configurable security options for all 
kinds of things including encrypted content, authentication, authorization, 
etc..  It gives you data provenance, click-to-content, replayability, visual 
command and control, versioned flows, etc..  You might not need all these 
things or even most of them.  If that is case and something simpler can do it 
and meet your security needs then that makes sense to go with.

If your needs are more advanced and your flow management needs are likely to 
grow then going with NiFi makes a lot of sense even if you're operating just 
within the Hadoop cluster.  I dont believe the community has any plans to 
implement NiFi to execute as Map/Reduce or Yarn managed tasks at this point 
though there are some strong reasons to advance our story around docker 
containers and K8S.

 

Hopefully that helps you a bit more.  What Pierre suggests for your flow sounds 
quite feasible and will give a lot of benefits.  NiFI is, can be, and should be 
used for a lot of flow management cases even if it is not going to modify the 
content.

> Ability to Execute File (FTP/CIFS/SFTP) Copy jobs on Mapreduce From Nifi
> ------------------------------------------------------------------------
>
>                 Key: NIFI-5004
>                 URL: https://issues.apache.org/jira/browse/NIFI-5004
>             Project: Apache NiFi
>          Issue Type: Wish
>            Reporter: Greg Senia
>            Priority: Critical
>
> Would like to see Nifi run programs on MapReduce exampesl of these like 
> FTP2HDFS [https://github.com/gss2002/ftp2hdfs] and CIFS2HDFS 
> [https://github.com/gss2002/cifs2hdfs] as a MapReduce application where the 
> final resting place is HDFS without any type of data transform on the way in. 
> This would reduce overhead on the Nifi node and move the incoming data 
> directly to the datanode via shortcircuit/read rites. As I currently have 
> these two applications running as MR jobs now and doing this being able to do 
> this from within Nifi pointing at HDFS/YARN.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to