Geoff To establish some shared understanding lets summarize a few of the components at play.
SourceSystem: This is where the data you want to grab lives and it sounds like it is accessible via some protocol and as files. DataSourceProcess: This sounds like some process which is active creating/writing data to files in the SourceSystem. It is very important to understand how this process behaves as it actively writes data because you don't want to consume the data while it is still being written/renamed/etc.. Common techniques are that such processes write data with a special filenaming technique to indicate it is still working on it such as starting with a '.' or something like that then you could configure the List* processors in nifi to ignore such filenames. Often though this is not possible so you'll end up having to use things like checking file modification age and waiting to pick it up after a while with the hope this means the file is done being written. That can fail too for fun reasons and so you can resort to monitoring file age. All these various joyful processes are understood and baked into the nifi processes where possible for List/Fetch/etc.. because this is just the joys of file based IO. Protocol: How the NiFi server will communicate with the SourceSystem. Could be FTP, SFTP, file share if accessible to the nifi server, etc.. That will be important to specify. List*: These processors in nifi tend to execute the appropriate protocol to then conduct a listing of what files are present meeting some criteria in the destination system. This processor just pulls metadata not the actual content of the files. Fetch*: These processors given a list of one or more files it will then go and actually pull the content of those files to the NiFi server itself. It optionally is usually configured to delete the raw source data once successfully pulled. Put*: These processors given a flowfile with content/metdata will write/send/etc.. the file/metadata to the appropriate target using the appropriate protocol. That is data is copied from nifi's internal repositories to the target. Get*: These processors tend to be a combination of the logic in Listing and Fetching and are used much less often these days because List/Fetch tends to be more powerful and performant. You mention wanting to have a way to pull from the initial location the DataSourceProcess writes to then write that data to some holding point. I don't recommend this as it does not make the problem easier. Instead just have NiFi be the staging/holding point. That is its job in this equation. >From there you can send the data wherever you like and yes Put* processors output relationships (often success) can go wherever you like. Once you have run a Fetch* processor the mental model you want to have is the data is IN nifi and it is responsible to keep it safe and move it to your next part of the flow. Thanks Joe On Wed, Nov 1, 2023 at 1:25 PM Buthorne, Geoffrey < [email protected]> wrote: > Dev Team, > > You have a wonderful product, and I use it every day to move files to and > from a variety of servers. It works well and thank you. > Recently, I've come across a situation that I can't seem to find a > processor for and I'm hoping you can help point me in the right direction. > > I am trying to move files that are in an active directory. > The number of files and their size, dictate that I first copy the files > out of the active directory while on a remote server to some other > directory on that remote server that is not actively moving/deleting the > files from. I would like to create a directory tree to help keep files > organized and manageable, as I prepare to pick them up and bring to my > server. > Once I have created the needed directories, and copied the files out of > the active directory(ies) - then I can safely begin to move them. > > I have looked at the various get/fetch/put processors to help with this - > but so far - the processors will always first pick up the files and bring > them to my server from the remote system. It will then place them back on > the remote server, to whatever directory I configure it to. - I could then > pick it up again, but I'm waiting bandwidth and resources transferring the > file up the same file 3 times. > I thought a putfile - once on a remote server - would work however, after > the put file, that task chain is done and my only options appear to be > successfully terminate or failure and retry. I don't seem to be able to > add another task/processor after a put. > > Is there a processor that will allow remote directory management (make > directories at remote server, copy/move remote files to those directories) > and then Get/Fetch? > > I'm still a little new at this and I certainly do not have experience will > all the processors available - I'm hoping there is a processor that will > allow some remote file management/set up before bring the files to my > servers. > > Thank you and have a great week! > > Respectfully, > > Geoff Buthorne > ELINT SME > Epsilon C5I > (808) 366 9397 > [email protected]<mailto:[email protected]> > www.epsilonc5i.com<http://www.epsilonc5i.com/> > > Epsilon C5I, Inc., a Subsidiary of > Epsilon Systems Solutions, Inc. > > Disclaimer > > The information contained in this communication from the sender is > confidential. It is intended solely for use by the recipient and others > authorized to receive it. If you are not the recipient, you are hereby > notified that any disclosure, copying, distribution or taking action in > relation of the contents of this information is strictly prohibited and may > be unlawful. > > This email has been scanned for viruses and malware, and may have been > automatically archived by Mimecast Ltd, an innovator in Software as a > Service (SaaS) for business. Providing a safer and more useful place for > your human generated data. Specializing in; Security, archiving and > compliance. To find out more visit the Mimecast website. >
