Re: Processor Help

Joe Witt Wed, 01 Nov 2023 13:41:09 -0700

Geoff

To establish some shared understanding lets summarize a few of the
components at play.

SourceSystem: This is where the data you want to grab lives and it sounds
like it is accessible via some protocol and as files.

DataSourceProcess: This sounds like some process which is active
creating/writing data to files in the SourceSystem.  It is very important
to understand how this process behaves as it actively writes data because
you don't want to consume the data while it is still being
written/renamed/etc..  Common techniques are that such processes write data
with a special filenaming technique to indicate it is still working on it
such as starting with a '.' or something like that then you could configure
the List* processors in nifi to ignore such filenames.  Often though this
is not possible so you'll end up having to use things like checking file
modification age and waiting to pick it up after a while with the hope this
means the file is done being written.  That can fail too for fun reasons
and so you can resort to monitoring file age.  All these various joyful
processes are understood and baked into the nifi processes where possible
for List/Fetch/etc.. because this is just the joys of file based IO.

Protocol: How the NiFi server will communicate with the SourceSystem.
Could be FTP, SFTP, file share if accessible to the nifi server, etc.. That
will be important to specify.

List*: These processors in nifi tend to execute the appropriate protocol to
then conduct a listing of what files are present meeting some criteria in
the destination system.  This processor just pulls metadata not the actual
content of the files.

Fetch*:  These processors given a list of one or more files it will then go
and actually pull the content of those files to the NiFi server itself.  It
optionally is usually configured to delete the raw source data once
successfully pulled.

Put*: These processors given a flowfile with content/metdata will
write/send/etc.. the file/metadata to the appropriate target using the
appropriate protocol.  That is data is copied from nifi's internal
repositories to the target.

Get*: These processors tend to be a combination of the logic in Listing and
Fetching and are used much less often these days because List/Fetch tends
to be more powerful and performant.

You mention wanting to have a way to pull from the initial location the
DataSourceProcess writes to then write that data to some holding point.  I
don't recommend this as it does not make the problem easier.  Instead just
have NiFi be the staging/holding point.  That is its job in this equation.
>From there you can send the data wherever you like and yes Put* processors
output relationships (often success) can go wherever you like.  Once you
have run a Fetch* processor the mental model you want to have is the data
is IN nifi and it is responsible to keep it safe and move it to your next
part of the flow.

Thanks
Joe

On Wed, Nov 1, 2023 at 1:25 PM Buthorne, Geoffrey <
[email protected]> wrote:

> Dev Team,
>
> You have a wonderful product, and I use it every day to move files to and
> from a variety of servers.  It works well and thank you.
> Recently, I've come across a situation that I can't seem to find a
> processor for and I'm hoping you can help point me in the right direction.
>
> I am trying to move files that are in an active directory.
> The number of files and their size, dictate that I first copy the files
> out of the active directory while on a remote server to some other
> directory on that remote server that is not actively moving/deleting the
> files from.  I would like to create a directory tree to help keep files
> organized and manageable, as I prepare to pick them up and bring to my
> server.
> Once I have created the needed directories, and copied the files out of
> the active directory(ies) - then I can safely begin to move them.
>
> I have looked at the various get/fetch/put processors to help with this -
> but so far - the processors will always first pick up the files and bring
> them to my server from the remote system.  It will then place them back on
> the remote server, to whatever directory I configure it to. - I could then
> pick it up again, but I'm waiting bandwidth and resources transferring the
> file up the same file 3 times.
> I thought a putfile - once on a remote server - would work however, after
> the put file, that task chain is done and my only options appear to be
> successfully terminate or failure and retry.  I don't seem to be able to
> add another task/processor after a put.
>
> Is there a processor that will allow remote directory management (make
> directories at remote server, copy/move remote files to those directories)
> and then Get/Fetch?
>
> I'm still a little new at this and I certainly do not have experience will
> all the processors available - I'm hoping there is a processor that will
> allow some remote file management/set up before bring the files to my
> servers.
>
> Thank you and have a great week!
>
> Respectfully,
>
> Geoff Buthorne
> ELINT SME
> Epsilon C5I
> (808) 366 9397
> [email protected]<mailto:[email protected]>
> www.epsilonc5i.com<http://www.epsilonc5i.com/>
>
> Epsilon C5I, Inc., a Subsidiary of
> Epsilon Systems Solutions, Inc.
>
> Disclaimer
>
> The information contained in this communication from the sender is
> confidential. It is intended solely for use by the recipient and others
> authorized to receive it. If you are not the recipient, you are hereby
> notified that any disclosure, copying, distribution or taking action in
> relation of the contents of this information is strictly prohibited and may
> be unlawful.
>
> This email has been scanned for viruses and malware, and may have been
> automatically archived by Mimecast Ltd, an innovator in Software as a
> Service (SaaS) for business. Providing a safer and more useful place for
> your human generated data. Specializing in; Security, archiving and
> compliance. To find out more visit the Mimecast website.
>

Re: Processor Help

Reply via email to