Joe Witt wrote
> On May 4, 2016, at 8:56 AM, Joe Witt <

> joe.witt@

> > wrote:
> 
> Dale,
> 
> Where there is a fetch file there is usually a list file.  And while
> the symptom of memory issues is showing up in fetch file i am curious
> if the issue might actually be caused in ListFile.  How many files are
> in the directory being listed?
> 
> Mark,
> 
> Are we using a stream friendly API to list files and do we know if
> that API on all platforms really doing things in a stream friendly
> way?
> 
> Thanks
> Joe

So I will explain my flow first and then I will answer your question of how
I am using ListFile and FetchFile.

To begin my process, I am ingesting a CSV file that contains a list of
filenames. The first (and only ListFile) starts off the flow and passes it
to the first FetchFile to retrieve the contents of the documents. Afterward,
I use expression language (ExtractText) to extract all of the file names and
put them as attributes to individual FlowFiles. THEN I use a second
FetchFile (this is the processor that has trouble allocating memory) and use
expression language to use that file name to retrieve a text document.

The CSV file (189 MB) contains metadata and path/filenames for over 200,000
documents, and I am having trouble reading from a directory of about 85,000
documents (second FetchFile, each document is usually a few KB). I get stuck
at around 20 MB and then NiFi moves to a crawl.

I can give you a picture of our actual flow if you need it


Mark Payne wrote
> ListFile performs a listing using Java's File.listFiles(). This will
> provide a list of all files in the
> directory. I do not believe this to be related, though. Googling indicates
> that when this error
> occurs it is related to the ability to create a native process in order to
> interact with the file system.
> I don't think the issue is related to Java heap but rather available RAM
> on the box. How much RAM
> is actually available on the box? You mentioned IOPS - are you running in
> a virtual cloud environment?
> Using remote storage such as Amazon EBS?

I am running six Linux VMs on a Windows 8 machine. Three VMs (one ncm, two
nodes) use NiFi and those VMs have 20 GB assigned to them. Looking through
Ambari and monitoring the memory on the nodes, I have a little more than 4
GB free RAM on the nodes. It looks like the free memory dipped severely
during my NiFi flow, but no swap memory was used.



--
View this message in context: 
http://apache-nifi-developer-list.39713.n7.nabble.com/FetchFile-Cannot-Allocate-Enough-Memory-tp9720p9911.html
Sent from the Apache NiFi Developer List mailing list archive at Nabble.com.

Reply via email to