Commons FileUpload uses the FileCleaner class from Commons IO to ensure that all temporary files are deleted when the corresponding Java object goes out of scope. FileCleaner works well in most situations, but can lead to OutOfMemoryExceptions in long running active programs. For example, in the situation where we have a web server with 50 threads uploading files, the low priority single cleaning thread doesn't get enough CPU time to empty its queue. Thus the queue holding the list of files to be deleted grows until memory is exhausted. This is even more ironic in our environment because we are careful to delete all the temporary files, so the FileCleaner never finds anything that needs deleting anyway on its long queue.

We have seen up to 300,000 objects on the FileCleaner queue, each of which references a large amount of memory, including a couple of IO buffers, keeping it from being garbage collected. One possible solution is to minimize the size of the marker object put on the ReferenceQueue, perhaps just the file name instead of the whole DiskFileItem, but that only postpones the problem since the queue may still not get enough time to empty.

Another potential solution is to remove the file from the FileCleaner queue when it is explicitly deleted, but I don't see any mechanism in the underlying ReferenceQueue to remove arbitrary items.

So the approach we took was to subclass DiskFileItem with an alternative implementation that 1) does not enqueue the DiskFileItem with the FileCleaner and 2) has a empty finalize method (to speed up garbage collection). We called that class UnmanagedDeleteDiskFileItem to indicate that the user must explicitly clean up the temporary file (typically with finally blocks). We also coded an alternative DiskFileItemFactory whose constructor took a boolean indicating whether the returned items were to be implicitly or explicitly managed, choosing either the old or new DiskFileItem implementation respectively.

I'd like to ask the FileUpload maintainers to consider these options:
1) Adopt a strategy like the one we use to allow the coder to ask the factory for either managed or unmanaged DiskFileItems, both of which are supplied by the FileUpload package. I have attached our DiskFileItemFactory and UnmanagedDeleteDiskFileItem to use as a starting point of this enhancement. 2) Make DiskFileItem more supportive of subclassing, by adding a protected getTempFileName method which the subclass can call. This would allow users to cleanly subclass DiskFileItem. Currently, our UnmanagedDeleteDiskFileItem implementation has to duplicate some of the code from DiskFileItem because too much was marked private.
--keenan

package edu.mit.broad.trace.util;

import java.io.File;

import org.apache.commons.fileupload.FileItem;
import org.apache.commons.fileupload.disk.DiskFileItem;

/**
 * A FileItemFactory that allows the creator to decide whether the
 * FileItems that are constucted will automatically delete their
 * underlying files or not.
 * 
 * @author keenan
 *
 */
public class DiskFileItemFactory extends
                org.apache.commons.fileupload.disk.DiskFileItemFactory {

        private boolean managedDelete;
        
        /**
         * Constructor that allows the specification of whether the
         * factory will create FileItems that manage their deletes or not.
         * @param managedDelete true if the file should be deleted when the
         * FileItem is garbage collected, false if the programmer is
         * explicitly responsible for the deletes.
         */
        public DiskFileItemFactory(boolean managedDelete) {
                this.managedDelete = managedDelete;
        }

        /**
         * Constructor that allows the specification of whether the
         * factory will create FileItems that manage their deletes or not,
         * along with the size threshhold and the repository location.
         * @param managedDelete true if the file should be deleted when the
         * FileItem is garbage collected, false if the programmer is
         * explicitly responsible for the deletes.
         * @param sizeThreshold The threshold, in bytes, below which items will 
be
     *                      retained in memory and above which they will be
     *                      stored as a file.
     * @param repository    The data repository, which is the directory in
     *                      which files will be created, should the item size
     *                      exceed the threshold.
         */
        public DiskFileItemFactory(boolean managedDelete, int sizeThreshold, 
File repository) {
                super(sizeThreshold, repository);
                this.managedDelete = managedDelete;
        }
        
    public FileItem createItem(
            String fieldName,
            String contentType,
            boolean isFormField,
            String fileName
            ) {
        
        if (managedDelete) {
                return new DiskFileItem(fieldName, contentType,
                isFormField, fileName, getSizeThreshold(), getRepository());
        } else {
                return new UnmanagedDeleteDiskFileItem(fieldName, contentType,
                    isFormField, fileName, getSizeThreshold(), 
getRepository());                
        }
    }
} // DiskFileItemFactory
package edu.mit.broad.trace.util;

import java.io.File;
import java.rmi.server.UID;

import org.apache.commons.fileupload.disk.DiskFileItem;

/**
 * A streamlined version of a DiskFileItem that spends no time or resources
 * tracking when to delete the underlying disk file, instead putting the
 * burden on the programmer to delete the underlying file. Unlike the
 * base class, the underlying file is not automatically deleted when
 * these FileItems go out of scope and are garbage collected. In particular,
 * this class does not have a finalize nor does it enqueue on a ReferenceQueue
 * waiting for garbage collection, both of which attempt to perform a
 * file delete in the base class. By eliminating these features, the
 * memory and CPU requirements are minimized although the programmer is
 * now responsible for deleting the files.
 * 
 * This class was explicitly created in response to a situation where there
 * were many threads creating files and only one low priority thread
 * deleting them, which could not keep up with the demand and eventually
 * ran out of memory. The problem was moot anyway because in that
 * case, the programmer was explicitly deleting the files anyway, so all
 * the tracking was for naught.
 * 
 * Derived from, with some private members copied, from the base class
 * which is covered by the Apache Software License.
 * 
 * @author keenan
 *
 */
public class UnmanagedDeleteDiskFileItem extends DiskFileItem {

    /**
     * UID used in unique file name generation.
     */
    private static final String UID =
            new UID().toString().replace(':', '_').replace('-', '_');
    
    /**
     * Counter used in unique identifier generation.
     */
    private static int counter = 0;

    /**
     * The repository where the files will be created, stored
     * directly because it is not exposed by the base class.
     */
    private File repository;

        public UnmanagedDeleteDiskFileItem(String fieldName, String contentType,
            boolean isFormField, String fileName, int sizeThreshold,
            File repository) {
                super(fieldName, contentType,
                    isFormField, fileName, sizeThreshold,
                    repository);
                this.repository = repository;
        }
        
    /**
     * Copied from the base class, with the deletion of the code that
     * enqued the item on a FileCleaner.
     */
        protected File getTempFile() {
        File tempDir = repository;
        if (tempDir == null) {
            tempDir = new File(System.getProperty("java.io.tmpdir"));
        }

        String fileName = "upload_" + UID + "_" + getUniqueId() + ".tmp";

        File f = new File(tempDir, fileName);
        return f;
    }   
        
    /**
     * Do nothing on finalization, overriding delete code in the base
     * class.
     */
        protected void finalize() {
    }
    
    /**
     * Returns an identifier that is unique within the class loader used to
     * load this class, but does not have random-like apearance.
     *
     * @return A String with the non-random looking instance identifier.
     */
    private static String getUniqueId() {
        final int limit = 100000000;
        int current;
        synchronized (DiskFileItem.class) {
            current = counter++;
        }
        String id = Integer.toString(current);

        // If you manage to get more than 100 million of ids, you'll
        // start getting ids longer than 8 characters.
        if (current < limit) {
            id = ("00000000" + id).substring(id.length());
        }
        return id;
    }

} // UnmanagedDeleteDiskFileItem

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to