[ 
https://issues.apache.org/jira/browse/TIKA-2058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15492522#comment-15492522
 ] 

Tim Barrett commented on TIKA-2058:
-----------------------------------

private void processMsgEmbeddedInMsg(InformationGranule msgGranule, Path 
resourceFilePath, ResourceSet parentResourceSet,
                        AttachmentChunks attachment) throws Throwable {

                InputStream embeddedMsgFilePathInputStream = null;

                OutputStream outStream = null;

                POIFSFileSystem poifsFileSystem = null;

                try {

                        MAPIMessage embeddedMAPIMessage = 
attachment.getEmbeddedMessage();

                        poifsFileSystem = new POIFSFileSystem();

                        
EntryUtils.copyNodes(attachment.attachmentDirectory.getDirectory(), 
poifsFileSystem.getRoot());

                        Path targetDir = 
FileSystems.getDefault().getPath(resourceFilePath.getParent().toString() + 
"/attachments");

                        /*
                         * Creates directory if not already there
                         */
                        try {

                                Files.createDirectory(targetDir);

                        } catch (IOException ignore) {

                        }

                        String embeddedMessageName = null;

                        try {

                                String conversationTopic = 
embeddedMAPIMessage.getConversationTopic();

                                conversationTopic = 
NalandaStringUtilities.stripSpecialCharactersFromString(conversationTopic);

                                embeddedMessageName = conversationTopic + 
".msg";

                        } catch (ChunkNotFoundException cnfe) {

                                embeddedMessageName = this.messageNameCounter + 
".msg";

                                this.messageNameCounter++;

                        }

                        if (embeddedMessageName != null) {

                                if (embeddedMessageName.length() > 200) {

                                        logger.warn("Embedded attachment has 
filename longer than 200 characters: " + embeddedMessageName);

                                        StringBuilder strBldrEmbeddedFileName = 
new StringBuilder();

                                        
strBldrEmbeddedFileName.append(UUID.randomUUID().toString());

                                        strBldrEmbeddedFileName.append(".msg");

                                        embeddedMessageName = 
strBldrEmbeddedFileName.toString();

                                        logger.warn("Embedded attachment has 
filename with long name saved as " + embeddedMessageName);

                                }

                                File msgFileToWrite = new 
File(targetDir.toString() + "/" + embeddedMessageName);

                                outStream = new 
FileOutputStream(msgFileToWrite);

                                poifsFileSystem.writeFilesystem(outStream);

                                outStream.close();

                                Path embeddedMsgFilePath = 
FileSystems.getDefault().getPath(msgFileToWrite.getPath());

                                embeddedMsgFilePathInputStream = 
Files.newInputStream(embeddedMsgFilePath);

                                NalandaResourceHandler 
attachmentResourceHandler = new NalandaResourceHandler(this.parentResourceSet,
                                                this.jsonParseFailures, 
this.jsonPasswordFailures, this.filesCouldNotParseList);

                                boolean isEmbeddedInMsg = true;

                                
attachmentResourceHandler.processEmbeddedResource(msgGranule, 
msgFileToWrite.getName(),
                                                embeddedMsgFilePathInputStream, 
parentResourceSet, embeddedMsgFilePath, null, null, null, isEmbeddedInMsg);

                        }

                } catch (Throwable t) {

                        logger.warn("Exception occurred processing embedded 
message in: " + msgGranule.getValue()
                                        + " embedded message has not been 
processed", t);

                } finally {

                        if (poifsFileSystem != null) {

                                // poifsFileSystem.close();
                                
                        }

                        if (embeddedMsgFilePathInputStream != null) {

                                embeddedMsgFilePathInputStream.close();

                        }

                        if (outStream != null) {

                                outStream.close();

                        }

                }
        }





> Memory Leak in Tika version 1.13 when parsing millions of files
> ---------------------------------------------------------------
>
>                 Key: TIKA-2058
>                 URL: https://issues.apache.org/jira/browse/TIKA-2058
>             Project: Tika
>          Issue Type: Bug
>    Affects Versions: 1.13
>            Reporter: Tim Barrett
>         Attachments: Yourkit screenshot.png, poi-3.15-beta1-p1.jar, 
> poi-3.15-beta1-p1.pom, prevents-OOM-when-writable-is-false.patch, 
> screenshot-1.png, screenshot-2.png, screenshot-3.png
>
>
> We have an application using Tika which parses roughly 7,000,000 files of 
> different types, many of the files are MSG files with attachments. This works 
> correctly with Tika 1.9, and has been in production for over a year,  with 
> parsing runs taking place every few weeks. The same application runs into 
> insufficient memory problems (java heap) when using Tika 1.13.
> I have used lsof and file leak detector to track down open files, however 
> neither shows any open files when the application is running. I did find an 
> issue with open files https://issues.apache.org/jira/browse/TIKA-2015, 
> however there was a workaround for this and this is not the issue.
> I am sorry to have to report this with a level of vagueness, but with lsof 
> turning nothing up I am a bit stuck as to how to investigate further. We are 
> more than willing to help by testing on the basis of any ideas provided.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to