Well, the results are in. As long as the file can be handled by iText, which means under 2Gb, my new process is several times faster than the old one. The general steps are: 1) Preprocess the pages into a HashMap 2) Use the bookmarks to determine which pages go into what end result pdf, remove found pages from the HashMap 3) Once 1000 of these are collected, create a new pdf containing only those pages and save the page and file info for later use. I also release the large file reader at this point since each subsequent thread will be creating their own smaller readers. 4) Start multiple threads using previous collected info to split the smaller pdfs into the final pdfs
Steps 1-3 are run in a single thread to minimize memory usage and maximize the file size that we can process. Step 4 is easier to tune because we are working with smaller file that use less memory and the page data has been preparsed. Plus we have a general idea of how much memory those process are going to use because we control the file size. We took a process that was running in 14 hours down to 3 hours this way. Hope this helps. Edward W. Rouse -----Original Message----- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Edward W. Rouse Sent: Thursday, August 14, 2008 10:50 AM To: 'Post all your questions about iText here' Subject: Re: [iText-questions] iText & multithreading delays We are having issues trying to multi-thread iText due to memory issues. When we parse through normal size files, our single and multi-threaded programs work fine. Once the file sizes get bigger (1.5Gb in some cases) even the single-threaded program can run out of memory. With a 500Mb file anything more than 2 threads causes OOM. We can get rid of the OOM condition by not using the getRandomAccessFileOrArray, but then it runs so much slower it negates the whole reason for using multiple threads. Our current plan is to try and breaks the file up into smaller 'chunks' and then multi=thread the processing of the chunks. That is what I am in the process of doing right now. I'll let ya know how that works when I get it finished and tested. Edward W. Rouse -----Original Message----- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of robert meyer Sent: Wednesday, August 13, 2008 7:11 PM To: itext-questions@lists.sourceforge.net Subject: Re: [iText-questions] iText & multithreading delays Talmage: thank you for your response. I don't believe that making significant modifications to iText is the way we'd like to go about solving this problem. I was hoping to hear from someone that has used iText in a multi-thread / multi-processor environment, and if iText performed better for them than it has for us in our simple test. We would also appreciate some input from the iText developers. Thanks in advance to all. rm. ------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ _______________________________________________ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.1t3xt.com/docs/book.php ------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ _______________________________________________ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.1t3xt.com/docs/book.php ------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ _______________________________________________ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.1t3xt.com/docs/book.php