Re: [iText-questions] iText & multithreading delays
Oops, step 3 is missing a bit. I continue to process the original file, creating many intermediate size pdfs until the entire original pdf is parsed, THEN I release the original reader to free up the memory. Edward W. Rouse -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Edward W. Rouse Sent: Thursday, August 14, 2008 3:22 PM To: 'Post all your questions about iText here' Subject: Re: [iText-questions] iText & multithreading delays Well, the results are in. As long as the file can be handled by iText, which means under 2Gb, my new process is several times faster than the old one. The general steps are: 1) Preprocess the pages into a HashMap 2) Use the bookmarks to determine which pages go into what end result pdf, remove found pages from the HashMap 3) Once 1000 of these are collected, create a new pdf containing only those pages and save the page and file info for later use. I also release the large file reader at this point since each subsequent thread will be creating their own smaller readers. 4) Start multiple threads using previous collected info to split the smaller pdfs into the final pdfs Steps 1-3 are run in a single thread to minimize memory usage and maximize the file size that we can process. Step 4 is easier to tune because we are working with smaller file that use less memory and the page data has been preparsed. Plus we have a general idea of how much memory those process are going to use because we control the file size. We took a process that was running in 14 hours down to 3 hours this way. Hope this helps. Edward W. Rouse -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Edward W. Rouse Sent: Thursday, August 14, 2008 10:50 AM To: 'Post all your questions about iText here' Subject: Re: [iText-questions] iText & multithreading delays We are having issues trying to multi-thread iText due to memory issues. When we parse through normal size files, our single and multi-threaded programs work fine. Once the file sizes get bigger (1.5Gb in some cases) even the single-threaded program can run out of memory. With a 500Mb file anything more than 2 threads causes OOM. We can get rid of the OOM condition by not using the getRandomAccessFileOrArray, but then it runs so much slower it negates the whole reason for using multiple threads. Our current plan is to try and breaks the file up into smaller 'chunks' and then multi=thread the processing of the chunks. That is what I am in the process of doing right now. I'll let ya know how that works when I get it finished and tested. Edward W. Rouse -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of robert meyer Sent: Wednesday, August 13, 2008 7:11 PM To: itext-questions@lists.sourceforge.net Subject: Re: [iText-questions] iText & multithreading delays Talmage: thank you for your response. I don't believe that making significant modifications to iText is the way we'd like to go about solving this problem. I was hoping to hear from someone that has used iText in a multi-thread / multi-processor environment, and if iText performed better for them than it has for us in our simple test. We would also appreciate some input from the iText developers. Thanks in advance to all. rm. - This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ ___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.1t3xt.com/docs/book.php - This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ ___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.1t3xt.com/docs/book.php - This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&ur
Re: [iText-questions] iText & multithreading delays
Well, the results are in. As long as the file can be handled by iText, which means under 2Gb, my new process is several times faster than the old one. The general steps are: 1) Preprocess the pages into a HashMap 2) Use the bookmarks to determine which pages go into what end result pdf, remove found pages from the HashMap 3) Once 1000 of these are collected, create a new pdf containing only those pages and save the page and file info for later use. I also release the large file reader at this point since each subsequent thread will be creating their own smaller readers. 4) Start multiple threads using previous collected info to split the smaller pdfs into the final pdfs Steps 1-3 are run in a single thread to minimize memory usage and maximize the file size that we can process. Step 4 is easier to tune because we are working with smaller file that use less memory and the page data has been preparsed. Plus we have a general idea of how much memory those process are going to use because we control the file size. We took a process that was running in 14 hours down to 3 hours this way. Hope this helps. Edward W. Rouse -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Edward W. Rouse Sent: Thursday, August 14, 2008 10:50 AM To: 'Post all your questions about iText here' Subject: Re: [iText-questions] iText & multithreading delays We are having issues trying to multi-thread iText due to memory issues. When we parse through normal size files, our single and multi-threaded programs work fine. Once the file sizes get bigger (1.5Gb in some cases) even the single-threaded program can run out of memory. With a 500Mb file anything more than 2 threads causes OOM. We can get rid of the OOM condition by not using the getRandomAccessFileOrArray, but then it runs so much slower it negates the whole reason for using multiple threads. Our current plan is to try and breaks the file up into smaller 'chunks' and then multi=thread the processing of the chunks. That is what I am in the process of doing right now. I'll let ya know how that works when I get it finished and tested. Edward W. Rouse -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of robert meyer Sent: Wednesday, August 13, 2008 7:11 PM To: itext-questions@lists.sourceforge.net Subject: Re: [iText-questions] iText & multithreading delays Talmage: thank you for your response. I don't believe that making significant modifications to iText is the way we'd like to go about solving this problem. I was hoping to hear from someone that has used iText in a multi-thread / multi-processor environment, and if iText performed better for them than it has for us in our simple test. We would also appreciate some input from the iText developers. Thanks in advance to all. rm. - This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ ___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.1t3xt.com/docs/book.php - This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ ___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.1t3xt.com/docs/book.php - This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ ___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.1t3xt.com/docs/book.php
Re: [iText-questions] iText & multithreading delays
We are having issues trying to multi-thread iText due to memory issues. When we parse through normal size files, our single and multi-threaded programs work fine. Once the file sizes get bigger (1.5Gb in some cases) even the single-threaded program can run out of memory. With a 500Mb file anything more than 2 threads causes OOM. We can get rid of the OOM condition by not using the getRandomAccessFileOrArray, but then it runs so much slower it negates the whole reason for using multiple threads. Our current plan is to try and breaks the file up into smaller 'chunks' and then multi=thread the processing of the chunks. That is what I am in the process of doing right now. I'll let ya know how that works when I get it finished and tested. Edward W. Rouse -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of robert meyer Sent: Wednesday, August 13, 2008 7:11 PM To: itext-questions@lists.sourceforge.net Subject: Re: [iText-questions] iText & multithreading delays Talmage: thank you for your response. I don't believe that making significant modifications to iText is the way we'd like to go about solving this problem. I was hoping to hear from someone that has used iText in a multi-thread / multi-processor environment, and if iText performed better for them than it has for us in our simple test. We would also appreciate some input from the iText developers. Thanks in advance to all. rm. - This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ ___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.1t3xt.com/docs/book.php - This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ ___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.1t3xt.com/docs/book.php
Re: [iText-questions] iText & multithreading delays
Talmage: thank you for your response. I don't believe that making significant modifications to iText is the way we'd like to go about solving this problem. I was hoping to hear from someone that has used iText in a multi-thread / multi-processor environment, and if iText performed better for them than it has for us in our simple test. We would also appreciate some input from the iText developers. Thanks in advance to all. rm. - This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ ___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.1t3xt.com/docs/book.php
Re: [iText-questions] iText & multithreading delays
You mentioned that you replaced the iText code with some simple file IO. Have you tried replacing that with a set of Objects that call New() several hundred (thousand?) times to emulate what is likely going on behind the scenes in the iText library? I do not claim to be an expert on the Jvm or iText but you may try experimenting with the following scenarios: 1) Create a object that calls New() to construct several size worthy objects (have those objects call New() too). Also, make sure you end up "freeing" the objects by not referencing them any more (so the garbage collector comes into play). Time how long it takes for your application to go through the above test scenario. 2) Modify the objects from the above example to use object "pools" on a per thread basis. That is, allocate a bunch of empty objects at the start of each of your threads (per thread to avoid throwing in lock complexity). In places where you would call New() grab an object off of your pool. In places where you get done with the object return it to your thread-local pool. Time how long it takes for your application to go through the above test scenario -- make sure you don't include the time it takes to initially create your object pools. I believe you will find method #2 is quite a bit faster than method #1. This is called object pooling and is often used in multi threaded applications across all languages to avoid having to deal with the bottlenecks associated with real-time memory allocation (and the contention created from multiple threads wanting to talk to the memory allocator) and garbage collection (where applicable). On Mon, Aug 4, 2008 at 10:59 AM, robert meyer <[EMAIL PROTECTED]> wrote: > > Paulo Soares glintt.com> writes: > >> >> That's probably related to IO contention, iText does a lot of small reads. If > you do everything in memory >> what do you get? In any case, iText doesn't lock anything and any thread > performance problems should be >> addressed to the JVM. >> >> Paulo >> > > Paulo: > > Thank you for your quick response. > > I just tried changing my output stream from FileOutputStream > to ByteArrayOutputStream, and ran the test a few times; > > total elapsed time (for a test using 4 threads) went from > 12400 ms to 11200 ms - only about 1 second faster. > > As for the JVM, I mentioned in my original posting that I tried > using the same thread experiment replacing the iText code with > simple java statements to open a pdf file, read characters one > at a time, and count them; this test clearly proved that the > multi-threading/multi-processing capabilities of my system are > working well. > > > > > > - > This SF.Net email is sponsored by the Moblin Your Move Developer's challenge > Build the coolest Linux based applications with Moblin SDK & win great prizes > Grand prize is a trip for two to an Open Source event anywhere in the world > http://moblin-contest.org/redirect.php?banner_id=100&url=/ > ___ > iText-questions mailing list > iText-questions@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/itext-questions > > Buy the iText book: http://www.1t3xt.com/docs/book.php > - This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ ___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.1t3xt.com/docs/book.php
Re: [iText-questions] iText & multithreading delays
Paulo Soares glintt.com> writes: > > That's probably related to IO contention, iText does a lot of small reads. If you do everything in memory > what do you get? In any case, iText doesn't lock anything and any thread performance problems should be > addressed to the JVM. > > Paulo > Paulo: Thank you for your quick response. I just tried changing my output stream from FileOutputStream to ByteArrayOutputStream, and ran the test a few times; total elapsed time (for a test using 4 threads) went from 12400 ms to 11200 ms - only about 1 second faster. As for the JVM, I mentioned in my original posting that I tried using the same thread experiment replacing the iText code with simple java statements to open a pdf file, read characters one at a time, and count them; this test clearly proved that the multi-threading/multi-processing capabilities of my system are working well. - This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ ___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.1t3xt.com/docs/book.php
Re: [iText-questions] iText & multithreading delays
That's probably related to IO contention, iText does a lot of small reads. If you do everything in memory what do you get? In any case, iText doesn't lock anything and any thread performance problems should be addressed to the JVM. Paulo > -Original Message- > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] On > Behalf Of Robert Meyer > Sent: Sunday, August 03, 2008 10:52 PM > To: itext-questions@lists.sourceforge.net > Subject: [iText-questions] iText & multithreading delays > > I'm having problems using iText in a multi-threaded environment. > > I built a small test app to see how iText would perform when > used on a machine with multiple cpu's, and the results were > not what I expected. > > The test code is fairly simple; I built a class that extends > Thread: > > class mythread extends Thread > { > } > > and within that class's run method, I do some simple iText > things, like: > > create a PdfReader. > create a PdfStamper. > loop thru all pages of the input document, stamping > some text on each page. > close the Stamper. > close the Reader. > output a line to the console reporting the total elapsed time > since the run() method was entered. > > The main code of this test application handles the details of > creating a fixed number of instances of the 'mythread' object, > (the actual number is hardcoded in the module), and reads a simple > text file that drives the process (the text file contains one > line for each pdf file to be created). > > I've been careful to ensure that none of the threads attempt to > use the same input file, or create the same output file. > > The text file contains 10 lines of input, meaning 10 pdf files > will be created. > > I'm running the test on a machine with 2 quad-core Intel processors, > for a total of 8 cpu's. > > Here's the interesting (and not so thrilling) part: as I increase > the number of threads that will be used to process files (and > therefore > increase the number of cpu's that will be used), the time required for > each thread to complete increases significantly. > > here are some approximate values of thread count vs. elapsed run > time per thread: > > threads - elapsed time (in ms) per thread > 1 - 800 > 2 - 1600 > 4 - 4500 > 8 - 1 > > This seemed very strange to me, and I immediately assumed the problem > was that I was doing something wrong regarding my use of threads. > > With that in mind, I replaced the 'mythread' class and all it's iText > calls with something that was iText-free, but still opened files and > gobbled up a bunch of cpu time. > > This time I got what I was expecting: as more threads/cpus were > allocated > to the job, the amount of work accomplished increased accordingly - > and not > just a little. And this time, the amount of elapsed time > taken for each > thread to complete remained consistent, regardless of the total number > of threads being executed. > > Believing now that my thread handling was not the problem, I put the > iText > code back in place, and started looking at where the delays were > occurring. > I soon discovered that the delays were happening in the > Stamper.close() > function, in PdfStamperImp.java. > > I grabbed the iText source, added some print statements in various > places > throughout the Stamper.close() function, and determined that the code > causing the delays is within this loop (in the close() method in > PdfStamperImp.java): > > for (int k = 1; k < reader.getXrefSize(); ++k) { > PdfObject obj = reader.getPdfObjectRelease(k); > if (obj != null && skip != k) { >addToBody(obj, getNewObjectNumber(reader, k, 0), k != rootN); > } > > > At this point I thought it would make more sense to post my > problem here > and hope that someone can offer some explanation as to why processing > slows down drastically as more threads are used. > > Thanks in advance for any assistance. > Robert Meyer. Aviso Legal: Esta mensagem é destinada exclusivamente ao destinatário. Pode conter informação confidencial ou legalmente protegida. A incorrecta transmissão desta mensagem não significa a perca de confidencialidade. Se esta mensagem for recebida por engano, por favor envie-a de volta para o remetente e apague-a do seu sistema de imediato. É proibido a qualquer pessoa que não o destinatário de usar, revelar ou distribuir qualquer parte desta mensagem. Disclaimer: This message is destined exclusively to the intended receiver. It may contain confidential or legally protected information. The incorrect transmission of this message does not mean the loss of its confidentiality. If this message is received by mistake, please send it back to the sender and delete it from your system immediately. It is forbidden to any person who is not the intended receiver to use, distribute or copy any part of this message. - This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coo