Re: [iText-questions] iText & multithreading delays

2008-08-14 Thread Edward W. Rouse
Oops, step 3 is missing a bit. I continue to process the original file,
creating many intermediate size pdfs until the entire original pdf is
parsed, THEN I release the original reader to free up the memory.

Edward W. Rouse


-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Edward
W. Rouse
Sent: Thursday, August 14, 2008 3:22 PM
To: 'Post all your questions about iText here'
Subject: Re: [iText-questions] iText & multithreading delays

Well, the results are in. As long as the file can be handled by iText, which
means under 2Gb, my new process is several times faster than the old one.
The general steps are:
1) Preprocess the pages into a HashMap
2) Use the bookmarks to determine which pages go into what end result pdf,
remove found pages from the HashMap
3) Once 1000 of these are collected, create a new pdf containing only those
pages and save the page and file info for later use. I also release the
large file reader at this point since each subsequent thread will be
creating their own smaller readers.
4) Start multiple threads using previous collected info to split the smaller
pdfs into the final pdfs

Steps 1-3 are run in a single thread to minimize memory usage and maximize
the file size that we can process.

Step 4 is easier to tune because we are working with smaller file that use
less memory and the page data has been preparsed. Plus we have a general
idea of how much memory those process are going to use because we control
the file size.

We took a process that was running in 14 hours down to 3 hours this way.
Hope this helps.

Edward W. Rouse


-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Edward
W. Rouse
Sent: Thursday, August 14, 2008 10:50 AM
To: 'Post all your questions about iText here'
Subject: Re: [iText-questions] iText & multithreading delays

We are having issues trying to multi-thread iText due to memory issues. When
we parse through normal size files, our single and multi-threaded programs
work fine. Once the file sizes get bigger (1.5Gb in some cases) even the
single-threaded program can run out of memory. With a 500Mb file anything
more than 2 threads causes OOM. We can get rid of the OOM condition by not
using the getRandomAccessFileOrArray, but then it runs so much slower it
negates the whole reason for using multiple threads.

Our current plan is to try and breaks the file up into smaller 'chunks' and
then multi=thread the processing of the chunks. That is what I am in the
process of doing right now. I'll let ya know how that works when I get it
finished and tested.

Edward W. Rouse


-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of robert
meyer
Sent: Wednesday, August 13, 2008 7:11 PM
To: itext-questions@lists.sourceforge.net
Subject: Re: [iText-questions] iText & multithreading delays

Talmage:  thank you for your response.

I don't believe that making significant modifications
to iText is the way we'd like to go about solving this
problem.

I was hoping to hear from someone that has used iText in
a multi-thread / multi-processor environment, and if
iText performed better for them than it has for us in
our simple test.

We would also appreciate some input from the iText
developers.

Thanks in advance to all.
rm.



-
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great
prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
___
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

Buy the iText book: http://www.1t3xt.com/docs/book.php


-
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great
prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
___
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

Buy the iText book: http://www.1t3xt.com/docs/book.php


-
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great
prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&ur

Re: [iText-questions] iText & multithreading delays

2008-08-14 Thread Edward W. Rouse
Well, the results are in. As long as the file can be handled by iText, which
means under 2Gb, my new process is several times faster than the old one.
The general steps are:
1) Preprocess the pages into a HashMap
2) Use the bookmarks to determine which pages go into what end result pdf,
remove found pages from the HashMap
3) Once 1000 of these are collected, create a new pdf containing only those
pages and save the page and file info for later use. I also release the
large file reader at this point since each subsequent thread will be
creating their own smaller readers.
4) Start multiple threads using previous collected info to split the smaller
pdfs into the final pdfs

Steps 1-3 are run in a single thread to minimize memory usage and maximize
the file size that we can process.

Step 4 is easier to tune because we are working with smaller file that use
less memory and the page data has been preparsed. Plus we have a general
idea of how much memory those process are going to use because we control
the file size.

We took a process that was running in 14 hours down to 3 hours this way.
Hope this helps.

Edward W. Rouse


-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Edward
W. Rouse
Sent: Thursday, August 14, 2008 10:50 AM
To: 'Post all your questions about iText here'
Subject: Re: [iText-questions] iText & multithreading delays

We are having issues trying to multi-thread iText due to memory issues. When
we parse through normal size files, our single and multi-threaded programs
work fine. Once the file sizes get bigger (1.5Gb in some cases) even the
single-threaded program can run out of memory. With a 500Mb file anything
more than 2 threads causes OOM. We can get rid of the OOM condition by not
using the getRandomAccessFileOrArray, but then it runs so much slower it
negates the whole reason for using multiple threads.

Our current plan is to try and breaks the file up into smaller 'chunks' and
then multi=thread the processing of the chunks. That is what I am in the
process of doing right now. I'll let ya know how that works when I get it
finished and tested.

Edward W. Rouse


-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of robert
meyer
Sent: Wednesday, August 13, 2008 7:11 PM
To: itext-questions@lists.sourceforge.net
Subject: Re: [iText-questions] iText & multithreading delays

Talmage:  thank you for your response.

I don't believe that making significant modifications
to iText is the way we'd like to go about solving this
problem.

I was hoping to hear from someone that has used iText in
a multi-thread / multi-processor environment, and if
iText performed better for them than it has for us in
our simple test.

We would also appreciate some input from the iText
developers.

Thanks in advance to all.
rm.



-
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great
prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
___
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

Buy the iText book: http://www.1t3xt.com/docs/book.php


-
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great
prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
___
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

Buy the iText book: http://www.1t3xt.com/docs/book.php


-
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
___
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

Buy the iText book: http://www.1t3xt.com/docs/book.php


Re: [iText-questions] iText & multithreading delays

2008-08-14 Thread Edward W. Rouse
We are having issues trying to multi-thread iText due to memory issues. When
we parse through normal size files, our single and multi-threaded programs
work fine. Once the file sizes get bigger (1.5Gb in some cases) even the
single-threaded program can run out of memory. With a 500Mb file anything
more than 2 threads causes OOM. We can get rid of the OOM condition by not
using the getRandomAccessFileOrArray, but then it runs so much slower it
negates the whole reason for using multiple threads.

Our current plan is to try and breaks the file up into smaller 'chunks' and
then multi=thread the processing of the chunks. That is what I am in the
process of doing right now. I'll let ya know how that works when I get it
finished and tested.

Edward W. Rouse


-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of robert
meyer
Sent: Wednesday, August 13, 2008 7:11 PM
To: itext-questions@lists.sourceforge.net
Subject: Re: [iText-questions] iText & multithreading delays

Talmage:  thank you for your response.

I don't believe that making significant modifications
to iText is the way we'd like to go about solving this
problem.

I was hoping to hear from someone that has used iText in
a multi-thread / multi-processor environment, and if
iText performed better for them than it has for us in
our simple test.

We would also appreciate some input from the iText
developers.

Thanks in advance to all.
rm.



-
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great
prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
___
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

Buy the iText book: http://www.1t3xt.com/docs/book.php


-
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
___
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

Buy the iText book: http://www.1t3xt.com/docs/book.php


Re: [iText-questions] iText & multithreading delays

2008-08-13 Thread robert meyer
Talmage:  thank you for your response.

I don't believe that making significant modifications
to iText is the way we'd like to go about solving this
problem.

I was hoping to hear from someone that has used iText in
a multi-thread / multi-processor environment, and if
iText performed better for them than it has for us in
our simple test.

We would also appreciate some input from the iText
developers.

Thanks in advance to all.
rm.



-
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
___
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

Buy the iText book: http://www.1t3xt.com/docs/book.php


Re: [iText-questions] iText & multithreading delays

2008-08-04 Thread Talmage
You mentioned that you replaced the iText code with some simple file
IO. Have you tried replacing that with a set of Objects that call
New() several hundred (thousand?) times to emulate what is likely
going on behind the scenes in the iText library?

I do not claim to be an expert on the Jvm or iText but you may try
experimenting with the following scenarios:

1) Create a object that calls New() to construct several size worthy
objects (have those objects call New() too). Also, make sure you end
up "freeing" the objects by not referencing them any more (so the
garbage collector comes into play).

Time how long it takes for your application to go through the above
test scenario.


2) Modify the objects from the above example to use object "pools" on
a per thread basis. That is, allocate a bunch of empty objects at the
start of each of your threads (per thread to avoid throwing in lock
complexity). In places where you would call New() grab an object off
of your pool. In places where you get done with the object return it
to your thread-local pool.

Time how long it takes for your application to go through the above
test scenario -- make sure you don't include the time it takes to
initially create your object pools.


I believe you will find method #2 is quite a bit faster than method #1.

This is called object pooling and is often used in multi threaded
applications across all languages to avoid having to deal with the
bottlenecks associated with real-time memory allocation (and the
contention created from multiple threads wanting to talk to the memory
allocator) and garbage collection (where applicable).


On Mon, Aug 4, 2008 at 10:59 AM, robert meyer <[EMAIL PROTECTED]> wrote:
>
> Paulo Soares  glintt.com> writes:
>
>>
>> That's probably related to IO contention, iText does a lot of small reads. If
> you do everything in memory
>> what do you get? In any case, iText doesn't lock anything and any thread
> performance problems should be
>> addressed to the JVM.
>>
>> Paulo
>>
>
> Paulo:
>
> Thank you for your quick response.
>
> I just tried changing my output stream from FileOutputStream
> to ByteArrayOutputStream, and ran the test a few times;
>
> total elapsed time (for a test using 4 threads) went from
> 12400 ms to 11200 ms - only about 1 second faster.
>
> As for the JVM, I mentioned in my original posting that I tried
> using the same thread experiment replacing the iText code with
> simple java statements to open a pdf file, read characters one
> at a time, and count them; this test clearly proved that the
> multi-threading/multi-processing capabilities of my system are
> working well.
>
>
>
>
>
> -
> This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
> Build the coolest Linux based applications with Moblin SDK & win great prizes
> Grand prize is a trip for two to an Open Source event anywhere in the world
> http://moblin-contest.org/redirect.php?banner_id=100&url=/
> ___
> iText-questions mailing list
> iText-questions@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/itext-questions
>
> Buy the iText book: http://www.1t3xt.com/docs/book.php
>

-
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
___
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

Buy the iText book: http://www.1t3xt.com/docs/book.php


Re: [iText-questions] iText & multithreading delays

2008-08-04 Thread robert meyer

Paulo Soares  glintt.com> writes:

> 
> That's probably related to IO contention, iText does a lot of small reads. If
you do everything in memory
> what do you get? In any case, iText doesn't lock anything and any thread
performance problems should be
> addressed to the JVM.
> 
> Paulo
> 

Paulo:

Thank you for your quick response.

I just tried changing my output stream from FileOutputStream
to ByteArrayOutputStream, and ran the test a few times;

total elapsed time (for a test using 4 threads) went from 
12400 ms to 11200 ms - only about 1 second faster.

As for the JVM, I mentioned in my original posting that I tried
using the same thread experiment replacing the iText code with
simple java statements to open a pdf file, read characters one
at a time, and count them; this test clearly proved that the
multi-threading/multi-processing capabilities of my system are
working well.





-
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
___
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

Buy the iText book: http://www.1t3xt.com/docs/book.php


Re: [iText-questions] iText & multithreading delays

2008-08-04 Thread Paulo Soares
That's probably related to IO contention, iText does a lot of small reads. If 
you do everything in memory what do you get? In any case, iText doesn't lock 
anything and any thread performance problems should be addressed to the JVM.

Paulo

> -Original Message-
> From: [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED] On
> Behalf Of Robert Meyer
> Sent: Sunday, August 03, 2008 10:52 PM
> To: itext-questions@lists.sourceforge.net
> Subject: [iText-questions] iText & multithreading delays
>
> I'm having problems using iText in a multi-threaded environment.
>
> I built a small test app to see how iText would perform when
> used on a machine with multiple cpu's, and the results were
> not what I expected.
>
> The test code is fairly simple; I built a class that extends
> Thread:
>
> class mythread extends Thread
> {
> }
>
> and within that class's run method, I do some simple iText
> things, like:
>
> create a PdfReader.
> create a PdfStamper.
> loop thru all pages of the input document, stamping
> some text on each page.
> close the Stamper.
> close the Reader.
> output a line to the console reporting the total elapsed time
> since the run() method was entered.
>
> The main code of this test application handles the details of
> creating a fixed number of instances of the 'mythread' object,
> (the actual number is hardcoded in the module), and reads a simple
> text file that drives the process (the text file contains one
> line for each pdf file to be created).
>
> I've been careful to ensure that none of the threads attempt to
> use the same input file, or create the same output file.
>
> The text file contains 10 lines of input, meaning 10 pdf files
> will be created.
>
> I'm running the test on a machine with 2 quad-core Intel processors,
> for a total of 8 cpu's.
>
> Here's the interesting (and not so thrilling) part:  as I increase
> the number of threads that will be used to process files (and
> therefore
> increase the number of cpu's that will be used), the time required for
> each thread to complete increases significantly.
>
> here are some approximate values of thread count vs. elapsed run
> time per thread:
>
> threads - elapsed time (in ms) per thread
> 1 - 800
> 2 - 1600
> 4 - 4500
> 8 - 1
>
> This seemed very strange to me, and I immediately assumed the problem
> was that I was doing something wrong regarding my use of threads.
>
> With that in mind, I replaced the 'mythread' class and all it's iText
> calls with something that was iText-free, but still opened files and
> gobbled up a bunch of cpu time.
>
> This time I got what I was expecting: as more threads/cpus were
> allocated
> to the job, the amount of work accomplished increased accordingly -
> and not
> just a little.  And this time, the amount of elapsed time
> taken for each
> thread to complete remained consistent, regardless of the total number
> of threads being executed.
>
> Believing now that my thread handling was not the problem, I put the
> iText
> code back in place, and started looking at where the delays were
> occurring.
> I soon discovered that the delays were happening in the
> Stamper.close()
> function, in PdfStamperImp.java.
>
> I grabbed the iText source, added some print statements in various
> places
> throughout the Stamper.close() function, and determined that the code
> causing the delays is within this loop (in the close() method in
> PdfStamperImp.java):
>
> for (int k = 1; k < reader.getXrefSize(); ++k) {
> PdfObject obj = reader.getPdfObjectRelease(k);
> if (obj != null && skip != k) {
>addToBody(obj, getNewObjectNumber(reader, k, 0), k != rootN);
> }
>
>
> At this point I thought it would make more sense to post my
> problem here
> and hope that someone can offer some explanation as to why processing
> slows down drastically as more threads are used.
>
> Thanks in advance for any assistance.
> Robert Meyer.


Aviso Legal:
Esta mensagem é destinada exclusivamente ao destinatário. Pode conter 
informação confidencial ou legalmente protegida. A incorrecta transmissão desta 
mensagem não significa a perca de confidencialidade. Se esta mensagem for 
recebida por engano, por favor envie-a de volta para o remetente e apague-a do 
seu sistema de imediato. É proibido a qualquer pessoa que não o destinatário de 
usar, revelar ou distribuir qualquer parte desta mensagem. 

Disclaimer:
This message is destined exclusively to the intended receiver. It may contain 
confidential or legally protected information. The incorrect transmission of 
this message does not mean the loss of its confidentiality. If this message is 
received by mistake, please send it back to the sender and delete it from your 
system immediately. It is forbidden to any person who is not the intended 
receiver to use, distribute or copy any part of this message.


-
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coo