Hi Thomas,


Thanks for you help. I have more or less solved it by triggering the garbage collector more than once like you told me earlier. The answers are in the replies.


Many thanks!

On 05/15/2012 12:59 AM, DeWeese Thomas wrote:
Hi Hilbert,

On May 14, 2012, at 2:33 AM, Hilbert Mostert wrote:

Thanks for paying attention to this. I've measured the values using 
Runtime.getRuntime().freeMemory() and friends. Indeed the process size is very 
misleading because in one case we had a process size of 5 GB while the free 
memory was more than 3Gb.
        Have you tried explicitly triggering a garbage collection?  Often this 
does nothing but if you call a few dozen times sometimes you can get it to do 
something :)
A more useful way to tell if you really have a leak is to have a graph of 
memory usage over time.  There is one we use in Squiggle that can be useful to 
watch.
Basically anytime you do anything memory usage will grow so what you look for 
is where the memory lands after a full GC which will occur every now and then.
I have implemented a thread which checks the memory usage every 100 milliseconds. When there is more than 512Mb in use it will trigger the garbage collector. Sometimes the process permanently grows over 512Mb the thread will then leave garbage collection at least a whole second. This strategy works and during conversion the memory never grows above 550Mb anymore. After batiks task has finished the process even uses less than 128Mb. Thanks for pointing me to this, I have tried starting the garbage collector before but that was never sufficient.
I do have large images but it not a large number, about three and for every 
student an QR code which are about 150bytes (png) each. These are embedded as 
base64 encoded data.
        Do you have any idea if the images can be shared between the documents 
or not?  One thing about the PDF transcoder is that it often ends up 
rasterizing the images which depending on what it thinks it can get away with 
can cause the images to grow quite considerably.  Are the PDFs you generate on 
the large side?
I have made a mistake, I use a unique qr code per student per exam, this is necessary so the number of images is large and the size is not. Size of the final PDF file is not really important. I have limited it to blocks of 500 students, this works.
I am not using much features of Batik, it is mainly replacing text content in 
specific SVG elements and then converting them into PDF using the PDFTransposer 
provided in the Apache Batik 1.7 package.

What I have noticed is that there is a CleanerThread created when I start 
generating PDF files but it never runs, it is always in waiting state. Is there 
a command which triggers this thread?
        The cleaner thread is used to clear out caches when soft referenced 
objects are cleared by a run of the garbage collector.  It may just mean that 
your images aren't generating much cached stuff or it may mean that the garbage 
collector hasn't felt the need to be particularly agressive about clearing out 
memory (although I would have thought that by the time you reach 5GB it would 
have felt the need a few times).
Thats indeed interesting, when i have time i will take a look if it is triggered in the new situation.
        Also does it just use a lot of memory or does the memory usage grow 
consistently over time? So no matter how large you set the heap eventually it 
runs out of memory.
At this moment it only runs out of memory when i merge all the pages using PDFBox. I have solved this by making files of 500 students, its much, much less now.
        Thomas

On 05/12/2012 07:17 PM, DeWeese Thomas wrote:
Hi Hilbert,

        How are you measuring how much memory you are using after step 4?  If 
you are just looking at process size that can
be very misleading since typically the JVM will grow and even if the JVM has 
freed most of the memory it will hold onto the
larger memory block, partially since it may be fragmented and partially since 
it may need the memory again shortly.

        There are caches in Batik for documents and images and other assets but 
unless you have a lot of large images it
is unlikely they would reach 1Gb.  Filter effects may also cache some 
intermediate results but typically those will be cleaned
when the filter is disposed of (which given the lazy nature of the JVM may not 
happen for a while).  A general outline of the
features you are using from SVG might help identify areas that might be 
responsible for the memory bloat.

        Which by the way raises the other issue are you forcing a GC?  If not 
lots of currently unused stuff will hang out until the
memory is needed for something else.  Finally remember that just calling for a 
single GC doesn't typically do much to clear out memory.

        Thomas

On May 11, 2012, at 8:39 AM, Hilbert Mostert wrote:

Recently I have started using Apache Batik to create PDF files from SVG 
templates. The application is used to generate exam pages for students. It 
works great but it uses a huge amount of memory. This is sometimes annoying 
because i have to increase the memory limit to over 4Gb to have it complete the 
task. There are in general lots of students (500+) and in one case 2000+ 
students. This will, of course, eat memory like an elephant I accept that.

I want to reduce this memory footprint and have found one issue in my program 
where I need help with.

I am using the Java JRE 1.6.0_32, Batik 1.7 and PDFBox 1.6.0.

The program has the following flow:

1.    fetch students from source (Excel file)
2.    create workers to generate pdf from svg
3.    while not all students have been processed do
3.1      replace information in svg document ( using w3c functions from 
Document class  ) (this is done by worker)
3.2      generate PDF from svg document (Using PDFTranscoder)
3.3      check if there are more students; true: goto 3.1; false: continue with 
step 4
4.   clean up workers
5.   generate single pdf from all generated pdfs using PDFBox
6.   done

It is a multi threaded environment and all the workers are in their own thread, 
each worker has a copy of the svg document, they dont share anything (for 
obvious reasons).

What I have found  is what comes after step 4, after cleaning up the workers I 
am still using 1Gb of memory which is much more than when I start (around 
128Mb). I suspect there is some caching here and there but I have not enough 
knowledge from batik to fix this problem.

Who can help me or has the answer for me?


Thanks in advance,

Hilbert Mostert

---------------------------------------------------------------------
To unsubscribe, e-mail: batik-users-unsubscr...@xmlgraphics.apache.org
For additional commands, e-mail: batik-users-h...@xmlgraphics.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: batik-users-unsubscr...@xmlgraphics.apache.org
For additional commands, e-mail: batik-users-h...@xmlgraphics.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: batik-users-unsubscr...@xmlgraphics.apache.org
For additional commands, e-mail: batik-users-h...@xmlgraphics.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: batik-users-unsubscr...@xmlgraphics.apache.org
For additional commands, e-mail: batik-users-h...@xmlgraphics.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: batik-users-unsubscr...@xmlgraphics.apache.org
For additional commands, e-mail: batik-users-h...@xmlgraphics.apache.org

Reply via email to