Re: Images in FOP 0.92beta
[EMAIL PROTECTED] wrote: Hi Jeremias, Well I figure that the thread will just be blocked in queue.remove most of the time unless it has something to do. I don't think there is much overhead for a thread in the types of systems we are targeting (i.e. not small constrained devices). Note that this is one thread that is used for all CleanerThread sub objects (so it's not like you are likely to spawn lots of threads). Don't forget that a lot of folk deploy FOP/Batik inside Web containers or Application Servers, where spawning new Threads is considered illegal. Chris
Re: Images in FOP 0.92beta
Hi Jeremias, Jeremias Maerki <[EMAIL PROTECTED]> wrote on 07/14/2006 04:26:57 PM: > At first, I'd have preferred to avoid an extra thread if possible so I > just added a local ReferenceQueue and used poll() to do house-keeping > whenever a user agent signs off. I assume you don't have a > non-too-frequently called method you could do on-demand house-keeping in, > so the thread is probably ok. Well I figure that the thread will just be blocked in queue.remove most of the time unless it has something to do. I don't think there is much overhead for a thread in the types of systems we are targeting (i.e. not small constrained devices). Note that this is one thread that is used for all CleanerThread sub objects (so it's not like you are likely to spawn lots of threads). Some people put the cleaning in the management calls (so you poll the queue when people add/remove elements from the hash). I'm not fond of that as it means you are borrowing a 'strangers' thread to do your work (it just feels ugly). > And given that we have Batik in memory > anyway FOP could co-use that thread. But since I'd like to avoid > dependencies on Batik directly if possible, can we move CleanerThread to > XML Graphics Commons and rename it to ReferenceCleanerThread to give it > a more speaking name? I was under the impression that most of the stuff in batik.util will find it's way into graphics commons. As for renaming I don't think it's a big deal. > The SoftReferenceCache is indeed a little odd, especially the method > names. I think I'll skip that one for now. It is meant to be subclassed to provide a strongly typed interface (notice all the '*Impl' methods are protected. So the subclass can provide public versions that take strongly typed parameters. > Some other interesting things I observed while playing around for those > interested (ATM, I'm still doing the house-keeping without the thread > but I might rewrite): SoftReferences are a very powerful tool in Java, I don't think they get enough attention in general. > When using weak references (as the current code does but with the fixed > behavior) FOP takes around 35 sec on my machine to produce that 182 > image PDF. Heap usage is usually around 12MB with peaks to 26MB. The > house-keeping after the user agent retires removes around 178 references. > > Switching to soft references which is actually the recommended type for > caches, the heap usage goes up to the 64MB maximum and pretty much stay > there. The whole thing takes 29-30 sec average. The house-keeping after > the user agent retires removes between 161 and 170 references. So this > means the VM actually keeps more references around, only freeing as many > as it needs not to run into memory problems. And it runs faster this way. > > I learned a few things today. :-) I guess that makes it a good day ;) > On 14.07.2006 14:35:06 thomas.deweese wrote: > > Hi all, > > > > Just a small comment on HashMaps with weak values: > > > > Jeremias Maerki <[EMAIL PROTECTED]> wrote on 07/13/2006 04:43:07 PM: > > > > > Ok, so I changed the WeakHashMap to a HashMap and wrapped the values in > > > WeakReferences. Tadaaa! A PDF with 182 JPEG images with a total size of > > > 258 MB is suddenly produced without exceptions using the VM's default > > > heap settings, never going beyond 26MB heap usage. *g* > > > >There is a potential problem with this approach that Batik ran into. > > Unless you go a little further those weak values accumulate in the map. > > In your case this probably isn't a big deal, but for Batik where there > > are potentially of thousands (or tens of thousands, think mouse move > > events) > > of entries, these 'dead' entries start to add up. > > > >As a result Batik has batik.util.CleanerThread. This class has > > inner classes that subclass the various SoftReference classes with an > > additional method 'public void cleared()'. This method is called by > > the CleanerThread when the object the soft reference is point at is > > cleared from memory (it uses the ReferenceQueue part of soft references). > > > >This gives you the hook you need to then de-register the entry from > > the has table. This is actually an incredibly useful 'addition' to > > the standard soft reference classes (for example I will often use > > it to check if classes I think should go to GC really do go to GC). > > > >I should also mention that Batik has a class called > > 'SoftReferenceCache' > > which is a thread safe implementation of exactly what you just > > implemented. > > The interface may seem a little odd but it is designed to ensure that > > only one party ever has to decode a resource even if multiple threads > > request it "at the same time". > > > >Anyway just thought I would add my 2 cents... > > > > Jeremias Maerki >
Re: Images in FOP 0.92beta
That was worth more than 2 cents. Thanks, Thomas. I didn't really care too much about left-over references at first, but in a long-running service they add up unnecessarily even if it's only a Map.Entry, a String and a Reference instance per entry. At first, I'd have preferred to avoid an extra thread if possible so I just added a local ReferenceQueue and used poll() to do house-keeping whenever a user agent signs off. I assume you don't have a non-too-frequently called method you could do on-demand house-keeping in, so the thread is probably ok. And given that we have Batik in memory anyway FOP could co-use that thread. But since I'd like to avoid dependencies on Batik directly if possible, can we move CleanerThread to XML Graphics Commons and rename it to ReferenceCleanerThread to give it a more speaking name? In the beginning, this means we will have two threads doing the same thing but it is ultimately cleaner design in the long run (when Batik starts using Commons). The SoftReferenceCache is indeed a little odd, especially the method names. I think I'll skip that one for now. Some other interesting things I observed while playing around for those interested (ATM, I'm still doing the house-keeping without the thread but I might rewrite): When using weak references (as the current code does but with the fixed behaviour) FOP takes around 35 sec on my machine to produce that 182 image PDF. Heap usage is usually around 12MB with peaks to 26MB. The house-keeping after the user agent retires removes around 178 references. Switching to soft references which is actually the recommended type for caches, the heap usage goes up to the 64MB maximum and pretty much stay there. The whole thing takes 29-30 sec average. The house-keeping after the user agent retires removes between 161 and 170 references. So this means the VM actually keeps more references around, only freeing as many as it needs not to run into memory problems. And it runs faster this way. I learned a few things today. :-) On 14.07.2006 14:35:06 thomas.deweese wrote: > Hi all, > > Just a small comment on HashMaps with weak values: > > Jeremias Maerki <[EMAIL PROTECTED]> wrote on 07/13/2006 04:43:07 PM: > > > Ok, so I changed the WeakHashMap to a HashMap and wrapped the values in > > WeakReferences. Tadaaa! A PDF with 182 JPEG images with a total size of > > 258 MB is suddenly produced without exceptions using the VM's default > > heap settings, never going beyond 26MB heap usage. *g* > >There is a potential problem with this approach that Batik ran into. > Unless you go a little further those weak values accumulate in the map. > In your case this probably isn't a big deal, but for Batik where there > are potentially of thousands (or tens of thousands, think mouse move > events) > of entries, these 'dead' entries start to add up. > >As a result Batik has batik.util.CleanerThread. This class has > inner classes that subclass the various SoftReference classes with an > additional method 'public void cleared()'. This method is called by > the CleanerThread when the object the soft reference is point at is > cleared from memory (it uses the ReferenceQueue part of soft references). > >This gives you the hook you need to then de-register the entry from > the has table. This is actually an incredibly useful 'addition' to > the standard soft reference classes (for example I will often use > it to check if classes I think should go to GC really do go to GC). > >I should also mention that Batik has a class called > 'SoftReferenceCache' > which is a thread safe implementation of exactly what you just > implemented. > The interface may seem a little odd but it is designed to ensure that > only one party ever has to decode a resource even if multiple threads > request it "at the same time". > >Anyway just thought I would add my 2 cents... Jeremias Maerki
Re: Images in FOP 0.92beta
Hi all, Just a small comment on HashMaps with weak values: Jeremias Maerki <[EMAIL PROTECTED]> wrote on 07/13/2006 04:43:07 PM: > Ok, so I changed the WeakHashMap to a HashMap and wrapped the values in > WeakReferences. Tadaaa! A PDF with 182 JPEG images with a total size of > 258 MB is suddenly produced without exceptions using the VM's default > heap settings, never going beyond 26MB heap usage. *g* There is a potential problem with this approach that Batik ran into. Unless you go a little further those weak values accumulate in the map. In your case this probably isn't a big deal, but for Batik where there are potentially of thousands (or tens of thousands, think mouse move events) of entries, these 'dead' entries start to add up. As a result Batik has batik.util.CleanerThread. This class has inner classes that subclass the various SoftReference classes with an additional method 'public void cleared()'. This method is called by the CleanerThread when the object the soft reference is point at is cleared from memory (it uses the ReferenceQueue part of soft references). This gives you the hook you need to then de-register the entry from the has table. This is actually an incredibly useful 'addition' to the standard soft reference classes (for example I will often use it to check if classes I think should go to GC really do go to GC). I should also mention that Batik has a class called 'SoftReferenceCache' which is a thread safe implementation of exactly what you just implemented. The interface may seem a little odd but it is designed to ensure that only one party ever has to decode a resource even if multiple threads request it "at the same time". Anyway just thought I would add my 2 cents...
Re: Images in FOP 0.92beta
Jeremias Maerki wrote: remember this thread on fop-users? I've just found out what's wrong. Great! There's absolutely nothing wrong with the PDFRenderer or the PDF library concerning reference freeing. It does it so as soon as each image is written to the PDF which always happens immediately. Hm. I'm pretty sure in 0.20.5 a PDF object held a pointer, and the object was using some data while writing a dictionary structure into the PDF stream after all the real content was written. [...] I ended up in the image cache and in the Javadocs for WeakHashMap where I found that little detail that the weak reference is on the key, not the value. Oops, my fault. J.Pietschmann
Re: Images in FOP 0.92beta
Jörg, remember this thread on fop-users? I've just found out what's wrong. There's absolutely nothing wrong with the PDFRenderer or the PDF library concerning reference freeing. It does it so as soon as each image is written to the PDF which always happens immediately. But I found that org.apache.fop.fo.flow.ExternalGraphic unnecessarily maintains a hard reference on a FopImage. Unnecessarily, because we just need the instrinsic size there. The FopImage is never reset to null after use. I fixed that and: d'oh, still not good. I ended up in the image cache and in the Javadocs for WeakHashMap where I found that little detail that the weak reference is on the key, not the value. And the key is the URL (String) which is passed around in FOP. Ok, so I changed the WeakHashMap to a HashMap and wrapped the values in WeakReferences. Tadaaa! A PDF with 182 JPEG images with a total size of 258 MB is suddenly produced without exceptions using the VM's default heap settings, never going beyond 26MB heap usage. *g* Will test some more and then commit later. On 21.06.2006 23:03:38 J.Pietschmann wrote: > Jeremias Maerki wrote: > > Ouch, that could explain it. No, no changes in that area. Actually, > > images could be written to the file immediately and then released > > instead of having to wait until the next page-sequence is finished. > > While the image data is written as soon as possible, the XObject > which also points to the image object is kept for the object dictionary > which is written much later. There have been changes in the way the > object dictionaries are written to the PDF which I didn't track. > > > Should be easy to fix. > > Unfortunately, the XObject seems to query some data from the image > object while writing the dictionary. Jeremias Maerki