On Mon, Jun 17, 2013 at 1:40 PM, Laurent Bourgès <[email protected]>wrote:
> Andrea, > thanks for your time testing my patch in a real benchmark ! > > I think that the ratio of pisces rendering / request processing is very > low (few percents) that's why the performance gains between L1 and L4 are > so little. > > How many cpu cores have your machine ? > It's a core I7 860, has 4 phisical cores, but the OS sees 8 because of hyperthreading (the extra 4 HT units can only do integer math as far as I remember... may be wrong about this) > > >> As you can see L1 provides most of the benefit, althought L4 managed to >> give another boost when the number of concurrent requests is higher. >> The benchmarks have been run with the thread local storage option, I did >> not manage to run them with the concurrent linked queue approach (planning >> to do that next weekend). >> > > That's would be very interesting because CLQ mode is normally a bit slower > than TL mode but in a web server it will avoid wasting memory ~ 1Mb per > thread (for 200 threads ~ 200 to 300 Mb) ! > > I still have to finalize some array sizing (initial capacity ...) of the > renderer context to have a good compromise between performance and memory > usage. > Yes, I see. I'll have a look. > > >> >> The remaining bottlenecks in the benchmarks are somewhat... funny? ;-) >> Concurrency wise the major offender is now >> FreeTypeScaler.getLaboutTableCache() (the map has several labels), CPU wise >> the CLibPNGImageWriter. write(...) is eating 75% of the overall time >> request time... >> This class comes with JAI ImageIO native extension, and it's a major >> speedup compared to the one built into the JDK, if I make GeoServer use >> that one the top performance goes down to 30req/s, a really major drop. >> Huston, we really need a faster PNG encoder! :-p >> > > So you implicitly confirm that pisces only represents < 25% so let's say > 10% of the request processing time. > Yes, in the past data loading from the OS file system cache and rendering were similarly sized, so I guess it's fair to say the renderer is now using around 10-12% of the overall processing time. Hard to be more precise since GeoServer is fully based on a streaming architecture, read a bunch of data, process it, read another bunch, in a way that makes it rather hard to separate the two elements in a profile. > > Many you should submit a concurrency issue related to > FreeTypeScaler.getLaboutTableCache() ! > I had a look, but all it's doing is to wrap a native method call, it may well be that the underlying native library is not thread safe and the synchronization is actually required. > > Could you perform benchmark using other image format (bmp, jpg or any > faster encoding) ? > Yes, I can have a look, although it's going to be an academic exercise: that kind of map (typical road map with buildings and the like) is ever only requested in PNG, bmp is not compressed, JPEG ruins it visibly > Again it would be interesting to identify the performance bottleneck in > the C library ? please look at JAI bugs ... > Yes, I'm going to spend some time looking at it, maybe oprofile can help? The issue here is that the CLIB encoder is a native one, but where are the CLIB sources? Cheers Andrea -- == Our support, Your Success! Visit http://opensdi.geo-solutions.it for more information. == Ing. Andrea Aime @geowolf Technical Lead GeoSolutions S.A.S. Via Poggio alle Viti 1187 55054 Massarosa (LU) Italy phone: +39 0584 962313 fax: +39 0584 1660272 mob: +39 339 8844549 http://www.geo-solutions.it http://twitter.com/geosolutions_it -------------------------------------------------------
