Hi Bob, a few things come to mind to help narrow this down (if I misses something previous in this thread, please ignore ;-): - you mentioned using mmap; did you use other (standard?) malloc as well? Did you make sure libumem isn't adding some debugging information? - how does the rest of the HW (I/O!) compare between the linux and illumos box? Have you done some iostat monitoring? - how do file system/volume management compare?
HTH Michael On Sun, Mar 16, 2014 at 6:35 PM, Bob Friesenhahn < bfrie...@simple.dallas.tx.us> wrote: > On Sat, 15 Mar 2014, Bob Friesenhahn wrote: > > I am still struggling to get GraphicsMagick running properly fast on an >> Illumos system (in this case OpenIndiana oi_151a9). >> >> Previously, GraphicsMagick was entirely profiled and tuned on a 4-core >> AMD system running Solaris 10. It still runs well on that system. >> >> The OpenIndiana system has 16-cores (32 threads with hyper-threading). >> >> GraphicsMagick usually runs 2X faster on a Linux system with prior >> generation Intel CPUs with 12-cores (a system which should be 1/2 as fast). >> With the AMD Solaris 10 system and the modern Linux system, I see expected >> speedups from adding threads but not on the OpenIndiana system. >> > > I should clarify the above. The problematic situation is the case where > the software should be doing very little actual work. It allocates a large > buffer (e.g. 200MB) using libumem's 'malloc()' for the data and then reads > data from a file using fread(), doing a small amount of processing as it > transfers data linearly from the file to memory. The input data is 1/2 the > size of the allocated memory. Then the memory is released and the program > terminates. The reason why this case is important is that this represents > the baseline cost to do anything further and the baseline cost is 2X more > on Illumos than Linux. > > If actual data processing takes place (i.e. CPU processing becomes the > bottleneck than I/O and initial memory allocation) then the performance > numbers do reflect the difference in underlying hardware performance and > all seems good. > > The Linux VM system works rather differently than Illumos since Linux VM > relies on over-commit and Solaris does not. Perhaps Linux is much faster > to add memory to a process than Solaris is. > > If the memory allocation under Linux is reduced by a factor of 2 (memory > size is the same as input data size), then the run-time decreases by a > factor of 2 whereas with Illumos, the run-time is only slightly diminished. > In fact, with the decreased memory use, the difference is more stark (e.g. > Illumos 0.75s, Linux 0.26s). > > One might think that the problem is with Illumos stdio but if the data is > mmapped with a zero-copy approach, Illumos still exhibits similar balkyness > but with somewhat more performance. > > Bob > -- > Bob Friesenhahn > bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ > GraphicsMagick Maintainer, http://www.GraphicsMagick.org/ > > > ------------------------------------------- > illumos-discuss > Archives: https://www.listbox.com/member/archive/182180/=now > RSS Feed: https://www.listbox.com/member/archive/rss/182180/ > 21175681-9f7ae099 > Modify Your Subscription: https://www.listbox.com/ > member/?& > Powered by Listbox: http://www.listbox.com > -- Michael Schuster http://recursiveramblings.wordpress.com/ ------------------------------------------- illumos-discuss Archives: https://www.listbox.com/member/archive/182180/=now RSS Feed: https://www.listbox.com/member/archive/rss/182180/21175430-2e6923be Modify Your Subscription: https://www.listbox.com/member/?member_id=21175430&id_secret=21175430-6a77cda4 Powered by Listbox: http://www.listbox.com