Hi Bob,

a few things come to mind to help narrow this down (if I misses something
previous in this thread, please ignore ;-):
- you mentioned using mmap; did you use other (standard?) malloc as well?
Did you make sure libumem isn't adding some debugging information?
- how does the rest of the HW (I/O!) compare between the linux and illumos
box? Have you done some iostat monitoring?
- how do file system/volume management compare?

HTH
Michael


On Sun, Mar 16, 2014 at 6:35 PM, Bob Friesenhahn <
bfrie...@simple.dallas.tx.us> wrote:

> On Sat, 15 Mar 2014, Bob Friesenhahn wrote:
>
>  I am still struggling to get GraphicsMagick running properly fast on an
>> Illumos system (in this case OpenIndiana oi_151a9).
>>
>> Previously, GraphicsMagick was entirely profiled and tuned on a 4-core
>> AMD system running Solaris 10.  It still runs well on that system.
>>
>> The OpenIndiana system has 16-cores (32 threads with hyper-threading).
>>
>> GraphicsMagick usually runs 2X faster on a Linux system with prior
>> generation Intel CPUs with 12-cores (a system which should be 1/2 as fast).
>>  With the AMD Solaris 10 system and the modern Linux system, I see expected
>> speedups from adding threads but not on the OpenIndiana system.
>>
>
> I should clarify the above.  The problematic situation is the case where
> the software should be doing very little actual work.  It allocates a large
> buffer (e.g. 200MB) using libumem's 'malloc()' for the data and then reads
> data from a file using fread(), doing a small amount of processing as it
> transfers data linearly from the file to memory.  The input data is 1/2 the
> size of the allocated memory. Then the memory is released and the program
> terminates.  The reason why this case is important is that this represents
> the baseline cost to do anything further and the baseline cost is 2X more
> on Illumos than Linux.
>
> If actual data processing takes place (i.e. CPU processing becomes the
> bottleneck than I/O and initial memory allocation) then the performance
> numbers do reflect the difference in underlying hardware performance and
> all seems good.
>
> The Linux VM system works rather differently than Illumos since Linux VM
> relies on over-commit and Solaris does not.  Perhaps Linux is much faster
> to add memory to a process than Solaris is.
>
> If the memory allocation under Linux is reduced by a factor of 2 (memory
> size is the same as input data size), then the run-time decreases by a
> factor of 2 whereas with Illumos, the run-time is only slightly diminished.
>  In fact, with the decreased memory use, the difference is more stark (e.g.
> Illumos 0.75s, Linux 0.26s).
>
> One might think that the problem is with Illumos stdio but if the data is
> mmapped with a zero-copy approach, Illumos still exhibits similar balkyness
> but with somewhat more performance.
>
> Bob
> --
> Bob Friesenhahn
> bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
> GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/
>
>
> -------------------------------------------
> illumos-discuss
> Archives: https://www.listbox.com/member/archive/182180/=now
> RSS Feed: https://www.listbox.com/member/archive/rss/182180/
> 21175681-9f7ae099
> Modify Your Subscription: https://www.listbox.com/
> member/?&
> Powered by Listbox: http://www.listbox.com
>



-- 
Michael Schuster
http://recursiveramblings.wordpress.com/



-------------------------------------------
illumos-discuss
Archives: https://www.listbox.com/member/archive/182180/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182180/21175430-2e6923be
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=21175430&id_secret=21175430-6a77cda4
Powered by Listbox: http://www.listbox.com

Reply via email to