Hi Alan, David and Judy friends:

We have been very busy working on Judy during 2014.  I have been
promising a 2X version of Judy since about 2008.  But I have been
chasing (what I called) the "green eyed monster" since the Core 2 duo
processor came out.  It appeared that there was a bug in the processor
because a very slight change in (non-executed) code produced a very
large changes in performance.  This made it difficult determining what
change made Judy slower or faster.  I am embarrassed about how much time
I wasted on this.

In 2014 two things happened that brought understanding to what was the
"green eyed monster".  The recent discussion on malloc is related to one
of thoes things.  A modification to malloc in concert with an
enhancement to the OS alone will improve the performance of Judy (1.0.5)
by 30-40% with large data sets (1GB+).  The OS enhancement was made to
the Ubuntu Kernel (i think it is just an #ifdef in Linux) in spring
2014.  I have not found another OS with the same enhancement yet (Mac
OS-X, BSD, Windows).  It deserves a lot of discussion and I will be
happy to participate, but not in this thread.  The other thing that
happened was the release of the Intel Haswell processor.  This also
deserves a lot of discussion and I will again be happy to participate.

To get to the initial question asked by David, I have some thoughts.
First I do not use swapping anymore.  I find it too slow, while RAM is
so cheap that swapping is not cost effective anymore.  For example, the
last notebook I bought a year ago at WalMart was $250 and had 4GB of
RAM.  I don't think anybody sells a new computer with less that 4GB of
RAM.  The "small" test machine I use for testing Judy performance is a
Haswell (G3258) with 32GB of RAM.  The current Judy design has awesome
performance.  Perhaps 3-4X of the performance of released Judy with
large data sets (4GB+).  Second, the real elephant in the room regarding
malloc is the small change to malloc and the enhanced OS (kernel) that
need to be done by every operating system.

As far as I know, all modern malloc's use anonymous mmap to get memory
from the OS and if they do not, then they are not modern.  I believe
sbrk is deprecated in Mac OS-X and is not used in Linux malloc.  Mmap is
preferred because of the control over the logical address of the
de/allocated memory which allows logical "holes" without wasting
physical RAM.

Perhaps in the future, swapping may have a use for very large data sets
(1TB+) and when SSD's get as fast as RAM.

Doug

P.S I am building a house in Thailand now and have little time to work
on Judy right now.
 Doug Baskins <[email protected]> 

     On Sunday, March 8, 2015 9:16 AM, Alan Silverstein <[email protected]> wrote:
   
 

 David,

> This is a tad rantish, but I can't think of a better way so...

No problem...

> I agree, the problem is telling the system which version of malloc to
> use.  For example:  On my gentoo system I have two choices for malloc
> (there might be more, I just did a quick search in my package manager
> for "malloc") if I installed libc, and jemalloc which one would judy
> use?

Been a while, but I think it depends on cc => ld options you specify.
There are many choices, including specifying the library name explicitly
before(?) the caller *.o file (not recommended especially for shared
libs), or various search options that name the library but leave the
suffix off, etc.

> Typically, to use a different library for malloc instead of the
> traditional one you'd need to include that libraries header.

Right, very important that the -I path for the *.h file matches the
library location path when you have multiple choices for different
libraries of the same name.  There's some way to inquire which path was
used at link time, not nm, but I forget the command name.

> This can easily become a painstaking and time consuming task, becuase
> you'd have to ensure that all the programs on the system use the
> "supported" version of the malloc libraries.

Well not exactly, just ensure your one compilation unit ends up with the
one version of any given library that you prefer.  But I agree it's
still tedious and tricky to ensure you get it right at compile and link
time the makefile or equivalent.  Bugs could result from failure to
compile and link as intended.

> And then you'd need to consider the possibility that there is a malloc
> out there that you're unaware of, perhaps it's better then the other
> choices, perhaps it's worse and that a person out there only uses that
> package for thier whole system.

But not whole system, remember the library is built into each
compilation unit OR is found as a shlib at runtime, but every program
does its own user-level memory management, the OS level is just the
sbrk() or mmap() if I recall right.

> You'd also have to explain to everyone why there is the dependency on
> some wierd malloc.

Agreed, that is a cost of optimization.

> And there would be complaints as to why you dare to use a wierd
> version of malloc, why the current one is bad, bug reports because x
> version of judy runs slower then y....

Yup, all possible.

> I could go on, but as you can see, the better the program, the harder
> it is to impliment it.  Peolpe like the easiest choice these days.  It
> is really quite sad, because those of us who are willing to put forth
> the necessary effort are normally jaded into something dumber because
> the best choice is nowhere to be found...

Agreed, but this is a natural and perhaps healthy consequence of
managing excessive complexity.  "Let the computer do the dirty work"
with smarter and smarter hardware, caching, OS algorithms, compiler
optimization, etc.  Focus performance efforts only in the special corner
cases where the payoff (ROI) is positive.

One reason we spent so much time on libJudy and were pleased with it --
as I was in later years USING it as a software writer -- is that it was
possible to write in pure C (no assembly) in generic terms and still do
something that would run exceptionally well (minimum time and space)
across many platforms.  Pushing down the complexity into the library.
Even if you used merely an "average" malloc()!

Cheers,
Alan


 
   
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the 
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Judy-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/judy-devel

Reply via email to