Re: [Pharo-dev] hashMultiply

Andres Valloud Sun, 02 Mar 2014 15:15:07 -0800

Once the code is loaded, from the Tools menu use Hash Analysis Tool.There's a manual below, and also the Fundamentals book has a somewhat indepth discussion on how it works.


ftp://sqrmax.us.to/pub/Smalltalk/Papers/Hash%20Analysis%20Tool.pdf


On 3/2/14 15:06 , p...@highoctane.be wrote:

I have found the tools in the Cincom public repo.

Now, I have to see how this works in VW, I am not that proficient with it.

Phil



On Wed, Feb 26, 2014 at 8:55 AM, p...@highoctane.be
<mailto:p...@highoctane.be> <p...@highoctane.be
<mailto:p...@highoctane.be>> wrote:

    Andres,

    Thanks for the insights.

    hash quality is indeed an key factor. At least, my mind is somewhat
    grasping this hashing field a bit better.

    I'll have a look at the tools, I haven't used them yet.

    And a shot at the ASM version with NativeBoost in Pharo.

    For what occurs in modern CPUs, well, no. Surprising to see that a
    mul would be faster than a shr or shl. How comes?
    I used to be ok with these things when I was writing demoscene code
    a loong time ago but I'd need an extremely serious refresh.

    As a side note, there is a huge uptake in the
    BigData/mapreduce/hadoop environment where Smalltalk is sorely
    absent. Scala seems to fill the void on the JVM.
    There hashing is quite key, to remap all of the mapping phase
    results to the reduce nodes. I am surprised to see that Smalltalk
    vendors haven't jumped in that space.

    Phil


    On Wed, Feb 26, 2014 at 2:04 AM, Andres Valloud
    <avall...@smalltalk.comcastbiz.net
    <mailto:avall...@smalltalk.comcastbiz.net>> wrote:

        Hello...

        On 2/25/14 1:17 , p...@highoctane.be <mailto:p...@highoctane.be>
        wrote:

            I am currently reading through the Hashing in Smalltalk book
            
(http://www.lulu.com/shop/__andres-valloud/hashing-in-__smalltalk-theory-and-practice/__paperback/product-3788892.html
            
<http://www.lulu.com/shop/andres-valloud/hashing-in-smalltalk-theory-and-practice/paperback/product-3788892.html>__)
            and, my head hurting notwithstanding, there are indeed a ton
            of gems in
            this system. As he mentions, doing the exercises brings a
            lot of extra :-)


        :) thank you.

            When going to 64-bit, and with the new ObjectMemory scheme,
            I guess a
            couple of identity hashing functions will come under scrutiny.

            e.g.

            SmallInteger>>hashMultiply
            | low |

            low := self bitAnd: 16383.
            ^(16r260D * low + ((16r260D * (self bitShift: -14) +
            (16r0065 * low)
            bitAnd: 16383) * 16384))
            bitAnd: 16r0FFFFFFF


            which will need some more bits.


        IMO it's not clear that SmallInteger>>identityHash should be
        implemented that way.  Finding a permutation of the small
        integers that also behaves like a good quality hash function and
        evaluates quickly (in significantly less time and complexity
        than, say, Bob Jenkins' lookup3) would be a really interesting
        research project.  I don't know if it's possible.  If no such
        thing exists, then getting at least some rough idea of what's
        the minimum necessary complexity for such hash functions would
        be valuable.

        Looking at hashMultiply as a non-identity hash function, one
        would start having problems when significantly more than 2^28
        objects are stored in a single hashed collection.  2^28 objects
        with e.g. 12 bytes per header and a minimum of one instance
        variable (so the hash value isn't a instance-constant) stored in
        a hashed collection requires more than 4gb, so that is clearly a
        64 bit image problem.  In 64 bits, 2^28 objects with e.g. 16
        bytes per header and a minimum of one instance variable each is
        already 6gb, and a minimum of 8gb with the hashed collection itself.

        Because of these figures, I'd think improving the implementation
        of hashed collections takes priority over adding more
        non-identity hash function bits (as long as the existing hash
        values are of good quality).

        Did you look at the Hash Analysis Tool I wrote?  It's in the
        Cincom public Store repository.  It comes in two bundles: Hash
        Analysis Tool, and Hash Analysis Tool - Extensions.  With
        everything loaded, the tool comes with 300+ hash functions and
        100+ data sets out of the box.  The code is MIT.

            I had a look at how it was done in VisualWorks;


        The implementation of hashMultiply, yes.  Note however that
        SmallInteger>>hash is ^self.

            hashMultiply
            "Multiply the receiver by 16r0019660D mod 2^28
            without using large integer arithmetic for speed.
            The constant is a generator of the multiplicative
            subgroup of Z_2^30, see Knuth's TAOCP vol 2."
            <primitive: 1747>
            | low14Bits |
            low14Bits := self bitAnd: 16r3FFF.
            ^16384
            * (16r260D * (self bitShift: -14) + (16r0065 * low14Bits)
            bitAnd: 16r3FFF)
            + (16r260D * low14Bits) bitAnd: 16rFFFFFFF

            The hashing book version has:

            multiplication
            "Computes self times 1664525 mod 2^38 while avoiding
            overflow into a
            large integer by making the multiplication into two 14 bits
            chunks. Do
            not use any division or modulo operation."
            | lowBits highBits|

            lowBits := self bitAnd: 16r3FFF.
            highBits := self bitShift: -14.
            ^(lowBits * 16r260D)
                + (((lowBits * 16r0065) bitAnd: 16r3FFF) bitShift: 14)
                + (((highBits * 16r260D) bitAnd: 16r3FFF) bitShift: 14)
            bitAnd: 16rFFFFFFF

            So, 16384 * is the same as bitShift: 14 and it looks like
            done once,
            which may be better.


        It should be a primitive (or otherwise optimized somehow).  At
        some point though that hash function was implemented for e.g.
        ByteArray in Squeak, I thought at that point the multiplication
        step was also implemented as a primitive?

            Also VW marks it as a primitive, which Pharo does not.


        In VW it is also a translated primitive, i.e. it's executed
        directly in the JIT without calling C.

        Keep in mind that the speed at which hash values are calculated
        is only part of the story.  If the hash function quality is not
        great, or the hashed collection implementation is not efficient
        and induces collisions or other extra work, improving the
        efficiency of the hash functions won't do much.  I think it's
        mentioned in the hash book (I'd have to check), but once I made
        a hash function 5x times slower to get better quality and the
        result was that a report that was taking 30 minutes took 90
        seconds instead (and hashing was gone from the profiler output).

            Would we gain
            some speed doing that? hashMultiply is used a lof for
            identity hashes.

            Bytecode has quite some work to do:

            37 <70> self
            38 <20> pushConstant: 16383
            39 <BE> send: bitAnd:
            40 <68> popIntoTemp: 0
            41 <21> pushConstant: 9741
            42 <10> pushTemp: 0
            43 <B8> send: *
            44 <21> pushConstant: 9741
            45 <70> self
            46 <22> pushConstant: -14
            47 <BC> send: bitShift:
            48 <B8> send: *
            49 <23> pushConstant: 101
            50 <10> pushTemp: 0
            51 <B8> send: *
            52 <B0> send: +
            53 <20> pushConstant: 16383
            54 <BE> send: bitAnd:
            55 <24> pushConstant: 16384
            56 <B8> send: *
            57 <B0> send: +
            58 <25> pushConstant: 268435455
            59 <BE> send: bitAnd:
            60 <7C> returnTop


        If this is a primitive instead, then you can also avoid the
        overflow into large integers and do the math with (basically)

        mov eax, smallInteger
        shr eax, numberOfTagBits
        mul eax, 1664525 "the multiplication that throws out the high bits"
        shl eax, 4 "throw out bits 29-32"
        shr eax, 4
        lea eax, [eax * 2^numberOfTagBits + smallIntegerTagBits]

        Please excuse trivial omissions in the above, it's written only
        for the sake of illustration (e.g. it looks like the 3 last
        instructions can be combined into two... lea followed by shr).
          Also, did you see the latency of integer multiplication
        instructions in modern x86 processors?...

            I ran some experiments timing things.

            It looks like that replacing 16384 * by bitShift:14 leads to
            a small
            gain, bitShift (primitive 17) being faster than * (primitive 9)


        Keep in mind those operations still have to check for overflow
        into large integers.  In this case, large integers are not
        necessary.

        Andres.

            The bytecode is identical, except send: bitShift instead of
            send: *

            multiplication3
            | low |

            low := self bitAnd: 16383.
            ^(16r260D * low + ((16r260D * (self bitShift: -14) +
            (16r0065 * low)
            bitAnd: 16383) bitShift: 14))
            bitAnd: 16r0FFFFFFF



            [500000 timesRepeat: [ 15000 hashMultiply ]] timeToRun 12
            [500000 timesRepeat: [ 15000 multiplication ]] timeToRun 41
              (worse)
            [500000 timesRepeat: [ 15000 multiplication3 ]] timeToRun 10
            (better)

            It looks like correct for SmallInteger minVal to:
            SmallInteger maxVal

            Now, VW gives: [500000 timesRepeat: [ 15000 hashMultiply ]]
            timeToRun
            1.149 milliseconds

            Definitely worth investigating the primitive thing, or some
            NB Asm as
            this is used about everywhere (Collections etc).

            Toughts?

            Phil

Re: [Pharo-dev] hashMultiply

Reply via email to