On 6/24/13 5:56 PM, Anders Halager wrote:
On Monday, 24 June 2013 at 20:19:31 UTC, Walter Bright wrote:
On 6/24/2013 1:08 PM, Anders Halager wrote:
Python is one of the slower interpreted languages. It would be more
interesting
to look at luajit which actually does something clever.
If the string is at least 4 chars long it only hashes the first 4
bytes, the
last 4, the 4 starting at floor(len/2)-2 and the 4 starting at
floor(len/4)-1.
Any of these may overlap of course but that isn't a problem.

I used that method back in the 1980's, it was well known then, but
perhaps has drifted into obscurity. In fact, I still use it for
hashing identifiers in DMC++.

I can't imagine all the clever (even if outdated) tricks that have
disappeared with retired old-timers :)

I haven't set up anything for testing but if someone wants to try I've
made a quick patch here: http://dpaste.com/hold/1268958/

This is significantly faster than anything submitted thus far. Compiled alongside Juan Manuel Cabo's submission, the results are as follows:

Times hashing words:

        Unchanged : 1386 ms
        One switch: 1338 ms
        Only add : 1354 ms
        Anders Haliger : 933 ms

Times hashing entire lines:
        
        Unchanged : 335 ms
        One switch: 332 ms
        Only add : 331 ms
        Anders Haliger : 125 ms

Wonder how much faster can it get?

--

Andrew Edwards
--------------------
http://www.akeron.co
auto getAddress() {
    string location = "@", period = ".";
    return ("info" ~ location ~ "afidem" ~ period ~ "org");
}

Reply via email to