[jira] [Commented] (LUCENE-5098) Broadword bit selection

Dawid Weiss (JIRA) Fri, 12 Jul 2013 00:31:55 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-5098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13706750#comment-13706750
 ]


Dawid Weiss commented on LUCENE-5098:
-------------------------------------

bq. I played a bit with it and rank9 was always between 15% and 20% slower than 
bitCount no matter what the input was (which is still impressing since bitCount 
is supposed to be an intrinsic)

I also toyed with this a bit. Interesting because rank9 is essentially Hacker's 
delight implementation, but with a multiplication at the end rather than 
shifts/ additions (which I think originated from the fact that multiplication 
used to be much slower than additions/shifts back in the day). 

I ran a few caliper bechmarks on a million longs, different distributions 
(Intel I7, JDK17) just to see what the output will be.
{code}
   benchmark distribution     us linear runtime
    HdPopCnd        ZEROS   2333 =
    HdPopCnd         FULL   2334 =
    HdPopCnd       RANDOM   2329 =
    HdPopCnd       ONEBIT   2334 =
       Rank9        ZEROS   1651 =
       Rank9         FULL   1652 =
       Rank9       RANDOM   1651 =
       Rank9       ONEBIT   1651 =
LongBitCount        ZEROS    411 =
LongBitCount         FULL    394 =
LongBitCount       RANDOM    391 =
LongBitCount       ONEBIT    404 =
 NaivePopCnt        ZEROS    585 =
 NaivePopCnt         FULL  39019 ======
 NaivePopCnt       RANDOM 171365 ==============================
 NaivePopCnt       ONEBIT  28155 ====
{code}

The naive loop was:
{code}
        int cnt = 0;
        while (x != 0) {
            if (((x >>>= 1) & 1) != 0L) {
                cnt++;
            }
        }
        return cnt;
{code}

and you can see that even for all zeros (when in fact there is no counting at 
all) it's still slower than the intrinsified popcnt. Note full-ones is not the 
worst case (I believe due to constant branch misprediction in a random input).

A zoomed-in benchmark without the naive impl.:
{code}
   benchmark distribution   us linear runtime
    HdPopCnd        ZEROS 2331 =============================
    HdPopCnd         FULL 2329 =============================
    HdPopCnd       RANDOM 2333 =============================
    HdPopCnd       ONEBIT 2333 =============================
       Rank9        ZEROS 1650 =====================
       Rank9         FULL 1650 =====================
       Rank9       RANDOM 1652 =====================
       Rank9       ONEBIT 1650 =====================
LongBitCount        ZEROS  400 =====
LongBitCount         FULL  402 =====
LongBitCount       RANDOM  401 =====
LongBitCount       ONEBIT  391 =====
{code}

You can see when popcnt isn't intrinsified by running with IBM's J9, for 
example:
{code}
   benchmark distribution   ms linear runtime
LongBitCount        ZEROS 2.25 =============================
LongBitCount         FULL 2.22 =============================
LongBitCount       RANDOM 2.25 =============================
LongBitCount       ONEBIT 2.25 =============================
    HdPopCnd        ZEROS 2.25 =============================
    HdPopCnd         FULL 2.25 =============================
    HdPopCnd       RANDOM 2.22 =============================
    HdPopCnd       ONEBIT 2.22 =============================
       Rank9        ZEROS 1.62 =====================
       Rank9         FULL 1.62 =====================
       Rank9       RANDOM 1.62 =====================
       Rank9       ONEBIT 1.62 =====================
{code}

But I think they'll eventually catch up with modern cpus too so I'd stick with 
Long.bitCount.

                
> Broadword bit selection
> -----------------------
>
>                 Key: LUCENE-5098
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5098
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: core/other
>            Reporter: Paul Elschot
>            Assignee: Adrien Grand
>            Priority: Minor
>         Attachments: LUCENE-5098.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-5098) Broadword bit selection

Reply via email to