[GitHub] [lucene-solr] gf2121 edited a comment on pull request #2113: LUCENE-9629: Use computed masks

GitBox Tue, 08 Dec 2020 07:12:13 -0800


gf2121 edited a comment on pull request #2113:
URL: https://github.com/apache/lucene-solr/pull/2113#issuecomment-740674712



   > I wonder if the difference in performance is observable since final long 
values would be inlined at compile time (and easily optimized for hotspot) 
whereas array accesses, even if locally cached, still have to be dynamic (I 
don't think the compiler is smart enough to detect constant array values?).
   
   Hi @dweiss ! Thers days I did some more benchmarks on this issue and get 
some 'amazing' result...
   First, i randomly choosed a decode method `decode15`, and try to find out if 
it will be slower in an array case. Here is the benchmark code based on JMH:
   ```
   @State(Scope.Benchmark)
   public class MyBenchmark {
       private static final long MASK16_1 = 0x0001000100010001L;
       private static final long[] MASKS16_1 = new long[] {MASK16_1};
       private static final long[] TMP = new long[128];
       private static final long[] ARR = new long[128];
   
       static {
           for (int i=0;i<128;i++) {
               TMP[i] = ARR[i] = i;
           }
       }
   
       public static void main(String[] args) throws RunnerException {
           Options opt = new OptionsBuilder()
                   .include("MyBenchmark")
                   .build();
   
           new Runner(opt).run();
       }
   
       @Benchmark
       @BenchmarkMode({Mode.Throughput})
       @Fork(1)
       @Measurement(iterations = 10, time = 1, timeUnit = TimeUnit.SECONDS)
       @Warmup(iterations = 3, time = 1, timeUnit = TimeUnit.SECONDS)
       public static void decode0() {
           for (int iter = 0, tmpIdx = 0, longsIdx = 30; iter < 2; ++iter, 
tmpIdx += 15, longsIdx += 1) {
               long l0 = (TMP[tmpIdx+0] & MASKS16_1[0]) << 14;
               l0 |= (TMP[tmpIdx+1] & MASKS16_1[0]) << 13;
               l0 |= (TMP[tmpIdx+2] & MASKS16_1[0]) << 12;
               l0 |= (TMP[tmpIdx+3] & MASKS16_1[0]) << 11;
               l0 |= (TMP[tmpIdx+4] & MASKS16_1[0]) << 10;
               l0 |= (TMP[tmpIdx+5] & MASKS16_1[0]) << 9;
               l0 |= (TMP[tmpIdx+6] & MASKS16_1[0]) << 8;
               l0 |= (TMP[tmpIdx+7] & MASKS16_1[0]) << 7;
               l0 |= (TMP[tmpIdx+8] & MASKS16_1[0]) << 6;
               l0 |= (TMP[tmpIdx+9] & MASKS16_1[0]) << 5;
               l0 |= (TMP[tmpIdx+10] & MASKS16_1[0]) << 4;
               l0 |= (TMP[tmpIdx+11] & MASKS16_1[0]) << 3;
               l0 |= (TMP[tmpIdx+12] & MASKS16_1[0]) << 2;
               l0 |= (TMP[tmpIdx+13] & MASKS16_1[0]) << 1;
               l0 |= (TMP[tmpIdx+14] & MASKS16_1[0]) << 0;
               ARR[longsIdx+0] = l0;
           }
       }
   
       @Benchmark
       @BenchmarkMode({Mode.Throughput})
       @Fork(1)
       @Measurement(iterations = 10, time = 1, timeUnit = TimeUnit.SECONDS)
       @Warmup(iterations = 3, time = 1, timeUnit = TimeUnit.SECONDS)
       public static void decode1() {
           for (int iter = 0, tmpIdx = 0, longsIdx = 30; iter < 2; ++iter, 
tmpIdx += 15, longsIdx += 1) {
               long l0 = (TMP[tmpIdx+0] & MASK16_1) << 14;
               l0 |= (TMP[tmpIdx+1] & MASK16_1) << 13;
               l0 |= (TMP[tmpIdx+2] & MASK16_1) << 12;
               l0 |= (TMP[tmpIdx+3] & MASK16_1) << 11;
               l0 |= (TMP[tmpIdx+4] & MASK16_1) << 10;
               l0 |= (TMP[tmpIdx+5] & MASK16_1) << 9;
               l0 |= (TMP[tmpIdx+6] & MASK16_1) << 8;
               l0 |= (TMP[tmpIdx+7] & MASK16_1) << 7;
               l0 |= (TMP[tmpIdx+8] & MASK16_1) << 6;
               l0 |= (TMP[tmpIdx+9] & MASK16_1) << 5;
               l0 |= (TMP[tmpIdx+10] & MASK16_1) << 4;
               l0 |= (TMP[tmpIdx+11] & MASK16_1) << 3;
               l0 |= (TMP[tmpIdx+12] & MASK16_1) << 2;
               l0 |= (TMP[tmpIdx+13] & MASK16_1) << 1;
               l0 |= (TMP[tmpIdx+14] & MASK16_1) << 0;
               ARR[longsIdx+0] = l0;
           }
       }
   
       @Benchmark
       @BenchmarkMode({Mode.Throughput})
       @Fork(1)
       @Measurement(iterations = 10, time = 1, timeUnit = TimeUnit.SECONDS)
       @Warmup(iterations = 3, time = 1, timeUnit = TimeUnit.SECONDS)
       public static void decode2() {
           for (int iter = 0, tmpIdx = 0, longsIdx = 30; iter < 2; ++iter, 
tmpIdx += 15, longsIdx += 1) {
               long l0 = (TMP[tmpIdx+0] & 0x0001000100010001L) << 14;
               l0 |= (TMP[tmpIdx+1] & 0x0001000100010001L) << 13;
               l0 |= (TMP[tmpIdx+2] & 0x0001000100010001L) << 12;
               l0 |= (TMP[tmpIdx+3] & 0x0001000100010001L) << 11;
               l0 |= (TMP[tmpIdx+4] & 0x0001000100010001L) << 10;
               l0 |= (TMP[tmpIdx+5] & 0x0001000100010001L) << 9;
               l0 |= (TMP[tmpIdx+6] & 0x0001000100010001L) << 8;
               l0 |= (TMP[tmpIdx+7] & 0x0001000100010001L) << 7;
               l0 |= (TMP[tmpIdx+8] & 0x0001000100010001L) << 6;
               l0 |= (TMP[tmpIdx+9] & 0x0001000100010001L) << 5;
               l0 |= (TMP[tmpIdx+10] & 0x0001000100010001L) << 4;
               l0 |= (TMP[tmpIdx+11] & 0x0001000100010001L) << 3;
               l0 |= (TMP[tmpIdx+12] & 0x0001000100010001L) << 2;
               l0 |= (TMP[tmpIdx+13] & 0x0001000100010001L) << 1;
               l0 |= (TMP[tmpIdx+14] & 0x0001000100010001L) << 0;
               ARR[longsIdx+0] = l0;
           }
       }
   }
   ```
   **Result:**
   
   method | speed (ops/s)
   ------------ | -------------
   MyBenchmark.decode0 | 107204513.594 ± 4773882.841
   MyBenchmark.decode1 | 68214347.208 ± 1293940.972
   MyBenchmark.decode2 | 67038804.906 ± 1667315.376
   
   Surprisingly, the static final array val get a more than 50% faster than 
static final long. I am not sure if there is anything wrong in the benchmark 
code, and not sure if the result make sense in a realistic scene. Most of all, 
please tell me if you have some clue for the reason:)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] gf2121 edited a comment on pull request #2113: LUCENE-9629: Use computed masks

Reply via email to