Re: [PR] Reduce the overhead of `IndexInput#prefetch` when data is cached in RAM. [lucene]

via GitHub Fri, 17 May 2024 13:14:06 -0700


jpountz commented on PR #13381:
URL: https://github.com/apache/lucene/pull/13381#issuecomment-2118319218


   <details>
   <summary>I slightly modified the benchmark from #13337</summary>
   
   ```java
   import java.io.IOException;
   import java.nio.file.Path;
   import java.nio.file.Paths;
   import java.util.ArrayList;
   import java.util.Arrays;
   import java.util.List;
   import java.util.Random;
   import java.util.concurrent.ThreadLocalRandom;
   
   import org.apache.lucene.store.Directory;
   import org.apache.lucene.store.IOContext;
   import org.apache.lucene.store.IndexInput;
   import org.apache.lucene.store.IndexOutput;
   import org.apache.lucene.store.MMapDirectory;
   
   public class PrefetchBench {
   
     private static final int NUM_TERMS = 3;
     private static final long FILE_SIZE = 100L * 1024 * 1024 * 1024; // 100GB
     private static final int NUM_BYTES = 16;
     public static int DUMMY;
   
     public static void main(String[] args) throws IOException {
       Path filePath = Paths.get(args[0]);
       Path dirPath = filePath.getParent();
       String fileName = filePath.getFileName().toString();
       Random r = ThreadLocalRandom.current();
   
       try (Directory dir = new MMapDirectory(dirPath)) {
         if (Arrays.asList(dir.listAll()).contains(fileName) == false) {
           try (IndexOutput out = dir.createOutput(fileName, 
IOContext.DEFAULT)) {
             byte[] buf = new byte[8196];
             for (long i = 0; i < FILE_SIZE; i += buf.length) {
               r.nextBytes(buf);
               out.writeBytes(buf, buf.length);
             }
           }
         }
   
         for (boolean dataFitsInCache : new boolean[] { false, true}) {
           try (IndexInput i0 = dir.openInput("file", IOContext.DEFAULT)) {
             byte[][] b = new byte[NUM_TERMS][];
             for (int i = 0; i < NUM_TERMS; ++i) {
               b[i] = new byte[NUM_BYTES];
             }
             IndexInput[] inputs = new IndexInput[NUM_TERMS];
             if (dataFitsInCache) {
               // 16MB slice that should easily fit in the page cache
               inputs[0] = i0.slice("slice", 0, 16 * 1024 * 1024);
             } else {
               inputs[0] = i0;
             }
             for (int i = 1; i < NUM_TERMS; ++i) {
               inputs[i] = inputs[0].clone();
             }
             final long length = inputs[0].length();
             List<Long>[] latencies = new List[2];
             latencies[0] = new ArrayList<>();
             latencies[1] = new ArrayList<>();
             for (int iter = 0; iter < 100_000; ++iter) {
               final boolean prefetch = (iter & 1) == 0;
   
               final long start = System.nanoTime();
   
               for (IndexInput ii : inputs) {
                 final long offset = r.nextLong(length - NUM_BYTES);
                 ii.seek(offset);
                 if (prefetch) {
                   ii.prefetch(offset, 1);
                 }
               }
   
               for (int i = 0; i < NUM_TERMS; ++i) {
                 inputs[i].readBytes(b[i], 0, b[i].length);
               }
   
               final long end = System.nanoTime();
   
               // Prevent the JVM from optimizing away the reads
               DUMMY = Arrays.stream(b).mapToInt(Arrays::hashCode).sum();
   
               latencies[iter & 1].add(end - start);
             }
   
             latencies[0].sort(null);
             latencies[1].sort(null);
   
             System.out.println("Data " + (dataFitsInCache ? "fits" : "does not 
fit") + " in the page cache");
             long prefetchP50 = latencies[0].get(latencies[0].size() / 2);
             long prefetchP90 = latencies[0].get(latencies[0].size() * 9 / 10);
             long prefetchP99 = latencies[0].get(latencies[0].size() * 99 / 
100);
             long noPrefetchP50 = latencies[1].get(latencies[1].size() / 2);
             long noPrefetchP90 = latencies[1].get(latencies[1].size() * 9 / 
10);
             long noPrefetchP99 = latencies[1].get(latencies[1].size() * 99 / 
100);
   
             System.out.println("  With prefetching:    P50=" + prefetchP50 + 
"ns P90=" + prefetchP90 + "ns P99=" + prefetchP99 + "ns");
             System.out.println("  Without prefetching: P50=" + noPrefetchP50 + 
"ns P90=" + noPrefetchP90 + "ns P99=" + noPrefetchP99 + "ns");
           }
         }
       }
     }
   
   }
   
   ```
   </details>
   
   It gives the following results. Before the change:
   
   ```
   Data does not fit in the page cache
     With prefetching:    P50=88080ns P90=122970ns P99=157420ns
     Without prefetching: P50=224040ns P90=242320ns P99=297470ns
   Data fits in the page cache
     With prefetching:    P50=880ns P90=1060ns P99=1370ns
     Without prefetching: P50=190ns P90=280ns P99=580ns
   ```
   
   After the change:
   
   ```
   Data does not fit in the page cache
     With prefetching:    P50=89710ns P90=124780ns P99=159400ns
     Without prefetching: P50=224271ns P90=242940ns P99=297371ns
   Data fits in the page cache
     With prefetching:    P50=210ns P90=300ns P99=630ns
     Without prefetching: P50=200ns P90=290ns P99=580ns
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] Reduce the overhead of `IndexInput#prefetch` when data is cached in RAM. [lucene]

Reply via email to