neoremind commented on pull request #91:
URL: https://github.com/apache/lucene/pull/91#issuecomment-827678981


   I spent some time trying to use the real case benchmark. The speedup of 
`IndexWriter` is what we expected, faster than main branch, total time elapsed 
(include adding doc, building index and merging) decreased by about 20%. If we 
only consider `flush_time`, the speedup is more obvious, time cost drops about 
40% - 50%.
   
   1) Run 
[IndexAndSearchOpenStreetMaps1D.java](https://github.com/neoremind/luceneutil/blob/master/src/main/perf/IndexAndSearchOpenStreetMaps1D.java)
 against the two branches and take down the 
[log](https://github.com/neoremind/luceneutil/tree/master/log/OpenStreetMaps).
   _note: comment query stage, modify some of the code to adapt to latest 
Lucene main branch._
   
   main branch:
   ```
   # egrep "flush time|sec to build index" open-street-maps.log
   DWPT 0 [2021-04-27T11:33:04.518908Z; main]: flush time 17284.537739 msec
   DWPT 0 [2021-04-27T11:33:37.888449Z; main]: flush time 12039.476885 msec
   72.49147722 sec to build index
   ```
   PR branch:
   ```
   #egrep "flush time|sec to build index" open-street-maps-optimized.log
   DWPT 0 [2021-04-27T11:35:00.619683Z; main]: flush time 9313.007647 msec
   DWPT 0 [2021-04-27T11:35:29.575254Z; main]: flush time 8631.820226 msec
   59.252797133 sec to build index
   ```
   
   2) Further more, I come up with an idea to use TPC-H LINEITEM to verify. I 
have a 10GB TPC-H dataset and develop a new test case to import the first 5 INT 
fields, which is more typical in real case.
   
   Run 
[IndexAndSearchTpcHLineItem.java](https://github.com/neoremind/luceneutil/blob/master/src/main/perf/IndexAndSearchTpcHLineItem.java)
 against the two branches and take down the 
[log](https://github.com/neoremind/luceneutil/tree/master/log/TPC-H-LINEITEM).
   
   main branch:
   ```
   egrep "flush time|sec to build index" tpch-lineitem.log
   DWPT 0 [2021-04-27T11:17:25.329006Z; main]: flush time 13850.23328 msec
   DWPT 0 [2021-04-27T11:17:50.289370Z; main]: flush time 12228.723665 msec
   DWPT 0 [2021-04-27T11:18:15.546002Z; main]: flush time 12537.085005 msec
   DWPT 0 [2021-04-27T11:18:40.140413Z; main]: flush time 11819.225223 msec
   DWPT 0 [2021-04-27T11:19:04.850989Z; main]: flush time 12004.380921 msec
   DWPT 0 [2021-04-27T11:19:29.435183Z; main]: flush time 11850.273453 msec
   DWPT 0 [2021-04-27T11:19:54.016951Z; main]: flush time 11882.316067 msec
   DWPT 0 [2021-04-27T11:20:18.932727Z; main]: flush time 12223.151464 msec
   DWPT 0 [2021-04-27T11:20:43.522117Z; main]: flush time 11871.276323 msec
   DWPT 0 [2021-04-27T11:20:52.060300Z; main]: flush time 3422.434221 msec
   271.188917715 sec to build index
   ```
   PR branch:
   ```
    egrep "flush time|sec to build index" tpch-lineitem-optimized.log
   DWPT 0 [2021-04-27T11:24:00.362128Z; main]: flush time 7573.05091 msec
   DWPT 0 [2021-04-27T11:24:19.498948Z; main]: flush time 7355.376016 msec
   DWPT 0 [2021-04-27T11:24:38.602117Z; main]: flush time 7287.306154 msec
   DWPT 0 [2021-04-27T11:24:57.541930Z; main]: flush time 7227.514396 msec
   DWPT 0 [2021-04-27T11:25:16.474158Z; main]: flush time 7236.208865 msec
   DWPT 0 [2021-04-27T11:25:35.339855Z; main]: flush time 7152.876269 msec
   DWPT 0 [2021-04-27T11:25:54.111110Z; main]: flush time 7080.405571 msec
   DWPT 0 [2021-04-27T11:26:12.985489Z; main]: flush time 7188.012278 msec
   DWPT 0 [2021-04-27T11:26:31.857053Z; main]: flush time 7176.303704 msec
   DWPT 0 [2021-04-27T11:26:38.838771Z; main]: flush time 2185.742347 msec
   213.175509249 sec to build index
   ```
   
   For benchmark command, please refer to [my 
document](https://github.com/neoremind/luceneutil/tree/master/command). 
   
   Test environment:
   ```
   CPU: 
   Architecture:          x86_64
   CPU op-mode(s):        32-bit, 64-bit
   Byte Order:            Little Endian
   CPU(s):                32
   On-line CPU(s) list:   0-31
   Thread(s) per core:    2
   Core(s) per socket:    16
   Socket(s):             1
   NUMA node(s):          1
   Vendor ID:             GenuineIntel
   CPU family:            6
   Model:                 85
   Model name:            Intel(R) Xeon(R) Platinum 8163 CPU @ 2.50GHz
   Stepping:              4
   CPU MHz:               2500.000
   BogoMIPS:              5000.00
   Hypervisor vendor:     KVM
   Virtualization type:   full
   L1d cache:             32K
   L1i cache:             32K
   L2 cache:              1024K
   L3 cache:              33792K
   NUMA node0 CPU(s):     0-31
   
   Memory: 
   $cat /proc/meminfo
   MemTotal:       65703704 kB
   
   Disk: SATA 
   $fdisk -l | grep Disk
   Disk /dev/vdb: 35184.4 GB, 35184372088832 bytes, 68719476736 sectors
   
   OS: 
   Linux 4.19.57-15.1.al7.x86_64
   
   JDK:
   openjdk version "11.0.11" 2021-04-20 LTS
   OpenJDK Runtime Environment 18.9 (build 11.0.11+9-LTS)
   OpenJDK 64-Bit Server VM 18.9 (build 11.0.11+9-LTS, mixed mode, sharing)
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to