[GitHub] [lucene] neoremind commented on pull request #91: LUCENE-9932: Performance improvement for BKD index building

2021-05-14 Thread GitBox


neoremind commented on pull request #91:
URL: https://github.com/apache/lucene/pull/91#issuecomment-841074895


   @jpountz It's great to work with you on this optimization :smile: Thanks for 
taking so much time to help me.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] neoremind commented on pull request #91: LUCENE-9932: Performance improvement for BKD index building

2021-05-13 Thread GitBox


neoremind commented on pull request #91:
URL: https://github.com/apache/lucene/pull/91#issuecomment-840385022


   Comment addressed. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] neoremind commented on pull request #91: LUCENE-9932: Performance improvement for BKD index building

2021-05-13 Thread GitBox


neoremind commented on pull request #91:
URL: https://github.com/apache/lucene/pull/91#issuecomment-840373136


   > This looks good to me. Can we better check that the sort is actually 
stable in the tests? E.g. maybe we could verify that the arrays are not only 
equal after sorting with Arrays#sort and StableMSBRadixSorter but also that 
they have the very same instance at every index?
   
   Add a new assertion `assertSame(points[i].packedValue, 
reader.points[i].packedValue);`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] neoremind commented on pull request #91: LUCENE-9932: Performance improvement for BKD index building

2021-05-12 Thread GitBox


neoremind commented on pull request #91:
URL: https://github.com/apache/lucene/pull/91#issuecomment-840256710


   @jpountz sorry, forget to push :sweat_smile:


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] neoremind commented on pull request #91: LUCENE-9932: Performance improvement for BKD index building

2021-05-12 Thread GitBox


neoremind commented on pull request #91:
URL: https://github.com/apache/lucene/pull/91#issuecomment-839858327


   Comments addressed.
   1. Reuse offset array with size of `HISTOGRAM_SIZE` in reorder.
   2. Update CHANGES document.
   3. Remove benchmark test case.
   4. Add logic to check doc IDs' ordering in 
`TestMutablePointsReaderUtils.java`.
   5. Add new test case for `StableMSBRadixSorter.java` (by copying 
`TestMSBRadixSorter` and only change the sorter, is that ok?)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] neoremind commented on pull request #91: LUCENE-9932: Performance improvement for BKD index building

2021-05-11 Thread GitBox


neoremind commented on pull request #91:
URL: https://github.com/apache/lucene/pull/91#issuecomment-839400284


   Thanks for taking time working on my branch. I merged your change into this 
PR, the code looks much better .
   I was wondering which test case do I neglect besides `TestIndexSorting`? I 
agree that maybe we could work on improving index sorting scenario later.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] neoremind commented on pull request #91: LUCENE-9932: Performance improvement for BKD index building

2021-04-27 Thread GitBox


neoremind commented on pull request #91:
URL: https://github.com/apache/lucene/pull/91#issuecomment-827678981


   I spent some time trying to use the real case benchmark. The speedup of 
`IndexWriter` is what we expected, faster than main branch, total time elapsed 
(include adding doc, building index and merging) decreased by about 20%. If we 
only consider `flush_time`, the speedup is more obvious, time cost drops about 
40% - 50%.
   
   1) Run 
[IndexAndSearchOpenStreetMaps1D.java](https://github.com/neoremind/luceneutil/blob/master/src/main/perf/IndexAndSearchOpenStreetMaps1D.java)
 against the two branches and take down the 
[log](https://github.com/neoremind/luceneutil/tree/master/log/OpenStreetMaps).
   _note: comment query stage, modify some of the code to adapt to latest 
Lucene main branch._
   
   main branch:
   ```
   # egrep "flush time|sec to build index" open-street-maps.log
   DWPT 0 [2021-04-27T11:33:04.518908Z; main]: flush time 17284.537739 msec
   DWPT 0 [2021-04-27T11:33:37.888449Z; main]: flush time 12039.476885 msec
   72.49147722 sec to build index
   ```
   PR branch:
   ```
   #egrep "flush time|sec to build index" open-street-maps-optimized.log
   DWPT 0 [2021-04-27T11:35:00.619683Z; main]: flush time 9313.007647 msec
   DWPT 0 [2021-04-27T11:35:29.575254Z; main]: flush time 8631.820226 msec
   59.252797133 sec to build index
   ```
   
   2) Further more, I come up with an idea to use TPC-H LINEITEM to verify. I 
have a 10GB TPC-H dataset and develop a new test case to import the first 5 INT 
fields, which is more typical in real case.
   
   Run 
[IndexAndSearchTpcHLineItem.java](https://github.com/neoremind/luceneutil/blob/master/src/main/perf/IndexAndSearchTpcHLineItem.java)
 against the two branches and take down the 
[log](https://github.com/neoremind/luceneutil/tree/master/log/TPC-H-LINEITEM).
   
   main branch:
   ```
   egrep "flush time|sec to build index" tpch-lineitem.log
   DWPT 0 [2021-04-27T11:17:25.329006Z; main]: flush time 13850.23328 msec
   DWPT 0 [2021-04-27T11:17:50.289370Z; main]: flush time 12228.723665 msec
   DWPT 0 [2021-04-27T11:18:15.546002Z; main]: flush time 12537.085005 msec
   DWPT 0 [2021-04-27T11:18:40.140413Z; main]: flush time 11819.225223 msec
   DWPT 0 [2021-04-27T11:19:04.850989Z; main]: flush time 12004.380921 msec
   DWPT 0 [2021-04-27T11:19:29.435183Z; main]: flush time 11850.273453 msec
   DWPT 0 [2021-04-27T11:19:54.016951Z; main]: flush time 11882.316067 msec
   DWPT 0 [2021-04-27T11:20:18.932727Z; main]: flush time 12223.151464 msec
   DWPT 0 [2021-04-27T11:20:43.522117Z; main]: flush time 11871.276323 msec
   DWPT 0 [2021-04-27T11:20:52.060300Z; main]: flush time 3422.434221 msec
   271.188917715 sec to build index
   ```
   PR branch:
   ```
egrep "flush time|sec to build index" tpch-lineitem-optimized.log
   DWPT 0 [2021-04-27T11:24:00.362128Z; main]: flush time 7573.05091 msec
   DWPT 0 [2021-04-27T11:24:19.498948Z; main]: flush time 7355.376016 msec
   DWPT 0 [2021-04-27T11:24:38.602117Z; main]: flush time 7287.306154 msec
   DWPT 0 [2021-04-27T11:24:57.541930Z; main]: flush time 7227.514396 msec
   DWPT 0 [2021-04-27T11:25:16.474158Z; main]: flush time 7236.208865 msec
   DWPT 0 [2021-04-27T11:25:35.339855Z; main]: flush time 7152.876269 msec
   DWPT 0 [2021-04-27T11:25:54.10Z; main]: flush time 7080.405571 msec
   DWPT 0 [2021-04-27T11:26:12.985489Z; main]: flush time 7188.012278 msec
   DWPT 0 [2021-04-27T11:26:31.857053Z; main]: flush time 7176.303704 msec
   DWPT 0 [2021-04-27T11:26:38.838771Z; main]: flush time 2185.742347 msec
   213.175509249 sec to build index
   ```
   
   For benchmark command, please refer to [my 
document](https://github.com/neoremind/luceneutil/tree/master/command). 
   
   Test environment:
   ```
   CPU: 
   Architecture:  x86_64
   CPU op-mode(s):32-bit, 64-bit
   Byte Order:Little Endian
   CPU(s):32
   On-line CPU(s) list:   0-31
   Thread(s) per core:2
   Core(s) per socket:16
   Socket(s): 1
   NUMA node(s):  1
   Vendor ID: GenuineIntel
   CPU family:6
   Model: 85
   Model name:Intel(R) Xeon(R) Platinum 8163 CPU @ 2.50GHz
   Stepping:  4
   CPU MHz:   2500.000
   BogoMIPS:  5000.00
   Hypervisor vendor: KVM
   Virtualization type:   full
   L1d cache: 32K
   L1i cache: 32K
   L2 cache:  1024K
   L3 cache:  33792K
   NUMA node0 CPU(s): 0-31
   
   Memory: 
   $cat /proc/meminfo
   MemTotal:   65703704 kB
   
   Disk: SATA 
   $fdisk -l | grep Disk
   Disk /dev/vdb: 35184.4 GB, 35184372088832 bytes, 68719476736 sectors
   
   OS: 
   Linux 4.19.57-15.1.al7.x86_64
   
   JDK:
   openjdk version "11.0.11" 2021-04-20 LTS
   OpenJDK Runtime Environment 18.9 (build 11.0.11+9-LTS)
   OpenJDK 64-Bit Server VM 18.9 (build 11.0.11+9-LTS, mixed mode, sharing)
   ```


-- 

[GitHub] [lucene] neoremind commented on pull request #91: LUCENE-9932: Performance improvement for BKD index building

2021-04-23 Thread GitBox


neoremind commented on pull request #91:
URL: https://github.com/apache/lucene/pull/91#issuecomment-825779387


   @jpountz are there any testcases suitable to verify the end to end 
performance improvement, like through IndexWriter? maybe I could give it try. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] neoremind commented on pull request #91: LUCENE-9932: Performance improvement for BKD index building

2021-04-23 Thread GitBox


neoremind commented on pull request #91:
URL: https://github.com/apache/lucene/pull/91#issuecomment-825776633


   I make `StableMSBRadixSorter` as default sorter and use `InPlaceMergeSorter` 
as fallback sorter. Please check the latest commit. (I did not squash commits, 
and save every commit as separate branch in my own forked repo for future 
reference).
   
   Average sort time (unit in us) comparison.
   (1) PR branch
   (2) main branch
   (3) PR branch, verify doc ID not in ascending order, original 
`MSBRadixSorter` with stable reorder.
   ```

---
   | bytesPerDim | isDocIdIncremental |  (1) |  (2) | (3)   
  |

---
   |  1  | N  | 31132.7  |   1144138.9  |
981732.7  |
   |  1  | Y  | 33049.8  |390398.8  |   
   |
   |  2  | N  |273975.1  |   1121301.9  |
941666.1  |
   |  2  | Y  |281060.2  |   112.9  |   
   |
   |  3  | N  |844291.7  |   1482451.3  |   
1306374.9  |
   |  3  | Y  |804839.8  |   1471338.1  |   
   |
   |  4  | N  |   1274670.8  |   1424961.6  |   
1262810.3  |
   |  4  | Y  |   1289128.8  |   1423907.1  |   
   |
   |  8  | N  |   1357592.3  |   1437768.5  |   
1282193.9  |
   |  8  | Y  |   1286732.0  |   1474001.1  |   
   |
   | 16  | N  |   1366177.6  |   1464370.4  |   
1269967.4  |
   | 16  | Y  |   1353213.1  |   1478291.3  |   
   |
   | 32  | N  |   1403655.4  |   1500323.9  |   
1293686.0  |
   | 32  | Y  |   1406872.4  |   1508646.1  |   
   |

---
   ```
   
   In conclusion, PR branch runs faster than main branch regardless of 
bytePerDim and isDocIdIncremental, some cases 38x times faster. If we make 
original `MSBRadixSorter` with stable reorder, PR branch is little bit slower 
but acceptable. I think in real case, doc ID is usually in ascending order when 
index building, and that could be a noticeable performance improvement.
   
   > We could still iterate later, but for now this sounds to me like a good 
performance-simplicity trade-off. What do you think?
   I have no problem  


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] neoremind commented on pull request #91: LUCENE-9932: Performance improvement for BKD index building

2021-04-22 Thread GitBox


neoremind commented on pull request #91:
URL: https://github.com/apache/lucene/pull/91#issuecomment-824891023


   I use `TimSort` instead of `InPlaceMergeSorter`, expect it to be faster, but 
it turns out to be slower. @jpountz would you check my latest commit to see if 
I implement Tim Sort correctly? 
   
   Below is the latest benchmark of `MSBRadixSort` with stable 
reorder(isDocIdIncremental = N) and `StableMSBRadixSort` (isDocIdIncremental = 
Y) 
   ```
-
   | bytesPerDim | isDocIdIncremental | avg time(us) |
-
   |  1  | N  |995541.5  |
   |  1  | Y  | 60399.2  |
   |  2  | N  |951085.9  |
   |  2  | Y  |322054.3  |
   |  3  | N  |   1333992.5  |
   |  3  | Y  |756951.4  |
   |  4  | N  |   1340422.4  |
   |  4  | Y  |   1528955.5  |
   |  8  | N  |   1323878.8  |
   |  8  | Y  |   1494004.5  |
   | 16  | N  |   1305548.1  |
   | 16  | Y  |   1480329.4  |
   | 32  | N  |   1326447.5  |
   | 32  | Y  |   1589089.8  |
-
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] neoremind commented on pull request #91: LUCENE-9932: Performance improvement for BKD index building

2021-04-21 Thread GitBox


neoremind commented on pull request #91:
URL: https://github.com/apache/lucene/pull/91#issuecomment-824172567


   > For instance I'd expect users who index integers (4 bytes) between 0 and 
2^24 to notice speedups that are closer to the one that you computed for 
bytesPerDim=3 than for bytesPerDim=4.
   
   @jpountz I add a new benchmark test case `testBenchmarkWithLeadingZeroBytes` 
in `TestBKDDisableSortDocId`. Your assumption is correct.
   
   Result is shown as below (do not make stable reorder as default).
   ```
   bytesPerDim=4, leadingZeroByteNum=[1,2,3]
-
   | bytesPerDim | isDocIdIncremental | avg time(us) |
-
   |  4  | N  |   1476874.0  |  <- leadingZeroByteNum = 
1
   |  4  | Y  |   1363280.3  | <- leadingZeroByteNum = 1
   |  4  | N  |   1601885.9  | <- leadingZeroByteNum = 2
   |  4  | Y  |930728.1  | <- <- leadingZeroByteNum 
= 2
   |  4  | N  |   1210586.8  | <- leadingZeroByteNum = 3
   |  4  | Y  |367141.6  | <- <- leadingZeroByteNum 
= 3
-
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] neoremind commented on pull request #91: LUCENE-9932: Performance improvement for BKD index building

2021-04-21 Thread GitBox


neoremind commented on pull request #91:
URL: https://github.com/apache/lucene/pull/91#issuecomment-824164245


   > +1 to always use the stable version of the algorithm. This would only use 
transient memory and in reasonable amounts, so I'm not concerned with the 
memory usage.
   
   Per comment, I make stable reorder in MSB Radix Sort as default. Things 
become a little tricky.
   
   The benchmark of this PR branch goes like below. Starting from bytePerDim=4, 
non-stable MSB Radix Sort runs faster than stable version.
   ```
 -
   | bytesPerDim | isDocIdIncremental | avg time(us) |
-
   |  1  | N  |981732.7  |
   |  1  | Y  | 56766.1  |
   |  2  | N  |941666.1  |
   |  2  | Y  |304395.9  |
   |  3  | N  |   1306374.9  |
   |  3  | Y  |853744.8  |
   |  4  | N  |   1262810.3  |
   |  4  | Y  |   1345605.8  |
   |  8  | N  |   1282193.9  |
   |  8  | Y  |   1364078.3  |
   | 16  | N  |   1269967.4  |
   | 16  | Y  |   1433378.3  |
   | 32  | N  |   1293686.0  |
   | 32  | Y  |   1468472.1  |
-
   ```
   
   To make it more clear, I also add the new result into the graph I made above.
   
   
![](https://issues.apache.org/jira/secure/attachment/13024398/refined-code-benchmark2.png)
   
   I try to analysis, according to the flame graph I generated (see above 
picture). Sorting includes the following stages.
   - build histogram
   - fallback intro sort (quick sort for non-stable, in place merge sort for 
stable)
   - reorder
   
   Let's illustrate the total time partitioned by the cost per stage. 
   
   For main branch `MSBRadixSort`, it shows as below.
   ```
   
++
   |  build histogram |  fallback sort (quick sort) | non 
stable reorder| 
   
++
   
   ```
   
   For PR branch `StableMSBRadixSort`, it shows as below
   ```
   
+-+
   |  build histogram | fallback sort (in place merge sort)|
stable reorder   | 
   
+-+
 
   ```
   
   For PR branch `MSBRadixSort` making stable reorder as default, it shows as 
below. 
   ```
   
+--+
   |  build histogram |   fallback sort (quick sort) |stable 
reorder  | 
   
+--+
   
   ```
   
   I think that stable reorder contributes most part of the speedup, but in 
place merge sort is slower than 3-pivot quick sort.
   
   After all, regardless of the order of doc IDs, stable reorder is definitely 
a good choice. If data are spread evenly, not many duplicates, 
`StableMSBRadixSort` is a little slower (10%) than `MSBRadixSorter`. If not, 
`StableMSBRadixSort` will out-perform far better than `MSBRadixSorter`. This 
makes me reluctant on the way we should go, @jpountz could you give me some 
advice?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] neoremind commented on pull request #91: LUCENE-9932: Performance improvement for BKD index building

2021-04-21 Thread GitBox


neoremind commented on pull request #91:
URL: https://github.com/apache/lucene/pull/91#issuecomment-823960282


   > For instance I'd expect users who index integers (4 bytes) between 0 and 
2^24 to notice speedups that are closer to the one that you computed for 
bytesPerDim=3 than for bytesPerDim=4.
   I can also construct a real case to verify :smile:


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] neoremind commented on pull request #91: LUCENE-9932: Performance improvement for BKD index building

2021-04-21 Thread GitBox


neoremind commented on pull request #91:
URL: https://github.com/apache/lucene/pull/91#issuecomment-823949164


   @jpountz Per your advice, I have updated the code.
   
   In terms of performance, I refined `TestBKDDisableSortDocId`, to make it 
re-runnable as a benchmark. I have made the following benchmark.
   
   doc num = 2,000,000, dim number = 1, bytePerDim = [1,2,3,4,8,16,32], run 
times = 10 (warm up 5 times)
   
   Result is shown as below.
   
   ```
 -
   | bytesPerDim | isDocIdIncremental | avg time(us) |
-
   |  1  | N  |   1127688.5  |
   |  1  | Y  | 56464.7  |
   |  2  | N  |   1124137.8  |
   |  2  | Y  |339150.2  |
   |  3  | N  |   1485020.4  |
   |  3  | Y  |878251.1  |
   |  4  | N  |   1436003.9  |
   |  4  | Y  |   1376974.1  |
   |  8  | N  |   1444971.3  |
   |  8  | Y  |   1365877.5  |
   | 16  | N  |   1500235.5  |
   | 16  | Y  |   1385235.8  |
   | 32  | N  |   1516514.0  |
   | 32  | Y  |   1415364.9  |
-
   ```
   
   Meanwhile, I also reset to main branch to run the same test case. Result is 
shown as below.
   
   ```
 -
   | bytesPerDim | isDocIdIncremental | avg time(us) |
-
   |  1  | N  |   1144138.9  |
   |  1  | Y  |390398.8  |
   |  2  | N  |   1121301.9  |
   |  2  | Y  |   112.9  |
   |  3  | N  |   1482451.3  |
   |  3  | Y  |   1471338.1  |
   |  4  | N  |   1424961.6  |
   |  4  | Y  |   1423907.1  |
   |  8  | N  |   1437768.5  |
   |  8  | Y  |   1474001.1  |
   | 16  | N  |   1464370.4  |
   | 16  | Y  |   1478291.3  |
   | 32  | N  |   1500323.9  |
   | 32  | Y  |   1508646.1  |
-
   ```
   
   I made a graph so that we can see more clearly.
   
   
![](https://issues.apache.org/jira/secure/attachment/13024383/refined-code-benchmark.png)
   
   If DocIds are increasing, PR branch out-performs in all scenarios. 
   If DocIds are not in order, we expect the performance to be the same with 
main branch. It does work almost the same, but here we introduce a small 
overhead to scan data beforehand, checking whether data is in order, so PR 
branch is a little bit (like 1% percent) backward. 
   
   I made a flame-graph, the right-most column is where checking order 
consumes, very small, like below 3% of total CPU consumption.
   
![](https://issues.apache.org/jira/secure/attachment/13024385/flame-graph.png)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] neoremind commented on pull request #91: LUCENE-9932: Performance improvement for BKD index building

2021-04-20 Thread GitBox


neoremind commented on pull request #91:
URL: https://github.com/apache/lucene/pull/91#issuecomment-823172724


   @jpountz Good advice! Before that I am still struggling where to propagate 
this config up to the index builder layer. 
   I will give it a try, the first thing comes up my mind is to bring up a new 
`prepare` method, in which it will scan all docid from i to j to see if they 
are increasing. I will experiment on this, if the overhead is small enough, 
then it is worthwhile to sort without docid.
   One more question, are there any places where doc id is not added 
increasingly? I mean the source code, not test cases.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org