neoremind commented on pull request #91:
URL: https://github.com/apache/lucene/pull/91#issuecomment-825776633


   I make `StableMSBRadixSorter` as default sorter and use `InPlaceMergeSorter` 
as fallback sorter. Please check the latest commit. (I did not squash commits, 
and save every commit as separate branch in my own forked repo for future 
reference).
   
   Average sort time (unit in us) comparison.
   (1) PR branch
   (2) main branch
   (3) PR branch, verify doc ID not in ascending order, original 
`MSBRadixSorter` with stable reorder.
   ```
    
-------------------------------------------------------------------------------
   | bytesPerDim | isDocIdIncremental |      (1)     |      (2)     |     (3)   
  |
    
-------------------------------------------------------------------------------
   |      1      |         N          |     31132.7  |   1144138.9  |    
981732.7  |
   |      1      |         Y          |     33049.8  |    390398.8  |           
   |
   |      2      |         N          |    273975.1  |   1121301.9  |    
941666.1  |
   |      2      |         Y          |    281060.2  |   1124444.9  |           
   |
   |      3      |         N          |    844291.7  |   1482451.3  |   
1306374.9  |
   |      3      |         Y          |    804839.8  |   1471338.1  |           
   |
   |      4      |         N          |   1274670.8  |   1424961.6  |   
1262810.3  |
   |      4      |         Y          |   1289128.8  |   1423907.1  |           
   |
   |      8      |         N          |   1357592.3  |   1437768.5  |   
1282193.9  |
   |      8      |         Y          |   1286732.0  |   1474001.1  |           
   |
   |     16      |         N          |   1366177.6  |   1464370.4  |   
1269967.4  |
   |     16      |         Y          |   1353213.1  |   1478291.3  |           
   |
   |     32      |         N          |   1403655.4  |   1500323.9  |   
1293686.0  |
   |     32      |         Y          |   1406872.4  |   1508646.1  |           
   |
    
-------------------------------------------------------------------------------
   ```
   
   In conclusion, PR branch runs faster than main branch regardless of 
bytePerDim and isDocIdIncremental, some cases 38x times faster. If we make 
original `MSBRadixSorter` with stable reorder, PR branch is little bit slower 
but acceptable. I think in real case, doc ID is usually in ascending order when 
index building, and that could be a noticeable performance improvement.
   
   > We could still iterate later, but for now this sounds to me like a good 
performance-simplicity trade-off. What do you think?
   I have no problem 😄 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to