[ 
https://issues.apache.org/jira/browse/LUCENE-8635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16772153#comment-16772153
 ] 

Michael McCandless commented on LUCENE-8635:
--------------------------------------------

I ran luceneutil on {{wikimediumall}} with current trunk vs PR here – net/net 
looks like noise, which is great – I'll push shortly:
{noformat}
Report after iter 19:

                    Task    QPS base      StdDev    QPS comp      StdDev        
        Pct diff

                 Prefix3       37.05     (11.4%)       36.25     (13.0%)   
-2.1% ( -23% -   25%)
   BrowseMonthSSDVFacets        5.01      (6.4%)        4.91     (10.4%)   
-1.9% ( -17% -   15%)
   BrowseMonthTaxoFacets        1.24      (2.7%)        1.22      (4.8%)   
-1.3% (  -8% -    6%)
                Wildcard      106.53      (8.6%)      105.18      (9.1%)   
-1.3% ( -17% -   18%)
   HighTermDayOfYearSort       14.85      (4.2%)       14.70      (4.2%)   
-1.0% (  -9% -    7%)
    BrowseDateTaxoFacets        1.11      (3.2%)        1.10      (5.6%)   
-0.8% (  -9% -    8%)
BrowseDayOfYearTaxoFacets        1.11      (3.1%)        1.10      (5.6%)   
-0.8% (  -9% -    8%)
         MedSloppyPhrase        4.59      (3.4%)        4.56      (2.8%)   
-0.5% (  -6% -    5%)
                  Fuzzy2       68.49      (1.0%)       68.12      (1.3%)   
-0.5% (  -2% -    1%)
             LowSpanNear       30.34      (1.7%)       30.19      (1.9%)   
-0.5% (  -4% -    3%)
                  Fuzzy1       72.43      (0.9%)       72.10      (1.4%)   
-0.5% (  -2% -    1%)
               LowPhrase       34.35      (1.1%)       34.22      (2.0%)   
-0.4% (  -3% -    2%)
                 Respell       47.66      (1.4%)       47.48      (1.7%)   
-0.4% (  -3% -    2%)
         LowSloppyPhrase       10.59      (4.9%)       10.56      (3.6%)   
-0.3% (  -8% -    8%)
                HighTerm     1290.39      (1.8%)     1286.15      (1.4%)   
-0.3% (  -3% -    2%)
                 MedTerm     1419.25      (2.0%)     1415.23      (1.5%)   
-0.3% (  -3% -    3%)
                  IntNRQ       27.03     (11.0%)       26.96     (10.9%)   
-0.3% ( -19% -   24%)
        HighSloppyPhrase        6.73      (4.9%)        6.71      (3.4%)   
-0.3% (  -8% -    8%)
           OrNotHighHigh      825.79      (1.9%)      823.77      (1.4%)   
-0.2% (  -3% -    3%)
            OrNotHighMed      912.80      (1.3%)      910.96      (1.3%)   
-0.2% (  -2% -    2%)
               MedPhrase       29.52      (1.1%)       29.46      (1.9%)   
-0.2% (  -3% -    2%)
            OrHighNotLow     1184.54      (3.1%)     1182.86      (1.8%)   
-0.1% (  -4% -    4%)
                 LowTerm      974.30      (1.5%)      973.33      (1.4%)   
-0.1% (  -2% -    2%)
               OrHighLow      328.39      (1.0%)      328.13      (1.0%)   
-0.1% (  -2% -    1%)
             AndHighHigh       21.04      (2.8%)       21.03      (2.6%)   
-0.1% (  -5% -    5%)
           OrHighNotHigh      907.78      (1.8%)      907.93      (1.4%)    
0.0% (  -3% -    3%)
            OrHighNotMed     1019.49      (2.0%)     1019.67      (1.4%)    
0.0% (  -3% -    3%)
              AndHighMed       64.27      (1.1%)       64.33      (1.1%)    
0.1% (  -2% -    2%)
            OrNotHighLow      414.78      (1.2%)      415.43      (1.0%)    
0.2% (  -2% -    2%)
BrowseDayOfYearSSDVFacets        4.14      (6.9%)        4.15      (8.9%)    
0.2% ( -14% -   17%)
              AndHighLow      371.09      (1.7%)      371.84      (1.7%)    
0.2% (  -3% -    3%)
               OrHighMed       65.31      (1.8%)       65.45      (1.8%)    
0.2% (  -3% -    3%)
                PKLookup      141.21      (1.6%)      141.63      (1.9%)    
0.3% (  -3% -    3%)
            HighSpanNear       25.84      (2.8%)       25.94      (2.6%)    
0.4% (  -4% -    5%)
             MedSpanNear       26.39      (2.9%)       26.50      (2.8%)    
0.4% (  -5% -    6%)
              HighPhrase       11.72      (2.1%)       11.77      (1.9%)    
0.4% (  -3% -    4%)
              OrHighHigh       14.60      (2.2%)       14.69      (1.8%)    
0.6% (  -3% -    4%)
       HighTermMonthSort       31.51      (6.0%)       31.90      (6.0%)    
1.2% ( -10% -   14%){noformat}

> Lazy loading Lucene FST offheap using mmap
> ------------------------------------------
>
>                 Key: LUCENE-8635
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8635
>             Project: Lucene - Core
>          Issue Type: New Feature
>          Components: core/FSTs
>         Environment: I used below setup for es_rally tests:
> single node i3.xlarge running ES 6.5
> es_rally was running on another i3.xlarge instance
>            Reporter: Ankit Jain
>            Priority: Major
>         Attachments: fst-offheap-ra-rev.patch, fst-offheap-rev.patch, 
> offheap.patch, optional_offheap_ra.patch, ra.patch, rally_benchmark.xlsx
>
>
> Currently, FST loads all the terms into heap memory during index open. This 
> causes frequent JVM OOM issues if the term size gets big. A better way of 
> doing this will be to lazily load FST using mmap. That ensures only the 
> required terms get loaded into memory.
>  
> Lucene can expose API for providing list of fields to load terms offheap. I'm 
> planning to take following approach for this:
>  # Add a boolean property fstOffHeap in FieldInfo
>  # Pass list of offheap fields to lucene during index open (ALL can be 
> special keyword for loading ALL fields offheap)
>  # Initialize the fstOffHeap property during lucene index open
>  # FieldReader invokes default FST constructor or OffHeap constructor based 
> on fstOffHeap field
>  
> I created a patch (that loads all fields offheap), did some benchmarks using 
> es_rally and results look good.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to