[ 
https://issues.apache.org/jira/browse/LUCENE-8635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16758431#comment-16758431
 ] 

Michael McCandless commented on LUCENE-8635:
--------------------------------------------

{quote}Better would be an attribute of {{FieldInfo}}, where we have 
{{put/getAttribute}}. Then {{FieldReader }}can inspect the {{FieldInfo}} and 
pass the appropriate {{On/OffHeapStore}} when creating its {{FST}}. What do you 
think?
{quote}
Hmm that's also an interesting approach to get per-field control.  One can set 
these attributes in a custom {{FieldType}} when indexing documents, or maybe in 
a custom codec at write time (just subclassing e.g. {{Lucene80Codec}}), or at 
read time using a real (named) custom codec.  So we would pick a specific 
string ({{FST_OFF_HEAP}} or something) and define that as a string constant 
which users could then use for setting the attribute?

So ... maybe we have a default behavior w/ Adrien's cool idea, but then also 
allow the attribute to give per-field control?  We should probably also by 
default (if the field attribute is not present) not do off-heap when the 
directory is not MMapDirectory?  We haven't tested the other directory impls 
but I suspect they'd be quite a bit slower with off-heap FST?

 
{quote}Given that reversing the index during write to make it forward reading 
didn't help the performance (in addition to it not being backward compatible), 
is the consensus to add exception for PK and directories other than mmap for 
offheap FST in [^ra.patch]?
{quote}
Yeah +1 to keep the two changes separated.

 

> Lazy loading Lucene FST offheap using mmap
> ------------------------------------------
>
>                 Key: LUCENE-8635
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8635
>             Project: Lucene - Core
>          Issue Type: New Feature
>          Components: core/FSTs
>         Environment: I used below setup for es_rally tests:
> single node i3.xlarge running ES 6.5
> es_rally was running on another i3.xlarge instance
>            Reporter: Ankit Jain
>            Priority: Major
>         Attachments: fst-offheap-ra-rev.patch, fst-offheap-rev.patch, 
> offheap.patch, optional_offheap_ra.patch, ra.patch, rally_benchmark.xlsx
>
>
> Currently, FST loads all the terms into heap memory during index open. This 
> causes frequent JVM OOM issues if the term size gets big. A better way of 
> doing this will be to lazily load FST using mmap. That ensures only the 
> required terms get loaded into memory.
>  
> Lucene can expose API for providing list of fields to load terms offheap. I'm 
> planning to take following approach for this:
>  # Add a boolean property fstOffHeap in FieldInfo
>  # Pass list of offheap fields to lucene during index open (ALL can be 
> special keyword for loading ALL fields offheap)
>  # Initialize the fstOffHeap property during lucene index open
>  # FieldReader invokes default FST constructor or OffHeap constructor based 
> on fstOffHeap field
>  
> I created a patch (that loads all fields offheap), did some benchmarks using 
> es_rally and results look good.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to