[ 
https://issues.apache.org/jira/browse/LUCENE-8635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16750316#comment-16750316
 ] 

Ankit Jain edited comment on LUCENE-8635 at 1/23/19 6:47 PM:
-------------------------------------------------------------

{quote}Ankit Jain unfortunately RandomAccessInput doesn't offer readBytes. I'm 
looking into adding it; shouldn't be hard as there aren't that many 
implementations.{quote}
You don't need to use RandomAccessInput. You can revert back to original 
IndexInputReader and get rid of the reversal logic.
{code:title=ForwardIndexInputReader|borderStyle=solid}
/** Implements forward read for FST from an index input. */
final class ForwardIndexInputReader extends FST.BytesReader {
    private final IndexInput in;
    private final long startFP;

    public ReverseIndexInputReader(IndexInput in, long startFP) {
        this.in = in;
        this.startFP = startFP;
    }

    @Override
    public byte readByte() throws IOException {
        return this.in.readByte();
    }

    @Override
    public void readBytes(byte[] b, int offset, int len) throws IOException {
        this.in.readBytes(b, offset, len);
    }

    @Override
    public void skipBytes(long count) {
        this.setPosition(this.getPosition() + count);
    }

    @Override
    public long getPosition() {
        final long position = this.in.getFilePointer() - startFP;
        return position;
    }

    @Override
    public void setPosition(long pos) {
        try {
            this.in.seek(startFP + pos);
        } catch (IOException ex) {
            System.out.println(String.format("Unreported exception in set 
position at %d - %s", pos, ex.getMessage()));
        }
    }

    @Override
    public boolean reversed() {
        return false;
    }
}
{code}

{quote}Furthermore the NIO and Simple FS directories use buffering. I'm 
wondering how bad things would be if every seek would need to reload the 
buffer?{quote}
This can be serious concern for NIO and Simple FS systems. Given that most of 
the systems today use mmap, can we limit the offheap FST to mmap supported 
systems i.e.
{code:title=isMMapSupported|borderStyle=solid}
Constants.JRE_IS_64BIT && MMapDirectory.UNMAP_SUPPORTED
{code}




was (Author: akjain):
{quote}Ankit Jain unfortunately RandomAccessInput doesn't offer readBytes. I'm 
looking into adding it; shouldn't be hard as there aren't that many 
implementations.{quote}
You don't need to use RandomAccessInput. You can revert back to original 
IndexInputReader and get rid of the reversal logic.
{code:title=ForwardIndexInputReader|borderStyle=Solid}
/** Implements reverse read from an index input. */
final class ForwardIndexInputReader extends FST.BytesReader {
    private final IndexInput in;
    private final long startFP;

    public ReverseIndexInputReader(IndexInput in, long startFP) {
        this.in = in;
        this.startFP = startFP;
    }

    @Override
    public byte readByte() throws IOException {
        return this.in.readByte();
    }

    @Override
    public void readBytes(byte[] b, int offset, int len) throws IOException {
        this.in.readBytes(b, offset, len);
    }

    @Override
    public void skipBytes(long count) {
        this.setPosition(this.getPosition() + count);
    }

    @Override
    public long getPosition() {
        final long position = this.in.getFilePointer() - startFP;
        return position;
    }

    @Override
    public void setPosition(long pos) {
        try {
            this.in.seek(startFP + pos);
        } catch (IOException ex) {
            System.out.println(String.format("Unreported exception in set 
position at %d - %s", pos, ex.getMessage()));
        }
    }

    @Override
    public boolean reversed() {
        return false;
    }
}
{code}

{quote}Furthermore the NIO and Simple FS directories use buffering. I'm 
wondering how bad things would be if every seek would need to reload the 
buffer?{quote}
This can be serious concern for NIO and Simple FS systems. Given that most of 
the systems today use mmap, can we limit the offheap FST to mmap supported 
systems i.e.
{code:title=isMMapSupported|borderStyle=Solid}
Constants.JRE_IS_64BIT && MMapDirectory.UNMAP_SUPPORTED
{code}



> Lazy loading Lucene FST offheap using mmap
> ------------------------------------------
>
>                 Key: LUCENE-8635
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8635
>             Project: Lucene - Core
>          Issue Type: New Feature
>          Components: core/FSTs
>         Environment: I used below setup for es_rally tests:
> single node i3.xlarge running ES 6.5
> es_rally was running on another i3.xlarge instance
>            Reporter: Ankit Jain
>            Priority: Major
>         Attachments: fst-offheap-ra-rev.patch, offheap.patch, ra.patch, 
> rally_benchmark.xlsx
>
>
> Currently, FST loads all the terms into heap memory during index open. This 
> causes frequent JVM OOM issues if the term size gets big. A better way of 
> doing this will be to lazily load FST using mmap. That ensures only the 
> required terms get loaded into memory.
>  
> Lucene can expose API for providing list of fields to load terms offheap. I'm 
> planning to take following approach for this:
>  # Add a boolean property fstOffHeap in FieldInfo
>  # Pass list of offheap fields to lucene during index open (ALL can be 
> special keyword for loading ALL fields offheap)
>  # Initialize the fstOffHeap property during lucene index open
>  # FieldReader invokes default FST constructor or OffHeap constructor based 
> on fstOffHeap field
>  
> I created a patch (that loads all fields offheap), did some benchmarks using 
> es_rally and results look good.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to