[ 
https://issues.apache.org/jira/browse/LUCENE-5729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14016360#comment-14016360
 ] 

Robert Muir commented on LUCENE-5729:
-------------------------------------

The problem is not really overhead here, is the two i/o calls with two sets of 
checks (seek + get). I dont think we need a bytebuffer, its fine to expose a 
clean random access API.

Why make it optional? That just makes code messy. The default implementation 
can just be seek + get and it is no worse than today.

> explore random-access methods to IndexInput
> -------------------------------------------
>
>                 Key: LUCENE-5729
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5729
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Robert Muir
>
> Traditionally lucene access is mostly reading lists of postings and geared at 
> that, but for random-access stuff like docvalues, it just creates overhead.
> So today we are hacking around it, by doing this random access with 
> seek+readXXX, but this is inefficient (additional checks by the jdk that we 
> dont need).
> As a hack, I added the following to IndexInput, changed direct packed ints 
> decode to use them, and implemented in MMapDir:
> {code}
> byte readByte(long pos) --> ByteBuffer.get(pos)
> short readShort(long pos) --> ByteBuffer.getShort(pos)
> int readInt(long pos) --> ByteBuffer.getInt(pos)
> long readLong(long pos) --> ByteBuffer.getLong(pos)
> {code}
> This gives ~30% performance improvement for docvalues (numerics, sorting 
> strings, etc)
> We should do a few things first before working this (LUCENE-5728: use slice 
> api in decode, pad packed ints so we only have one i/o call ever, etc etc) 
> but I think we need to figure out such an API.
> It could either be on indexinput like my hack (this is similar to ByteBuffer 
> API with both relative and absolute methods), or we could have a separate 
> API. But i guess arguably IOContext exists to supply hints too, so I dont 
> know which is the way to go.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to