[ 
https://issues.apache.org/jira/browse/LUCENE-5729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14016311#comment-14016311
 ] 

Uwe Schindler commented on LUCENE-5729:
---------------------------------------

An idea would be to add an API to IndexInput that returns a ByteBuffer directly 
for a slice. Of course, this would be limited to slices of 32 bit size.
But we would then allow SIGSEGV, if somebody uses the returned ByteBuffer. On 
the other hand we have no overhead at all.

The whole thing should be optional, so if IndexInput does not support random 
access or does not support the given slice size (i.e., if it spans two 
buffers), it may throw UOE / return null and the consumer would need to use the 
alternative implementation. NIOFS and SimpleFS would never support random 
access; RAMDir could use ByteBuffer.wrap().

> explore random-access methods to IndexInput
> -------------------------------------------
>
>                 Key: LUCENE-5729
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5729
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Robert Muir
>
> Traditionally lucene access is mostly reading lists of postings and geared at 
> that, but for random-access stuff like docvalues, it just creates overhead.
> So today we are hacking around it, by doing this random access with 
> seek+readXXX, but this is inefficient (additional checks by the jdk that we 
> dont need).
> As a hack, I added the following to IndexInput, changed direct packed ints 
> decode to use them, and implemented in MMapDir:
> {code}
> byte readByte(long pos) --> ByteBuffer.get(pos)
> short readShort(long pos) --> ByteBuffer.getShort(pos)
> int readInt(long pos) --> ByteBuffer.getInt(pos)
> long readLong(long pos) --> ByteBuffer.getLong(pos)
> {code}
> This gives ~30% performance improvement for docvalues (numerics, sorting 
> strings, etc)
> We should do a few things first before working this (LUCENE-5728: use slice 
> api in decode, pad packed ints so we only have one i/o call ever, etc etc) 
> but I think we need to figure out such an API.
> It could either be on indexinput like my hack (this is similar to ByteBuffer 
> API with both relative and absolute methods), or we could have a separate 
> API. But i guess arguably IOContext exists to supply hints too, so I dont 
> know which is the way to go.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to