Uwe Schindler created LUCENE-10113:
--------------------------------------
Summary: Improve ByteArrayDataInput to read short/int/long
natively using VarHandles
Key: LUCENE-10113
URL: https://issues.apache.org/jira/browse/LUCENE-10113
Project: Lucene - Core
Issue Type: Improvement
Components: core/store
Affects Versions: main (9.0)
Reporter: Uwe Schindler
Assignee: Uwe Schindler
LUCENE-10112 reminded me about something i wanted to do long ago: Basically for
all IndexInputs/DataInputs we are able to natively read short, int, long using
little endian with single CPU instructions. Only ByteArrayDataInput still uses
manual code beased on the the inherited byte-by-byte approach to read single
bytes and combining the bytes using little endian.
The approach here is to use Java 9+ VarHandles to allow reading int/long/short
as single cpu instructions and not manually recombining the bytes. The trick is
to make a "view" var handle which allows to access the byte array using the
same mechanisms as ByteBuffers or JDK 17 MemorySegments (under the hood it uses
Unsafe to use CPU instructions and optionally swap bytes if platform endianness
is BE).
In LUCENE-10112 there were similar stuff done with LZ4 and a microbenchmark was
written that showed a significant speed improvement when accessing the types
with VarHandle.
P.S.: The same applies to FST.BytesReader, but I am no sure if this one uses
the int/short/long ones at all. At least this one does not override the methods
to read ints, longs and shorts, so there is no optimization at all.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]