Re: FW: Is there a really performant way to store a full 32-bit int in doc values?

Robert Muir Tue, 08 Oct 2013 10:49:07 -0700

Your test/benchmark indexes purely random floats: Random.nextFloat()

It should be obvious that things like index compression do not help
performance on purely random data (I think thats totally unrealistic).


If you are *really indexing purely random data* then there is no
benefit of things like compression at all, so just make your own codec
that just calls IndexInput.readInt() or whatever?

On Tue, Oct 8, 2013 at 10:27 AM,  <[email protected]> wrote:
> Hi David,
>
> We tried that and still didn't come close to DirectBuffer speed.  It was only 
> about 20% faster.  I've attached updated numbers.
>
> We looked through the Lucene code and determined that very likely the costly 
> part is loading each part of an int out of the byte array.  There are much 
> faster (in fact, native) operations available for reading a whole word or 
> float at one time, if we could get access to the DirectBuffer behind the 
> DocValues implementation.  But when Lucene loads the byte array into Java 
> heap memory that ability is lost.
>


> Karl
>
>
> -----Original Message-----
> From: ext David Smiley (@MITRE.org) [mailto:[email protected]]
> Sent: Tuesday, October 08, 2013 11:52 AM
> To: [email protected]
> Subject: Re: FW: Is there a really performant way to store a full 32-bit int 
> in doc values?
>
> Hi Karl!
>
> I suggest that you put the point data you need in BinaryDocValues.  That is 
> both the x & y into the same byte[] chunk.  I've done this for a Solr 
> integration in https://issues.apache.org/jira/browse/SOLR-5170
>
> ~ David
>
>
> karl.wright-2 wrote
>> Hi All (and especially Robert),
>>
>> Lucene NumericDocValues seems to operate slower than we would expect.
>> In our application, we're using it for storing coordinate values,
>> which we retrieve to compute a distance.  While doing timings trying
>> to determine the impact of including a sqrt in the calculation, we
>> noted that the lucene overhead itself overwhelmed pretty much anything
>> we did in the ValueSource.
>>
>> One of our engineers did performance testing (code attached, hope it gets
>> through), which shows what we are talking about.   Please see the thread
>> below.  The question is: why is lucene 2.5x slower than a direct
>> buffer access for this case?  And is there anything we can do in the
>> Lucene paradigm to get our performance back closer to the direct buffer case?
>>
>> Karl
>>
>> -----Original Message-----
>> From: Ziech Christian (HERE/Berlin)
>> Sent: Tuesday, October 08, 2013 9:08 AM
>> To: Wright Karl (HERE/Cambridge)
>> Subject: AW: Is there a really performant way to store a full 32-bit
>> int in doc values?
>>
>> Hi,
>>
>> I have tested now the approach with usind the NumericDocValues
>> directly and it indeed helps about 20% compared to the original Lucene
>> numbers - Lucene is still 2,5x slower than using a DirectBuffer alone but it 
>> helps.
>> The funny thing is really that with lucene using the SquareRoot is
>> almost meaningless which can be explained well by the CPU calculating
>> the SquareRoot while other things are computated and since it doesn't
>> need the result for a while in my micro-Benchmark it can happily do
>> other things in the meantime. Since we also have a lot of other query
>> aspects we'd get that gain either way I assume so calculating about
>> 30-50ms for the square root for the scoring 25M documents should be
>> about accurate. So what is lucene doing that causes it to be 3 times slower 
>> than the naive approach.
>> And why is that impact compared to the one of a simple square root
>> (slowing down things by ~20% when assuming the 30ms with more complex
>> actions) so big? I mean 20% vs 200% is a magnitude!
>> As a side note: Storing the values as a int when using a DirectBuffer
>> doesn't seem helpful - I assume because we have to cast the in to
>> float either way later.
>>
>> BR
>>   Christian
>>
>> PS: The new numbers are:
>> Scoring 25000000 documents with direct float buffers (without square
>> root) took 190
>>
>> Scoring 25000000 documents with direct float buffers (without square
>> root) took 171
>>
>> Scoring 25000000 documents with direct float buffers (without square
>> root) took 172
>>
>> Scoring 25000000 documents with direct float buffers (and a square
>> root) took 281
>>
>> Scoring 25000000 documents with direct float buffers (and a square
>> root) took 280
>>
>> Scoring 25000000 documents with direct float buffers (and a square
>> root) took 266
>>
>> Scoring 25000000 documents with a lucene float value source (without
>> square root) took 1045
>>
>> Scoring 25000000 documents with a lucene float value source (without
>> square root) took 625
>>
>> Scoring 25000000 documents with a lucene float value source (without
>> square root) took 630
>>
>> Scoring 25000000 documents with a lucene float value source (and a
>> square
>> root) took 661
>>
>> Scoring 25000000 documents with a lucene float value source (and a
>> square
>> root) took 670
>>
>> Scoring 25000000 documents with a lucene float value source (and a
>> square
>> root) took 665
>>
>> Scoring 25000000 documents with direct int buffers (without square
>> root) took 218
>>
>> Scoring 25000000 documents with direct int buffers (without square
>> root) took 219
>>
>> Scoring 25000000 documents with direct int buffers (without square
>> root) took 204
>>
>> Scoring 25000000 documents with a lucene numeric values (without
>> square
>> root) source took 1123
>>
>> Scoring 25000000 documents with a lucene numeric values (without
>> square
>> root) source took 500
>>
>> Scoring 25000000 documents with a lucene numeric values (without
>> square
>> root) source took 499
>>
>> Scoring 25000000 documents with a lucene numeric values (and a square
>> root) source took 531
>>
>> Scoring 25000000 documents with a lucene numeric values (and a square
>> root) source took 531
>>
>> Scoring 25000000 documents with a lucene numeric values (and a square
>> root) source took 535
>>
>>
>> ________________________________________
>> Von: Wright Karl (HERE/Cambridge)
>> Gesendet: Montag, 7. Oktober 2013 09:22
>> An: Ziech Christian (HERE/Berlin)
>> Betreff: FW: Is there a really performant way to store a full 32-bit
>> int in doc values?
>>
>> -----Original Message-----
>> From: ext Michael McCandless [mailto:
>
>> lucene@
>
>> ]
>> Sent: Monday, October 07, 2013 8:28 AM
>> To: Wright Karl (HERE/Cambridge)
>> Subject: Re: Is there a really performant way to store a full 32-bit
>> int in doc values?
>>
>> Well, it is a micro-benchmark ... so it'd be better to test in the
>> wider/full context of the application?
>>
>> I'm also a little worried that you go through ValueSource instead of
>> interacting directly with the NumericDocValues instance; it's just an
>> additional level of indirection that may confuse hotspot.  But it
>> really ought not be so bad ...
>>
>> Under the hood we encode a float to an int using
>> Float.floatToRawIntBits; it could be that this doesn't work well w/
>> the compression we then do on the ints by default?  I'm curious which
>> impl the Lucene45DocValuesConsumer is using in your case.  Looks like
>> you are using random floats, so I'd expect it's using DELTA_COMPRESSED.
>>
>> It'd be a simple test to just make your own DVFormat using raw 32 bit
>> ints, to see how much that helps.
>>
>> But, yes, I would just email the list and see if there are other ideas?
>>
>> Mike McCandless
>>
>> http://blog.mikemccandless.com
>>
>>
>> On Mon, Oct 7, 2013 at 7:14 AM,  &lt;
>
>> karl.wright@
>
>> &gt; wrote:
>>> Hi Mike,
>>>
>>>
>>>
>>> Before I post to the general list, do you see any problem with our
>>> testing methodology?
>>>
>>>
>>>
>>> Basically, we conclude that by far the most expensive thing is
>>> retrieving the NumericDocValue value.  This currently overwhelms any
>>> expensive operations we might do in the scoring ourselves, which is
>>> why we're looking for potential improvements in that area.
>>>
>>>
>>>
>>> Do you agree with the assessment?
>>>
>>> Karl
>>>
>>>
>>>
>>> From: Ziech Christian (HERE/Berlin)
>>> Sent: Friday, October 04, 2013 11:09 PM
>>> To: Wright Karl (HERE/Cambridge)
>>> Subject: AW: Is there a really performant way to store a full 32-bit
>>> int in doc values?
>>>
>>>
>>>
>>> Hi,
>>>
>>> maybe it's best if I share where I got my numbers from - I have
>>> written a small test (which originally should only test the
>>> Math.sqrt() impact for 10M scorings).
>>>
>>> The output is (I looped over the search invocation to give lucene a
>>> chance to load everything):
>>> Scoring 25000000 documents with direct buffers (without square root)
>>> took
>>> 203
>>> Scoring 25000000 documents with direct buffers (without square root)
>>> took
>>> 179
>>> Scoring 25000000 documents with direct buffers (without square root)
>>> took
>>> 172
>>> Scoring 25000000 documents with direct buffers (and a square root)
>>> took 292 Scoring 25000000 documents with direct buffers (and a square
>>> root) took 289 Scoring 25000000 documents with direct buffers (and a
>>> square root) took 289 Scoring 25000000 documents with a lucene value
>>> (without square root) source took 1045 Scoring 25000000 documents
>>> with a lucene value (without square root) source took 656 Scoring
>>> 25000000 documents with a lucene value (without square root) source
>>> took 660 Scoring 25000000 documents with a lucene value (without
>>> square root) source took 658 Scoring 25000000 documents with a lucene
>>> value (without square root) source took 663 Scoring 25000000
>>> documents with a lucene value (and a square root) source took 711
>>> Scoring 25000000 documents with a lucene value (and a square root)
>>> source took 710 Scoring 25000000 documents with a lucene value (and a
>>> square root) source took 713 Scoring 25000000 documents with a lucene
>>> value (and a square root) source took 711 Scoring 25000000 documents
>>> with a lucene value (and a square root) source took 714
>>>
>>> So the impact of a square root is roughly 110ms while the impact of
>>> using the lucene function values is far higher (depending on the run
>>> between 300-350ms). Interstingly the square root impact is not as
>>> high on the lucene function query for some reason (most likely java
>>> or the cpu can just optimize the very simple scorer best).
>>>
>>> I did measure the values with a FSDirectory and a RAMDirectory which
>>> both essentially yield the same performance. Do you see any problem
>>> with the attached code?
>>>
>>> BR
>>>   Christian
>>>
>>> ________________________________
>>>
>>> Von: Wright Karl (HERE/Cambridge)
>>> Gesendet: Freitag, 4. Oktober 2013 20:56
>>> An: Ziech Christian (HERE/Berlin)
>>> Betreff: FW: Is there a really performant way to store a full 32-bit
>>> int in doc values?
>>>
>>>
>>> FYI
>>> Karl
>>>
>>> Sent from my Windows Phone
>>>
>>> ________________________________
>>>
>>> From: ext Michael McCandless
>>> Sent: 10/4/2013 4:51 PM
>>> To: Wright Karl (HERE/Cambridge)
>>> Subject: Re: Is there a really performant way to store a full 32-bit
>>> int in doc values?
>>>
>>> Hmmm, that's interesting that you see decode cost is too high.  Are
>>> you sure?
>>>
>>> Can you email the list?  I'm sure Rob will have suggestions.  The
>>> worst case is you make a custom DV format that stores things raw.
>>>
>>> 4.5 has a new default DocValuesFormat with more compression, but with
>>> values stored on disk by default (cached by the OS if you have the
>>> RAM) ... I wonder how that would compare to what you're using now.
>>>
>>> I think the simplest thing to do is to instantiate the
>>> Lucene42DocValuesConsumer (renamed to MemoryDVConsumer in 4.5),
>>> passing a very high acceptableOverheadRatio?  This should caused
>>> packed ints to upgraded to a byte[], short[], int[], long[].  If this
>>> is still not fast enough then I suspect a custom DVFormat that just
>>> uses int[] directly (avoiding the abstractions of packed ints) is
>>> your best shot.
>>>
>>> Mike McCandless
>>>
>>> http://blog.mikemccandless.com
>>>
>>>
>>> On Fri, Oct 4, 2013 at 8:46 AM,  &lt;
>
>> karl.wright@
>
>> &gt; wrote:
>>>>
>>>>
>>>> Hi Mike,
>>>>
>>>>
>>>>
>>>> We're using docvalues to store geocoordinates in meters in X,Y,Z
>>>> space, and discovering that they are taking more time to unpack than
>>>> we'd like.  I was surprised to find no raw representation available
>>>> for docvalues right now
>>>> -
>>>> otherwise, a fixed 4-byte representation would have been ideal.
>>>> Would you have any suggestions?
>>>>
>>>>
>>>>
>>>> Karl
>>>>
>>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail:
>
>> [email protected]
>
>> For additional commands, e-mail:
>
>> [email protected]
>
>>
>> LuceneFloatSourceTest.java (16K)
>> &lt;http://lucene.472066.n3.nabble.com/attachment/4094104/0/LuceneFloa
>> tSourceTest.java&gt;
>
>
>
>
>
> -----
>  Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/FW-Is-there-a-really-performant-way-to-store-a-full-32-bit-int-in-doc-values-tp4094104p4094120.html
> Sent from the Lucene - Java Developer mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected] For additional 
> commands, e-mail: [email protected]
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: FW: Is there a really performant way to store a full 32-bit int in doc values?

Reply via email to