FW: Is there a really performant way to store a full 32-bit int in doc values?

karl.wright Tue, 08 Oct 2013 06:36:45 -0700

Hi All (and especially Robert),

Lucene NumericDocValues seems to operate slower than we would expect.  In our 
application, we're using it for storing coordinate values, which we retrieve to 
compute a distance.  While doing timings trying to determine the impact of 
including a sqrt in the calculation, we noted that the lucene overhead itself 
overwhelmed pretty much anything we did in the ValueSource.


One of our engineers did performance testing (code attached, hope it gets 
through), which shows what we are talking about.   Please see the thread below. 
 The question is: why is lucene 2.5x slower than a direct buffer access for 
this case?  And is there anything we can do in the Lucene paradigm to get our 
performance back closer to the direct buffer case?

Karl

-----Original Message-----
From: Ziech Christian (HERE/Berlin) 
Sent: Tuesday, October 08, 2013 9:08 AM
To: Wright Karl (HERE/Cambridge)
Subject: AW: Is there a really performant way to store a full 32-bit int in doc 
values?

Hi,

I have tested now the approach with usind the NumericDocValues directly and it 
indeed helps about 20% compared to the original Lucene numbers - Lucene is 
still 2,5x slower than using a DirectBuffer alone but it helps. The funny thing 
is really that with lucene using the SquareRoot is almost meaningless which can 
be explained well by the CPU calculating the SquareRoot while other things are 
computated and since it doesn't need the result for a while in my 
micro-Benchmark it can happily do other things in the meantime. Since we also 
have a lot of other query aspects we'd get that gain either way I assume so 
calculating about 30-50ms for the square root for the scoring 25M documents 
should be about accurate. So what is lucene doing that causes it to be 3 times 
slower than the naive approach. And why is that impact compared to the one of a 
simple square root (slowing down things by ~20% when assuming the 30ms with 
more complex actions) so big? I mean 20% vs 200% is a magnitude!
As a side note: Storing the values as a int when using a DirectBuffer doesn't 
seem helpful - I assume because we have to cast the in to float either way 
later.

BR
  Christian

PS: The new numbers are:
Scoring 25000000 documents with direct float buffers (without square root) took 
190 

Scoring 25000000 documents with direct float buffers (without square root) took 
171 

Scoring 25000000 documents with direct float buffers (without square root) took 
172 

Scoring 25000000 documents with direct float buffers (and a square root) took 
281 

Scoring 25000000 documents with direct float buffers (and a square root) took 
280 

Scoring 25000000 documents with direct float buffers (and a square root) took 
266 

Scoring 25000000 documents with a lucene float value source (without square 
root) took 1045 

Scoring 25000000 documents with a lucene float value source (without square 
root) took 625 

Scoring 25000000 documents with a lucene float value source (without square 
root) took 630 

Scoring 25000000 documents with a lucene float value source (and a square root) 
took 661 

Scoring 25000000 documents with a lucene float value source (and a square root) 
took 670 

Scoring 25000000 documents with a lucene float value source (and a square root) 
took 665 

Scoring 25000000 documents with direct int buffers (without square root) took 
218 

Scoring 25000000 documents with direct int buffers (without square root) took 
219 

Scoring 25000000 documents with direct int buffers (without square root) took 
204 

Scoring 25000000 documents with a lucene numeric values (without square root) 
source took 1123 

Scoring 25000000 documents with a lucene numeric values (without square root) 
source took 500 

Scoring 25000000 documents with a lucene numeric values (without square root) 
source took 499 

Scoring 25000000 documents with a lucene numeric values (and a square root) 
source took 531 

Scoring 25000000 documents with a lucene numeric values (and a square root) 
source took 531 

Scoring 25000000 documents with a lucene numeric values (and a square root) 
source took 535


________________________________________
Von: Wright Karl (HERE/Cambridge)
Gesendet: Montag, 7. Oktober 2013 09:22
An: Ziech Christian (HERE/Berlin)
Betreff: FW: Is there a really performant way to store a full 32-bit int in doc 
values?

-----Original Message-----
From: ext Michael McCandless [mailto:[email protected]]
Sent: Monday, October 07, 2013 8:28 AM
To: Wright Karl (HERE/Cambridge)
Subject: Re: Is there a really performant way to store a full 32-bit int in doc 
values?

Well, it is a micro-benchmark ... so it'd be better to test in the wider/full 
context of the application?

I'm also a little worried that you go through ValueSource instead of 
interacting directly with the NumericDocValues instance; it's just an 
additional level of indirection that may confuse hotspot.  But it really ought 
not be so bad ...

Under the hood we encode a float to an int using Float.floatToRawIntBits; it 
could be that this doesn't work well w/ the compression we then do on the ints 
by default?  I'm curious which impl the Lucene45DocValuesConsumer is using in 
your case.  Looks like you are using random floats, so I'd expect it's using 
DELTA_COMPRESSED.

It'd be a simple test to just make your own DVFormat using raw 32 bit ints, to 
see how much that helps.

But, yes, I would just email the list and see if there are other ideas?

Mike McCandless

http://blog.mikemccandless.com


On Mon, Oct 7, 2013 at 7:14 AM,  <[email protected]> wrote:
> Hi Mike,
>
>
>
> Before I post to the general list, do you see any problem with our 
> testing methodology?
>
>
>
> Basically, we conclude that by far the most expensive thing is 
> retrieving the NumericDocValue value.  This currently overwhelms any 
> expensive operations we might do in the scoring ourselves, which is 
> why we're looking for potential improvements in that area.
>
>
>
> Do you agree with the assessment?
>
> Karl
>
>
>
> From: Ziech Christian (HERE/Berlin)
> Sent: Friday, October 04, 2013 11:09 PM
> To: Wright Karl (HERE/Cambridge)
> Subject: AW: Is there a really performant way to store a full 32-bit 
> int in doc values?
>
>
>
> Hi,
>
> maybe it's best if I share where I got my numbers from - I have 
> written a small test (which originally should only test the
> Math.sqrt() impact for 10M scorings).
>
> The output is (I looped over the search invocation to give lucene a 
> chance to load everything):
> Scoring 25000000 documents with direct buffers (without square root) 
> took
> 203
> Scoring 25000000 documents with direct buffers (without square root) 
> took
> 179
> Scoring 25000000 documents with direct buffers (without square root) 
> took
> 172
> Scoring 25000000 documents with direct buffers (and a square root) 
> took 292 Scoring 25000000 documents with direct buffers (and a square
> root) took 289 Scoring 25000000 documents with direct buffers (and a 
> square root) took 289 Scoring 25000000 documents with a lucene value 
> (without square root) source took 1045 Scoring 25000000 documents with 
> a lucene value (without square root) source took 656 Scoring 25000000 
> documents with a lucene value (without square root) source took 660 
> Scoring 25000000 documents with a lucene value (without square root) 
> source took 658 Scoring 25000000 documents with a lucene value 
> (without square root) source took 663 Scoring 25000000 documents with 
> a lucene value (and a square root) source took 711 Scoring 25000000 
> documents with a lucene value (and a square root) source took 710 
> Scoring 25000000 documents with a lucene value (and a square root) 
> source took 713 Scoring 25000000 documents with a lucene value (and a 
> square root) source took 711 Scoring 25000000 documents with a lucene 
> value (and a square root) source took 714
>
> So the impact of a square root is roughly 110ms while the impact of 
> using the lucene function values is far higher (depending on the run 
> between 300-350ms). Interstingly the square root impact is not as high 
> on the lucene function query for some reason (most likely java or the 
> cpu can just optimize the very simple scorer best).
>
> I did measure the values with a FSDirectory and a RAMDirectory which 
> both essentially yield the same performance. Do you see any problem 
> with the attached code?
>
> BR
>   Christian
>
> ________________________________
>
> Von: Wright Karl (HERE/Cambridge)
> Gesendet: Freitag, 4. Oktober 2013 20:56
> An: Ziech Christian (HERE/Berlin)
> Betreff: FW: Is there a really performant way to store a full 32-bit 
> int in doc values?
>
>
> FYI
> Karl
>
> Sent from my Windows Phone
>
> ________________________________
>
> From: ext Michael McCandless
> Sent: 10/4/2013 4:51 PM
> To: Wright Karl (HERE/Cambridge)
> Subject: Re: Is there a really performant way to store a full 32-bit 
> int in doc values?
>
> Hmmm, that's interesting that you see decode cost is too high.  Are 
> you sure?
>
> Can you email the list?  I'm sure Rob will have suggestions.  The 
> worst case is you make a custom DV format that stores things raw.
>
> 4.5 has a new default DocValuesFormat with more compression, but with 
> values stored on disk by default (cached by the OS if you have the
> RAM) ... I wonder how that would compare to what you're using now.
>
> I think the simplest thing to do is to instantiate the 
> Lucene42DocValuesConsumer (renamed to MemoryDVConsumer in 4.5), 
> passing a very high acceptableOverheadRatio?  This should caused 
> packed ints to upgraded to a byte[], short[], int[], long[].  If this 
> is still not fast enough then I suspect a custom DVFormat that just 
> uses int[] directly (avoiding the abstractions of packed ints) is your 
> best shot.
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
>
> On Fri, Oct 4, 2013 at 8:46 AM,  <[email protected]> wrote:
>>
>>
>> Hi Mike,
>>
>>
>>
>> We're using docvalues to store geocoordinates in meters in X,Y,Z 
>> space, and discovering that they are taking more time to unpack than 
>> we'd like.  I was surprised to find no raw representation available 
>> for docvalues right now
>> -
>> otherwise, a fixed 4-byte representation would have been ideal. Would 
>> you have any suggestions?
>>
>>
>>
>> Karl
>>
>>

LuceneFloatSourceTest.java
Description: LuceneFloatSourceTest.java

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

FW: Is there a really performant way to store a full 32-bit int in doc values?

Reply via email to