Your test/benchmark indexes purely random floats: Random.nextFloat() It should be obvious that things like index compression do not help performance on purely random data (I think thats totally unrealistic).
If you are *really indexing purely random data* then there is no benefit of things like compression at all, so just make your own codec that just calls IndexInput.readInt() or whatever? On Tue, Oct 8, 2013 at 10:27 AM, <[email protected]> wrote: > Hi David, > > We tried that and still didn't come close to DirectBuffer speed. It was only > about 20% faster. I've attached updated numbers. > > We looked through the Lucene code and determined that very likely the costly > part is loading each part of an int out of the byte array. There are much > faster (in fact, native) operations available for reading a whole word or > float at one time, if we could get access to the DirectBuffer behind the > DocValues implementation. But when Lucene loads the byte array into Java > heap memory that ability is lost. > > Karl > > > -----Original Message----- > From: ext David Smiley (@MITRE.org) [mailto:[email protected]] > Sent: Tuesday, October 08, 2013 11:52 AM > To: [email protected] > Subject: Re: FW: Is there a really performant way to store a full 32-bit int > in doc values? > > Hi Karl! > > I suggest that you put the point data you need in BinaryDocValues. That is > both the x & y into the same byte[] chunk. I've done this for a Solr > integration in https://issues.apache.org/jira/browse/SOLR-5170 > > ~ David > > > karl.wright-2 wrote >> Hi All (and especially Robert), >> >> Lucene NumericDocValues seems to operate slower than we would expect. >> In our application, we're using it for storing coordinate values, >> which we retrieve to compute a distance. While doing timings trying >> to determine the impact of including a sqrt in the calculation, we >> noted that the lucene overhead itself overwhelmed pretty much anything >> we did in the ValueSource. >> >> One of our engineers did performance testing (code attached, hope it gets >> through), which shows what we are talking about. Please see the thread >> below. The question is: why is lucene 2.5x slower than a direct >> buffer access for this case? And is there anything we can do in the >> Lucene paradigm to get our performance back closer to the direct buffer case? >> >> Karl >> >> -----Original Message----- >> From: Ziech Christian (HERE/Berlin) >> Sent: Tuesday, October 08, 2013 9:08 AM >> To: Wright Karl (HERE/Cambridge) >> Subject: AW: Is there a really performant way to store a full 32-bit >> int in doc values? >> >> Hi, >> >> I have tested now the approach with usind the NumericDocValues >> directly and it indeed helps about 20% compared to the original Lucene >> numbers - Lucene is still 2,5x slower than using a DirectBuffer alone but it >> helps. >> The funny thing is really that with lucene using the SquareRoot is >> almost meaningless which can be explained well by the CPU calculating >> the SquareRoot while other things are computated and since it doesn't >> need the result for a while in my micro-Benchmark it can happily do >> other things in the meantime. Since we also have a lot of other query >> aspects we'd get that gain either way I assume so calculating about >> 30-50ms for the square root for the scoring 25M documents should be >> about accurate. So what is lucene doing that causes it to be 3 times slower >> than the naive approach. >> And why is that impact compared to the one of a simple square root >> (slowing down things by ~20% when assuming the 30ms with more complex >> actions) so big? I mean 20% vs 200% is a magnitude! >> As a side note: Storing the values as a int when using a DirectBuffer >> doesn't seem helpful - I assume because we have to cast the in to >> float either way later. >> >> BR >> Christian >> >> PS: The new numbers are: >> Scoring 25000000 documents with direct float buffers (without square >> root) took 190 >> >> Scoring 25000000 documents with direct float buffers (without square >> root) took 171 >> >> Scoring 25000000 documents with direct float buffers (without square >> root) took 172 >> >> Scoring 25000000 documents with direct float buffers (and a square >> root) took 281 >> >> Scoring 25000000 documents with direct float buffers (and a square >> root) took 280 >> >> Scoring 25000000 documents with direct float buffers (and a square >> root) took 266 >> >> Scoring 25000000 documents with a lucene float value source (without >> square root) took 1045 >> >> Scoring 25000000 documents with a lucene float value source (without >> square root) took 625 >> >> Scoring 25000000 documents with a lucene float value source (without >> square root) took 630 >> >> Scoring 25000000 documents with a lucene float value source (and a >> square >> root) took 661 >> >> Scoring 25000000 documents with a lucene float value source (and a >> square >> root) took 670 >> >> Scoring 25000000 documents with a lucene float value source (and a >> square >> root) took 665 >> >> Scoring 25000000 documents with direct int buffers (without square >> root) took 218 >> >> Scoring 25000000 documents with direct int buffers (without square >> root) took 219 >> >> Scoring 25000000 documents with direct int buffers (without square >> root) took 204 >> >> Scoring 25000000 documents with a lucene numeric values (without >> square >> root) source took 1123 >> >> Scoring 25000000 documents with a lucene numeric values (without >> square >> root) source took 500 >> >> Scoring 25000000 documents with a lucene numeric values (without >> square >> root) source took 499 >> >> Scoring 25000000 documents with a lucene numeric values (and a square >> root) source took 531 >> >> Scoring 25000000 documents with a lucene numeric values (and a square >> root) source took 531 >> >> Scoring 25000000 documents with a lucene numeric values (and a square >> root) source took 535 >> >> >> ________________________________________ >> Von: Wright Karl (HERE/Cambridge) >> Gesendet: Montag, 7. Oktober 2013 09:22 >> An: Ziech Christian (HERE/Berlin) >> Betreff: FW: Is there a really performant way to store a full 32-bit >> int in doc values? >> >> -----Original Message----- >> From: ext Michael McCandless [mailto: > >> lucene@ > >> ] >> Sent: Monday, October 07, 2013 8:28 AM >> To: Wright Karl (HERE/Cambridge) >> Subject: Re: Is there a really performant way to store a full 32-bit >> int in doc values? >> >> Well, it is a micro-benchmark ... so it'd be better to test in the >> wider/full context of the application? >> >> I'm also a little worried that you go through ValueSource instead of >> interacting directly with the NumericDocValues instance; it's just an >> additional level of indirection that may confuse hotspot. But it >> really ought not be so bad ... >> >> Under the hood we encode a float to an int using >> Float.floatToRawIntBits; it could be that this doesn't work well w/ >> the compression we then do on the ints by default? I'm curious which >> impl the Lucene45DocValuesConsumer is using in your case. Looks like >> you are using random floats, so I'd expect it's using DELTA_COMPRESSED. >> >> It'd be a simple test to just make your own DVFormat using raw 32 bit >> ints, to see how much that helps. >> >> But, yes, I would just email the list and see if there are other ideas? >> >> Mike McCandless >> >> http://blog.mikemccandless.com >> >> >> On Mon, Oct 7, 2013 at 7:14 AM, < > >> karl.wright@ > >> > wrote: >>> Hi Mike, >>> >>> >>> >>> Before I post to the general list, do you see any problem with our >>> testing methodology? >>> >>> >>> >>> Basically, we conclude that by far the most expensive thing is >>> retrieving the NumericDocValue value. This currently overwhelms any >>> expensive operations we might do in the scoring ourselves, which is >>> why we're looking for potential improvements in that area. >>> >>> >>> >>> Do you agree with the assessment? >>> >>> Karl >>> >>> >>> >>> From: Ziech Christian (HERE/Berlin) >>> Sent: Friday, October 04, 2013 11:09 PM >>> To: Wright Karl (HERE/Cambridge) >>> Subject: AW: Is there a really performant way to store a full 32-bit >>> int in doc values? >>> >>> >>> >>> Hi, >>> >>> maybe it's best if I share where I got my numbers from - I have >>> written a small test (which originally should only test the >>> Math.sqrt() impact for 10M scorings). >>> >>> The output is (I looped over the search invocation to give lucene a >>> chance to load everything): >>> Scoring 25000000 documents with direct buffers (without square root) >>> took >>> 203 >>> Scoring 25000000 documents with direct buffers (without square root) >>> took >>> 179 >>> Scoring 25000000 documents with direct buffers (without square root) >>> took >>> 172 >>> Scoring 25000000 documents with direct buffers (and a square root) >>> took 292 Scoring 25000000 documents with direct buffers (and a square >>> root) took 289 Scoring 25000000 documents with direct buffers (and a >>> square root) took 289 Scoring 25000000 documents with a lucene value >>> (without square root) source took 1045 Scoring 25000000 documents >>> with a lucene value (without square root) source took 656 Scoring >>> 25000000 documents with a lucene value (without square root) source >>> took 660 Scoring 25000000 documents with a lucene value (without >>> square root) source took 658 Scoring 25000000 documents with a lucene >>> value (without square root) source took 663 Scoring 25000000 >>> documents with a lucene value (and a square root) source took 711 >>> Scoring 25000000 documents with a lucene value (and a square root) >>> source took 710 Scoring 25000000 documents with a lucene value (and a >>> square root) source took 713 Scoring 25000000 documents with a lucene >>> value (and a square root) source took 711 Scoring 25000000 documents >>> with a lucene value (and a square root) source took 714 >>> >>> So the impact of a square root is roughly 110ms while the impact of >>> using the lucene function values is far higher (depending on the run >>> between 300-350ms). Interstingly the square root impact is not as >>> high on the lucene function query for some reason (most likely java >>> or the cpu can just optimize the very simple scorer best). >>> >>> I did measure the values with a FSDirectory and a RAMDirectory which >>> both essentially yield the same performance. Do you see any problem >>> with the attached code? >>> >>> BR >>> Christian >>> >>> ________________________________ >>> >>> Von: Wright Karl (HERE/Cambridge) >>> Gesendet: Freitag, 4. Oktober 2013 20:56 >>> An: Ziech Christian (HERE/Berlin) >>> Betreff: FW: Is there a really performant way to store a full 32-bit >>> int in doc values? >>> >>> >>> FYI >>> Karl >>> >>> Sent from my Windows Phone >>> >>> ________________________________ >>> >>> From: ext Michael McCandless >>> Sent: 10/4/2013 4:51 PM >>> To: Wright Karl (HERE/Cambridge) >>> Subject: Re: Is there a really performant way to store a full 32-bit >>> int in doc values? >>> >>> Hmmm, that's interesting that you see decode cost is too high. Are >>> you sure? >>> >>> Can you email the list? I'm sure Rob will have suggestions. The >>> worst case is you make a custom DV format that stores things raw. >>> >>> 4.5 has a new default DocValuesFormat with more compression, but with >>> values stored on disk by default (cached by the OS if you have the >>> RAM) ... I wonder how that would compare to what you're using now. >>> >>> I think the simplest thing to do is to instantiate the >>> Lucene42DocValuesConsumer (renamed to MemoryDVConsumer in 4.5), >>> passing a very high acceptableOverheadRatio? This should caused >>> packed ints to upgraded to a byte[], short[], int[], long[]. If this >>> is still not fast enough then I suspect a custom DVFormat that just >>> uses int[] directly (avoiding the abstractions of packed ints) is >>> your best shot. >>> >>> Mike McCandless >>> >>> http://blog.mikemccandless.com >>> >>> >>> On Fri, Oct 4, 2013 at 8:46 AM, < > >> karl.wright@ > >> > wrote: >>>> >>>> >>>> Hi Mike, >>>> >>>> >>>> >>>> We're using docvalues to store geocoordinates in meters in X,Y,Z >>>> space, and discovering that they are taking more time to unpack than >>>> we'd like. I was surprised to find no raw representation available >>>> for docvalues right now >>>> - >>>> otherwise, a fixed 4-byte representation would have been ideal. >>>> Would you have any suggestions? >>>> >>>> >>>> >>>> Karl >>>> >>>> >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: > >> [email protected] > >> For additional commands, e-mail: > >> [email protected] > >> >> LuceneFloatSourceTest.java (16K) >> <http://lucene.472066.n3.nabble.com/attachment/4094104/0/LuceneFloa >> tSourceTest.java> > > > > > > ----- > Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book > -- > View this message in context: > http://lucene.472066.n3.nabble.com/FW-Is-there-a-really-performant-way-to-store-a-full-32-bit-int-in-doc-values-tp4094104p4094120.html > Sent from the Lucene - Java Developer mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] For additional > commands, e-mail: [email protected] > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
