Re: Iterating BinaryDocValues

Mikhail Khludnev Thu, 09 Jan 2014 08:33:59 -0800

Don't you think it's worth to raise a jira regarding those 'new bytes[]' ?
I'm able to provide a patch if you wish.



On Wed, Jan 8, 2014 at 2:02 PM, Mikhail Khludnev <mkhlud...@griddynamics.com
> wrote:

> FWIW,
>
> Micro benchmark shows 4% gain on reusing incoming ByteRef.bytes in short
> binary docvalues Test2BBinaryDocValues.testVariableBinary() with mmap
> directory.
> I wonder why it doesn't reads into incoming bytes
> https://github.com/apache/lucene-solr/blame/trunk/lucene/core/src/java/org/apache/lucene/codecs/lucene45/Lucene45DocValuesProducer.java#L401
>
>
>
> On Wed, Jan 8, 2014 at 12:53 AM, Michael McCandless <
> luc...@mikemccandless.com> wrote:
>
>> Going sequentially should help, if the pages are not hot (in the OS's IO
>> cache).
>>
>> You can also use a different DVFormat, e.g. Direct, but this holds all
>> bytes in RAM.
>>
>> Mike McCandless
>>
>> http://blog.mikemccandless.com
>>
>>
>> On Tue, Jan 7, 2014 at 1:09 PM, Mikhail Khludnev
>> <mkhlud...@griddynamics.com> wrote:
>> > Joel,
>> >
>> > I tried to hack it straightforwardly, but found no free gain there. The
>> only
>> > attempt I can suggest is to try to reuse bytes in
>> >
>> https://github.com/apache/lucene-solr/blame/trunk/lucene/core/src/java/org/apache/lucene/codecs/lucene45/Lucene45DocValuesProducer.java#L401
>> > right now it allocates bytes every time, which beside of GC can also
>> impact
>> > memory access locality. Could you try fix memory waste and repeat
>> > performance test?
>> >
>> > Have a good hack!
>> >
>> >
>> > On Mon, Dec 23, 2013 at 9:51 PM, Joel Bernstein <joels...@gmail.com>
>> wrote:
>> >>
>> >>
>> >> Hi,
>> >>
>> >> I'm looking for a faster way to perform large scale docId -> bytesRef
>> >> lookups for BinaryDocValues.
>> >>
>> >> I'm finding that I can't get the performance that I need from the
>> random
>> >> access seek in the BinaryDocValues interface.
>> >>
>> >> I'm wondering if sequentially scanning the docValues would be a faster
>> >> approach. I have a BitSet of matching docs, so if I sequentially moved
>> >> through the docValues I could test each one against that bitset.
>> >>
>> >> Wondering if that approach would be faster for bulk extracts and how
>> >> tricky it would be to add an iterator to the BinaryDocValues interface?
>> >>
>> >> Thanks,
>> >> Joel
>> >
>> >
>> >
>> >
>> > --
>> > Sincerely yours
>> > Mikhail Khludnev
>> > Principal Engineer,
>> > Grid Dynamics
>> >
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>
>>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
> Principal Engineer,
> Grid Dynamics
>
> <http://www.griddynamics.com>
>  <mkhlud...@griddynamics.com>
>



-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

<http://www.griddynamics.com>
 <mkhlud...@griddynamics.com>

Re: Iterating BinaryDocValues

Reply via email to