Hi,

I am finding that lucene is slowing down a lot when bigger and bigger
doc/pos files are merged... While it's normally the case, the worrying part
is all my data is in RAM. Version is 4.6.1

Some sample statistics took after instrumenting the SortingAtomicReader
code, as we use a SortingMergePolicy. The times displayed are just for
reading {ex: in.nextDoc(), in.nextPosition()}. It does not include
tim-sorting or new-segment writing times

*337 sec* to merge postings [*281655 docs*] with
*SortingDocsAndPositionEnum-nextPosition()* as [*130sec*] and *Sorting*
*DocsAndPositionEnum-nextDoc()* as [*232sec*] and total-num-terms as [
*2,058,600*]

*482 sec* to merge postings [*475143 docs*] with *Sorting*
*DocsAndPositionEnum-nextPosition()* as [*204sec*] and *Sorting*
*DocsAndPositionEnum-nextDoc()* as [*332sec*] and total-num-terms as [
*3,791,065*]

*898 sec* to merge postings [*890385 docs*] with *Sorting*
*DocsAndPositionEnum-nextPosition()* as [*343sec*] and *Sorting*
*DocsAndPositionEnum-nextDoc()* as [*609sec*] and total-num-terms as [
*5,470,110*]

*1000 sec* to merge postings [*950084 docs*] with *Sorting*
*DocsAndPositionEnum-nextPosition()* as [*361sec*] and *Sorting*
*DocsAndPositionEnum-nextDoc()* as [*686sec*] and total-num-terms as [
*1,108,744*]

I went ahead and did an "mlock" on already mmapped doc/pos files and then
proceeded for merge, to eliminate disk. The numbers shown above come for
iterating all terms/docs/positions sequentially from RAM!!

I understand that there are no bulk-merge of postings currently available,
but given that data is in RAM, doesn't it indicate a slow-down? Is there
some configuration I am missing etc... to speed this up?

--
Ravi


 [P.S: I have not verified whether all pages reside in RAM, but "mlock"
doesn't throw any Exceptions and returns success...]

Reply via email to