Hi,

Working with htdig I noticed a heavy memory load using htmerge on large
indexes.
A general search revealed a patch by Lorenzo Campedelli for htdig230b2
(msg02780.html).
His patch is broken because WordSearchDescription was replaced by
WordCursor.

Although I am not a C programmer, I have made an attempt to rewrite his
patch to use the new callback interface introduced somewhere between
htdig230b2 and htdig230b5.
Took a deep breath and dived into the source to try and understand what it
was about. Based my work on some code that generates output to a file by
using the new classback interface and the existing patch of Lorenzo.
Did some testing (see README in tgz) and the changes seems to check out ok.

Attached file contains a 3 files, readme and a patch for htmerge.cc for
htdig2.3.0-b5 and htdig2.3.0-b6

Before the patch, running htmerge showed an increasing memory load the
larger an index became. Especially peaking when merging the word indexes,
resulting in swap file usage and a dramatic performance drop on my Debian
system.
After this patch, running htmerge showed a contant memory load (less than 4%
of 500Mb) that seems rather independant of index size.


Best Regards,

  David van de Vliet

Attachment: htmerge-memorypatch.tgz
Description: application/compressed

Reply via email to