Hi, Working with htdig I noticed a heavy memory load using htmerge on large indexes. A general search revealed a patch by Lorenzo Campedelli for htdig230b2 (msg02780.html). His patch is broken because WordSearchDescription was replaced by WordCursor.
Although I am not a C programmer, I have made an attempt to rewrite his patch to use the new callback interface introduced somewhere between htdig230b2 and htdig230b5. Took a deep breath and dived into the source to try and understand what it was about. Based my work on some code that generates output to a file by using the new classback interface and the existing patch of Lorenzo. Did some testing (see README in tgz) and the changes seems to check out ok. Attached file contains a 3 files, readme and a patch for htmerge.cc for htdig2.3.0-b5 and htdig2.3.0-b6 Before the patch, running htmerge showed an increasing memory load the larger an index became. Especially peaking when merging the word indexes, resulting in swap file usage and a dramatic performance drop on my Debian system. After this patch, running htmerge showed a contant memory load (less than 4% of 500Mb) that seems rather independant of index size. Best Regards, David van de Vliet
htmerge-memorypatch.tgz
Description: application/compressed

