Greetings Krzysztof, I think this is indeed a bug, and probably introduced by me :( To help us track it down, could you please provide some more information? a) which was the last version without the bug? b) which was the first version with the bug? c) what platform are you using?
Thanks, Lachlan On Fri, 27 Jun 2003 15:02, Jim Cole wrote: > On Thursday, June 26, 2003, at 07:20 AM, Krzysztof Gorgolewski wrote: > > > At the beginning of June we noticed, that our index is getting too big. > > It was usually 300-400mb. Now its swelled ower 3gb and we don't know > > why. The size of all indexed files (html, pdf, ps, txt) is about 2.3gb. > > The files we're indexing are not changed, and htdig don't hang-up while > > indexing. It's even take 8-9 hours longer!! > > Are you sure that there were no changes to any of the pages and no > changes whatsoever to the directory structure? It is possible for a > symbolic link or poorly formed hyperlink in a document to cause htdig > to loop through a lot of bogus URLs, indexing some of the same > documents over and over again. Simply adding a single link to a > document also has the potential to pull in arbitrarily large portions > of a site that were not previously indexed. > > Are you certain that the start_url and limit_urls_to attributes have > not changed in any way? Changes to either could allow more > sites/directories to be indexed. > > Are you reindexing from scratch, or performing updates? If the latter, > it is possible that some sort of database corruption could be causing > problems. > > If you are indexing from scratch and can't think of anything else that > has changed, you probably need to log the output of the dig and analyze > it in order to determine where the problem might lie. If you are not > already doing so, try running with the -s option to see if the number > of indexed pages seems reasonable. You can also add one or more -v > options in order to increase the verbosity of the output. > > Jim -- [EMAIL PROTECTED] ht://Dig developer DownUnder (http://www.htdig.org) ------------------------------------------------------- This SF.Net email sponsored by: Free pre-built ASP.NET sites including Data Reports, E-commerce, Portals, and Forums are available now. Download today and enter to win an XBOX or Visual Studio .NET. http://aspnet.click-url.com/go/psa00100006ave/direct;at.asp_061203_01/01 _______________________________________________ htdig-general mailing list <[EMAIL PROTECTED]> To unsubscribe, send a message to <[EMAIL PROTECTED]> with a subject of unsubscribe FAQ: http://htdig.sourceforge.net/FAQ.html

