For some reason I did not see any emails from you for a full 10 days after you sent them. I wonder why this was? Perhaps Apache infrastructure was misbehaving but I apologize for the late response.
On Sun, May 21, 2023 at 8:59 AM Karl Wright <daddy...@gmail.com> wrote: > Hi - the big source of bloat for hopcount processing is the delete > dependencies table, and the options provided allow you to not track those > at all. The other tables (intrinsiclink and hopcount) are 1:1 with the > documents themselves, so these were not considered worth optimizing. > > It may be possible to introduce a fourth hopcount mode that did not record > any information in those tables - but since this can be changed on a job, > very careful analysis would need to be done to figure out what happens when > someone flips that setting after a crawl has already been run. > > Karl > > > On Thu, May 11, 2023 at 2:28 AM Mingchun Zhao <mingchun.zha...@gmail.com> > wrote: > >> Hi Karl, >> >> Thank you for taking time out of your busy schedule to reply. >> >> > There is an option on the "hopcount" tab of your job to disable hopcount >> >> You mean setting "Hop count mode" to "keep unreachable documents, >> forever" in the "Hop Filters" tab? >> Yes, I did it, however, it seems that the records were still inserted >> into the "intrinsiclink" and "hopcount" tables. Is there a way to tell >> MCF not to insert data into those tables because operations on it can >> become a performance bottleneck when the tables bloat? >> >> Regards, >> Mingchun >> >> 2023年5月10日(水) 19:53 Karl Wright <daddy...@gmail.com>: >> > >> > There is an option on the "hopcount" tab of your job to disable hopcount >> > tracking entirely. >> > Karl >> > >> > On Tue, May 9, 2023 at 11:49 PM Mingchun Zhao < >> mingchun.zha...@gmail.com> >> > wrote: >> > >> > > Hi Karl, >> > > >> > > Could you please advise me on tracking hopcount. >> > > I'm using ManifoldCF 2.24 with PostgreSQL 12.14 as the database for >> now. >> > > In my case, I don't need to use the 'Hop Filters' feature so I'd like >> > > to disable tracking hopcount and reduce the insert/update/delete load >> > > on the 'intrinsiclink' and 'hopcount' tables. So I have two questions >> > > about this. >> > > First, is there an option to disable tracking hopcount? >> > > Second, if I disable tracking hopcount , can it affect other crawling >> > > processes? >> > > >> > > Thank you in advance. >> > > Kind regards, >> > > Mingchun >> > > >> >