For some reason I did not see any emails from you for a full 10 days after
you sent them.  I wonder why this was?  Perhaps Apache infrastructure was
misbehaving but I apologize for the late response.


On Sun, May 21, 2023 at 8:59 AM Karl Wright <daddy...@gmail.com> wrote:

> Hi - the big source of bloat for hopcount processing is the delete
> dependencies table, and the options provided allow you to not track those
> at all.  The other tables (intrinsiclink and hopcount) are 1:1 with the
> documents themselves, so these were not considered worth optimizing.
>
> It may be possible to introduce a fourth hopcount mode that did not record
> any information in those tables - but since this can be changed on a job,
> very careful analysis would need to be done to figure out what happens when
> someone flips that setting after a crawl has already been run.
>
> Karl
>
>
> On Thu, May 11, 2023 at 2:28 AM Mingchun Zhao <mingchun.zha...@gmail.com>
> wrote:
>
>> Hi Karl,
>>
>> Thank you for taking time out of your busy schedule to reply.
>>
>> > There is an option on the "hopcount" tab of your job to disable hopcount
>>
>> You mean setting "Hop count mode" to "keep unreachable documents,
>> forever" in the "Hop Filters" tab?
>> Yes, I did it, however, it seems that the records were still inserted
>> into the "intrinsiclink" and "hopcount" tables. Is there a way to tell
>> MCF not to insert data into those tables because operations on it can
>> become a performance bottleneck when the tables bloat?
>>
>> Regards,
>> Mingchun
>>
>> 2023年5月10日(水) 19:53 Karl Wright <daddy...@gmail.com>:
>> >
>> > There is an option on the "hopcount" tab of your job to disable hopcount
>> > tracking entirely.
>> > Karl
>> >
>> > On Tue, May 9, 2023 at 11:49 PM Mingchun Zhao <
>> mingchun.zha...@gmail.com>
>> > wrote:
>> >
>> > > Hi Karl,
>> > >
>> > > Could you please advise me on tracking hopcount.
>> > > I'm using ManifoldCF 2.24 with PostgreSQL 12.14 as the database for
>> now.
>> > > In my case, I don't need to use the 'Hop Filters' feature so I'd like
>> > > to disable tracking hopcount and reduce the insert/update/delete load
>> > > on the 'intrinsiclink' and 'hopcount' tables. So I have two questions
>> > > about this.
>> > > First, is there an option to disable tracking hopcount?
>> > > Second, if I disable tracking hopcount , can it affect other crawling
>> > > processes?
>> > >
>> > > Thank you in advance.
>> > > Kind regards,
>> > > Mingchun
>> > >
>>
>

Reply via email to