[ https://issues.apache.org/jira/browse/CONNECTORS-1599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16815389#comment-16815389 ]
Karl Wright commented on CONNECTORS-1599: ----------------------------------------- [~goovaertsr], 401 means the document is not accessible. This has nothing to do with being "unreachable", because "unreachable" means there is no path to it from the seeds. > response code 401 still gets deleted with the setting "keep unreachable > documents" > ---------------------------------------------------------------------------------- > > Key: CONNECTORS-1599 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1599 > Project: ManifoldCF > Issue Type: Bug > Components: Web connector > Affects Versions: ManifoldCF 2.12 > Reporter: roel goovaerts > Priority: Major > > Even with the "Hop count mode" set to "keep unreachable documents, 'for now' > || forever" manifold deletes documents for which it receives a 401 response > code. > The documentation does not specify such a distinction as described above. Is > there some information/configuration that I'm missing? Is there a reasoning > behind the guaranteed deletion of a 401? > Ideally, for our use-case, we would want to remove all documents that return > 404, but keep everything which is due the server not responding or the > crawler being unauthenticated. > Is there a way to configure this in a more granular fashion? > Regards, > roel -- This message was sent by Atlassian JIRA (v7.6.3#76005)