[
https://issues.apache.org/jira/browse/CONNECTORS-854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13866492#comment-13866492
]
Karl Wright commented on CONNECTORS-854:
----------------------------------------
This was added for a reason. I am trying to determine what that reason was.
{code}
r1417060 | kwright | 2012-12-04 12:47:28 -0500 (Tue, 04 Dec 2012) | 1 line
Final checkin of CONNECTORS-120. Port the final connectors to httpcomponents: l
ivelink (untested), meridio (untested), web connector (tested thoroughly). Also
further fixes for SharePoint.
{code}
I expect that the reason may either be in ticket CONNECTORS-120, or may be I
have to resurrect branches/CONNECTORS-120 to find that out. This will take a
little time.
> Enable STALE_CONNECTION_CHECK
> -----------------------------
>
> Key: CONNECTORS-854
> URL: https://issues.apache.org/jira/browse/CONNECTORS-854
> Project: ManifoldCF
> Issue Type: Improvement
> Components: Web connector
> Affects Versions: ManifoldCF 1.4.1
> Reporter: Shinichiro Abe
> Priority: Minor
> Fix For: ManifoldCF 1.5
>
>
> When crawling some sites( < 1000 docs), sometimes manifoldcf.log shows the
> following "The target server failed to respond" messages. It seems that
> NoHttpResponseException is thrown at ThrottledFetcher.
> {noformat}
> WARN 2014-01-09 12:39:16,701 (Worker thread '10') - Pre-ingest service
> interruption reported for job 1389238470356 connection '1': Timed out waiting
> for response for 'http://www.rondhuit.com/?p=1890': The target server failed
> to respond
> WARN 2014-01-09 12:39:55,509 (Worker thread '7') - Pre-ingest service
> interruption reported for job 1389238470356 connection '1': Timed out waiting
> for response for 'http://www.rondhuit.com/?p=675': The target server failed
> to respond
> {noformat}
> The fetching that page after retry time(15 minutes) passed was running
> successfully.
> I tried to change a httpclient configuration then I confirmed that massage
> was not shown.
> {noformat}
> +++
> connectors/webcrawler/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/webcrawler/ThrottledFetcher.java
> @@ -463,7 +463,7 @@
> BasicHttpParams params = new BasicHttpParams();
> params.setParameter(ClientPNames.DEFAULT_HOST,fetchHost);
> params.setBooleanParameter(CoreConnectionPNames.TCP_NODELAY,true);
> -
> params.setBooleanParameter(CoreConnectionPNames.STALE_CONNECTION_CHECK,false);
> +
> params.setBooleanParameter(CoreConnectionPNames.STALE_CONNECTION_CHECK,true);
>
> params.setBooleanParameter(ClientPNames.ALLOW_CIRCULAR_REDIRECTS,true);
> {noformat}
> I know two users who are hitting this issue and have resolved it by turning
> on stale connection check.
> The crawling job is done more quickly than the check is false because there
> are not retry fetches.
> May I switch false to true in stale connection check as well as
> SolrConnector's httpclient configuration?
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)