[ 
https://issues.apache.org/jira/browse/NUTCH-1138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13141334#comment-13141334
 ] 

Zhang JinYan edited comment on NUTCH-1138 at 11/1/11 5:14 PM:
--------------------------------------------------------------

Apply the path to branch-1.4, rebuild with cmd: "ant clean build".
Config to crawl websites:
{quote}
http://172.16.123.123/bbs/viewthread.php?tid=12345
http://172.16.123.123/bbs/attachment.php?aid=12345
http://www.jettycn.com/
{quote}

The previous two sites are not available.
Run crawl with cmd(platform windows):
{quote}
sh.exe ./bin/nutch crawl seedurl -dir crawldev -solr http://localhost:8983/solr/
{quote}

Complete the crawl successfully. Query in solr admin return:
{code:xml}
<result name="response" numFound="320" start="0"></result>
{code}

Search word "ERROR" in "hadoop.log",find 3 results caused by:
{code}
java.net.ConnectException: Connection timed out: connect
{code}

Search word "Exception" in "hadoop.log", find results like this:
{quote}
2011-11-02 00:39:01,821 INFO  httpclient.HttpMethodDirector - I/O exception 
(org.apache.commons.httpclient.NoHttpResponseException) caught when processing 
request: The server www.jettycn.com failed to respond
2011-11-02 00:39:01,821 INFO  httpclient.HttpMethodDirector - Retrying request
{quote}

So there is no exception related to your patch in the "hadoop.log".
The patch work fine with "branch-1.4" for me.
                
      was (Author: yearn20m):
    Apply the path to branch-1.4, rebuild with cmd: "ant clean build".
Config to crawl websites:
{quote}
http://172.16.123.123/bbs/viewthread.php?tid=12345
http://172.16.123.123/bbs/attachment.php?aid=12345
http://www.jettycn.com/
{quote}

The previous two sites are not available.
Run crawl with cmd(platform windows):
{quote}
sh.exe ./bin/nutch crawl seedurl -dir crawldev -solr http://localhost:8983/solr/
{quote}

Complete the crawl successfully. Query in solr admin return:
{code:xml}
<result name="response" numFound="320" start="0"></result>
{code}

Search word "ERROR" in "hadoop.log",find 3 results caused by:
{code}
java.net.ConnectException: Connection timed out: connect
{code}

Search word "Exception" in "hadoop.log", find results like this:
{quote}
2011-11-02 00:39:01,821 INFO  httpclient.HttpMethodDirector - I/O exception 
(org.apache.commons.httpclient.NoHttpResponseException) caught when processing 
request: The server www.jettycn.com failed to respond
2011-11-02 00:39:01,821 INFO  httpclient.HttpMethodDirector - Retrying request
{quote}

So there is no exception related your path in the "hadoop.log".
The path work fine with "branch-1.4" for me.
                  
> remove LogUtil from trunk and nutch gora
> ----------------------------------------
>
>                 Key: NUTCH-1138
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1138
>             Project: Nutch
>          Issue Type: Improvement
>    Affects Versions: 1.4, nutchgora
>            Reporter: Lewis John McGibbney
>            Assignee: Lewis John McGibbney
>            Priority: Minor
>             Fix For: nutchgora, 1.5
>
>         Attachments: Document1.txt, NUTCH-1138-trunk-20111023.patch
>
>
> This should move towards the removal of the LogUtil class from both codebases 
> as per comments in NUTCH-1078.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to