[jira] [Commented] (NUTCH-1934) Refactor Fetcher in trunk

2015-05-07 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14533844#comment-14533844
 ] 

Hudson commented on NUTCH-1934:
---

SUCCESS: Integrated in Nutch-trunk #3107 (See 
[https://builds.apache.org/job/Nutch-trunk/3107/])
NUTCH-1934 Refactor Fetcher in trunk (lewismc: 
http://svn.apache.org/viewvc/nutch/trunk/?view=rev&rev=1678281)
* /nutch/trunk/CHANGES.txt
* /nutch/trunk/src/java/org/apache/nutch/fetcher/FetchItem.java
* /nutch/trunk/src/java/org/apache/nutch/fetcher/FetchItemQueue.java
* /nutch/trunk/src/java/org/apache/nutch/fetcher/FetchItemQueues.java
* /nutch/trunk/src/java/org/apache/nutch/fetcher/Fetcher.java
* /nutch/trunk/src/java/org/apache/nutch/fetcher/FetcherThread.java
* /nutch/trunk/src/java/org/apache/nutch/fetcher/QueueFeeder.java


> Refactor Fetcher in trunk
> -
>
> Key: NUTCH-1934
> URL: https://issues.apache.org/jira/browse/NUTCH-1934
> Project: Nutch
>  Issue Type: Improvement
>  Components: fetcher
>Affects Versions: 1.10
>Reporter: Lewis John McGibbney
>Assignee: Lewis John McGibbney
>  Labels: memex
> Fix For: 1.11
>
> Attachments: NUTCH-1934-trunkv2.patch, NUTCH-1934.patch
>
>
> Put simply 
> [Fetcher|https://github.com/apache/nutch/blob/trunk/src/java/org/apache/nutch/fetcher/Fetcher.java]
>  is too big.
> This is kinda strange as the size of this file is unique (I think) from every 
> other class within Nutch. The others are reasonably well modularized and 
> split into constituent classes which make sense.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NUTCH-1934) Refactor Fetcher in trunk

2015-04-21 Thread Markus Jelsma (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14505089#comment-14505089
 ] 

Markus Jelsma commented on NUTCH-1934:
--

Yes excellent. And don't wait for my + or -, i'm gone for a while :)

> Refactor Fetcher in trunk
> -
>
> Key: NUTCH-1934
> URL: https://issues.apache.org/jira/browse/NUTCH-1934
> Project: Nutch
>  Issue Type: Improvement
>  Components: fetcher
>Affects Versions: 1.10
>Reporter: Lewis John McGibbney
>Assignee: Lewis John McGibbney
>  Labels: memex
> Fix For: 1.11
>
> Attachments: NUTCH-1934-trunkv2.patch, NUTCH-1934.patch
>
>
> Put simply 
> [Fetcher|https://github.com/apache/nutch/blob/trunk/src/java/org/apache/nutch/fetcher/Fetcher.java]
>  is too big.
> This is kinda strange as the size of this file is unique (I think) from every 
> other class within Nutch. The others are reasonably well modularized and 
> split into constituent classes which make sense.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NUTCH-1934) Refactor Fetcher in trunk

2015-04-21 Thread Lewis John McGibbney (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14504941#comment-14504941
 ] 

Lewis John McGibbney commented on NUTCH-1934:
-

Tika upgrade then push an RC markus?
Sounds good to me.




-- 
*Lewis*


> Refactor Fetcher in trunk
> -
>
> Key: NUTCH-1934
> URL: https://issues.apache.org/jira/browse/NUTCH-1934
> Project: Nutch
>  Issue Type: Improvement
>  Components: fetcher
>Affects Versions: 1.10
>Reporter: Lewis John McGibbney
>Assignee: Lewis John McGibbney
>  Labels: memex
> Fix For: 1.11
>
> Attachments: NUTCH-1934-trunkv2.patch, NUTCH-1934.patch
>
>
> Put simply 
> [Fetcher|https://github.com/apache/nutch/blob/trunk/src/java/org/apache/nutch/fetcher/Fetcher.java]
>  is too big.
> This is kinda strange as the size of this file is unique (I think) from every 
> other class within Nutch. The others are reasonably well modularized and 
> split into constituent classes which make sense.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NUTCH-1934) Refactor Fetcher in trunk

2015-04-21 Thread Markus Jelsma (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14504526#comment-14504526
 ] 

Markus Jelsma commented on NUTCH-1934:
--

Agreed, but please commit for 1.11. Let us release 1.10 soon and not bring in 
huge changes last minute.

> Refactor Fetcher in trunk
> -
>
> Key: NUTCH-1934
> URL: https://issues.apache.org/jira/browse/NUTCH-1934
> Project: Nutch
>  Issue Type: Improvement
>  Components: fetcher
>Affects Versions: 1.10
>Reporter: Lewis John McGibbney
>Assignee: Lewis John McGibbney
>  Labels: memex
> Fix For: 1.11
>
> Attachments: NUTCH-1934-trunkv2.patch, NUTCH-1934.patch
>
>
> Put simply 
> [Fetcher|https://github.com/apache/nutch/blob/trunk/src/java/org/apache/nutch/fetcher/Fetcher.java]
>  is too big.
> This is kinda strange as the size of this file is unique (I think) from every 
> other class within Nutch. The others are reasonably well modularized and 
> split into constituent classes which make sense.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NUTCH-1934) Refactor Fetcher in trunk

2015-04-20 Thread Lewis John McGibbney (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14504006#comment-14504006
 ] 

Lewis John McGibbney commented on NUTCH-1934:
-

+1 on that sentiment
Will commit tomorrow to allow EU folks to wake up

On Monday, April 20, 2015, Jorge Luis Betancourt Gonzalez (JIRA) <



-- 
*Lewis*


> Refactor Fetcher in trunk
> -
>
> Key: NUTCH-1934
> URL: https://issues.apache.org/jira/browse/NUTCH-1934
> Project: Nutch
>  Issue Type: Improvement
>  Components: fetcher
>Affects Versions: 1.10
>Reporter: Lewis John McGibbney
>Assignee: Lewis John McGibbney
>  Labels: memex
> Fix For: 1.11
>
> Attachments: NUTCH-1934-trunkv2.patch, NUTCH-1934.patch
>
>
> Put simply 
> [Fetcher|https://github.com/apache/nutch/blob/trunk/src/java/org/apache/nutch/fetcher/Fetcher.java]
>  is too big.
> This is kinda strange as the size of this file is unique (I think) from every 
> other class within Nutch. The others are reasonably well modularized and 
> split into constituent classes which make sense.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NUTCH-1934) Refactor Fetcher in trunk

2015-04-20 Thread Jorge Luis Betancourt Gonzalez (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14503904#comment-14503904
 ] 

Jorge Luis Betancourt Gonzalez commented on NUTCH-1934:
---

+1 to [~chrismattmann] comment, 

If the tests pass without any problem I think we can commit and do some more 
testing, the basic test that covers the monolithic fetcher right now is a great 
starting point, and of course take it for a spin :) I plan on taking some time 
to prepare some midsize crawl before/after the commit if it helps.

> Refactor Fetcher in trunk
> -
>
> Key: NUTCH-1934
> URL: https://issues.apache.org/jira/browse/NUTCH-1934
> Project: Nutch
>  Issue Type: Improvement
>  Components: fetcher
>Affects Versions: 1.10
>Reporter: Lewis John McGibbney
>Assignee: Lewis John McGibbney
>  Labels: memex
> Fix For: 1.11
>
> Attachments: NUTCH-1934-trunkv2.patch, NUTCH-1934.patch
>
>
> Put simply 
> [Fetcher|https://github.com/apache/nutch/blob/trunk/src/java/org/apache/nutch/fetcher/Fetcher.java]
>  is too big.
> This is kinda strange as the size of this file is unique (I think) from every 
> other class within Nutch. The others are reasonably well modularized and 
> split into constituent classes which make sense.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NUTCH-1934) Refactor Fetcher in trunk

2015-04-20 Thread Chris A. Mattmann (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14503882#comment-14503882
 ] 

Chris A. Mattmann commented on NUTCH-1934:
--

well my point is on this - you can keep this as a patch and spend the effort to 
take a > 1000 line Java file and keep it up to date with trunk or you can risk 
that you broke something in trunk, but make the fixes to that 10x 
easier by having it committed. Your call :)

> Refactor Fetcher in trunk
> -
>
> Key: NUTCH-1934
> URL: https://issues.apache.org/jira/browse/NUTCH-1934
> Project: Nutch
>  Issue Type: Improvement
>  Components: fetcher
>Affects Versions: 1.10
>Reporter: Lewis John McGibbney
>Assignee: Lewis John McGibbney
>  Labels: memex
> Fix For: 1.11
>
> Attachments: NUTCH-1934-trunkv2.patch, NUTCH-1934.patch
>
>
> Put simply 
> [Fetcher|https://github.com/apache/nutch/blob/trunk/src/java/org/apache/nutch/fetcher/Fetcher.java]
>  is too big.
> This is kinda strange as the size of this file is unique (I think) from every 
> other class within Nutch. The others are reasonably well modularized and 
> split into constituent classes which make sense.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NUTCH-1934) Refactor Fetcher in trunk

2015-04-20 Thread Lewis John McGibbney (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14503866#comment-14503866
 ] 

Lewis John McGibbney commented on NUTCH-1934:
-

This patch really needs tested thoroughly.
It's a major refactoring of a >1000 line Java file which we all know as
trunk Fetcher.
Although no existing functionality has changed... I believe I've now
implemented some method calls as static so we need to make sure this is OK.




-- 
*Lewis*


> Refactor Fetcher in trunk
> -
>
> Key: NUTCH-1934
> URL: https://issues.apache.org/jira/browse/NUTCH-1934
> Project: Nutch
>  Issue Type: Improvement
>  Components: fetcher
>Affects Versions: 1.10
>Reporter: Lewis John McGibbney
>Assignee: Lewis John McGibbney
>  Labels: memex
> Fix For: 1.11
>
> Attachments: NUTCH-1934-trunkv2.patch, NUTCH-1934.patch
>
>
> Put simply 
> [Fetcher|https://github.com/apache/nutch/blob/trunk/src/java/org/apache/nutch/fetcher/Fetcher.java]
>  is too big.
> This is kinda strange as the size of this file is unique (I think) from every 
> other class within Nutch. The others are reasonably well modularized and 
> split into constituent classes which make sense.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NUTCH-1934) Refactor Fetcher in trunk

2015-04-20 Thread Chris A. Mattmann (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14503795#comment-14503795
 ] 

Chris A. Mattmann commented on NUTCH-1934:
--

+1 to commit if it applies cleanly and tests pass.

> Refactor Fetcher in trunk
> -
>
> Key: NUTCH-1934
> URL: https://issues.apache.org/jira/browse/NUTCH-1934
> Project: Nutch
>  Issue Type: Improvement
>  Components: fetcher
>Affects Versions: 1.10
>Reporter: Lewis John McGibbney
>Assignee: Lewis John McGibbney
>  Labels: memex
> Fix For: 1.11
>
> Attachments: NUTCH-1934-trunkv2.patch, NUTCH-1934.patch
>
>
> Put simply 
> [Fetcher|https://github.com/apache/nutch/blob/trunk/src/java/org/apache/nutch/fetcher/Fetcher.java]
>  is too big.
> This is kinda strange as the size of this file is unique (I think) from every 
> other class within Nutch. The others are reasonably well modularized and 
> split into constituent classes which make sense.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NUTCH-1934) Refactor Fetcher in trunk

2015-04-20 Thread Lewis John McGibbney (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14503758#comment-14503758
 ] 

Lewis John McGibbney commented on NUTCH-1934:
-

Thanks [~mjoyce] this is a big help in determining if this applies against 
trunk. 
If it is ripe for testing an  eval then hopefully more people can chime in 
before too many patches make it in to trunk Fetcher and I need to rebase again.

> Refactor Fetcher in trunk
> -
>
> Key: NUTCH-1934
> URL: https://issues.apache.org/jira/browse/NUTCH-1934
> Project: Nutch
>  Issue Type: Improvement
>  Components: fetcher
>Affects Versions: 1.10
>Reporter: Lewis John McGibbney
>Assignee: Lewis John McGibbney
>  Labels: memex
> Fix For: 1.11
>
> Attachments: NUTCH-1934-trunkv2.patch, NUTCH-1934.patch
>
>
> Put simply 
> [Fetcher|https://github.com/apache/nutch/blob/trunk/src/java/org/apache/nutch/fetcher/Fetcher.java]
>  is too big.
> This is kinda strange as the size of this file is unique (I think) from every 
> other class within Nutch. The others are reasonably well modularized and 
> split into constituent classes which make sense.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NUTCH-1934) Refactor Fetcher in trunk

2015-04-20 Thread Michael Joyce (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14503746#comment-14503746
 ] 

Michael Joyce commented on NUTCH-1934:
--

Hey [~lewismc], 

Patch applied clean to trunk for me and simple crawl over one site worked just 
fine. Couldn't run the tests unfortunately since I seem to have some config 
problem locally, but hopefully that's a start at least.

> Refactor Fetcher in trunk
> -
>
> Key: NUTCH-1934
> URL: https://issues.apache.org/jira/browse/NUTCH-1934
> Project: Nutch
>  Issue Type: Improvement
>  Components: fetcher
>Affects Versions: 1.10
>Reporter: Lewis John McGibbney
>Assignee: Lewis John McGibbney
>  Labels: memex
> Fix For: 1.11
>
> Attachments: NUTCH-1934-trunkv2.patch, NUTCH-1934.patch
>
>
> Put simply 
> [Fetcher|https://github.com/apache/nutch/blob/trunk/src/java/org/apache/nutch/fetcher/Fetcher.java]
>  is too big.
> This is kinda strange as the size of this file is unique (I think) from every 
> other class within Nutch. The others are reasonably well modularized and 
> split into constituent classes which make sense.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NUTCH-1934) Refactor Fetcher in trunk

2015-04-20 Thread Michael Joyce (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14503727#comment-14503727
 ] 

Michael Joyce commented on NUTCH-1934:
--

Once sec Lewis and I'll take a quick scope.

> Refactor Fetcher in trunk
> -
>
> Key: NUTCH-1934
> URL: https://issues.apache.org/jira/browse/NUTCH-1934
> Project: Nutch
>  Issue Type: Improvement
>  Components: fetcher
>Affects Versions: 1.10
>Reporter: Lewis John McGibbney
>Assignee: Lewis John McGibbney
>  Labels: memex
> Fix For: 1.11
>
> Attachments: NUTCH-1934-trunkv2.patch, NUTCH-1934.patch
>
>
> Put simply 
> [Fetcher|https://github.com/apache/nutch/blob/trunk/src/java/org/apache/nutch/fetcher/Fetcher.java]
>  is too big.
> This is kinda strange as the size of this file is unique (I think) from every 
> other class within Nutch. The others are reasonably well modularized and 
> split into constituent classes which make sense.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NUTCH-1934) Refactor Fetcher in trunk

2015-04-20 Thread Lewis John McGibbney (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14503663#comment-14503663
 ] 

Lewis John McGibbney commented on NUTCH-1934:
-

Anyone able to take this for a spin or even to verify if it can apply against 
trunk anymore? It is a non trivial patch but one which makes the Fetcher much 
easier for us all to work with if we get the refactoring correct. Thanks

> Refactor Fetcher in trunk
> -
>
> Key: NUTCH-1934
> URL: https://issues.apache.org/jira/browse/NUTCH-1934
> Project: Nutch
>  Issue Type: Improvement
>  Components: fetcher
>Affects Versions: 1.10
>Reporter: Lewis John McGibbney
>Assignee: Lewis John McGibbney
>  Labels: memex
> Fix For: 1.11
>
> Attachments: NUTCH-1934-trunkv2.patch, NUTCH-1934.patch
>
>
> Put simply 
> [Fetcher|https://github.com/apache/nutch/blob/trunk/src/java/org/apache/nutch/fetcher/Fetcher.java]
>  is too big.
> This is kinda strange as the size of this file is unique (I think) from every 
> other class within Nutch. The others are reasonably well modularized and 
> split into constituent classes which make sense.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NUTCH-1934) Refactor Fetcher in trunk

2015-04-07 Thread Lewis John McGibbney (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14483995#comment-14483995
 ] 

Lewis John McGibbney commented on NUTCH-1934:
-

I have a patch running locally with this. The most recent patch will still 
apply against trunk as I've just sync'd my local copy of trunk against and all 
looks good. Anyone able to review this would make me very happy.

> Refactor Fetcher in trunk
> -
>
> Key: NUTCH-1934
> URL: https://issues.apache.org/jira/browse/NUTCH-1934
> Project: Nutch
>  Issue Type: Improvement
>  Components: fetcher
>Affects Versions: 1.10
>Reporter: Lewis John McGibbney
>Assignee: Lewis John McGibbney
>  Labels: memex
> Fix For: 1.11
>
> Attachments: NUTCH-1934-trunkv2.patch, NUTCH-1934.patch
>
>
> Put simply 
> [Fetcher|https://github.com/apache/nutch/blob/trunk/src/java/org/apache/nutch/fetcher/Fetcher.java]
>  is too big.
> This is kinda strange as the size of this file is unique (I think) from every 
> other class within Nutch. The others are reasonably well modularized and 
> split into constituent classes which make sense.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)