[jira] [Commented] (NUTCH-1934) Refactor Fetcher in trunk
[ https://issues.apache.org/jira/browse/NUTCH-1934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14533844#comment-14533844 ] Hudson commented on NUTCH-1934: --- SUCCESS: Integrated in Nutch-trunk #3107 (See [https://builds.apache.org/job/Nutch-trunk/3107/]) NUTCH-1934 Refactor Fetcher in trunk (lewismc: http://svn.apache.org/viewvc/nutch/trunk/?view=rev&rev=1678281) * /nutch/trunk/CHANGES.txt * /nutch/trunk/src/java/org/apache/nutch/fetcher/FetchItem.java * /nutch/trunk/src/java/org/apache/nutch/fetcher/FetchItemQueue.java * /nutch/trunk/src/java/org/apache/nutch/fetcher/FetchItemQueues.java * /nutch/trunk/src/java/org/apache/nutch/fetcher/Fetcher.java * /nutch/trunk/src/java/org/apache/nutch/fetcher/FetcherThread.java * /nutch/trunk/src/java/org/apache/nutch/fetcher/QueueFeeder.java > Refactor Fetcher in trunk > - > > Key: NUTCH-1934 > URL: https://issues.apache.org/jira/browse/NUTCH-1934 > Project: Nutch > Issue Type: Improvement > Components: fetcher >Affects Versions: 1.10 >Reporter: Lewis John McGibbney >Assignee: Lewis John McGibbney > Labels: memex > Fix For: 1.11 > > Attachments: NUTCH-1934-trunkv2.patch, NUTCH-1934.patch > > > Put simply > [Fetcher|https://github.com/apache/nutch/blob/trunk/src/java/org/apache/nutch/fetcher/Fetcher.java] > is too big. > This is kinda strange as the size of this file is unique (I think) from every > other class within Nutch. The others are reasonably well modularized and > split into constituent classes which make sense. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (NUTCH-1934) Refactor Fetcher in trunk
[ https://issues.apache.org/jira/browse/NUTCH-1934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14505089#comment-14505089 ] Markus Jelsma commented on NUTCH-1934: -- Yes excellent. And don't wait for my + or -, i'm gone for a while :) > Refactor Fetcher in trunk > - > > Key: NUTCH-1934 > URL: https://issues.apache.org/jira/browse/NUTCH-1934 > Project: Nutch > Issue Type: Improvement > Components: fetcher >Affects Versions: 1.10 >Reporter: Lewis John McGibbney >Assignee: Lewis John McGibbney > Labels: memex > Fix For: 1.11 > > Attachments: NUTCH-1934-trunkv2.patch, NUTCH-1934.patch > > > Put simply > [Fetcher|https://github.com/apache/nutch/blob/trunk/src/java/org/apache/nutch/fetcher/Fetcher.java] > is too big. > This is kinda strange as the size of this file is unique (I think) from every > other class within Nutch. The others are reasonably well modularized and > split into constituent classes which make sense. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (NUTCH-1934) Refactor Fetcher in trunk
[ https://issues.apache.org/jira/browse/NUTCH-1934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14504941#comment-14504941 ] Lewis John McGibbney commented on NUTCH-1934: - Tika upgrade then push an RC markus? Sounds good to me. -- *Lewis* > Refactor Fetcher in trunk > - > > Key: NUTCH-1934 > URL: https://issues.apache.org/jira/browse/NUTCH-1934 > Project: Nutch > Issue Type: Improvement > Components: fetcher >Affects Versions: 1.10 >Reporter: Lewis John McGibbney >Assignee: Lewis John McGibbney > Labels: memex > Fix For: 1.11 > > Attachments: NUTCH-1934-trunkv2.patch, NUTCH-1934.patch > > > Put simply > [Fetcher|https://github.com/apache/nutch/blob/trunk/src/java/org/apache/nutch/fetcher/Fetcher.java] > is too big. > This is kinda strange as the size of this file is unique (I think) from every > other class within Nutch. The others are reasonably well modularized and > split into constituent classes which make sense. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (NUTCH-1934) Refactor Fetcher in trunk
[ https://issues.apache.org/jira/browse/NUTCH-1934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14504526#comment-14504526 ] Markus Jelsma commented on NUTCH-1934: -- Agreed, but please commit for 1.11. Let us release 1.10 soon and not bring in huge changes last minute. > Refactor Fetcher in trunk > - > > Key: NUTCH-1934 > URL: https://issues.apache.org/jira/browse/NUTCH-1934 > Project: Nutch > Issue Type: Improvement > Components: fetcher >Affects Versions: 1.10 >Reporter: Lewis John McGibbney >Assignee: Lewis John McGibbney > Labels: memex > Fix For: 1.11 > > Attachments: NUTCH-1934-trunkv2.patch, NUTCH-1934.patch > > > Put simply > [Fetcher|https://github.com/apache/nutch/blob/trunk/src/java/org/apache/nutch/fetcher/Fetcher.java] > is too big. > This is kinda strange as the size of this file is unique (I think) from every > other class within Nutch. The others are reasonably well modularized and > split into constituent classes which make sense. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (NUTCH-1934) Refactor Fetcher in trunk
[ https://issues.apache.org/jira/browse/NUTCH-1934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14504006#comment-14504006 ] Lewis John McGibbney commented on NUTCH-1934: - +1 on that sentiment Will commit tomorrow to allow EU folks to wake up On Monday, April 20, 2015, Jorge Luis Betancourt Gonzalez (JIRA) < -- *Lewis* > Refactor Fetcher in trunk > - > > Key: NUTCH-1934 > URL: https://issues.apache.org/jira/browse/NUTCH-1934 > Project: Nutch > Issue Type: Improvement > Components: fetcher >Affects Versions: 1.10 >Reporter: Lewis John McGibbney >Assignee: Lewis John McGibbney > Labels: memex > Fix For: 1.11 > > Attachments: NUTCH-1934-trunkv2.patch, NUTCH-1934.patch > > > Put simply > [Fetcher|https://github.com/apache/nutch/blob/trunk/src/java/org/apache/nutch/fetcher/Fetcher.java] > is too big. > This is kinda strange as the size of this file is unique (I think) from every > other class within Nutch. The others are reasonably well modularized and > split into constituent classes which make sense. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (NUTCH-1934) Refactor Fetcher in trunk
[ https://issues.apache.org/jira/browse/NUTCH-1934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14503904#comment-14503904 ] Jorge Luis Betancourt Gonzalez commented on NUTCH-1934: --- +1 to [~chrismattmann] comment, If the tests pass without any problem I think we can commit and do some more testing, the basic test that covers the monolithic fetcher right now is a great starting point, and of course take it for a spin :) I plan on taking some time to prepare some midsize crawl before/after the commit if it helps. > Refactor Fetcher in trunk > - > > Key: NUTCH-1934 > URL: https://issues.apache.org/jira/browse/NUTCH-1934 > Project: Nutch > Issue Type: Improvement > Components: fetcher >Affects Versions: 1.10 >Reporter: Lewis John McGibbney >Assignee: Lewis John McGibbney > Labels: memex > Fix For: 1.11 > > Attachments: NUTCH-1934-trunkv2.patch, NUTCH-1934.patch > > > Put simply > [Fetcher|https://github.com/apache/nutch/blob/trunk/src/java/org/apache/nutch/fetcher/Fetcher.java] > is too big. > This is kinda strange as the size of this file is unique (I think) from every > other class within Nutch. The others are reasonably well modularized and > split into constituent classes which make sense. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (NUTCH-1934) Refactor Fetcher in trunk
[ https://issues.apache.org/jira/browse/NUTCH-1934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14503882#comment-14503882 ] Chris A. Mattmann commented on NUTCH-1934: -- well my point is on this - you can keep this as a patch and spend the effort to take a > 1000 line Java file and keep it up to date with trunk or you can risk that you broke something in trunk, but make the fixes to that 10x easier by having it committed. Your call :) > Refactor Fetcher in trunk > - > > Key: NUTCH-1934 > URL: https://issues.apache.org/jira/browse/NUTCH-1934 > Project: Nutch > Issue Type: Improvement > Components: fetcher >Affects Versions: 1.10 >Reporter: Lewis John McGibbney >Assignee: Lewis John McGibbney > Labels: memex > Fix For: 1.11 > > Attachments: NUTCH-1934-trunkv2.patch, NUTCH-1934.patch > > > Put simply > [Fetcher|https://github.com/apache/nutch/blob/trunk/src/java/org/apache/nutch/fetcher/Fetcher.java] > is too big. > This is kinda strange as the size of this file is unique (I think) from every > other class within Nutch. The others are reasonably well modularized and > split into constituent classes which make sense. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (NUTCH-1934) Refactor Fetcher in trunk
[ https://issues.apache.org/jira/browse/NUTCH-1934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14503866#comment-14503866 ] Lewis John McGibbney commented on NUTCH-1934: - This patch really needs tested thoroughly. It's a major refactoring of a >1000 line Java file which we all know as trunk Fetcher. Although no existing functionality has changed... I believe I've now implemented some method calls as static so we need to make sure this is OK. -- *Lewis* > Refactor Fetcher in trunk > - > > Key: NUTCH-1934 > URL: https://issues.apache.org/jira/browse/NUTCH-1934 > Project: Nutch > Issue Type: Improvement > Components: fetcher >Affects Versions: 1.10 >Reporter: Lewis John McGibbney >Assignee: Lewis John McGibbney > Labels: memex > Fix For: 1.11 > > Attachments: NUTCH-1934-trunkv2.patch, NUTCH-1934.patch > > > Put simply > [Fetcher|https://github.com/apache/nutch/blob/trunk/src/java/org/apache/nutch/fetcher/Fetcher.java] > is too big. > This is kinda strange as the size of this file is unique (I think) from every > other class within Nutch. The others are reasonably well modularized and > split into constituent classes which make sense. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (NUTCH-1934) Refactor Fetcher in trunk
[ https://issues.apache.org/jira/browse/NUTCH-1934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14503795#comment-14503795 ] Chris A. Mattmann commented on NUTCH-1934: -- +1 to commit if it applies cleanly and tests pass. > Refactor Fetcher in trunk > - > > Key: NUTCH-1934 > URL: https://issues.apache.org/jira/browse/NUTCH-1934 > Project: Nutch > Issue Type: Improvement > Components: fetcher >Affects Versions: 1.10 >Reporter: Lewis John McGibbney >Assignee: Lewis John McGibbney > Labels: memex > Fix For: 1.11 > > Attachments: NUTCH-1934-trunkv2.patch, NUTCH-1934.patch > > > Put simply > [Fetcher|https://github.com/apache/nutch/blob/trunk/src/java/org/apache/nutch/fetcher/Fetcher.java] > is too big. > This is kinda strange as the size of this file is unique (I think) from every > other class within Nutch. The others are reasonably well modularized and > split into constituent classes which make sense. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (NUTCH-1934) Refactor Fetcher in trunk
[ https://issues.apache.org/jira/browse/NUTCH-1934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14503758#comment-14503758 ] Lewis John McGibbney commented on NUTCH-1934: - Thanks [~mjoyce] this is a big help in determining if this applies against trunk. If it is ripe for testing an eval then hopefully more people can chime in before too many patches make it in to trunk Fetcher and I need to rebase again. > Refactor Fetcher in trunk > - > > Key: NUTCH-1934 > URL: https://issues.apache.org/jira/browse/NUTCH-1934 > Project: Nutch > Issue Type: Improvement > Components: fetcher >Affects Versions: 1.10 >Reporter: Lewis John McGibbney >Assignee: Lewis John McGibbney > Labels: memex > Fix For: 1.11 > > Attachments: NUTCH-1934-trunkv2.patch, NUTCH-1934.patch > > > Put simply > [Fetcher|https://github.com/apache/nutch/blob/trunk/src/java/org/apache/nutch/fetcher/Fetcher.java] > is too big. > This is kinda strange as the size of this file is unique (I think) from every > other class within Nutch. The others are reasonably well modularized and > split into constituent classes which make sense. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (NUTCH-1934) Refactor Fetcher in trunk
[ https://issues.apache.org/jira/browse/NUTCH-1934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14503746#comment-14503746 ] Michael Joyce commented on NUTCH-1934: -- Hey [~lewismc], Patch applied clean to trunk for me and simple crawl over one site worked just fine. Couldn't run the tests unfortunately since I seem to have some config problem locally, but hopefully that's a start at least. > Refactor Fetcher in trunk > - > > Key: NUTCH-1934 > URL: https://issues.apache.org/jira/browse/NUTCH-1934 > Project: Nutch > Issue Type: Improvement > Components: fetcher >Affects Versions: 1.10 >Reporter: Lewis John McGibbney >Assignee: Lewis John McGibbney > Labels: memex > Fix For: 1.11 > > Attachments: NUTCH-1934-trunkv2.patch, NUTCH-1934.patch > > > Put simply > [Fetcher|https://github.com/apache/nutch/blob/trunk/src/java/org/apache/nutch/fetcher/Fetcher.java] > is too big. > This is kinda strange as the size of this file is unique (I think) from every > other class within Nutch. The others are reasonably well modularized and > split into constituent classes which make sense. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (NUTCH-1934) Refactor Fetcher in trunk
[ https://issues.apache.org/jira/browse/NUTCH-1934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14503727#comment-14503727 ] Michael Joyce commented on NUTCH-1934: -- Once sec Lewis and I'll take a quick scope. > Refactor Fetcher in trunk > - > > Key: NUTCH-1934 > URL: https://issues.apache.org/jira/browse/NUTCH-1934 > Project: Nutch > Issue Type: Improvement > Components: fetcher >Affects Versions: 1.10 >Reporter: Lewis John McGibbney >Assignee: Lewis John McGibbney > Labels: memex > Fix For: 1.11 > > Attachments: NUTCH-1934-trunkv2.patch, NUTCH-1934.patch > > > Put simply > [Fetcher|https://github.com/apache/nutch/blob/trunk/src/java/org/apache/nutch/fetcher/Fetcher.java] > is too big. > This is kinda strange as the size of this file is unique (I think) from every > other class within Nutch. The others are reasonably well modularized and > split into constituent classes which make sense. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (NUTCH-1934) Refactor Fetcher in trunk
[ https://issues.apache.org/jira/browse/NUTCH-1934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14503663#comment-14503663 ] Lewis John McGibbney commented on NUTCH-1934: - Anyone able to take this for a spin or even to verify if it can apply against trunk anymore? It is a non trivial patch but one which makes the Fetcher much easier for us all to work with if we get the refactoring correct. Thanks > Refactor Fetcher in trunk > - > > Key: NUTCH-1934 > URL: https://issues.apache.org/jira/browse/NUTCH-1934 > Project: Nutch > Issue Type: Improvement > Components: fetcher >Affects Versions: 1.10 >Reporter: Lewis John McGibbney >Assignee: Lewis John McGibbney > Labels: memex > Fix For: 1.11 > > Attachments: NUTCH-1934-trunkv2.patch, NUTCH-1934.patch > > > Put simply > [Fetcher|https://github.com/apache/nutch/blob/trunk/src/java/org/apache/nutch/fetcher/Fetcher.java] > is too big. > This is kinda strange as the size of this file is unique (I think) from every > other class within Nutch. The others are reasonably well modularized and > split into constituent classes which make sense. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (NUTCH-1934) Refactor Fetcher in trunk
[ https://issues.apache.org/jira/browse/NUTCH-1934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14483995#comment-14483995 ] Lewis John McGibbney commented on NUTCH-1934: - I have a patch running locally with this. The most recent patch will still apply against trunk as I've just sync'd my local copy of trunk against and all looks good. Anyone able to review this would make me very happy. > Refactor Fetcher in trunk > - > > Key: NUTCH-1934 > URL: https://issues.apache.org/jira/browse/NUTCH-1934 > Project: Nutch > Issue Type: Improvement > Components: fetcher >Affects Versions: 1.10 >Reporter: Lewis John McGibbney >Assignee: Lewis John McGibbney > Labels: memex > Fix For: 1.11 > > Attachments: NUTCH-1934-trunkv2.patch, NUTCH-1934.patch > > > Put simply > [Fetcher|https://github.com/apache/nutch/blob/trunk/src/java/org/apache/nutch/fetcher/Fetcher.java] > is too big. > This is kinda strange as the size of this file is unique (I think) from every > other class within Nutch. The others are reasonably well modularized and > split into constituent classes which make sense. -- This message was sent by Atlassian JIRA (v6.3.4#6332)