[jira] [Resolved] (NUTCH-2155) Create a "crawl completeness" utility
[ https://issues.apache.org/jira/browse/NUTCH-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Joyce resolved NUTCH-2155. -- Resolution: Fixed Latest patch committed in r1713885 > Create a "crawl completeness" utility > - > > Key: NUTCH-2155 > URL: https://issues.apache.org/jira/browse/NUTCH-2155 > Project: Nutch > Issue Type: Improvement > Components: util >Affects Versions: 1.10 >Reporter: Michael Joyce >Assignee: Michael Joyce > Labels: memex > Fix For: 1.11 > > Attachments: NUTCH-2155_joyce_9Nov2015.patch > > > I've found it useful to have a tool for dumping some "completeness" > information from a crawl similar to how domainstats does but including > fetched and unfetched counts per domain/host. This is especially nice when > doing vertical crawls over a few domains or just to see how much of a > host/domain you've covered with your crawl so far. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (NUTCH-2165) FileDumper Util hard codes part-# folder name
[ https://issues.apache.org/jira/browse/NUTCH-2165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15000658#comment-15000658 ] Lewis John McGibbney commented on NUTCH-2165: - It means that the remaining data is not dumped. > FileDumper Util hard codes part-# folder name > - > > Key: NUTCH-2165 > URL: https://issues.apache.org/jira/browse/NUTCH-2165 > Project: Nutch > Issue Type: Bug > Components: tool >Affects Versions: 2.3, 1.10 >Reporter: Michael Joyce > Fix For: 2.4, 1.11 > > > Hi folks, [~lewismc] and I were just discussing this off list. It seems that > the part-# folders seem to be hard coded to part-0 in the [FileDumper > utility|https://github.com/apache/nutch/blob/trunk/src/java/org/apache/nutch/tools/FileDumper.java#L166-L167] > which could prove problematic. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (NUTCH-2150) Add ProtocolStatus Utility
[ https://issues.apache.org/jira/browse/NUTCH-2150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Joyce resolved NUTCH-2150. -- Resolution: Fixed Resolved in r1713892 > Add ProtocolStatus Utility > -- > > Key: NUTCH-2150 > URL: https://issues.apache.org/jira/browse/NUTCH-2150 > Project: Nutch > Issue Type: Improvement > Components: util >Affects Versions: 1.10 >Reporter: Michael Joyce >Assignee: Michael Joyce > Fix For: 1.11 > > Attachments: NUTCH-2015_joyce_9Nov2015.patch > > > It would be nice to have a utility for dumping protocol status code > information for a crawl database. This will be a utility for getting a dump > of the protocol status codes that builds off of NUTCH-2129 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (NUTCH-2150) Add ProtocolStatus Utility
[ https://issues.apache.org/jira/browse/NUTCH-2150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15000787#comment-15000787 ] Hudson commented on NUTCH-2150: --- SUCCESS: Integrated in Nutch-trunk #3305 (See [https://builds.apache.org/job/Nutch-trunk/3305/]) NUTCH-2150 - Update help text and remove 'current' folder requirements (joyce: [http://svn.apache.org/viewvc/nutch/trunk/?view=rev=1713892]) * trunk/src/java/org/apache/nutch/util/ProtocolStatusStatistics.java > Add ProtocolStatus Utility > -- > > Key: NUTCH-2150 > URL: https://issues.apache.org/jira/browse/NUTCH-2150 > Project: Nutch > Issue Type: Improvement > Components: util >Affects Versions: 1.10 >Reporter: Michael Joyce >Assignee: Michael Joyce > Fix For: 1.11 > > Attachments: NUTCH-2015_joyce_9Nov2015.patch > > > It would be nice to have a utility for dumping protocol status code > information for a crawl database. This will be a utility for getting a dump > of the protocol status codes that builds off of NUTCH-2129 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (NUTCH-1911) Improve DomainStatistics tool command line parsing
[ https://issues.apache.org/jira/browse/NUTCH-1911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15000788#comment-15000788 ] Hudson commented on NUTCH-1911: --- SUCCESS: Integrated in Nutch-trunk #3305 (See [https://builds.apache.org/job/Nutch-trunk/3305/]) NUTCH-1911 - Recommit help fixes and remove 'current' folder requirement (joyce: [http://svn.apache.org/viewvc/nutch/trunk/?view=rev=1713890]) * trunk/CHANGES.txt * trunk/src/java/org/apache/nutch/util/domain/DomainStatistics.java > Improve DomainStatistics tool command line parsing > -- > > Key: NUTCH-1911 > URL: https://issues.apache.org/jira/browse/NUTCH-1911 > Project: Nutch > Issue Type: Bug > Components: util >Affects Versions: 1.9, 2.2.1 >Reporter: Lewis John McGibbney >Assignee: Michael Joyce >Priority: Trivial > Fix For: 1.10, 1.11 > > Attachments: NUTCH-1911_joyce_9Nov2015.patch, > NUTCH-1911_joyce_9Nov2015.patch > > > The DomainStatistic's tool could be improved based on the comments addressed > in [this mai > thread|http://www.mail-archive.com/user%40nutch.apache.org/msg13028.html] > For convenience, I've also pasted them below > {quote} > You cannot just tell it where the crawldb is, you need to tell it where the > directory is, so specifying current is ok, but not part-* > {quote} > Patch should be trivial work -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (NUTCH-2167) Backport TableUtil from 2.x for URL reversing
[ https://issues.apache.org/jira/browse/NUTCH-2167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15000841#comment-15000841 ] Michael Joyce commented on NUTCH-2167: -- Hi folks, All looks good and tests run fine after moving this over for testing. I'm going to svn cp them over if no one has any objections. > Backport TableUtil from 2.x for URL reversing > - > > Key: NUTCH-2167 > URL: https://issues.apache.org/jira/browse/NUTCH-2167 > Project: Nutch > Issue Type: Sub-task > Components: tool >Affects Versions: 1.10 >Reporter: Michael Joyce >Assignee: Michael Joyce > Fix For: 1.11 > > > The > [TableUtil|https://github.com/apache/nutch/blob/2.x/src/java/org/apache/nutch/util/TableUtil.java] > file provides a number of helpful utilities functions for URL reversing that > would be useful to have in 1.x -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Work started] (NUTCH-2167) Backport TableUtil from 2.x for URL reversing
[ https://issues.apache.org/jira/browse/NUTCH-2167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on NUTCH-2167 started by Michael Joyce. > Backport TableUtil from 2.x for URL reversing > - > > Key: NUTCH-2167 > URL: https://issues.apache.org/jira/browse/NUTCH-2167 > Project: Nutch > Issue Type: Sub-task > Components: tool >Affects Versions: 1.10 >Reporter: Michael Joyce >Assignee: Michael Joyce > Fix For: 1.11 > > > The > [TableUtil|https://github.com/apache/nutch/blob/2.x/src/java/org/apache/nutch/util/TableUtil.java] > file provides a number of helpful utilities functions for URL reversing that > would be useful to have in 1.x -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (NUTCH-2155) Create a "crawl completeness" utility
[ https://issues.apache.org/jira/browse/NUTCH-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15000667#comment-15000667 ] Hudson commented on NUTCH-2155: --- SUCCESS: Integrated in Nutch-trunk #3304 (See [https://builds.apache.org/job/Nutch-trunk/3304/]) NUTCH-2155 - Update crawlcomplete help and drop 'current' folder requirements (joyce: [http://svn.apache.org/viewvc/nutch/trunk/?view=rev=1713885]) * trunk/src/java/org/apache/nutch/util/CrawlCompletionStats.java > Create a "crawl completeness" utility > - > > Key: NUTCH-2155 > URL: https://issues.apache.org/jira/browse/NUTCH-2155 > Project: Nutch > Issue Type: Improvement > Components: util >Affects Versions: 1.10 >Reporter: Michael Joyce >Assignee: Michael Joyce > Labels: memex > Fix For: 1.11 > > Attachments: NUTCH-2155_joyce_9Nov2015.patch > > > I've found it useful to have a tool for dumping some "completeness" > information from a crawl similar to how domainstats does but including > fetched and unfetched counts per domain/host. This is especially nice when > doing vertical crawls over a few domains or just to see how much of a > host/domain you've covered with your crawl so far. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Work started] (NUTCH-1911) Improve DomainStatistics tool command line parsing
[ https://issues.apache.org/jira/browse/NUTCH-1911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on NUTCH-1911 started by Michael Joyce. > Improve DomainStatistics tool command line parsing > -- > > Key: NUTCH-1911 > URL: https://issues.apache.org/jira/browse/NUTCH-1911 > Project: Nutch > Issue Type: Bug > Components: util >Affects Versions: 1.9, 2.2.1 >Reporter: Lewis John McGibbney >Assignee: Michael Joyce >Priority: Trivial > Fix For: 1.10, 1.11 > > Attachments: NUTCH-1911_joyce_9Nov2015.patch, > NUTCH-1911_joyce_9Nov2015.patch > > > The DomainStatistic's tool could be improved based on the comments addressed > in [this mai > thread|http://www.mail-archive.com/user%40nutch.apache.org/msg13028.html] > For convenience, I've also pasted them below > {quote} > You cannot just tell it where the crawldb is, you need to tell it where the > directory is, so specifying current is ok, but not part-* > {quote} > Patch should be trivial work -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (NUTCH-1911) Improve DomainStatistics tool command line parsing
[ https://issues.apache.org/jira/browse/NUTCH-1911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Joyce resolved NUTCH-1911. -- Resolution: Fixed Resolved in r1713890 > Improve DomainStatistics tool command line parsing > -- > > Key: NUTCH-1911 > URL: https://issues.apache.org/jira/browse/NUTCH-1911 > Project: Nutch > Issue Type: Bug > Components: util >Affects Versions: 1.9, 2.2.1 >Reporter: Lewis John McGibbney >Assignee: Michael Joyce >Priority: Trivial > Fix For: 1.11, 1.10 > > Attachments: NUTCH-1911_joyce_9Nov2015.patch, > NUTCH-1911_joyce_9Nov2015.patch > > > The DomainStatistic's tool could be improved based on the comments addressed > in [this mai > thread|http://www.mail-archive.com/user%40nutch.apache.org/msg13028.html] > For convenience, I've also pasted them below > {quote} > You cannot just tell it where the crawldb is, you need to tell it where the > directory is, so specifying current is ok, but not part-* > {quote} > Patch should be trivial work -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Work started] (NUTCH-2150) Add ProtocolStatus Utility
[ https://issues.apache.org/jira/browse/NUTCH-2150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on NUTCH-2150 started by Michael Joyce. > Add ProtocolStatus Utility > -- > > Key: NUTCH-2150 > URL: https://issues.apache.org/jira/browse/NUTCH-2150 > Project: Nutch > Issue Type: Improvement > Components: util >Affects Versions: 1.10 >Reporter: Michael Joyce >Assignee: Michael Joyce > Fix For: 1.11 > > Attachments: NUTCH-2015_joyce_9Nov2015.patch > > > It would be nice to have a utility for dumping protocol status code > information for a crawl database. This will be a utility for getting a dump > of the protocol status codes that builds off of NUTCH-2129 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Work started] (NUTCH-2155) Create a "crawl completeness" utility
[ https://issues.apache.org/jira/browse/NUTCH-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on NUTCH-2155 started by Michael Joyce. > Create a "crawl completeness" utility > - > > Key: NUTCH-2155 > URL: https://issues.apache.org/jira/browse/NUTCH-2155 > Project: Nutch > Issue Type: Improvement > Components: util >Affects Versions: 1.10 >Reporter: Michael Joyce >Assignee: Michael Joyce > Labels: memex > Fix For: 1.11 > > Attachments: NUTCH-2155_joyce_9Nov2015.patch > > > I've found it useful to have a tool for dumping some "completeness" > information from a crawl similar to how domainstats does but including > fetched and unfetched counts per domain/host. This is especially nice when > doing vertical crawls over a few domains or just to see how much of a > host/domain you've covered with your crawl so far. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (NUTCH-2166) Add reverse URL format to dump tool
Michael Joyce created NUTCH-2166: Summary: Add reverse URL format to dump tool Key: NUTCH-2166 URL: https://issues.apache.org/jira/browse/NUTCH-2166 Project: Nutch Issue Type: Improvement Components: tool Affects Versions: 1.10, 2.3 Reporter: Michael Joyce Assignee: Michael Joyce Fix For: 2.4, 1.11 Update the FileDumper tool with an option for dumping files to the output directory in reverse URL format. So the file for http://bar.foo.com:8983/to/index.html?a=b Would dump to /com/foo/bar/8983/http/to/index.html?a=b -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Work started] (NUTCH-2166) Add reverse URL format to dump tool
[ https://issues.apache.org/jira/browse/NUTCH-2166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on NUTCH-2166 started by Michael Joyce. > Add reverse URL format to dump tool > --- > > Key: NUTCH-2166 > URL: https://issues.apache.org/jira/browse/NUTCH-2166 > Project: Nutch > Issue Type: Improvement > Components: tool >Affects Versions: 2.3, 1.10 >Reporter: Michael Joyce >Assignee: Michael Joyce > Fix For: 2.4, 1.11 > > > Update the FileDumper tool with an option for dumping files to the output > directory in reverse URL format. > So the file for > http://bar.foo.com:8983/to/index.html?a=b > Would dump to > /com/foo/bar/8983/http/to/index.html?a=b -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (NUTCH-2165) FileDumper Util hard codes part-# folder name
Michael Joyce created NUTCH-2165: Summary: FileDumper Util hard codes part-# folder name Key: NUTCH-2165 URL: https://issues.apache.org/jira/browse/NUTCH-2165 Project: Nutch Issue Type: Bug Components: tool Affects Versions: 1.10, 2.3 Reporter: Michael Joyce Fix For: 2.4, 1.11 Hi folks, [~lewismc] and I were just discussing this off list. It seems that the part-# folders seem to be hard coded to part-0 in the [FileDumper utility|https://github.com/apache/nutch/blob/trunk/src/java/org/apache/nutch/tools/FileDumper.java#L166-L167] which could prove problematic. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Updates to CHANGES.txt on commit
Hi folks, It seems like our usual workflow is to update CHANGES on commit (correct me if I'm wrong here). What do we think about pulling the CHANGES updates from JIRA as part of our release prep instead? Seems like it would be a bit less error prone, although I do understand peoples' desires to have CHANGES up to date all the time. Thoughts? -- Jimmy
[jira] [Created] (NUTCH-2167) Backport TableUtil from 2.x for URL reversing
Michael Joyce created NUTCH-2167: Summary: Backport TableUtil from 2.x for URL reversing Key: NUTCH-2167 URL: https://issues.apache.org/jira/browse/NUTCH-2167 Project: Nutch Issue Type: Sub-task Components: tool Affects Versions: 1.10 Reporter: Michael Joyce Assignee: Michael Joyce Fix For: 1.11 The [TableUtil|https://github.com/apache/nutch/blob/2.x/src/java/org/apache/nutch/util/TableUtil.java] file provides a number of helpful utilities functions for URL reversing that would be useful to have in 1.x -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Updates to CHANGES.txt on commit
Mike I honestly prefer just having it as a text file. If you search way back in the logs Doug talked about this long ago, but I generally agree. JIRA would be nice but I just like to keep it up to date in text and in JIRA. Sorry for the dupe work but it pays off. ++ Chris Mattmann, Ph.D. Chief Architect Instrument Software and Science Data Systems Section (398) NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 168-519, Mailstop: 168-527 Email: chris.a.mattm...@nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++ Adjunct Associate Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++ -Original Message- From: Mike Joyceon behalf of Michael Joyce Reply-To: "dev@nutch.apache.org" Date: Wednesday, November 11, 2015 at 11:21 AM To: "dev@nutch.apache.org" Subject: Updates to CHANGES.txt on commit >Hi folks, > > >It seems like our usual workflow is to update CHANGES on commit (correct >me if I'm wrong here). What do we think about pulling the CHANGES updates >from JIRA as part of our release prep instead? Seems like it would be a >bit less error prone, although I > do understand peoples' desires to have CHANGES up to date all the time. > > >Thoughts? > > >-- Jimmy > > > > > > >
[jira] [Commented] (NUTCH-2167) Backport TableUtil from 2.x for URL reversing
[ https://issues.apache.org/jira/browse/NUTCH-2167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15000912#comment-15000912 ] Lewis John McGibbney commented on NUTCH-2167: - Yes, an example of this being useful is within the filedumper. For example if we can reverse URLs then raw content can be sent to s3 for archived storage but also retrieved with minimal effort as we can the just re-reverse the URL. > Backport TableUtil from 2.x for URL reversing > - > > Key: NUTCH-2167 > URL: https://issues.apache.org/jira/browse/NUTCH-2167 > Project: Nutch > Issue Type: Sub-task > Components: tool >Affects Versions: 1.10 >Reporter: Michael Joyce >Assignee: Michael Joyce > Fix For: 1.11 > > > The > [TableUtil|https://github.com/apache/nutch/blob/2.x/src/java/org/apache/nutch/util/TableUtil.java] > file provides a number of helpful utilities functions for URL reversing that > would be useful to have in 1.x -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Work started] (NUTCH-2165) FileDumper Util hard codes part-# folder name
[ https://issues.apache.org/jira/browse/NUTCH-2165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on NUTCH-2165 started by Michael Joyce. > FileDumper Util hard codes part-# folder name > - > > Key: NUTCH-2165 > URL: https://issues.apache.org/jira/browse/NUTCH-2165 > Project: Nutch > Issue Type: Bug > Components: tool >Affects Versions: 2.3, 1.10 >Reporter: Michael Joyce >Assignee: Michael Joyce > Fix For: 2.4, 1.11 > > > Hi folks, [~lewismc] and I were just discussing this off list. It seems that > the part-# folders seem to be hard coded to part-0 in the [FileDumper > utility|https://github.com/apache/nutch/blob/trunk/src/java/org/apache/nutch/tools/FileDumper.java#L166-L167] > which could prove problematic. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (NUTCH-2165) FileDumper Util hard codes part-# folder name
[ https://issues.apache.org/jira/browse/NUTCH-2165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Joyce reassigned NUTCH-2165: Assignee: Michael Joyce > FileDumper Util hard codes part-# folder name > - > > Key: NUTCH-2165 > URL: https://issues.apache.org/jira/browse/NUTCH-2165 > Project: Nutch > Issue Type: Bug > Components: tool >Affects Versions: 2.3, 1.10 >Reporter: Michael Joyce >Assignee: Michael Joyce > Fix For: 2.4, 1.11 > > > Hi folks, [~lewismc] and I were just discussing this off list. It seems that > the part-# folders seem to be hard coded to part-0 in the [FileDumper > utility|https://github.com/apache/nutch/blob/trunk/src/java/org/apache/nutch/tools/FileDumper.java#L166-L167] > which could prove problematic. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (NUTCH-2165) FileDumper Util hard codes part-# folder name
[ https://issues.apache.org/jira/browse/NUTCH-2165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15000910#comment-15000910 ] Michael Joyce commented on NUTCH-2165: -- Oh aye > FileDumper Util hard codes part-# folder name > - > > Key: NUTCH-2165 > URL: https://issues.apache.org/jira/browse/NUTCH-2165 > Project: Nutch > Issue Type: Bug > Components: tool >Affects Versions: 2.3, 1.10 >Reporter: Michael Joyce >Assignee: Michael Joyce > Fix For: 2.4, 1.11 > > > Hi folks, [~lewismc] and I were just discussing this off list. It seems that > the part-# folders seem to be hard coded to part-0 in the [FileDumper > utility|https://github.com/apache/nutch/blob/trunk/src/java/org/apache/nutch/tools/FileDumper.java#L166-L167] > which could prove problematic. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (NUTCH-2165) FileDumper Util hard codes part-# folder name
[ https://issues.apache.org/jira/browse/NUTCH-2165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Joyce updated NUTCH-2165: - Attachment: NUTCH-2165_joyce_11Nov2015.patch Patch attached > FileDumper Util hard codes part-# folder name > - > > Key: NUTCH-2165 > URL: https://issues.apache.org/jira/browse/NUTCH-2165 > Project: Nutch > Issue Type: Bug > Components: tool >Affects Versions: 2.3, 1.10 >Reporter: Michael Joyce >Assignee: Michael Joyce > Fix For: 2.4, 1.11 > > Attachments: NUTCH-2165_joyce_11Nov2015.patch > > > Hi folks, [~lewismc] and I were just discussing this off list. It seems that > the part-# folders seem to be hard coded to part-0 in the [FileDumper > utility|https://github.com/apache/nutch/blob/trunk/src/java/org/apache/nutch/tools/FileDumper.java#L166-L167] > which could prove problematic. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (NUTCH-2165) FileDumper Util hard codes part-# folder name
[ https://issues.apache.org/jira/browse/NUTCH-2165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15000923#comment-15000923 ] Michael Joyce commented on NUTCH-2165: -- Note, the diff looks massive here. This is really just adding an extra loop over the parts directories in each segment directory. The tool could probably use a bit of cleanup love, but we can address that in a later patch. > FileDumper Util hard codes part-# folder name > - > > Key: NUTCH-2165 > URL: https://issues.apache.org/jira/browse/NUTCH-2165 > Project: Nutch > Issue Type: Bug > Components: tool >Affects Versions: 2.3, 1.10 >Reporter: Michael Joyce >Assignee: Michael Joyce > Fix For: 2.4, 1.11 > > Attachments: NUTCH-2165_joyce_11Nov2015.patch > > > Hi folks, [~lewismc] and I were just discussing this off list. It seems that > the part-# folders seem to be hard coded to part-0 in the [FileDumper > utility|https://github.com/apache/nutch/blob/trunk/src/java/org/apache/nutch/tools/FileDumper.java#L166-L167] > which could prove problematic. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (NUTCH-2120) Remove MapWritable from trunk codebase
[ https://issues.apache.org/jira/browse/NUTCH-2120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-2120: Issue Type: Task (was: Bug) > Remove MapWritable from trunk codebase > -- > > Key: NUTCH-2120 > URL: https://issues.apache.org/jira/browse/NUTCH-2120 > Project: Nutch > Issue Type: Task >Reporter: Lewis John McGibbney >Assignee: Lewis John McGibbney >Priority: Trivial > Fix For: 1.12 > > Attachments: NUTCH-2120.patch > > > [MapWritable|http://nutch.apache.org/apidocs/apidocs-1.10/index.html?org/apache/nutch/crawl/MapWritable.htm] > has been deprecated for a good while. > We should remove it from the codebase and make sure we are not using it > anywhere (I don't think we are). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (NUTCH-2120) Remove MapWritable from trunk codebase
[ https://issues.apache.org/jira/browse/NUTCH-2120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-2120: Flags: Patch Patch Info: Patch Available > Remove MapWritable from trunk codebase > -- > > Key: NUTCH-2120 > URL: https://issues.apache.org/jira/browse/NUTCH-2120 > Project: Nutch > Issue Type: Task >Reporter: Lewis John McGibbney >Assignee: Lewis John McGibbney >Priority: Trivial > Fix For: 1.12 > > Attachments: NUTCH-2120.patch > > > [MapWritable|http://nutch.apache.org/apidocs/apidocs-1.10/index.html?org/apache/nutch/crawl/MapWritable.htm] > has been deprecated for a good while. > We should remove it from the codebase and make sure we are not using it > anywhere (I don't think we are). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (NUTCH-2160) Upgrade Selenium Java to 2.48.2
[ https://issues.apache.org/jira/browse/NUTCH-2160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15001105#comment-15001105 ] Lewis John McGibbney commented on NUTCH-2160: - Will commit by EoB today unless there are objections > Upgrade Selenium Java to 2.48.2 > --- > > Key: NUTCH-2160 > URL: https://issues.apache.org/jira/browse/NUTCH-2160 > Project: Nutch > Issue Type: Bug > Components: plugin, protocol >Affects Versions: 1.11 >Reporter: Lewis John McGibbney >Assignee: Lewis John McGibbney > Fix For: 1.11 > > Attachments: NUTCH-2160.patch > > > Current Selenium support is pegged at a very old version of Firefox. The > attached patch, running with the most recent version of Selenium Java, works > with Firefox 38.4.0 very well. The remainder of the lib-selenium dependencies > have also been updated. > Thanks > [~kwhitehall] can you please scope if you get a wee minute? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (NUTCH-2120) Remove MapWritable from trunk codebase
[ https://issues.apache.org/jira/browse/NUTCH-2120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-2120: Attachment: NUTCH-2120.patch Patch which removes this class from Trunk. > Remove MapWritable from trunk codebase > -- > > Key: NUTCH-2120 > URL: https://issues.apache.org/jira/browse/NUTCH-2120 > Project: Nutch > Issue Type: Bug >Reporter: Lewis John McGibbney >Assignee: Lewis John McGibbney >Priority: Trivial > Fix For: 1.12 > > Attachments: NUTCH-2120.patch > > > [MapWritable|http://nutch.apache.org/apidocs/apidocs-1.10/index.html?org/apache/nutch/crawl/MapWritable.htm] > has been deprecated for a good while. > We should remove it from the codebase and make sure we are not using it > anywhere (I don't think we are). -- This message was sent by Atlassian JIRA (v6.3.4#6332)