[Nutch-dev] [jira] Resolved: (NUTCH-443) allow parsers to return multiple Parse object, this will speed up the rss parser

2007-06-17 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann resolved NUTCH-443. - Resolution: Fixed Patch tested and contributed by Dogacan. This update is a fix and

[Nutch-dev] [jira] Commented: (NUTCH-444) Possibly use a different library to parse RSS feed for improved performance and compatibility

2007-06-17 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12505607 ] Chris A. Mattmann commented on NUTCH-444: - Hi Nutch Newbie: I will take a look at this today, and take an

[Nutch-dev] [jira] Work started: (NUTCH-444) Possibly use a different library to parse RSS feed for improved performance and compatibility

2007-06-17 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on NUTCH-444 started by Chris A. Mattmann. Possibly use a different library to parse RSS feed for improved performance and compatibility

[Nutch-dev] [jira] Updated: (NUTCH-444) Possibly use a different library to parse RSS feed for improved performance and compatibility

2007-06-17 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated NUTCH-444: Attachment: NUTCH-444.Mattmann.061707.patch.txt Hi Folks, Here is a patch that brings

[Nutch-dev] [jira] Commented: (NUTCH-443) allow parsers to return multiple Parse object, this will speed up the rss parser

2007-06-16 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12505501 ] Chris A. Mattmann commented on NUTCH-443: - Doğacan, Whoops :) This one kind of fell off the radar screen.

[Nutch-dev] [jira] Commented: (NUTCH-485) Change HtmlParseFilter 's to return ParseResult object instead of Parse object

2007-06-16 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12505502 ] Chris A. Mattmann commented on NUTCH-485: - Doğacan, +1. As for your question, IMO, these type of minor

[Nutch-dev] [jira] Commented: (NUTCH-444) Possibly use a different library to parse RSS feed for improved performance and compatibility

2007-05-30 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12500133 ] Chris A. Mattmann commented on NUTCH-444: - Hi Guys, Okay, here is the way that I currently see this issue,

[Nutch-dev] [jira] Reopened: (NUTCH-443) allow parsers to return multiple Parse object, this will speed up the rss parser

2007-05-13 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann reopened NUTCH-443: - Assignee: Chris A. Mattmann (was: Andrzej Bialecki ) Per Doğacan's comment, we need to

[Nutch-dev] [jira] Commented: (NUTCH-444) Possibly use a different library to parse RSS feed for improved performance and compatibility

2007-05-13 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12495381 ] Chris A. Mattmann commented on NUTCH-444: - Doğacan -- I will check this out tomorrow (Monday) night, latest

[Nutch-dev] [jira] Commented: (NUTCH-444) Possibly use a different library to parse RSS feed for improved performance and compatibility

2007-05-10 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12494764 ] Chris A. Mattmann commented on NUTCH-444: - Hi Doğacan, Well I must say, with all the discussion that's

[Nutch-dev] [jira] Resolved: (NUTCH-384) Protocol-file plugin does not allow the parse plugins framework to operate properly

2007-03-09 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann resolved NUTCH-384. - Resolution: Fixed Fix Version/s: 0.9.0 Fixed tested in local crawl, works

[Nutch-dev] [jira] Closed: (NUTCH-384) Protocol-file plugin does not allow the parse plugins framework to operate properly

2007-03-09 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann closed NUTCH-384. --- Patch applied, with whitespace changes, and unit test (contributed by yours truly):

[Nutch-dev] [jira] Commented: (NUTCH-384) Protocol-file plugin does not allow the parse plugins framework to operate properly

2007-03-08 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12479430 ] Chris A. Mattmann commented on NUTCH-384: - Thanks for your patch Heiko! I am looking at this right now. If

[Nutch-dev] [jira] Updated: (NUTCH-443) allow parsers to return multiple Parse object, this will speed up the rss parser

2007-02-25 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated NUTCH-443: Attachment: NUTCH-443.022507.patch.txt Hi Folks, Attached is a candidate patch for

[Nutch-dev] [jira] Commented: (NUTCH-444) Possibly use a different library to parse RSS feed for improved performance and compatibility

2007-02-25 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12475794 ] Chris A. Mattmann commented on NUTCH-444: - Hi Nick, Thanks for your insightful comments on this issue. I

[Nutch-dev] [jira] Work started: (NUTCH-443) allow parsers to return multiple Parse object, this will speed up the rss parser

2007-02-16 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on NUTCH-443 started by Chris A. Mattmann. allow parsers to return multiple Parse object, this will speed up the rss parser

[Nutch-dev] [jira] Commented: (NUTCH-443) allow parsers to return multiple Parse object, this will speed up the rss parser

2007-02-13 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12472692 ] Chris A. Mattmann commented on NUTCH-443: - Hi Nutch Newbie: I've already contacted Doğacan off-list and am

[Nutch-dev] [jira] Assigned: (NUTCH-444) Possibly use a different library to parse RSS feed for improved performance and compatibility

2007-02-13 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann reassigned NUTCH-444: --- Assignee: Chris A. Mattmann Possibly use a different library to parse RSS feed for

[Nutch-dev] [jira] Closed: (NUTCH-258) Once Nutch logs a SEVERE log item, Nutch fails forevermore

2007-02-13 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann closed NUTCH-258. --- Bug not reproducable in current system, and no active users experiencing bug. Once Nutch logs a

[Nutch-dev] [jira] Resolved: (NUTCH-258) Once Nutch logs a SEVERE log item, Nutch fails forevermore

2007-02-13 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann resolved NUTCH-258. - Resolution: Cannot Reproduce With recent API changes to Hadoop, and with the note from

[Nutch-dev] [jira] Assigned: (NUTCH-309) Uses commons logging Code Guards

2007-02-13 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann reassigned NUTCH-309: --- Assignee: Chris A. Mattmann (was: Jerome Charron) Uses commons logging Code Guards

[Nutch-dev] [jira] Commented: (NUTCH-444) Possibly use a different library to parse RSS feed for improved performance and compatibility

2007-02-10 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12472005 ] Chris A. Mattmann commented on NUTCH-444: - Nutch Newbie: From the commons-feedparser site:

[Nutch-dev] [jira] Commented: (NUTCH-443) allow parsers to return multiple Parse object, this will speed up the rss parser

2007-02-09 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12471780 ] Chris A. Mattmann commented on NUTCH-443: - Nutch Newbie, What exactly do you mean when you mention Apache

[Nutch-dev] [jira] Commented: (NUTCH-444) Possibly use a different library to parse RSS feed for improved performance and compatibility

2007-02-09 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12471955 ] Chris A. Mattmann commented on NUTCH-444: - Hi Renaud, In fact, Rome does appear to be quite easy to use,

[Nutch-dev] [jira] Assigned: (NUTCH-443) allow parsers to return multiple Parse object, this will speed up the rss parser

2007-02-09 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann reassigned NUTCH-443: --- Assignee: Chris A. Mattmann allow parsers to return multiple Parse object, this will

[Nutch-dev] [jira] Commented: (NUTCH-443) allow parsers to return multiple Parse object, this will speed up the rss parser

2007-02-09 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12471956 ] Chris A. Mattmann commented on NUTCH-443: - I'll take the lead on evaluating these patches, and getting them

[Nutch-dev] [jira] Work started: (NUTCH-390) Javadoc warnings

2007-01-29 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on NUTCH-390 started by Chris A. Mattmann. Javadoc warnings Key: NUTCH-390 URL:

[Nutch-dev] [jira] Resolved: (NUTCH-390) Javadoc warnings

2007-01-29 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann resolved NUTCH-390. - Resolution: Fixed Fix Version/s: 0.9.0 I've fixed this issue in the trunk. I had

[Nutch-dev] [jira] Closed: (NUTCH-390) Javadoc warnings

2007-01-29 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann closed NUTCH-390. --- Fixed in the trunk: http://svn.apache.org/viewvc?view=revrevision=501315 Javadoc warnings

[Nutch-dev] [jira] Work started: (NUTCH-384) Protocol-file plugin does not allow the parse plugins framework to operate properly

2007-01-29 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on NUTCH-384 started by Chris A. Mattmann. Protocol-file plugin does not allow the parse plugins framework to operate properly

[Nutch-dev] [jira] Assigned: (NUTCH-431) Move plugin specific properties out of nutch-site.xml and into specific conf files for plugins

2007-01-26 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann reassigned NUTCH-431: --- Assignee: Chris A. Mattmann Move plugin specific properties out of nutch-site.xml

[Nutch-dev] [jira] Created: (NUTCH-431) Move plugin specific properties out of nutch-site.xml and into specific conf files for plugins

2007-01-20 Thread Chris A. Mattmann (JIRA)
Move plugin specific properties out of nutch-site.xml and into specific conf files for plugins -- Key: NUTCH-431 URL: https://issues.apache.org/jira/browse/NUTCH-431

[Nutch-dev] [jira] Commented: (NUTCH-353) pages that serverside forwards will be refetched every time

2007-01-20 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12466285 ] Chris A. Mattmann commented on NUTCH-353: - Doug, Let's see what you got. I'd be happy to take a look at

[Nutch-dev] [jira] Assigned: (NUTCH-390) Javadoc warnings

2006-11-24 Thread Chris A. Mattmann (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-390?page=all ] Chris A. Mattmann reassigned NUTCH-390: --- Assignee: Chris A. Mattmann Javadoc warnings Key: NUTCH-390 URL:

[Nutch-dev] [jira] Assigned: (NUTCH-185) XMLParser is configurable xml parser plugin.

2006-11-24 Thread Chris A. Mattmann (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-185?page=all ] Chris A. Mattmann reassigned NUTCH-185: --- Assignee: Chris A. Mattmann XMLParser is configurable xml parser plugin. Key:

[Nutch-dev] [jira] Work started: (NUTCH-406) Metadata tries to write null values

2006-11-23 Thread Chris A. Mattmann (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-406?page=all ] Work on NUTCH-406 started by Chris A. Mattmann. Metadata tries to write null values --- Key: NUTCH-406 URL: http://issues.apache.org/jira/browse/NUTCH-406

[Nutch-dev] [jira] Updated: (NUTCH-406) Metadata tries to write null values

2006-11-23 Thread Chris A. Mattmann (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-406?page=all ] Chris A. Mattmann updated NUTCH-406: Assignee: Chris A. Mattmann Metadata tries to write null values --- Key: NUTCH-406

[Nutch-dev] [jira] Commented: (NUTCH-406) Metadata tries to write null values

2006-11-23 Thread Chris A. Mattmann (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-406?page=comments#action_12452275 ] Chris A. Mattmann commented on NUTCH-406: - Hi Andrzej, Doğacan, +1. I think it makes a lot of sense to just not include the null key in the Met

[Nutch-dev] [jira] Commented: (NUTCH-406) Metadata tries to write null values

2006-11-23 Thread Chris A. Mattmann (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-406?page=comments#action_12452285 ] Chris A. Mattmann commented on NUTCH-406: - Hi Doğacan, Loooking at your latest patch, I'm not sure that it completely does the right behavior. For

[Nutch-dev] [jira] Commented: (NUTCH-406) Metadata tries to write null values

2006-11-23 Thread Chris A. Mattmann (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-406?page=comments#action_12452286 ] Chris A. Mattmann commented on NUTCH-406: - Hi Andrzej, Yup, you caught the same thing as me. +1 for your solution. I will extend my above patch by

[Nutch-dev] [jira] Resolved: (NUTCH-406) Metadata tries to write null values

2006-11-23 Thread Chris A. Mattmann (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-406?page=all ] Chris A. Mattmann resolved NUTCH-406. - Fix Version/s: 0.9.0 Resolution: Fixed Fix applied and tested in trunk. Metadata tries to write null values

[Nutch-dev] [jira] Closed: (NUTCH-406) Metadata tries to write null values

2006-11-23 Thread Chris A. Mattmann (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-406?page=all ] Chris A. Mattmann closed NUTCH-406. --- Patch applied to trunk: http://svn.apache.org/viewvc?view=revrevision=478619 Metadata tries to write null values ---

[Nutch-dev] [jira] Updated: (NUTCH-384) Protocol-file plugin does not allow the parse plugins framework to operate properly

2006-10-11 Thread Chris A. Mattmann (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-384?page=all ] Chris A. Mattmann updated NUTCH-384: Summary: Protocol-file plugin does not allow the parse plugins framework to operate properly (was: When using the file protocol one can not map a

[Nutch-dev] [jira] Assigned: (NUTCH-384) Protocol-file plugin does not allow the parse plugins framework to operate properly

2006-10-11 Thread Chris A. Mattmann (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-384?page=all ] Chris A. Mattmann reassigned NUTCH-384: --- Assignee: Chris A. Mattmann Protocol-file plugin does not allow the parse plugins framework to operate properly

[Nutch-dev] [jira] Work started: (NUTCH-379) ParseUtil does not pass through the content's URL to the ParserFactory

2006-10-04 Thread Chris A. Mattmann (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-379?page=all ] Work on NUTCH-379 started by Chris A. Mattmann. ParseUtil does not pass through the content's URL to the ParserFactory -- Key: NUTCH-379

[Nutch-dev] [jira] Created: (NUTCH-379) ParseUtil does not pass through the content's URL to the ParserFactory

2006-10-04 Thread Chris A. Mattmann (JIRA)
ParseUtil does not pass through the content's URL to the ParserFactory -- Key: NUTCH-379 URL: http://issues.apache.org/jira/browse/NUTCH-379 Project: Nutch Issue Type: Bug

[Nutch-dev] [jira] Updated: (NUTCH-379) ParseUtil does not pass through the content's URL to the ParserFactory

2006-10-04 Thread Chris A. Mattmann (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-379?page=all ] Chris A. Mattmann updated NUTCH-379: Attachment: NUTCH-379.Mattmann.100406.patch.txt Small patch that at least gets started on fixing the larger issue of content urls and parser mapping,

[Nutch-dev] [jira] Commented: (NUTCH-356) Plugin repository cache can lead to memory leak

2006-08-21 Thread Chris A. Mattmann (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-356?page=comments#action_12429548 ] Chris A. Mattmann commented on NUTCH-356: - -1 for closing this issue. If there is a demonstrable memory leak in the plugin system, then I think it should

[Nutch-dev] [jira] Commented: (NUTCH-338) Remove the text parser as an option for parsing PDF files in parse-plugins.xml

2006-08-18 Thread Chris A. Mattmann (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-338?page=comments#action_12429033 ] Chris A. Mattmann commented on NUTCH-338: - Hi Andrzej, A patch is available that you can apply quickly to remove the text parser as an option for pdf.

[Nutch-dev] [jira] Commented: (NUTCH-258) Once Nutch logs a SEVERE log item, Nutch fails forevermore

2006-08-18 Thread Chris A. Mattmann (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-258?page=comments#action_12429035 ] Chris A. Mattmann commented on NUTCH-258: - Hi Folks, A patch is available on this issue. Has anyone who was experiencing the original problem tried out

[Nutch-dev] [jira] Commented: (NUTCH-338) Remove the text parser as an option for parsing PDF files in parse-plugins.xml

2006-08-18 Thread Chris A. Mattmann (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-338?page=comments#action_12429042 ] Chris A. Mattmann commented on NUTCH-338: - Hi Sami, Thanks much. It's weird that it was broken seeing as it was a one line patch, however, I tried it

[Nutch-dev] [jira] Updated: (NUTCH-258) Once Nutch logs a SEVERE log item, Nutch fails forevermore

2006-08-04 Thread Chris A. Mattmann (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-258?page=all ] Chris A. Mattmann updated NUTCH-258: Attachment: NUTCH-258.Mattmann.080406.patch.txt Hi Folks, Sorry I'm a little later than I expected on this one. Attached is a patch that implements

[Nutch-dev] [jira] Updated: (NUTCH-338) Remove the text parser as an option for parsing PDF files in parse-plugins.xml

2006-08-03 Thread Chris A. Mattmann (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-338?page=all ] Chris A. Mattmann updated NUTCH-338: Attachment: NUTCH-338.Mattmann.patch.txt simple patch for removing the parse-text plugin from being mapped to PDF content type in parse-plugins.xml.

[Nutch-dev] [jira] Updated: (NUTCH-258) Once Nutch logs a SEVERE log item, Nutch fails forevermore

2006-07-26 Thread Chris A. Mattmann (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-258?page=all ] Chris A. Mattmann updated NUTCH-258: Fix Version/s: 0.8-dev Once Nutch logs a SEVERE log item, Nutch fails forevermore --

[Nutch-dev] [jira] Commented: (NUTCH-258) Once Nutch logs a SEVERE log item, Nutch fails forevermore

2006-07-23 Thread Chris A. Mattmann (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-258?page=comments#action_12422962 ] Chris A. Mattmann commented on NUTCH-258: - Guys, This issue slipped off my radar for a bt, but I'll have some free time this week to work on it. If there

[Nutch-dev] [jira] Updated: (NUTCH-258) Once Nutch logs a SEVERE log item, Nutch fails forevermore

2006-06-09 Thread Chris A. Mattmann (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-258?page=all ] Chris A. Mattmann updated NUTCH-258: Attachment: NUTCH-258.Mattmann.060906.patch.txt Hi Folks, Attached is a patch that implements the suggested two fixes to this issue. I had to go

[Nutch-dev] [jira] Updated: (NUTCH-236) PdfParser and RSSParser Log4j appender redirection

2006-06-08 Thread Chris A. Mattmann (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-236?page=all ] Chris A. Mattmann updated NUTCH-236: Attachment: NUTCH-236.Mattmann.060806.patch.txt Okay a bit late, but as usual with me :-) This patch implements Jason's suggestion for the following

[Nutch-dev] [jira] Created: (NUTCH-304) Change JIRA email address for nutch issues from apache incubator

2006-06-08 Thread Chris A. Mattmann (JIRA)
Change JIRA email address for nutch issues from apache incubator Key: NUTCH-304 URL: http://issues.apache.org/jira/browse/NUTCH-304 Project: Nutch Type: Task Environment: Dell Pentium M mobile 1.4

[Nutch-dev] [jira] Reopened: (NUTCH-258) Once Nutch logs a SEVERE log item, Nutch fails forevermore

2006-06-05 Thread Chris A. Mattmann (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-258?page=all ] Chris A. Mattmann reopened NUTCH-258: - Assign To: Chris A. Mattmann Issue found to in fact be a real issue with the Fetcher: here's the proposed solution: * add flag field

[Nutch-dev] [jira] Resolved: (NUTCH-258) Once Nutch logs a SEVERE log item, Nutch fails forevermore

2006-06-04 Thread Chris A. Mattmann (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-258?page=all ] Chris A. Mattmann resolved NUTCH-258: - Resolution: Won't Fix The use of LOG.severe in the fetcher indicates an unrecoverable error: thus, this issue is not a bug, and in fact

[Nutch-dev] [jira] Closed: (NUTCH-258) Once Nutch logs a SEVERE log item, Nutch fails forevermore

2006-06-04 Thread Chris A. Mattmann (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-258?page=all ] Chris A. Mattmann closed NUTCH-258: --- Won't fix: issue describes intended behavior of system (fetcher component). Once Nutch logs a SEVERE log item, Nutch fails forevermore

[Nutch-dev] [jira] Commented: (NUTCH-294) Topic-maps of related searchwords

2006-06-03 Thread Chris A. Mattmann (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-294?page=comments#action_12414597 ] Chris A. Mattmann commented on NUTCH-294: - Hi Stefan, I'm wondering if this issue is in any way related to the existing clustering-carrot plugin submitted by D.

[Nutch-dev] [jira] Commented: (NUTCH-258) Once Nutch logs a SEVERE log item, Nutch fails forevermore

2006-06-03 Thread Chris A. Mattmann (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-258?page=comments#action_12414598 ] Chris A. Mattmann commented on NUTCH-258: - Hi there, I believe that the fetcher halting on a LOG.Severe is the intended behavior of the system. The use of this

[Nutch-dev] [jira] Assigned: (NUTCH-236) PdfParser and RSSParser Log4j appender redirection

2006-06-03 Thread Chris A. Mattmann (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-236?page=all ] Chris A. Mattmann reassigned NUTCH-236: --- Assign To: Chris A. Mattmann PdfParser and RSSParser Log4j appender redirection --

[Nutch-dev] [jira] Commented: (NUTCH-236) PdfParser and RSSParser Log4j appender redirection

2006-06-03 Thread Chris A. Mattmann (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-236?page=comments#action_12414599 ] Chris A. Mattmann commented on NUTCH-236: - Hi Jason, I'll have a patch prepared for this issue shortly, and I'll attach it to JIRA by this Sunday night. Thanks,

[Nutch-dev] [jira] Updated: (NUTCH-236) PdfParser and RSSParser Log4j appender redirection

2006-06-03 Thread Chris A. Mattmann (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-236?page=all ] Chris A. Mattmann updated NUTCH-236: Due Date: 05/Jun/06 PdfParser and RSSParser Log4j appender redirection -- Key: NUTCH-236

[Nutch-dev] [jira] Updated: (NUTCH-187) Cannot start Nutch datanodes on Windows outside of a cygwin environment because of DF

2006-06-03 Thread Chris A. Mattmann (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-187?page=all ] Chris A. Mattmann updated NUTCH-187: Summary: Cannot start Nutch datanodes on Windows outside of a cygwin environment because of DF (was: Run Nutch on Windows without Cygwin) Update

[Nutch-dev] [jira] Updated: (NUTCH-245) DTD for plugin.xml configuration files

2006-04-12 Thread Chris A. Mattmann (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-245?page=all ] Chris A. Mattmann updated NUTCH-245: Summary: DTD for plugin.xml configuration files (was: DTD Schemas for plugin.xml configuration files in conf directory) update title to reflect core

[Nutch-dev] [jira] Commented: (NUTCH-245) DTD Schemas for plugin.xml configuration files in conf directory

2006-04-12 Thread Chris A. Mattmann (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-245?page=comments#action_12374217 ] Chris A. Mattmann commented on NUTCH-245: - Hey Doug, I'm fine with that, I think it makes sense. Want an updated patch or just, the person who commits it can move it

[Nutch-dev] [jira] Updated: (NUTCH-245) DTD Schemas for plugin.xml configuration files in conf directory

2006-04-11 Thread Chris A. Mattmann (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-245?page=all ] Chris A. Mattmann updated NUTCH-245: Description: Currently, the plugin.xml file does not have a DTD or XML Schema associated with it, and most people just go look at an existing plugin's

[Nutch-dev] [jira] Updated: (NUTCH-245) DTD Schemas for plugin.xml configuration files in conf directory

2006-04-11 Thread Chris A. Mattmann (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-245?page=all ] Chris A. Mattmann updated NUTCH-245: Attachment: NUTCH-245.Mattmann.patch.txt Here's the patch for the plugin DTD file. I got a lot of info from:

[Nutch-dev] [jira] Created: (NUTCH-245) XML Schemas for xml configuration files in conf directory

2006-04-07 Thread Chris A. Mattmann (JIRA)
XML Schemas for xml configuration files in conf directory - Key: NUTCH-245 URL: http://issues.apache.org/jira/browse/NUTCH-245 Project: Nutch Type: New Feature Components: fetcher, indexer, ndfs, searcher,

[Nutch-dev] [jira] Updated: (NUTCH-245) DTD Schemas for plugin.xml configuration files in conf directory

2006-04-07 Thread Chris A. Mattmann (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-245?page=all ] Chris A. Mattmann updated NUTCH-245: Summary: DTD Schemas for plugin.xml configuration files in conf directory (was: XML Schemas for xml configuration files in conf directory) DTD

[Nutch-dev] [jira] Commented: (NUTCH-210) Context.xml file for Nutch web application

2006-03-25 Thread Chris A. Mattmann (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-210?page=comments#action_12371849 ] Chris A. Mattmann commented on NUTCH-210: - Hi Jerome, The updates look fine. No objections from my end. I hope people find the patch useful. Cheers, Chris

[Nutch-dev] [jira] Resolved: (NUTCH-23) content text/xml parser

2006-03-25 Thread Chris A. Mattmann (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-23?page=all ] Chris A. Mattmann resolved NUTCH-23: Resolution: Duplicate Duplicate of NUTCH-185. content text/xml parser --- Key: NUTCH-23 URL:

[Nutch-dev] [jira] Closed: (NUTCH-23) content text/xml parser

2006-03-25 Thread Chris A. Mattmann (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-23?page=all ] Chris A. Mattmann closed NUTCH-23: -- Duplicate of NUTCH-185. content text/xml parser --- Key: NUTCH-23 URL:

[Nutch-dev] [jira] Updated: (NUTCH-210) Context.xml file for Nutch web application

2006-03-23 Thread Chris A. Mattmann (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-210?page=all ] Chris A. Mattmann updated NUTCH-210: Attachment: NUTCH-210.Mattmann.patch.txt Initial NUTCH-210 patch. Uses an XSL stylesheet to read searcher., plugin., extension.clustering and

[Nutch-dev] [jira] Commented: (NUTCH-236) PdfParser and RSSParser Log4j appender redirection

2006-03-23 Thread Chris A. Mattmann (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-236?page=comments#action_12371664 ] Chris A. Mattmann commented on NUTCH-236: - I'd be happy to make these changes and submit a patch, but I wanted to know it the change would be welcome first. I think

[Nutch-dev] [jira] Commented: (NUTCH-220) PDF Box can't parse document: java.lang.NullPointerException

2006-03-23 Thread Chris A. Mattmann (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-220?page=comments#action_12371669 ] Chris A. Mattmann commented on NUTCH-220: - Could you provide some more detail on this issue? For instance, a stack trace here would be quite helpful in trying to debug

[Nutch-dev] [jira] Commented: (NUTCH-185) XMLParser is configurable xml parser plugin.

2006-03-23 Thread Chris A. Mattmann (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-185?page=comments#action_12371671 ] Chris A. Mattmann commented on NUTCH-185: - I propose that either this issue be closed and the patch files moved to NUTCH-23, or that NUTCH-23 be closed, as the two are

[Nutch-dev] [jira] Resolved: (NUTCH-34) Parsing different content formats

2006-03-23 Thread Chris A. Mattmann (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-34?page=all ] Chris A. Mattmann resolved NUTCH-34: Fix Version: 0.7.2-dev 0.8-dev Resolution: Fixed This issue was addressed via the application of NUTCH-88 applied to Nutch

[Nutch-dev] [jira] Closed: (NUTCH-34) Parsing different content formats

2006-03-23 Thread Chris A. Mattmann (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-34?page=all ] Chris A. Mattmann closed NUTCH-34: -- Issue addressed by NUTCH-88. Parsing different content formats - Key: NUTCH-34 URL:

[Nutch-dev] [jira] Resolved: (NUTCH-24) Cannot handle incorrectly cased Content-Type

2006-03-23 Thread Chris A. Mattmann (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-24?page=all ] Chris A. Mattmann resolved NUTCH-24: Fix Version: 0.8-dev Resolution: Fixed This issue was addressed by NUTCH-139, the fault tolerant Metadata container. Cannot handle

[Nutch-dev] [jira] Closed: (NUTCH-24) Cannot handle incorrectly cased Content-Type

2006-03-23 Thread Chris A. Mattmann (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-24?page=all ] Chris A. Mattmann closed NUTCH-24: -- issue addressed by NUTCH-139. Cannot handle incorrectly cased Content-Type Key: NUTCH-24

[Nutch-dev] [jira] Updated: (NUTCH-218) need DOAP file for Nutch

2006-02-28 Thread Chris A. Mattmann (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-218?page=all ] Chris A. Mattmann updated NUTCH-218: Attachment: doap_Nutch.rdf I generated this off the DOAP generator page. Feel free to use it, or not. need DOAP file for Nutch

[Nutch-dev] [jira] Updated: (NUTCH-140) Add alias capability in parse-plugins.xml file that allows mimeType-extensionId mapping

2006-02-15 Thread Chris A. Mattmann (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-140?page=all ] Chris A. Mattmann updated NUTCH-140: Attachment: NUTCH-140.20051502.patch.txt An initial patch for NUTCH-140 for everyone's review. Add alias capability in parse-plugins.xml file that

[Nutch-dev] [jira] Commented: (NUTCH-140) Add alias capability in parse-plugins.xml file that allows mimeType-extensionId mapping

2006-02-14 Thread Chris A. Mattmann (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-140?page=comments#action_12366376 ] Chris A. Mattmann commented on NUTCH-140: - Hi Folks, I've went ahead and created an initial patch for this issue. I'll be attaching it to JIRA within the next day

[Nutch-dev] [jira] Created: (NUTCH-210) Context.xml file for Nutch web application

2006-02-14 Thread Chris A. Mattmann (JIRA)
Context.xml file for Nutch web application -- Key: NUTCH-210 URL: http://issues.apache.org/jira/browse/NUTCH-210 Project: Nutch Type: Improvement Components: web gui Versions: 0.7.1, 0.7, 0.6, 0.7.2-dev, 0.8-dev

[Nutch-dev] [jira] Resolved: (NUTCH-149) outlinks not shown properly in cached.jsp

2006-02-07 Thread Chris A. Mattmann (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-149?page=all ] Chris A. Mattmann resolved NUTCH-149: - Resolution: Invalid Closed at request of the reporter: not a bug. outlinks not shown properly in cached.jsp

[Nutch-dev] [jira] Closed: (NUTCH-149) outlinks not shown properly in cached.jsp

2006-02-07 Thread Chris A. Mattmann (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-149?page=all ] Chris A. Mattmann closed NUTCH-149: --- Closed at request of reporter: not a bug outlinks not shown properly in cached.jsp - Key: NUTCH-149

[Nutch-dev] [jira] Commented: (NUTCH-190) ParseUtil drops reason for failed parse

2006-01-26 Thread Chris A. Mattmann (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-190?page=comments#action_12364151 ] Chris A. Mattmann commented on NUTCH-190: - +1 i think that this is a needed patch. ParseUtil drops reason for failed parse ---

[Nutch-dev] [jira] Commented: (NUTCH-183) MapReduce has a series of problems concerning task-allocation to worker nodes

2006-01-21 Thread Chris A. Mattmann (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-183?page=comments#action_12363557 ] Chris A. Mattmann commented on NUTCH-183: - Guys, Greg Barish and the folks who worked on the Theseus planning system for information agents at USC did a lot of work

[Nutch-dev] [jira] Commented: (NUTCH-139) Standard metadata property names in the ParseData metadata

2006-01-19 Thread Chris A. Mattmann (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-139?page=comments#action_12363352 ] Chris A. Mattmann commented on NUTCH-139: - Hi Jerome, org.apache.nutch.parse.ParseData * The constructor becomes ParseData(ParseStatus, String, Outlink[],

[Nutch-dev] [jira] Commented: (NUTCH-139) Standard metadata property names in the ParseData metadata

2006-01-06 Thread Chris A. Mattmann (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-139?page=comments#action_12361923 ] Chris A. Mattmann commented on NUTCH-139: - Hi Doug, While it's true that content-length can be computed from the Content's data, wouldn't it also be nice to have it

[Nutch-dev] [jira] Commented: (NUTCH-139) Standard metadata property names in the ParseData metadata

2006-01-06 Thread Chris A. Mattmann (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-139?page=comments#action_12361924 ] Chris A. Mattmann commented on NUTCH-139: - Hi Doug, While it's true that content-length can be computed from the Content's data, wouldn't it also be nice to have it

[Nutch-dev] [jira] Commented: (NUTCH-139) Standard metadata property names in the ParseData metadata

2006-01-06 Thread Chris A. Mattmann (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-139?page=comments#action_12361925 ] Chris A. Mattmann commented on NUTCH-139: - Hi Doug, While it's true that content-length can be computed from the Content's data, wouldn't it also be nice to have it

[Nutch-dev] [jira] Commented: (NUTCH-139) Standard metadata property names in the ParseData metadata

2006-01-06 Thread Chris A. Mattmann (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-139?page=comments#action_12361926 ] Chris A. Mattmann commented on NUTCH-139: - Hi Doug, While it's true that content-length can be computed from the Content's data, wouldn't it also be nice to have it

[Nutch-dev] [jira] Commented: (NUTCH-139) Standard metadata property names in the ParseData metadata

2006-01-06 Thread Chris A. Mattmann (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-139?page=comments#action_12361927 ] Chris A. Mattmann commented on NUTCH-139: - Hi Doug, While it's true that content-length can be computed from the Content's data, wouldn't it also be nice to have it

[Nutch-dev] [jira] Commented: (NUTCH-139) Standard metadata property names in the ParseData metadata

2005-12-20 Thread Chris A. Mattmann (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-139?page=comments#action_12360929 ] Chris A. Mattmann commented on NUTCH-139: - Hi Andrzej, I have an objection, in fact I think the patches miss the main point of using of prefixed property names.

[Nutch-dev] [jira] Commented: (NUTCH-139) Standard metadata property names in the ParseData metadata

2005-12-20 Thread Chris A. Mattmann (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-139?page=comments#action_12360931 ] Chris A. Mattmann commented on NUTCH-139: - Hmm, Okay, I just finished reading the rest of the comments :-) Sorry, just woke up out here in Los Angeles. Okay, I

  1   2   >