[jira] Commented: (NUTCH-444) Possibly use a different library to parse RSS feed for improved performance and compatibility

2007-11-15 Thread Renaud Richardet (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12542819 ] Renaud Richardet commented on NUTCH-444: hi, i am travelling and will be offline until january 2008. thanks

[jira] Updated: (NUTCH-540) some problem about the Nutch cache

2007-08-09 Thread Renaud Richardet (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Renaud Richardet updated NUTCH-540: --- Priority: Major (was: Blocker) could you please attach log files and error messages? thanks

[jira] Commented: (NUTCH-444) Possibly use a different library to parse RSS feed for improved performance and compatibility

2007-02-25 Thread Renaud Richardet (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12475795 ] Renaud Richardet commented on NUTCH-444: +1 for the transparency interface thanks, Renaud > Possibly us

[jira] Updated: (NUTCH-369) StringUtil.resolveEncodingAlias is unuseful.

2007-02-24 Thread Renaud Richardet (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Renaud Richardet updated NUTCH-369: --- Attachment: remover.diff just FYI, you can further filter which element neko should keep and

[jira] Updated: (NUTCH-369) StringUtil.resolveEncodingAlias is unuseful.

2007-02-24 Thread Renaud Richardet (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Renaud Richardet updated NUTCH-369: --- Priority: Minor (was: Major) Affects Version/s: (was: 0.8

[jira] Updated: (NUTCH-369) StringUtil.resolveEncodingAlias is unuseful.

2007-02-24 Thread Renaud Richardet (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Renaud Richardet updated NUTCH-369: --- Attachment: patch.diff unified diff against head. - fixes encoding, as described by King

Re: Apache Droids - standalone crawl framework

2007-02-20 Thread Renaud Richardet
rubdabadub wrote: On 2/20/07, Renaud Richardet <[EMAIL PROTECTED]> wrote: Hi Thorsten, I have quickly looked at the Droid code, and was wondering why you don't want to completely reuse the Nutch plugin API in Droid. This way, you could reuse the Nutch parse-* plugins without mo

Re: Apache Droids - standalone crawl framework

2007-02-20 Thread Renaud Richardet
sted in such a plugin? Does it makes sense? Please test and report feedback to [EMAIL PROTECTED] I will happily answer all mails there. salu2 -- Renaud Richardet +1 617 230 9112 my email is my first name at apache.org http://www.oslutions.com

[jira] Commented: (NUTCH-443) allow parsers to return multiple Parse object, this will speed up the rss parser

2007-02-13 Thread Renaud Richardet (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12472733 ] Renaud Richardet commented on NUTCH-443: hi All, Glad to see that this patch is moving forward :-) I have

[jira] Commented: (NUTCH-444) Possibly use a different library to parse RSS feed for improved performance and compatibility

2007-02-09 Thread Renaud Richardet (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12471880 ] Renaud Richardet commented on NUTCH-444: Gal, Would you be able to share your code with Stax? What license

[jira] Commented: (NUTCH-443) allow parsers to return multiple Parse object, this will speed up the rss parser

2007-02-09 Thread Renaud Richardet (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12471878 ] Renaud Richardet commented on NUTCH-443: Nutch Newbie, Gal, Chris It's great that you discuss altern

[jira] Created: (NUTCH-444) Possibly use a different library to parse RSS feed for improved performance and compatibility

2007-02-09 Thread Renaud Richardet (JIRA)
Project: Nutch Issue Type: Improvement Components: fetcher Affects Versions: 0.9.0 Reporter: Renaud Richardet Priority: Minor Fix For: 0.9.0 As discussed by Nutch Newbie, Gal, and Chris on NUTCH-443, the current library

[jira] Updated: (NUTCH-443) allow parsers to return multiple Parse object, this will speed up the rss parser

2007-02-09 Thread Renaud Richardet (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Renaud Richardet updated NUTCH-443: --- Attachment: NUTCH-443-draft-v4.patch Hi Dogacan, Thanks for merging the patches, good

Re: FW: RSS-fecter and index individul-how can i realize this function

2007-02-08 Thread Renaud Richardet
ly. Could something like that work? Doug -- Renaud Richardet +1 617 230 9112 my email is my first name at apache.org http://www.oslutions.com

[jira] Updated: (NUTCH-443) allow parsers to return multiple Parse object, this will speed up the rss parser

2007-02-08 Thread Renaud Richardet (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Renaud Richardet updated NUTCH-443: --- Attachment: parsers.diff Great, here's my work-in-progress(not finished, not tested

Re: RSS-fecter and index individul-how can i realize this function

2007-02-07 Thread Renaud Richardet
Chris Mattmann wrote: Guys, Sorry to be so thick-headed, but could someone explain to me in really simple language what this change is requesting that is different from the current Nutch API? I still don't get it, sorry... Currently, the RSS parser returns a single Parse object that aggregat

Re: RSS-fecter and index individul-how can i realize this function

2007-02-07 Thread Renaud Richardet
Doug Cutting wrote: Renaud Richardet wrote: I see. I was thinking that I could index the feed items without having to fetch them individually. Okay, so if Parser#parse returned a Map, then the URL for each parse should be that of its link, since you don't want to fetch that separ

[jira] Created: (NUTCH-443) allow parsers to return multiple Parse object, this will speed up the rss parser

2007-02-07 Thread Renaud Richardet (JIRA)
Issue Type: New Feature Components: fetcher Affects Versions: 0.9.0 Reporter: Renaud Richardet Priority: Minor Fix For: 0.9.0 allow Parser#parse to return a Map. This way, the RSS parser can return multiple parse objects, that will all be

Re: RSS-fecter and index individul-how can i realize this function

2007-02-06 Thread Renaud Richardet
Doug Cutting wrote: Renaud Richardet wrote: The usecase is that you index RSS-feeds, but your users can search each feed-entry as a single document. Does it makes sense? But each feed item also contains a link whose content will be indexed and that's generally a superset of the

Re: api.RegexURLFilterBase - Configuration Resources

2007-02-06 Thread Renaud Richardet
don't know how to handle that Configuration-Objects (setConf() etc.) What should I do to avoid that error? Where does the Configuration-Object come from? TIA Tobias Zahn -- Renaud Richardet +1 617 230 9112 my email is my first name at

Re: RSS-fecter and index individul-how can i realize this function

2007-02-06 Thread Renaud Richardet
Mailstop: 171-246 ___ Disclaimer: The opinions presented within are my own and do not reflect those of either NASA, JPL, or the California Institute of Technology. -- Renaud Richardet +1 617 230 9112 my email is my

Re: RSS-fecter and index individul-how can i realize this function

2007-02-02 Thread Renaud Richardet
___ Jet Propulsion Laboratory Pasadena, CA Office: 171-266BMailstop: 171-246 ___ Disclaimer: The opinions presented within are my own and do not reflect those of either NASA, JPL, or the California Institute of Technology. -- renaud richardet +1 617 230 9112 renaud oslutions.com http://www.oslutions.com

[jira] Updated: (NUTCH-412) plugin to parse the feed-url (rss/atom) of a blog

2006-12-03 Thread Renaud Richardet (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-412?page=all ] Renaud Richardet updated NUTCH-412: --- Attachment: plugin_parse-feedUrl2.diff > plugin to parse the feed-url (rss/atom) of a b

[jira] Updated: (NUTCH-412) plugin to parse the feed-url (rss/atom) of a blog

2006-12-02 Thread Renaud Richardet (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-412?page=all ] Renaud Richardet updated NUTCH-412: --- Attachment: plugin_parse-feedUrl.diff unified diff against head (Rev: 481445) > plugin to parse the feed-url (rss/atom) of a b

[jira] Created: (NUTCH-412) plugin to parse the feed-url (rss/atom) of a blog

2006-12-02 Thread Renaud Richardet (JIRA)
Reporter: Renaud Richardet Priority: Minor A plugin that extracts the feed-url (rss/atom) of a blog by retrieving the href from the element (if found), and stores it in metadata. The meta can be accessed with parse.getData().getMeta("feedUrl"); you can test this p

Re: [Fwd: Re: [Nutch Wiki] Update of "RenaudRichardet" by RenaudRichardet]

2006-08-24 Thread Renaud Richardet
i category on "Nutch Wiki" for change notification. The following page has been changed by RenaudRichardet: http://wiki.apache.org/nutch/RenaudRichardet New page: {{{ Renaud Richardet COO America Wyona Inc. - Open Source Content Management - Apache Lenya office +1 857 776-3195

[jira] Created: (NUTCH-359) extraction of links will fail for whole page if one single link cannot be parsed

2006-08-23 Thread Renaud Richardet (JIRA)
Issue Type: Bug Components: fetcher Affects Versions: 0.8 Environment: Ubuntu Dapper Reporter: Renaud Richardet Priority: Minor Attachments: outlink.diff When Nutch parses the outlinks of a fetched page, the process will fail if a single

[jira] Updated: (NUTCH-346) Improve readability of logs/hadoop.log

2006-08-21 Thread Renaud Richardet (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-346?page=all ] Renaud Richardet updated NUTCH-346: --- Attachment: log4j_plugins.diff OK, here we go. This patch should be good for 0.8 and trunk. > Improve readability of logs/hadoop.

[jira] Created: (NUTCH-346) Improve readability of logs/hadoop.log

2006-08-09 Thread Renaud Richardet (JIRA)
dapper Reporter: Renaud Richardet Priority: Minor adding log4j.logger.org.apache.nutch.plugin.PluginRepository=WARN to conf/log4j.properties dramatically improves the readability of the logs in logs/hadoop.log (removes all INFO) -- This message is automatically

[jira] Commented: (NUTCH-330) command line tool to search a Lucene index

2006-08-08 Thread Renaud Richardet (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-330?page=comments#action_12426629 ] Renaud Richardet commented on NUTCH-330: This bug is obsolte, I just found out that Nutch already allows to search from the command line via bin/nutch

[jira] Commented: (NUTCH-266) hadoop bug when doing updatedb

2006-08-08 Thread Renaud Richardet (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-266?page=comments#action_12426579 ] Renaud Richardet commented on NUTCH-266: KuroSaka, yes you can download the hadoop jar, release 0.5.0 from the project website: http://lucene.apache.org

[jira] Updated: (NUTCH-266) hadoop bug when doing updatedb

2006-08-07 Thread Renaud Richardet (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-266?page=all ] Renaud Richardet updated NUTCH-266: --- Attachment: patch_hadoop-0.5.0.diff Now that Hadoop 0.5 has been released, here's the patch to use hadoop-0.5.0.jar in Nutch-0.8.x HTH, Renaud >

[jira] Updated: (NUTCH-266) hadoop bug when doing updatedb

2006-08-02 Thread Renaud Richardet (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-266?page=all ] Renaud Richardet updated NUTCH-266: --- Attachment: patch.diff Thank you Sami, We had a similar problem with Win XP and were able to fix it by using hadoop-nightly.jar. However, because of

[jira] Updated: (NUTCH-208) http: proxy exception list:

2006-07-31 Thread Renaud Richardet (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-208?page=all ] Renaud Richardet updated NUTCH-208: --- Attachment: proxy_exception_list-0.8.diff I updated the patch to 0.8 and corrected small typo (if (!"".equals(input[i].trim())){ ). The proxy

[jira] Updated: (NUTCH-330) command line tool to search a Lucene index

2006-07-25 Thread Renaud Richardet (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-330?page=all ] Renaud Richardet updated NUTCH-330: --- Attachment: clSearch.diff forgot the "echo" in sh... > command line tool to search a

[jira] Updated: (NUTCH-330) command line tool to search a Lucene index

2006-07-25 Thread Renaud Richardet (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-330?page=all ] Renaud Richardet updated NUTCH-330: --- Attachment: clSearch.diff unified diff against head > command line tool to search a Lucene in

[jira] Created: (NUTCH-330) command line tool to search a Lucene index

2006-07-25 Thread Renaud Richardet (JIRA)
Versions: 0.8-dev Environment: ubuntu Reporter: Renaud Richardet Priority: Minor Attachments: clSearch.diff Tool to allow to search a Lucene index from the command line, makes development and testing faster usage: bin/nutch searchindex [index dir