Re: What's the status of Nutch-GUI?

2006-12-02 Thread Stefan Groschupf

Hi Sami,

I quess you refer to these:
•  LocalJobRunner:
  •  Run as kind of singelton
  •  Have a kind of jobQueue
  •  Implement JobSubmissionProtocol status-report
 methods
  •  implement killJob method

Right!



-how about writing a nutchrunner that just extends the  
functionality of localjobrunner?
That would be one solution, however I still hope that the hadoop  
developer understand that it would be general benefit to improve the  
local jobrunner.
Since it would be somehow duplicated code it does not feel right, but  
I also think better this way as never get this issue solved.




-scheduling (jobQueue) could be completely outside of jobrunner?


We solved that with Quarz and file based JobStore we implemented back  
than.


Stefan 

Phrase query analysis-fr

2006-12-02 Thread Rida Benjelloun

Hi,
When I use analysis-fr for indexing and searching, I'm not able to search by
phrase query. I'm using nutch-0.8.1.

Could someone help ?
Best regards


[jira] Created: (NUTCH-412) plugin to parse the feed-url (rss/atom) of a blog

2006-12-02 Thread Renaud Richardet (JIRA)
plugin to parse the feed-url (rss/atom) of a blog
-

 Key: NUTCH-412
 URL: http://issues.apache.org/jira/browse/NUTCH-412
 Project: Nutch
  Issue Type: New Feature
Affects Versions: 0.9.0
Reporter: Renaud Richardet
Priority: Minor


A plugin that extracts the feed-url (rss/atom) of a blog by retrieving the href 
from the headlink element (if found), and stores it in metadata. 

The meta can be accessed with 
parse.getData().getMeta(feedUrl);
you can test this plugin with the main method of HtmlParser.

Thanks for a feedback.


-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Updated: (NUTCH-412) plugin to parse the feed-url (rss/atom) of a blog

2006-12-02 Thread Renaud Richardet (JIRA)
 [ http://issues.apache.org/jira/browse/NUTCH-412?page=all ]

Renaud Richardet updated NUTCH-412:
---

Attachment: plugin_parse-feedUrl.diff

unified diff against head (Rev: 481445)

 plugin to parse the feed-url (rss/atom) of a blog
 -

 Key: NUTCH-412
 URL: http://issues.apache.org/jira/browse/NUTCH-412
 Project: Nutch
  Issue Type: New Feature
Affects Versions: 0.9.0
Reporter: Renaud Richardet
Priority: Minor
 Attachments: plugin_parse-feedUrl.diff


 A plugin that extracts the feed-url (rss/atom) of a blog by retrieving the 
 href from the headlink element (if found), and stores it in metadata. 
 The meta can be accessed with 
 parse.getData().getMeta(feedUrl);
 you can test this plugin with the main method of HtmlParser.
 Thanks for a feedback.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira