Hi Eyal, Did you also modify parse-plugins.xml at the bottom to add an alias for parse-exe to point to the actual extension point id? I'm guessing that's your problem. Check out the bottom of parse-plugins.xml for an example of this.
Let me know if you still need more help and we'll go from there. Thanks, Chris On 10/17/07 6:53 AM, "eyal edri" <[EMAIL PROTECTED]> wrote: > Hi all, > > I'm trying to write a new plugin that will download pages with contentType: > x-dosexec (EXE) files. > i've followed the "write your own plugin tutorial" in the wiki and done the > following actions: (some actions are not mentioned in the tutorial) > > 1. Created a new dir under $NUTCH_HOME/src/plugins/parse-exe > 2. Created new $NUTCH_HOME/src/plugins/parse-exe/plugin.xml [displayed > below] > 3. Created new $NUTCH_HOME/src/plugins/parse-exe/build.xml [displayed > below] > 4. Written the java code > > $NUTCH_HOME/src/plugin/parse-exe/src/java/org/apache/nutch/parse/exe/ExeParser > .java > 5. Add "nutch-extensionpoints" & "parse-exe" to the 'plugins-include' > property in $NUTCH_HOME/conf/nutch-site.xml > 6. Add code to the $NUTCH_HOME/conf/parse-plugins.xml [written below] > 7. Added code the the $NUTCH_HOME/src/plugins/build.xml [written > below] > 8. copied $NUTCH_HOME/build/plugins/parse-exe/parse-exe.jar to > $NUTCH_HOME/plugins/parse-exe > 9. run ant (build successful) > > the log shows that nutch identifies the plugin: > > 2007-10-17 15:15:55,657 INFO plugin.PluginRepository - Registered Plugins: > 2007-10-17 15:15:55,657 INFO plugin.PluginRepository - the nutch > core extension points (nutch-extensionpoints) > 2007-10-17 15:15:55,657 INFO plugin.PluginRepository - Html Parse > Plug-in (parse-html) > 2007-10-17 15:15:55,657 INFO plugin.PluginRepository - Exe Parse > Plug-in (parse-exe) > > but when the fetcher encounters a x-dosexec file it thorws an exception: > > 2007-10-17 15:17:16,146 WARN parse.ParseUtil - No suitable parser found > when trying to parse content http://www.foo.com/yyy/foo.exe of type > application/x-dosexec > 2007-10-17 15:17:16,146 WARN fetcher.Fetcher - Error parsing: > http://www.foo.com/yyy/foo.exe: failed(2,200): > org.apache.nutch.parse.ParseException: parser not found for > contentType=application/x-dosexec url=http://www.foo.com/yyy/movie30.exe > > (sorry, but the url has been masked for security reasons) > > Am i missing something?? > > thanks !! > > > > [$NUTCH_HOME/src/plugins/build.xml] > > <ant dir="parse-exe" target="deploy"/> > > [parse-plugins.xml] > > <mimeType name="application/x-dosexec"> > <plugin id="parse-exe" /> > </mimeType> > > > [plugin.xml] // copied and changed from parse-pdf > > <?xml version="1.0" encoding="UTF-8"?> > <plugin > id="parse-exe" > name="Exe Parse Plug-in" > version="1.0.0" > provider-name="nutch.org"> > > <runtime> > <library name="parse-exe.jar"> > <export name="*"/> > </library> > </runtime> > > <requires> > <import plugin="nutch-extensionpoints"/> > <import plugin="lib-log4j"/> > </requires> > > <extension id="org.apache.nutch.parse.exe" > name="ExeParse" > point="org.apache.nutch.parse.Parser"> > > <implementation id="org.apache.nutch.parse.exe.ExeParse" > class="org.apache.nutch.parse.exe.ExeParse"> > <parameter name="contentType" value="application/x-dosexec"/> > <parameter name="pathSuffix" value=""/> > </implementation> > </extension> > > </plugin> > > ------------------------------------------------------------------------------ > ----------------------------------- > > [build.xml] > > <?xml version="1.0"?> > > <project name="parse-exe" default="jar-core"> > > <import file="../build-plugin.xml"/> > > </project> > > ------------------------------------------------------------------------ > [ExeParser.java] > > public class ExeParser implements Parser { > public static final Log LOG = LogFactory.getLog(" > org.apache.nutch.parse.exe"); > private Configuration conf; > > public Parse getParse(Content content) { > > try { > > byte[] raw = content.getContent(); > > // enter here my code ( i will replace this with real code) > LOG.info ("EDRI:: you have reached the parse-exe plugin!"); > System.out.println("EDRI:: system.out.print... parse-exe"); > > > > > String contentLength = content.getMetadata().get( > Response.CONTENT_LENGTH); > if (contentLength != null && raw.length != > Integer.parseInt(contentLength)) > { > return new ParseStatus(ParseStatus.FAILED, > ParseStatus.FAILED_TRUNCATED, > "Content truncated at "+raw.length > +" bytes. Parser can't handle incomplete exe > file.").getEmptyParse(getConf()); > } > > } catch (Exception e) { // run time exception > if (LOG.isWarnEnabled()) { > LOG.warn("General exception in EXE parser: "+e.getMessage()); > e.printStackTrace(LogUtil.getWarnStream(LOG)); > } > return new ParseStatus(ParseStatus.FAILED, > "Can't be handled as exe document. " + > e).getEmptyParse(getConf()); > } > > /// i'm not sure what to return here if i only need to d/l the file > > ParseData parseData = new ParseData(ParseStatus.STATUS_SUCCESS, "",null, > null, null); > parseData.setConf(this.conf); > return new ParseImpl("", parseData); > } > > public void setConf(Configuration conf) { > this.conf = conf; > } > > public Configuration getConf() { > return this.conf; > } > > > > ______________________________________________ Chris Mattmann, Ph.D. [EMAIL PROTECTED] Cognizant Development Engineer Early Detection Research Network Project _________________________________________________ Jet Propulsion Laboratory Pasadena, CA Office: 171-266B Mailstop: 171-246 _______________________________________________________ Disclaimer: The opinions presented within are my own and do not reflect those of either NASA, JPL, or the California Institute of Technology.