hadoop-site.xml - absolute Path
Hello out there, sorry for mailing to this list another time. I'm not sure if I'm not working carefully enough or something, but I'm facing even more problems. I put a new property in conf/hadoop-site.xml, according to the examples in hadoop-default.xml. The new property contains the path to a configuration file for a plugin. In that entry occurs: 2007-02-12 22:38:00,246 FATAL api.RegexURLFilterBase - Can't find resource: $CORRECT-AND-EXISTING-PATH No I wonder, if: 1) I can't extend api.RegexURLFilterBase and use another config file or something similar 2) I can't use an absolute path for my properties. It would be great if anyone is interested in that plugin and would like to help me finding my errors. Please contact me, I'll mail you the source (something around 100lines). [The plugin will make it possible to index only some files, according to an regex file - similar to urlfilter-regex]. Best regards, Tobias Zahn
Re: api.RegexURLFilterBase - Configuration Resources
Again, thank you for your help. In the end, I had slightly wrong configs for my plugin, but now it seems to work. But since nutch makes no output on the commandline anymore, I can't find out if everything is correct in the end (readdb -stats). I don't know why it is that way - I haven't changed anything. It would be create if someone would have an idea what to do now! My nutch version is 0.8. Best regards, Tobias Zahn
api.RegexURLFilterBase - Configuration Resources
Hello! I have written a new plugin extending the IndexingFilter and using the RegexURLFilterBase class. In the log there is this message: FATAL api.RegexURLFilterBase - Can't find resource: null I don't know how to handle that Configuration-Objects (setConf() etc.) What should I do to avoid that error? Where does the Configuration-Object come from? TIA Tobias Zahn
Can't Compile Revision 501954
hello! Does anybody know why I get the following error running ant on the revision I have checked out from svn? Maybe its a dumb question but... Thank you for your help! compile: [echo] Compiling plugin: parse-html [javac] Compiling 5 source files to nutch/trunk/build/parse-html/classes [javac] nutch/trunk/src/plugin/parse-html/src/java/org/apache/nutch/parse/html/HtmlParser.java:102: cannot access org.apache.nutch.parse.HtmlParseFilters [javac] bad class file: nutch/trunk/build/classes/org/apache/nutch/parse/HtmlParseFilters.class [javac] illegal start of class file [javac] Please remove or make sure it appears in the correct subdirectory of the classpath. [javac] private HtmlParseFilters htmlParseFilters; [javac] ^ [javac] 1 error
'RegexIndexingFilter'
Good evening! I have found out that it is impossible to index only some specific file types with nutch. Needing this feature, I thought of implementing an 'RegexIndexingFilter', if that would be the right thing to do so. I have read some sourcecode, but I couldn't find out how to tell the indexer that he shouldn't index a file. Hoping that I am on the right way I hope for your opinions, ideas and your help. TIA, Tobias Zahn