hadoop-site.xml - absolute Path

2007-02-12 Thread Tobias Zahn
Hello out there,
sorry for mailing to this list another time. I'm not sure if I'm not
working carefully enough or something, but I'm facing even more problems.

I put a new property in conf/hadoop-site.xml, according to the examples
in hadoop-default.xml. The new property contains the path to a
configuration file for a plugin.
In that entry occurs:
2007-02-12 22:38:00,246 FATAL api.RegexURLFilterBase - Can't find
resource: $CORRECT-AND-EXISTING-PATH

No I wonder, if:
1) I can't extend api.RegexURLFilterBase and use another config file or
something similar
2) I can't use an absolute path for my properties.

It would be great if anyone is interested in that plugin and would like
to help me finding my errors. Please contact me, I'll mail you the
source (something around 100lines).

[The plugin will make it possible to index only some files, according to
an regex file - similar to urlfilter-regex].

Best regards,
Tobias Zahn


Re: api.RegexURLFilterBase - Configuration Resources

2007-02-11 Thread Tobias Zahn
Again, thank you for your help.
In the end, I had slightly wrong configs for my plugin, but now it seems
to work. But since nutch makes no output on the commandline anymore, I
can't find out if everything is correct in the end (readdb -stats).

I don't know why it is that way - I haven't changed anything.
It would be create if someone would have an idea what to do now!

My nutch version is 0.8.


Best regards,
Tobias Zahn


api.RegexURLFilterBase - Configuration Resources

2007-02-06 Thread Tobias Zahn
Hello!
I have written a new plugin extending the IndexingFilter and using the
RegexURLFilterBase class.
In the log there is this message:

FATAL api.RegexURLFilterBase - Can't find resource: null

I don't know how to handle that Configuration-Objects (setConf() etc.)
What should I do to avoid that error? Where does the
Configuration-Object come from?

TIA
Tobias Zahn


Can't Compile Revision 501954

2007-01-31 Thread Tobias Zahn
hello!
Does anybody know why I get the following error running ant on the
revision I have checked out from svn? Maybe its a dumb question but...

Thank you for your help!

compile:
 [echo] Compiling plugin: parse-html
[javac] Compiling 5 source files to nutch/trunk/build/parse-html/classes
[javac]
nutch/trunk/src/plugin/parse-html/src/java/org/apache/nutch/parse/html/HtmlParser.java:102:
cannot access org.apache.nutch.parse.HtmlParseFilters
[javac] bad class file:
nutch/trunk/build/classes/org/apache/nutch/parse/HtmlParseFilters.class
[javac] illegal start of class file
[javac] Please remove or make sure it appears in the correct
subdirectory of the classpath.
[javac]   private HtmlParseFilters htmlParseFilters;
[javac]   ^
[javac] 1 error


'RegexIndexingFilter'

2007-01-29 Thread Tobias Zahn
Good evening!
I have found out that it is impossible to index only some specific file
types with nutch. Needing this feature, I thought of implementing an
'RegexIndexingFilter', if that would be the right thing to do so.
I have read some sourcecode, but I couldn't find out how to tell the
indexer that he shouldn't index a file.

Hoping that I am on the right way I hope for your opinions, ideas and
your help.

TIA,
Tobias Zahn