Hi Stefan,
> Hi Chris,
> thanks for the clarification.
No probs.
> Do you think we can we somehow cache it in the nutchConf instance,
> since this is the way we doing this on other places as well?
Yeah I think we can. Here is a small patch to the ParserFactory that should
do the trick. Give it a test and let me know if it works. If it does, I
would say +1 to the committers to get this into the sources ASAP, no?
Index: src/java/org/apache/nutch/parse/ParserFactory.java
===================================================================
--- src/java/org/apache/nutch/parse/ParserFactory.java (revision 383463)
+++ src/java/org/apache/nutch/parse/ParserFactory.java (working copy)
@@ -55,7 +55,13 @@
this.conf = conf;
this.extensionPoint = PluginRepository.get(conf).getExtensionPoint(
Parser.X_POINT_ID);
- this.parsePluginList = new ParsePluginsReader().parse(conf);
+
+ if(conf.getObject("parsePluginList") != null){
+ this.parsePluginList =
(ParsePluginList)conf.getObject("parsePluginList");
+ }
+ else{
+ this.parsePluginList = new ParsePluginsReader().parse(conf);
+ }
if (this.extensionPoint == null) {
throw new RuntimeException("x point " + Parser.X_POINT_ID + " not
found.");
Cheers,
Chris
> Cheers,
> Stefan
>
> Am 07.03.2006 um 04:38 schrieb Chris Mattmann:
>
> > Hi Stefan,
> >
> >> after a short time I already had 1602 time this lines in my
> >> tasktracker log files.
> >> 060307 022707 task_m_2bu9o4 found resource parse-plugins.xml at
> >> file:/home/joa/nutch/conf/parse-plugins.xml
> >>
> >> Sounds like this file is loaded 1602 (after lets say 3 minutes) I
> >> guess that wasn't the goal or do I oversee anything?
> >
> > It certainly wasn't the goal at all. After NUTCH-88, Jerome and I
> > had the
> > following line in the ParserFactory.java class:
> >
> > /** List of parser plugins. */
> > private static final ParsePluginList PARSE_PLUGIN_LIST =
> > new ParsePluginsReader().parse();
> >
> >
> > (see revision 326889)
> >
> > Looking at the revision history for the ParserFactory file, after the
> > application of NUTCH-169, the above changes to:
> >
> >
> > private ParsePluginList parsePluginList;
> >
> > //... code here
> >
> > public ParserFactory(NutchConf nutchConf) {
> > this.nutchConf = nutchConf;
> > this.extensionPoint = nutchConf.getPluginRepository
> > ().getExtensionPoint(
> > Parser.X_POINT_ID);
> > this.parsePluginList = new ParsePluginsReader().parse(nutchConf);
> >
> > if (this.extensionPoint == null) {
> > throw new RuntimeException("x point " + Parser.X_POINT_ID + "
> > not
> > found.");
> > }
> > if (this.parsePluginList == null) {
> > throw new RuntimeException(
> > "Parse Plugins preferences could not be loaded.");
> > }
> > }
> >
> >
> > Thus, every time the ParserFactory is constructed, the parse-
> > plugins.xml
> > file is read (it's the result of the call to
> > ParsePluginsReader().parse(nutchConf)). So, if the fie is loaded
> > 1602 times,
> > I'd guess that the ParserFactory is loaded 1602 times?
> > Additionally, I'm
> > wondering why the parse-plugins.xml configuration parameters aren't
> > declared
> > as final static anymore?
> >
> >> That could be a serious performance improvement to just load this
> >> file once.
> >
> > Yup, I think that's the reason we made it final static. If there is no
> > reason to not have it final static, I would suggest that it be put
> > back to
> > final static. There may be a problem however, now since NUTCH-169, the
> > loading requires an existing Configuration object I believe. So, we
> > may need
> > a static Configuration object as well. Thoughts?
> >
> >> I was not able to find the code that is logging this statement, has
> >> anyone a idea where this happens?
> >
> > The statement gets logged within the ParsePluginsReader.java class,
> > line 98:
> >
> > ppInputStream = conf.getConfResourceAsInputStream(
> > conf.get(PP_FILE_PROP));
> >
> > HTH,
> > Chris
> >
> >
> >>
> >> Thanks.
> >> Stefan
> >> ---------------------------------------------
> >> blog: http://www.find23.org
> >> company: http://www.media-style.com
> >
> >
> >
>
> ---------------------------------------------
> blog: http://www.find23.org
> company: http://www.media-style.com
-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers