Hi Stefan,

> Hi Chris,
> thanks for the clarification.

No probs. 

> Do you think we can we somehow cache it in the nutchConf instance,
> since this is the way we doing this on other places as well?

Yeah I think we can. Here is a small patch to the ParserFactory that should
do the trick. Give it a test and let me know if it works. If it does, I
would say +1 to the committers to get this into the sources ASAP, no?

Index: src/java/org/apache/nutch/parse/ParserFactory.java
===================================================================
--- src/java/org/apache/nutch/parse/ParserFactory.java  (revision 383463)
+++ src/java/org/apache/nutch/parse/ParserFactory.java  (working copy)
@@ -55,7 +55,13 @@
     this.conf = conf;
     this.extensionPoint = PluginRepository.get(conf).getExtensionPoint(
         Parser.X_POINT_ID);
-    this.parsePluginList = new ParsePluginsReader().parse(conf);
+    
+    if(conf.getObject("parsePluginList") != null){
+       this.parsePluginList =
(ParsePluginList)conf.getObject("parsePluginList");
+    }
+    else{
+        this.parsePluginList = new ParsePluginsReader().parse(conf);

+    }
 
     if (this.extensionPoint == null) {
       throw new RuntimeException("x point " + Parser.X_POINT_ID + " not
found.");


Cheers,
  Chris

> Cheers,
> Stefan
> 
> Am 07.03.2006 um 04:38 schrieb Chris Mattmann:
> 
> > Hi Stefan,
> >
> >> after a short time I already had 1602 time this lines in my
> >> tasktracker log files.
> >> 060307 022707 task_m_2bu9o4  found resource parse-plugins.xml at
> >> file:/home/joa/nutch/conf/parse-plugins.xml
> >>
> >> Sounds like this file is loaded 1602 (after lets say 3 minutes) I
> >> guess that wasn't the goal or do I oversee anything?
> >
> > It certainly wasn't the goal at all. After NUTCH-88, Jerome and I
> > had the
> > following line in the ParserFactory.java class:
> >
> >   /** List of parser plugins. */
> >   private static final ParsePluginList PARSE_PLUGIN_LIST =
> >           new ParsePluginsReader().parse();
> >
> >
> > (see revision 326889)
> >
> > Looking at the revision history for the ParserFactory file, after the
> > application of NUTCH-169, the above changes to:
> >
> >
> >   private ParsePluginList parsePluginList;
> >
> > //... code here
> >
> > public ParserFactory(NutchConf nutchConf) {
> >     this.nutchConf = nutchConf;
> >     this.extensionPoint = nutchConf.getPluginRepository
> > ().getExtensionPoint(
> >         Parser.X_POINT_ID);
> >     this.parsePluginList = new ParsePluginsReader().parse(nutchConf);
> >
> >     if (this.extensionPoint == null) {
> >       throw new RuntimeException("x point " + Parser.X_POINT_ID + "
> > not
> > found.");
> >     }
> >     if (this.parsePluginList == null) {
> >       throw new RuntimeException(
> >           "Parse Plugins preferences could not be loaded.");
> >     }
> >   }
> >
> >
> > Thus, every time the ParserFactory is constructed, the parse-
> > plugins.xml
> > file is read (it's the result of the call to
> > ParsePluginsReader().parse(nutchConf)). So, if the fie is loaded
> > 1602 times,
> > I'd guess that the ParserFactory is loaded 1602 times?
> > Additionally, I'm
> > wondering why the parse-plugins.xml configuration parameters aren't
> > declared
> > as final static anymore?
> >
> >> That could be a serious performance improvement to just load this
> >> file once.
> >
> > Yup, I think that's the reason we made it final static. If there is no
> > reason to not have it final static, I would suggest that it be put
> > back to
> > final static. There may be a problem however, now since NUTCH-169, the
> > loading requires an existing Configuration object I believe. So, we
> > may need
> > a static Configuration object as well. Thoughts?
> >
> >> I was not able to find the code that is logging this statement, has
> >> anyone a idea where this happens?
> >
> > The statement gets logged within the ParsePluginsReader.java class,
> > line 98:
> >
> > ppInputStream = conf.getConfResourceAsInputStream(
> >                           conf.get(PP_FILE_PROP));
> >
> > HTH,
> >   Chris
> >
> >
> >>
> >> Thanks.
> >> Stefan
> >> ---------------------------------------------
> >> blog: http://www.find23.org
> >> company: http://www.media-style.com
> >
> >
> >
> 
> ---------------------------------------------
> blog: http://www.find23.org
> company: http://www.media-style.com




-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to