On 5/30/07, Doğacan Güney <[EMAIL PROTECTED]> wrote: > On 5/30/07, Andrzej Bialecki <[EMAIL PROTECTED]> wrote: > > Doğacan Güney wrote: > > > > > My patch is just a draft to see if we can create a better caching > > > mechanism. There are definitely some rough edges there:) > > > > One important information: in future versions of Hadoop the method > > Configuration.setObject() is deprecated and then will be removed, so we > > have to grow our own caching mechanism anyway - either use a singleton > > cache, or change nearly all API-s to pass around a user/job/task context. > > > > So, we will face this problem pretty soon, with the next upgrade of Hadoop. > > Hmm, well, that sucks, but this is not really a problem for > PluginRepository: PluginRepository already has its own cache > mechanism. > > > > > > > > > > You are right about per-plugin parameters but I think it will be very > > > difficult to keep PluginProperty class in sync with plugin parameters. > > > I mean, if a plugin defines a new parameter, we have to remember to > > > update PluginProperty. Perhaps, we can force plugins to define > > > configuration options it will use in, say, its plugin.xml file, but > > > that will be very error-prone too. I don't want to compare entire > > > configuration objects, because changing irrevelant options, like > > > fetcher.store.content shouldn't force loading plugins again, though it > > > seems it may be inevitable.... > > > > Let me see if I understand this ... In my opinion this is a non-issue. > > > > Child tasks are started in separate JVMs, so the only "context" > > information that they have is what they can read from job.xml (which is > > a superset of all properties from config files + job-specific data + > > task-specific data). This context is currently instantiated as a > > Configuration object, and we (ab)use it also as a local per-JVM cache > > for plugin instances and other objects. > > > > Once we instantiate the plugins, they exist unchanged throughout the > > lifecycle of JVM (== lifecycle of a single task), so we don't have to > > worry about having different sets of plugins with different parameters > > for different jobs (or even tasks). > > > > In other words, it seems to me that there is no such situation in which > > we have to reload plugins within the same JVM, but with different > > parameters. > > Problem is that someone might get a little too smart. Like one may > write a new job where he has two IndexingFilters but creates each from > completely different configuration objects. Then filters some > documents with the first filter and others with the second. I agree > that this is a bit of a reach, but it is possible.
Actually thinking a bit further into this, I kind of agree with you. I initially thought that the best approach would be to change PluginRepository.get(Configuration) to PluginRepository.get() where get() just creates a configuration internally and initializes itself with it. But then we wouldn't be passing JobConf to PluginRepository but PluginRepository would do something like a NutchConfiguration.create(), which is probably wrong. So, all in all, I've come to believe that my (and Nicolas') patch is a not-so-bad way of fixing this. It allows us to pass JobConf to PluginRepository and stops creating new PluginRepository-s again and again... What do you think? > > > > > > -- > > Best regards, > > Andrzej Bialecki <>< > > ___. ___ ___ ___ _ _ __________________________________ > > [__ || __|__/|__||\/| Information Retrieval, Semantic Web > > ___|||__|| \| || | Embedded Unix, System Integration > > http://www.sigram.com Contact: info at sigram dot com > > > > > > > -- > Doğacan Güney > -- Doğacan Güney ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ Nutch-developers mailing list Nutch-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nutch-developers