----- Original Message ----- From: "Doğacan Güney" <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Friday, June 08, 2007 8:27 PM Subject: Re: Loading mechnism of plugin classes and singleton objects
[...] >> This is strange, because, as you can see below, the strings that >> make keys and values of conf appears unchanged. Perhaps we should >> override >> the equals() method in org.apache.hadoop.conf.Configuration (invoked by >> CACHE.get(), according to the specs of the java.util.Map interface), so >> that >> the hashCode()s of the keys get ignored, and conf1.equals(conf2) return >> true >> if and only if: >> >> 1. conf1.size() == conf2.size(), >> >> 2. for each key k1 of conf1 there is a key k2 in conf2 such as: >> 2.1 k1.equals(k2) >> 2.2 conf1.get(k1).equals(conf2.get(k2)) > > This has been suggested before and I have to say I don't like this > one, because this means that each call to PluginRepository.get(conf) > will end up comparing all key value pairs, which, IMO, is > excessive(because if I am not mistaken, we don't need this when > running nutch in a distributed environment.). Unfortunately, this may > be the only way to fix this leak. Well, after all how often can this overhead ever happen? I logged just a few calls (admittedly, on short runs) but even thousands of occurrences would only consume a negligible amount of CPU time. A more serious problem is that an implementation of equals() that returns true if the two hashCodes are different violates the specifications of Object.hashCode() : http://java.sun.com/j2se/1.5.0/docs/api/java/lang/Object.html#hashCode() "If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result." > This is probably not a good idea, but here it goes: Perhaps, we can > change LocaJobRunner.Job.run method. First, it clones the JobConf > object (clonedConf). It then runs a MapTask with the original > JobConf. Upon completion, it copies everything from clonedConf back to > original JobConf. This way, original JobConf's hashCode won't change, > so there should be no leak. What if we simply rework the PluginRepository.get() method, checking first if there is key "deeply" equal (i.e., with the same key/value pairs but possibly different hashCode) to conf is in CACHE, and in that case if the hashCode is different we use that key as parameter of the get() method, in place of conf? (This within a synchronized block, of course.) Probably the cleanest way do do that is to encapsulate this behavior in a "hashCodeObliviousGet()" method of a subclass of Hashtable, and use that subclass for CACHE. So, we will be "specs-ly correct" because we won't override common methods (get() or equals()) with implementations that are not compliant with the specs. Also: do you agree with me that we should avoid caches based on weak references (like the current WeakHashMap) if we want to be protected against multiple instances of plugins? I can't see a compelling reason for a garbage-collected CACHE, considering that the number of items stored in it can't grow that large (how many different configurations can users ever create?) Enzo ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
