Re: Caching CommonHyphenation (was: Re: The effect of the property cache ...)

2007-07-20 Thread Jeremias Maerki

On 20.07.2007 06:56:21 Andreas L Delmelle wrote:
 On Jul 19, 2007, at 00:36, Andreas L Delmelle wrote:
 
 
  On Jul 18, 2007, at 23:18, Jeremias Maerki wrote:
  snip /
  - One of the easiest candidates for another flyweight is probably
  CommonHyphenation (56K instances, 2.3MB in my example). The few  
  member
  variables could probably just be concatenated to a String (to be  
  used as
  the key).
 
  Interesting idea, will look into that asap.
 
 FWIW:
 Looked a bit closer at this, and it suddenly struck me that all the  
 base Property types, apart from CharacterProperty which I overlooked  
 as a possible candidate, were already cached:
 
 StringProperty - language, country, script
 NumberProperty - hyphenation-push/remain-character-count
 EnumProperty - hyphenate
 
 CharacterProperty(*) - hyphenation-character
 
 (*) now also added, see http://svn.apache.org/viewvc?view=revrev=557814
 
 This means we currently end up in the strange situation where  
 different/separate CommonHyphenation instances are generated from  
 identical sets of base Property instances.

Raises the question for me if for properties without dynamic context-based
evaluation the property evaluation could be streamlined to directly
return the primitive values instead of simple container objects like
NumberProperty. throw new NotEnoughTimeRightNowException();

 Maybe the CommonHyphenation bundle could store references to the  
 original properties themselves instead of duplicating their content/ 
 value and storing them as primitives? By itself, this should be  
 roughly the same in terms of overall memory consumption: replacement  
 of some primitives with references.
 
 In that case, one of the additional benefits of the individual  
 Property caching is that you can now actually avoid calls to  
 StringProperty.equals() in the rest of the code. identity means the  
 same as equality here, so the fastest possible implementation for  
 CommonHyphenation.equals() would then come to look like:
 
 public final class CommonHyphenation {
 ...
 public final StringProperty language;
 public final StringProperty script;
 public final StringProperty country;
 public final EnumProperty hyphenate;
 ...
 public boolean equals(Object obj) {
if (obj == this) {
  return true;
}
if (obj instanceof CommonHyphenation) {
  CommonHyphenation ch = (CommonHyphenation) obj;
  return (ch.language == this.language
ch.script == this.script
ch.country == this.country
ch.hyphenate == this.hyphenate
...)
}
return false;
 }
 
 One thing that cannot be avoided is the multiple calls to  
 PropertyList.get() to get to the properties that are needed to  
 perform the check for a flyweight bundle. Maybe the initial  
 assignments can be moved into the getInstance() method, so they  
 become part of the static code. getInstance() would get a  
 PropertyList as argument, while the private constructor signature is  
 altered to accept all the base properties as parameters.
 
 The key in the Map could be a composite String, but could also again  
 be the CommonHyphenation itself, if a decent hashCode()  
 implementation is added.
 The benefit of using the instance itself is that the key in a  
 WeakHashMap is automatically released after the last object referring  
 to it has been cleared. Using a key other than the instance itself  
 would make WeakHashMap unusable, since the keys are in that case not  
 referenced directly by any object. The key cannot be embedded in the  
 instance itself, since that would prevent the entire entry from ever  
 being released...
 
 The properties themselves being immutable and final, I guess it does  
 no harm to expose them as public members. Only a handful of places in  
 TextLM and LineLM would need a slight adjustment to compensate for  
 the lost getString() and getEnum() conversions. Maybe for  
 convenience, if really needed, accessors could be added like:
 
 public String language() {
return language.getString();
 }
 ...
 public boolean hyphenate() {
return (hyphenate.getEnum() == EN_TRUE);

Well, I'd prefer Bean-style getters, i.e. getLanguage(),
isHyphenationEnabled()

 
 Opinions?
 For the interested parties: full CommonHyphenation below, following  
 roughly the same principles as the Property caching.

hash should probably be transient here because it's a cached value.

 Cheers
 
 Andreas
 
 --- Sample code ---
 public final class CommonHyphenation {
 
  private static final Map cache =  
 java.util.Collections.synchronizedMap(
  new java.util.WeakHashMap());
 
  private int hash = 0;
 
  /** The language property */
  private final StringProperty language;
 
  /** The country property */
  private final StringProperty country;
 
  /** The script property */
  private final StringProperty script;
 
  /** The hyphenate property */
  private final 

Re: Caching CommonHyphenation (was: Re: The effect of the property cache ...)

2007-07-20 Thread Andreas L Delmelle

On Jul 20, 2007, at 09:19, Jeremias Maerki wrote:


snip /
This means we currently end up in the strange situation where
different/separate CommonHyphenation instances are generated from
identical sets of base Property instances.


Raises the question for me if for properties without dynamic  
context-based

evaluation the property evaluation could be streamlined to directly
return the primitive values instead of simple container objects like
NumberProperty. throw new NotEnoughTimeRightNowException();


The 'evaluation' here is precisely triggered by the calls to  
PropertyList.get().
Before those calls in the CommonHyphenation constructor, the base  
properties might not even exist yet (if they were not specified on  
the FO that is bound to the CommonHyphenation).


Trading the NumberProperty for an int... No idea if that's feasible  
without a thorough revision. The entire property resolution mechanism  
currently depends on the generic Property return type. That design  
would obviously have to be abandoned for this handful of cases...


snip /

public boolean hyphenate() {
   return (hyphenate.getEnum() == EN_TRUE);


Well, I'd prefer Bean-style getters, i.e. getLanguage(),
isHyphenationEnabled()


No problem. That was only by means of an example.





Opinions?
For the interested parties: full CommonHyphenation below, following
roughly the same principles as the Property caching.


hash should probably be transient here because it's a cached value.


Checked this out, and transient seems to be only applicable in a  
serialization context. If the object is never serialized, adding that  
keyword would seem to have zero effect... unless I'm missing something.




Cheers

Andreas