Overall, I suspect that any exactly-once work we can do would be a net performance increase over our current behavior (which will repeat the work for every usage) in most cases, even for schemas that define many unused types (since, in many cases, there would also be a lot of generated schema that uses some of those types many times).
Switching to at most once semantics would probably be a performance improvement in most cases. All of the fields of GlobalSimpleTypeDefFactory are lazy, and I was able to move the requiredEvaluations off of the factory classes without breaking any tests which seems to have solved some of the problems. The only remaining issue we need to decide about is related to the actual typeCalculator implementation. In theory, it is possible for someone to write silly expressions, such as { dfdl:typeInputCalcInt(../functionName, ../functionValue) } which would require us loading the type calculator for all globalSimpleTypes which define one (and therefore evaluating enough of all globalSimpleTypes to determine if they have one). Currently, this should be what we are doing, as nothing else evaulates unused simple types, but I suspect the process of determing if a type defines a typeCalc involves unnecessarily computing much of what the typeCalc would be (which should be fixable, but it is likely we would accidentally reintroduce a datadependency without noticing at some point in the future) We could also, as Mike has suggested previously, not allow cases like the above, and insist that the function name parameter is always a constant. This would be a fair bit more work to implement in the compiler that I would prefer to avoid doing. I think it is not unreasonable for a compiler to spend time looking at dead code. If there is a schema where that is a significant issue, they can add a pre-compilation step to strip out unused types. But I suspect the time spent partially analyzing all global types is not going to be significant. I would prefer to see profiling data to the contrary before spending time worrying about it. ________________________________ From: Steve Lawrence <slawre...@apache.org> Sent: Tuesday, April 9, 2019 7:42:10 AM To: dev@daffodil.apache.org; Sloane, Brandon Subject: Re: Exposing latent SDEs I personally don't have any problem with detecting errors even those elements aren't used. It is a backwards incompatible change, but no one should complain about improved error detection. I do have a concern that this is potentialy a big compile time performance hit. I easily can imagine cases were someone generates a large set of enumerations based on some specification but only some small set are actually used in someones use-case. In this case, we probably want to avoid compiling/checking every single enumeration if we're never going to use it. Doing things lazy should avoid that. It might make sense to have an option to allow checking everything, even things not used, but I would prefer that option to default to off. - Steve On 4/8/19 6:29 PM, Sloane, Brandon wrote: > It apears that hacking around this is not as simple as I would loke. The > problem is being triggered from SchemaSet, when we evaulate the line: > > > * lazy val globalSimpleTypeDefs: Seq[GlobalSimpleTypeDefFactory] = > schemas.flatMap(_.globalSimpleTypeDefs) > > > To surpress errors from unused SimpleTypes as suggested, we would want the > resulting type to be: > > * Seq[ => GlobalSimpleTypeDefFactory] > > Which does not appear to be something that we can do (Scala does not > recognize this signature as syntactically valid). > > We could be even more explicit about it, and use the type: > > * Seq[ () => GlobalSimpleTypeDefFactory] > > Which should work, but seems even more hacky. (In particular, we would need > to be careful that we actually cache the values if we want to maintain the > at-most-once sementics we expect from lazy values) > > > If you are curious, the actual issue (at least in the case I am looking at > now), is being triggered by the "requiredEvaluations(defaultPropertySources)" > line of AnnotatedSchemaComponant, which is a trait of > GlobalSimpleTypeDefFactory (Now that we are actually computing things on the > factory, it needs access to some of the annotations) > > > I don't really understand what the purpose of requiredEvaluation is, so I > don't want to remove it. > > > Again, the only time this would be an issue is when we have schema which A) > contains an error but B) happens to work if we ignore the error. > > > Given A), I would like to once again ask if it is acceptable to change our > behavior to reject such schemas. This will involve refactoring a number of > tests which deliberately include broken schema to test for error messages. > > ________________________________ > From: Sloane, Brandon <bslo...@tresys.com> > Sent: Friday, April 5, 2019 6:19:46 PM > To: dev@daffodil.apache.org > Subject: Re: Exposing latent SDEs > > The issue is that we need to compile the map of GlobalSimpleTypeFactories, as > that is the data structure that the compiler uses whenever it needs to look > up a type by qname. > > > I suppose we could change the type of that data structure from (guessing at > what the original structure looks like) Map[QName, GlobalSimpleTypeFactory] > tp Map[QName, => GlobalSimpleTypeFactory], which probably will do what we > want, but we are then relying on lazyness for our program to be correct, > which always makes me a bit nervous. > > > The only thing this gets us is the ability to compile broken schema so long > as the broken part is not being used. Apart from backwards compatibility > concerns, I am not sure we are doing anyone any favors by allowing this. > > ________________________________ > From: Beckerle, Mike <mbecke...@tresys.com> > Sent: Friday, April 5, 2019 5:59:12 PM > To: dev@daffodil.apache.org > Subject: Re: Exposing latent SDEs > > Do we have to compile simple types even if unused? Cant we compile them > lazily if used. > > I am very happy to restrict expressions that use simple type qnames for them > to have to be literal constants. Then compiling the expressions would provide > the qnames of the types actually being used. > > Get Outlook for Android<https://aka.ms/ghei36> > > ________________________________ > From: Sloane, Brandon <bslo...@tresys.com> > Sent: Friday, April 5, 2019 5:12:26 PM > To: dev@daffodil.apache.org > Subject: Exposing latent SDEs > > This is related to the previous thread with the subject "Further design > difficulties with TypeValueCalculators". I believe I have solved the main > issue of that thread by computing attributes that do not depend on the > context in the SimpleTypeDefFactory instead of the instance class [0]. > > > However, there is still an issue where I am changing the behaviour of > Daffodil to compile aspects simpleTypes regardless of if they are used or > not. We avoid the previous problem by making these aspects only those whose > correctness does not depend on the local context. However, there is still an > issue where if an unused simpleType is just plain broken, it will now emit an > SDE. > > > For instance, in section05/facets/Facets.tdml we have the following schema: > > 4856 <xs:simpleType name="enum_st1"> > 4857 <xs:restriction base="xs:string"> > 4858 <xs:enumeration value="Trout" /> > 4859 <xs:enumeration value="Bass" /> > 4860 <xs:enumeration value="Catfish" /> > 4861 </xs:restriction> > 4862 </xs:simpleType> > > > 4880 <xs:simpleType name="enum_st4"> > 4881 <xs:restriction base="ex:enum_st1"> > 4882 <xs:enumeration value="Trout" /> > 4883 <xs:enumeration value="Bass" /> > 4884 <xs:enumeration value="Carp" /> > 4885 </xs:restriction> > 4886 </xs:simpleType> > > As test case facetEnum06 verifies, enum_st4 is broken because "Local > enumerations must be a subset of base enumerations" > > The issue I am now running into is that all tests that use that schema are > now failing due to this, even if they do not actually use enum_st4. > > Abstractly, I don't mind calling this acceptable behaviour, as there is an > SDE in any schema containing enum_st4, even if the original implementation > ignored it; and I don't mind updating the relevent test files to isolate > these broken types in their own schema, but I wanted to verify that it is > okay to make this sort of backwards incompatible change. > > > [0] This involved a fair amount of refactoring. There is more refactoring > that can be done along these lines (which I believe will help with our > performance issue), but I only did what was needed to support the > functionality I am adding. > > > Regards, > > > Brandon T. Sloane > > Associate, Services > > bslo...@tresys.com | tresys.com >