Re: Exposing latent SDEs

Sloane, Brandon Tue, 09 Apr 2019 19:21:09 -0700

Overall, I suspect that any exactly-once work we can do would be a net 
performance increase over our current behavior (which will repeat the work for 
every usage) in most cases, even for schemas that define many unused types 
(since, in many cases, there would also be a lot of generated schema that uses 
some of those types many times).



Switching to at most once semantics would probably be a performance improvement 
in most cases. All of the fields of GlobalSimpleTypeDefFactory are lazy, and I 
was able to move the requiredEvaluations off of the factory classes without 
breaking any tests which seems to have solved some of the problems.


The only remaining issue we need to decide about is related to the actual 
typeCalculator implementation.


In theory, it is possible for someone to write silly expressions, such as { 
dfdl:typeInputCalcInt(../functionName, ../functionValue) }


which would require us loading the type calculator for all globalSimpleTypes 
which define one (and therefore evaluating enough of all globalSimpleTypes to 
determine if they have one). Currently, this should be what we are doing, as 
nothing else evaulates unused simple types, but I suspect the process of 
determing if a type defines a typeCalc involves unnecessarily computing much of 
what the typeCalc would be (which should be fixable, but it is likely we would 
accidentally reintroduce a datadependency without noticing at some point in the 
future)


We could also, as Mike has suggested previously, not allow cases like the 
above, and insist that the function name parameter is always a constant. This 
would be a fair bit more work to implement in the compiler that I would prefer 
to avoid doing.


I think it is not unreasonable for a compiler to spend time looking at dead 
code. If there is a schema where that is a significant issue, they can add a 
pre-compilation step to strip out unused types. But I suspect the time spent 
partially analyzing all global types is not going to be significant. I would 
prefer to see profiling data to the contrary before spending time worrying 
about it.

________________________________
From: Steve Lawrence <slawre...@apache.org>
Sent: Tuesday, April 9, 2019 7:42:10 AM
To: dev@daffodil.apache.org; Sloane, Brandon
Subject: Re: Exposing latent SDEs

I personally don't have any problem with detecting errors even those
elements aren't used. It is a backwards incompatible change, but no one
should complain about improved error detection.

I do have a concern that this is potentialy a big compile time
performance hit. I easily can imagine cases were someone generates a
large set of enumerations based on some specification but only some
small set are actually used in someones use-case. In this case, we
probably want to avoid compiling/checking every single enumeration if
we're never going to use it. Doing things lazy should avoid that. It
might make sense to have an option to allow checking everything, even
things not used, but I would prefer that option to default to off.

- Steve



On 4/8/19 6:29 PM, Sloane, Brandon wrote:
> It apears that hacking around this is not as simple as I would loke. The 
> problem is being triggered from SchemaSet, when we evaulate the line:
>
>
>   *   lazy val globalSimpleTypeDefs: Seq[GlobalSimpleTypeDefFactory] = 
> schemas.flatMap(_.globalSimpleTypeDefs)
>
>
> To surpress errors from unused SimpleTypes as suggested, we would want the 
> resulting type to be:
>
>   *   Seq[ => GlobalSimpleTypeDefFactory]
>
> Which does not appear to be something that we can do (Scala does not 
> recognize this signature as syntactically valid).
>
> We could be even more explicit about it, and use the type:
>
>   *   Seq[ () => GlobalSimpleTypeDefFactory]
>
> Which should work, but seems even more hacky. (In particular, we would need 
> to be careful that we actually cache the values if we want to maintain the 
> at-most-once sementics we expect from lazy values)
>
>
> If you are curious, the actual issue (at least in the case I am looking at 
> now), is being triggered by the "requiredEvaluations(defaultPropertySources)" 
> line of AnnotatedSchemaComponant, which is a trait of 
> GlobalSimpleTypeDefFactory (Now that we are actually computing things on the 
> factory, it needs access to some of the annotations)
>
>
> I don't really understand what the purpose of requiredEvaluation is, so I 
> don't want to remove it.
>
>
> Again, the only time this would be an issue is when we have schema which A) 
> contains an error but B) happens to work if we ignore the error.
>
>
> Given A), I would like to once again ask if it is acceptable to change our 
> behavior to reject such schemas. This will involve refactoring a number of 
> tests which deliberately include broken schema to test for error messages.
>
> ________________________________
> From: Sloane, Brandon <bslo...@tresys.com>
> Sent: Friday, April 5, 2019 6:19:46 PM
> To: dev@daffodil.apache.org
> Subject: Re: Exposing latent SDEs
>
> The issue is that we need to compile the map of GlobalSimpleTypeFactories, as 
> that is the data structure that the compiler uses whenever it needs to look 
> up a type by qname.
>
>
> I suppose we could change the type of that data structure from (guessing at 
> what the original structure looks like) Map[QName, GlobalSimpleTypeFactory] 
> tp Map[QName, => GlobalSimpleTypeFactory], which probably will do what we 
> want, but we are then relying on lazyness for our program to be correct, 
> which always makes me a bit nervous.
>
>
> The only thing this gets us is the ability to compile broken schema so long 
> as the broken part is not being used. Apart from backwards compatibility 
> concerns, I am not sure we are doing anyone any favors by allowing this.
>
> ________________________________
> From: Beckerle, Mike <mbecke...@tresys.com>
> Sent: Friday, April 5, 2019 5:59:12 PM
> To: dev@daffodil.apache.org
> Subject: Re: Exposing latent SDEs
>
> Do we have to compile simple types even if unused? Cant we compile them 
> lazily if used.
>
> I am very happy to restrict expressions that use simple type qnames for them 
> to have to be literal constants. Then compiling the expressions would provide 
> the qnames of the types actually being used.
>
> Get Outlook for Android<https://aka.ms/ghei36>
>
> ________________________________
> From: Sloane, Brandon <bslo...@tresys.com>
> Sent: Friday, April 5, 2019 5:12:26 PM
> To: dev@daffodil.apache.org
> Subject: Exposing latent SDEs
>
> This is related to the previous thread with the subject "Further design 
> difficulties with TypeValueCalculators". I believe I have solved the main 
> issue of that thread by computing attributes that do not depend on the 
> context in the SimpleTypeDefFactory instead of the instance class [0].
>
>
> However, there is still an issue where I am changing the behaviour of 
> Daffodil to compile aspects simpleTypes regardless of if they are used or 
> not. We avoid the previous problem by making these aspects only those whose 
> correctness does not depend on the local context. However, there is still an 
> issue where if an unused simpleType is just plain broken, it will now emit an 
> SDE.
>
>
> For instance, in section05/facets/Facets.tdml we have the following schema:
>
> 4856     <xs:simpleType name="enum_st1">
> 4857       <xs:restriction base="xs:string">
> 4858         <xs:enumeration value="Trout" />
> 4859         <xs:enumeration value="Bass" />
> 4860         <xs:enumeration value="Catfish" />
> 4861       </xs:restriction>
> 4862     </xs:simpleType>
>
>
> 4880     <xs:simpleType name="enum_st4">
> 4881       <xs:restriction base="ex:enum_st1">
> 4882         <xs:enumeration value="Trout" />
> 4883         <xs:enumeration value="Bass" />
> 4884         <xs:enumeration value="Carp" />
> 4885       </xs:restriction>
> 4886     </xs:simpleType>
>
> As test case facetEnum06 verifies, enum_st4 is broken because "Local 
> enumerations must be a subset of base enumerations"
>
> The issue I am now running into is that all tests that use that schema are 
> now failing due to this, even if they do not actually use enum_st4.
>
> Abstractly, I don't mind calling this acceptable behaviour, as there is an 
> SDE in any schema containing enum_st4, even if the original implementation 
> ignored it; and I don't mind updating the relevent test files to isolate 
> these broken types in their own schema, but I wanted to verify that it is 
> okay to make this sort of backwards incompatible change.
>
>
> [0] This involved a fair amount of refactoring. There is more refactoring 
> that can be done along these lines (which I believe will help with our 
> performance issue), but I only did what was needed to support the 
> functionality I am adding.
>
>
> Regards,
>
>
> Brandon T. Sloane
>
> Associate, Services
>
> bslo...@tresys.com | tresys.com
>

Re: Exposing latent SDEs

Reply via email to