PR updated. This 2.11 hack seems significant enough that it warrants a re-review. I'll take Steve's +1 from this thread as part of that, but would like to get a new +1 from Mike before merging in. ________________________________ From: Steve Lawrence <slawre...@apache.org> Sent: Tuesday, December 17, 2019 2:38 PM To: Sloane, Brandon <bslo...@tresys.com>; dev@daffodil.apache.org <dev@daffodil.apache.org> Subject: Re: DataValue compiler bug under 2.11
Seems reasonable to me. +1 On 12/17/19 2:36 PM, Sloane, Brandon wrote: > This is the option I think I am settling on. > > SBT allows us to provide a seperate implementation of a file based on compiler > version, so we can incur the related overhead on only 2.11 builds. > > The only caveat is that in 2.11, we would need to incur a cost on every call > to > getAnyRef, as that function needs to change to: > > @inline def getAnyRef = { > if(v.isInstanceOf[BoxedByteArray]){ > v.asInstanceOf[BoxedByteArray].v > }else{ > v.asInstanceOf[AnyRef] > } > > We could avoid this by going through all of our code that expects a ByteArray > and handling BoxedByteArray, but I would like to keep the hackiness contained. > > Since we can contain this to only 2.11 builds, I think the performance hit > would > be acceptable. > > This would still impose a bit of a maintance burden, as we now have 2 versions > of the file to maintain, but hopefully this is not a file that requires much > modifications. > -------------------------------------------------------------------------------- > *From:* Steve Lawrence <slawre...@apache.org> > *Sent:* Tuesday, December 17, 2019 2:11 PM > *To:* Sloane, Brandon <bslo...@tresys.com>; dev@daffodil.apache.org > <dev@daffodil.apache.org> > *Subject:* Re: DataValue compiler bug under 2.11 > So this seems a bit hacky, but doing some simple testing I think > wrapping the byte array works, e.g.: > > class WrappedByteArray(val ba: Array[Byte]) > > > object DataValue { > ... > > type DataValueByteArray = DataValue[WrappedByteArray, ...] > > @inline > implicit def toDataValue(v: Array[Byte]): DataValueByteArray = > { > val w = new WrappedByteArray(v) > new DataValueByteArray(w) > } > > ... > > } > > class DataValue ... { > > @inline > def getByteArray = v.asInstanceOf[WrappedByteArray].array > > } > > We're now effectively boxing byte arrays, but those are only used for > hexBinary data, which aren't all that common so might not be too big of > a deal. The byte code looks like this: > > public WrappedByteArray toDataValue(byte[]); > Code: > 0: new #20 // class WrappedByteArray > 3: dup > 4: aload_1 > 5: invokespecial #23 // Method WrappedByteArray."<init>":([B)V > 8: astore_2 > 9: aload_2 > 10: areturn > > Which is a bit more overhead, but is at least what we would actually > expect, and it works on scala 2.11. > > The other option is probably to drop support for 2.11. The only use case > I know of needing 2.11 is for using Daffodil in Apache Spark, which now > has experimental support for 2.12. So it's not unreasonable to drop > support, but would be nice to maintain, IMO. > > - Steve > > On 12/17/19 1:40 PM, Sloane, Brandon wrote: >> That certainly looks like what I am seeing. >> >> If I am reading that issue correctly, it looks like it was fixed in >> 2.12.0-M1. >> -------------------------------------------------------------------------------- >> *From:* Steve Lawrence <slawre...@apache.org> >> *Sent:* Tuesday, December 17, 2019 1:32 PM >> *To:* dev@daffodil.apache.org <dev@daffodil.apache.org> >> *Subject:* Re: DataValue compiler bug under 2.11 >> I believe this is scala bug 7521: >> >> https://github.com/scala/bug/issues/7521 >> >> Skimming the comments and other referenced issues, I don't see anyone >> mentioning a workaround. Appears the fix was added in 2.12, which makes >> sense based on the behavior you're seeing. >> >> On 12/17/19 1:19 PM, Sloane, Brandon wrote: >>> While debugging my latest DataValue PR under 2.11, I came across what I >>> believe to be a compiler bug. >>> >>> Recall that we have source code along the lines of: >>> >>> ... >>> @inline implicit def toDataValue(v: DFDLTime): DataValueTime = new >>> DataValue(v) >>> @inline implicit def toDataValue(v: Array[Byte]): DataValueByteArray = >>> new DataValue(v) >>> @inline implicit def toDataValue(v: JBoolean): DataValueBool = new >>> DataValue(v) >>> @inline implicit def toDataValue(v: JNumber): DataValueNumber = new >>> DataValue(v) >>> ... >>> >>> Under 2.11, this gets compiled to: >>> >>> ... >>> public org.apache.daffodil.calendar.DFDLDate >>> toDataValue(org.apache.daffodil.calendar.DFDLDate); >>> Code: >>> 0: aload_1 >>> 1: areturn >>> >>> public org.apache.daffodil.calendar.DFDLTime >>> toDataValue(org.apache.daffodil.calendar.DFDLTime); >>> Code: >>> 0: aload_1 >>> 1: areturn >>> >>> public byte[] toDataValue(byte[]); >>> Code: >>> 0: getstatic #35 // Field >>> scala/runtime/ScalaRunTime$.MODULE$:Lscala/runtime/ScalaRunTime$; >>> 3: aload_1 >>> 4: invokevirtual #39 // Method >>> scala/runtime/ScalaRunTime$.toObjectArray:(Ljava/lang/Object;)[Ljava/lang/Object; >>> 7: checkcast #41 // class "[Ljava/lang/Byte;" >>> 10: areturn >>> >>> public java.lang.Boolean toDataValue(java.lang.Boolean); >>> Code: >>> 0: aload_1 >>> 1: areturn >>> >>> public java.lang.Number toDataValue(java.lang.Number); >>> Code: >>> 0: aload_1 >>> 1: areturn >>> ... >>> >>> Note that for every instant accept byte[], the function compiles to a >>> simple identity function. >>> >>> When compiled under 2.12, the byte[] instance gets compiled to identity as >>> well: >>> >>> public byte[] toDataValue(byte[]); >>> Code: >>> 0: aload_1 >>> 1: areturn >>> >>> The 2.11 version is not just inefficient; it is also wrong. It first >>> converts to an array of Objects (which implies boxing), before attempting >>> to cast said array to an array of (boxed) Bytes. This case fails , as >>> Object[] does not extends Byte[]. >>> >>> Even if this cast were to succeed, the function would then be attempting to >>> return an array of boxed java.lang.Byte, while its signature indicates it >>> should be returning an array of unboxed bytes. >>> >>> (As an asside, at the callpoint I looked at in 2.12 optimizes this function >>> out entirely; the 2.11 version makes a standard function call to >>> toDataValue) >>> >>> I'm at a bit of a loss as to how to proceed. This bug seems fatal to the >>> idea of having an AnyVal DataValue class, and we do not have much >>> wiggle-room to tweak our source code to avoid triggering it. >>> >>> Thoughts? >>> >>> >>> Brandon T. Sloane >>> >>> Associate, Services >>> >>> bslo...@tresys.com | tresys.com >>> >> >