PR updated.

This 2.11 hack seems significant enough that it warrants a re-review. I'll take 
Steve's +1 from this thread as part of that, but would like to get a new +1 
from Mike before merging in.
________________________________
From: Steve Lawrence <slawre...@apache.org>
Sent: Tuesday, December 17, 2019 2:38 PM
To: Sloane, Brandon <bslo...@tresys.com>; dev@daffodil.apache.org 
<dev@daffodil.apache.org>
Subject: Re: DataValue compiler bug under 2.11

Seems reasonable to me. +1

On 12/17/19 2:36 PM, Sloane, Brandon wrote:
> This is the option I think I am settling on.
>
> SBT allows us to provide a seperate implementation of a file based on compiler
> version, so we can incur the related overhead on only 2.11 builds.
>
> The only caveat is that in 2.11, we would need to incur a cost on every call 
> to
> getAnyRef, as that function needs to change to:
>
>    @inline def getAnyRef = {
>      if(v.isInstanceOf[BoxedByteArray]){
>        v.asInstanceOf[BoxedByteArray].v
>      }else{
>        v.asInstanceOf[AnyRef]
>    }
>
> We could avoid this by going through all of our code that expects a ByteArray
> and handling BoxedByteArray, but I would like to keep the hackiness contained.
>
> Since we can contain this to only 2.11 builds, I think the performance hit 
> would
> be acceptable.
>
> This would still impose a bit of a maintance burden, as we now have 2 versions
> of the file to maintain, but hopefully this is not a file that requires much
> modifications.
> --------------------------------------------------------------------------------
> *From:* Steve Lawrence <slawre...@apache.org>
> *Sent:* Tuesday, December 17, 2019 2:11 PM
> *To:* Sloane, Brandon <bslo...@tresys.com>; dev@daffodil.apache.org
> <dev@daffodil.apache.org>
> *Subject:* Re: DataValue compiler bug under 2.11
> So this seems a bit hacky, but doing some simple testing I think
> wrapping the byte array works, e.g.:
>
> class WrappedByteArray(val ba: Array[Byte])
>
>
> object DataValue {
>    ...
>
>    type DataValueByteArray = DataValue[WrappedByteArray, ...]
>
>    @inline
>    implicit def toDataValue(v: Array[Byte]): DataValueByteArray =
>    {
>      val w = new WrappedByteArray(v)
>      new DataValueByteArray(w)
>    }
>
>    ...
>
> }
>
> class DataValue ... {
>
>    @inline
>    def getByteArray = v.asInstanceOf[WrappedByteArray].array
>
> }
>
> We're now effectively boxing byte arrays, but those are only used for
> hexBinary data, which aren't all that common so might not be too big of
> a deal. The byte code looks like this:
>
>    public WrappedByteArray toDataValue(byte[]);
>      Code:
>         0: new           #20   // class WrappedByteArray
>         3: dup
>         4: aload_1
>         5: invokespecial #23   // Method WrappedByteArray."<init>":([B)V
>         8: astore_2
>         9: aload_2
>        10: areturn
>
> Which is a bit more overhead, but is at least what we would actually
> expect, and it works on scala 2.11.
>
> The other option is probably to drop support for 2.11. The only use case
> I know of needing 2.11 is for using Daffodil in Apache Spark, which now
> has experimental support for 2.12. So it's not unreasonable to drop
> support, but would be nice to maintain, IMO.
>
> - Steve
>
> On 12/17/19 1:40 PM, Sloane, Brandon wrote:
>> That certainly looks like what I am seeing.
>>
>> If I am reading that issue correctly, it looks like it was fixed in 
>> 2.12.0-M1.
>> --------------------------------------------------------------------------------
>> *From:* Steve Lawrence <slawre...@apache.org>
>> *Sent:* Tuesday, December 17, 2019 1:32 PM
>> *To:* dev@daffodil.apache.org <dev@daffodil.apache.org>
>> *Subject:* Re: DataValue compiler bug under 2.11
>> I believe this is scala bug 7521:
>>
>> https://github.com/scala/bug/issues/7521
>>
>> Skimming the comments and other referenced issues, I don't see anyone
>> mentioning a workaround. Appears the fix was added in 2.12, which makes
>> sense based on the behavior you're seeing.
>>
>> On 12/17/19 1:19 PM, Sloane, Brandon wrote:
>>> While debugging my latest DataValue PR under 2.11, I came across what I 
>>> believe to be a compiler bug.
>>>
>>> Recall that we have source code along the lines of:
>>>
>>> ...
>>>   @inline implicit def toDataValue(v: DFDLTime): DataValueTime = new 
>>> DataValue(v)
>>>   @inline implicit def toDataValue(v: Array[Byte]): DataValueByteArray = 
>>> new DataValue(v)
>>>   @inline implicit def toDataValue(v: JBoolean): DataValueBool = new 
>>> DataValue(v)
>>>   @inline implicit def toDataValue(v: JNumber): DataValueNumber = new 
>>> DataValue(v)
>>> ...
>>>
>>> Under 2.11, this gets compiled to:
>>>
>>> ...
>>>   public org.apache.daffodil.calendar.DFDLDate 
>>> toDataValue(org.apache.daffodil.calendar.DFDLDate);
>>>     Code:
>>>        0: aload_1
>>>        1: areturn
>>>
>>>   public org.apache.daffodil.calendar.DFDLTime 
>>> toDataValue(org.apache.daffodil.calendar.DFDLTime);
>>>     Code:
>>>        0: aload_1
>>>        1: areturn
>>>
>>>   public byte[] toDataValue(byte[]);
>>>     Code:
>>>        0: getstatic     #35                 // Field 
>>> scala/runtime/ScalaRunTime$.MODULE$:Lscala/runtime/ScalaRunTime$;
>>>        3: aload_1
>>>        4: invokevirtual #39                 // Method 
>>> scala/runtime/ScalaRunTime$.toObjectArray:(Ljava/lang/Object;)[Ljava/lang/Object;
>>>        7: checkcast     #41                 // class "[Ljava/lang/Byte;"
>>>       10: areturn
>>>
>>>   public java.lang.Boolean toDataValue(java.lang.Boolean);
>>>     Code:
>>>        0: aload_1
>>>        1: areturn
>>>
>>>   public java.lang.Number toDataValue(java.lang.Number);
>>>     Code:
>>>        0: aload_1
>>>        1: areturn
>>> ...
>>>
>>> Note that for every instant accept byte[], the function compiles to a 
>>> simple identity function.
>>>
>>> When compiled under 2.12, the byte[] instance gets compiled to identity as 
>>> well:
>>>
>>>   public byte[] toDataValue(byte[]);
>>>     Code:
>>>        0: aload_1
>>>        1: areturn
>>>
>>> The 2.11 version is not just inefficient; it is also wrong. It first 
>>> converts to an array of Objects (which implies boxing), before attempting 
>>> to cast said array to an array of (boxed) Bytes. This case fails , as 
>>> Object[] does not extends Byte[].
>>>
>>> Even if this cast were to succeed, the function would then be attempting to 
>>> return an array of boxed java.lang.Byte, while its signature indicates it 
>>> should be returning an array of unboxed bytes.
>>>
>>> (As an asside, at the callpoint I looked at in 2.12 optimizes this function 
>>> out entirely; the 2.11 version makes a standard function call to 
>>> toDataValue)
>>>
>>> I'm at a bit of a loss as to how to proceed. This bug seems fatal to the 
>>> idea of having an AnyVal DataValue class, and we do not have much 
>>> wiggle-room to tweak our source code to avoid triggering it.
>>>
>>> Thoughts?
>>>
>>>
>>> Brandon T. Sloane
>>>
>>> Associate, Services
>>>
>>> bslo...@tresys.com | tresys.com
>>>
>>
>

Reply via email to