GitHub user drewrobb opened a pull request:

    https://github.com/apache/spark/pull/23062

    [SPARK-8288][SQL] ScalaReflection can use companion object constructor

    ## What changes were proposed in this pull request?
    
    This change fixes a particular scenario where default spark SQL can't 
encode (thrift) types that are generated by twitter scrooge. These types are a 
trait that extends `scala.ProductX` with a constructor defined only in a 
companion object, rather than a actual case class. The actual case class used 
is child class, but that type is almost never referred to in code. The type has 
no corresponding constructor symbol and causes an exception. For all other 
purposes, these classes act just like case classes, so it is unfortunate that 
spark SQL can't serialize them nicely as it can actual case classes. For an 
full example of a scrooge codegen class, see 
https://gist.github.com/anonymous/ba13d4b612396ca72725eaa989900314.
    
    This change catches the case where the type has no constructor but does 
have an `apply` method on the type's companion object. This allows for thrift 
types to be serialized/deserialized with implicit encoders the same way as 
normal case classes. This fix had to be done in three places where the 
constructor is assumed to be an actual constructor:
    
    1) In serializing, determining the schema for the dataframe relies on 
inspecting its constructor (`ScalaReflection.constructParams`). Here we fall 
back to using the companion constructor arguments.
    2) In deserializing or evaluating, in the java codegen ( 
`NewInstance.doGenCode`), the type couldn't be constructed with the new 
keyword. If there is no constructor, we change the constructor call to try the 
companion constructor.
    3)  In deserializing or evaluating, without codegen, the constructor is 
directly invoked (`NewInstance.constructor`). This was fixed with scala 
reflection to get the actual companion apply method.
    
    The return type of `findConstructor` was changed because the companion 
apply method constructor can't be represented as a 
`java.lang.reflect.Constructor`.
    
    There might be situations in which this approach would also fail in a new 
way, but it does at a minimum work for the specific scrooge example and will 
not impact cases that were already succeeding prior to this change
    
    Note: this fix does not enable using scrooge thrift enums, additional work 
for this is necessary. With this patch, it seems like you could patch 
`com.twitter.scrooge.ThriftEnum` to extend `_root_.scala.Product1[Int]` with 
`def _1 = value` to get spark's implicit encoders to handle enums, but I've yet 
to use this method myself.
    
    Note: I previously opened a PR for this issue, but only was able to fix 
case 1) there: https://github.com/apache/spark/pull/18766
    
    ## How was this patch tested?
    
    I've fixed all 3 cases and added two tests that use a case class that is 
similar to scrooge generated one. The test in ScalaReflectionSuite checks 1), 
and the additional asserting in ObjectExpressionsSuite checks 2) and 3).


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/drewrobb/spark SPARK-8288

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/23062.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #23062
    
----
commit e27f933710c289d5bc1ddcad38eecde0e555ac60
Author: Drew Robb <drewrobb@...>
Date:   2017-07-29T01:45:00Z

    ScalaReflection can use companion object constructor

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to