GitHub user viirya reopened a pull request:

    https://github.com/apache/spark/pull/22749

    [SPARK-25746][SQL] Refactoring ExpressionEncoder to get rid of flat flag

    ## What changes were proposed in this pull request?
    
    This is inspired during implementing #21732. For now `ScalaReflection` 
needs to consider how `ExpressionEncoder` uses generated serializers and 
deserializers. And `ExpressionEncoder` has a weird `flat` flag. After 
discussion with @cloud-fan, it seems to be better to refactor 
`ExpressionEncoder`. It should make SPARK-24762 easier to do.
    
    To summarize the proposed changes:
    
    1. `serializerFor` and `deserializerFor` return expressions for 
serializing/deserializing an input expression for a given type. They are 
private and should not be called directly.
    2. `serializerForType` and `deserializerForType` returns an expression for 
serializing/deserializing for an object of type T to/from Spark SQL 
representation. It assumes the input object/Spark SQL representation is located 
at ordinal 0 of a row.
    
    So in other words, `serializerForType` and `deserializerForType` return 
expressions for atomically serializing/deserializing JVM object to/from Spark 
SQL value.
    
    A serializer returned by `serializerForType` will serialize an object at 
`row(0)` to a corresponding Spark SQL representation, e.g. primitive type, 
array, map, struct.
    
    A deserializer returned by `deserializerForType` will deserialize an input 
field at `row(0)` to an object with given type.
    
    3. The construction of `ExpressionEncoder` takes a pair of serializer and 
deserializer for type `T`. It uses them to create serializer and deserializer 
for T <-> row serialization. Now `ExpressionEncoder` dones't need to remember 
if serializer is flat or not. When we need to construct new `ExpressionEncoder` 
based on existing ones, we only need to change input location in the atomic 
serializer and deserializer.
    
    ## How was this patch tested?
    
    Existing tests.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/viirya/spark-1 SPARK-24762-refactor

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/22749.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #22749
    
----
commit e1b5deebe715479125c8878f0c90a55dc9ab3e85
Author: Liang-Chi Hsieh <viirya@...>
Date:   2018-07-09T03:42:04Z

    Aggregator should be able to use Option of Product encoder.

commit 80506f4e98184ccd66dbaac14ec52d69c358020d
Author: Liang-Chi Hsieh <viirya@...>
Date:   2018-07-13T04:40:55Z

    Enable top-level Option of Product encoders.

commit ed3d5cb697b10af2e2cf4c78ab521d4d0b2f3c9b
Author: Liang-Chi Hsieh <viirya@...>
Date:   2018-08-24T04:26:28Z

    Remove topLevel parameter.

commit 9fc3f6165156051142a8366a32726badaaa16bb7
Author: Liang-Chi Hsieh <viirya@...>
Date:   2018-08-24T04:37:39Z

    Merge remote-tracking branch 'upstream/master' into SPARK-24762

commit 5f95bd0cf1bd308c7df55c41caef7a9f19368f5d
Author: Liang-Chi Hsieh <viirya@...>
Date:   2018-08-24T04:42:33Z

    Remove useless change.

commit a4f04055b2ba22f371663565710328791942855a
Author: Liang-Chi Hsieh <viirya@...>
Date:   2018-08-24T14:38:16Z

    Add more tests.

commit c1f798f7e9cba0d04223eed06f1b1f547ec29dc5
Author: Liang-Chi Hsieh <viirya@...>
Date:   2018-08-25T01:52:01Z

    Add test.

commit 80e11d289d7775863cb9c28b2c1d4364292048a4
Author: Liang-Chi Hsieh <viirya@...>
Date:   2018-10-06T04:06:57Z

    Merge remote-tracking branch 'upstream/master' into SPARK-24762

commit 0f029b0a28700334dc6334f1ad89b3124f235a51
Author: Liang-Chi Hsieh <viirya@...>
Date:   2018-10-06T04:40:07Z

    Improve code comments.

commit 84f3ce07f2f6a9236bd27f927fbb877e937f6917
Author: Liang-Chi Hsieh <viirya@...>
Date:   2018-10-15T09:55:03Z

    Refactoring ExpressionEncoder.

commit 6a6fa454e22728cc2ad8e5515cd587fe0be84b26
Author: Liang-Chi Hsieh <viirya@...>
Date:   2018-10-17T02:07:40Z

    Fix Malformed class name.

commit 25a616286075ca4f0a7d528095b387172b05c6c3
Author: Liang-Chi Hsieh <viirya@...>
Date:   2018-10-17T05:11:10Z

    Fix error message.

commit 295ecde8103c26dda169d931f939f8a2fe641c4c
Author: Liang-Chi Hsieh <viirya@...>
Date:   2018-10-18T15:58:03Z

    Fix test.

commit 85a91220ec4eb00bd9d5020ecf980eac0301f716
Author: Liang-Chi Hsieh <viirya@...>
Date:   2018-10-18T16:05:22Z

    Merge remote-tracking branch 'upstream/master' into SPARK-24762-refactor

commit 35700f4a0f36fb397ac028a68011a2753c5c2c75
Author: Liang-Chi Hsieh <viirya@...>
Date:   2018-10-19T00:07:29Z

    Fix rebase error.

commit b211ed069dceb33c45cf6caf12c19527334d4ad8
Author: Liang-Chi Hsieh <viirya@...>
Date:   2018-10-19T00:16:24Z

    Fix unintentional style change.

commit 0c78b73e5abce2a51763c860e43aab214c8634d9
Author: Liang-Chi Hsieh <viirya@...>
Date:   2018-10-19T00:51:52Z

    Address comments.

commit 5b9abb67907dfdb0c0c64751db3525564f832422
Author: Liang-Chi Hsieh <viirya@...>
Date:   2018-10-20T02:26:07Z

    Address ComplexTypeMergingExpression issue.

commit 7432344143fb4889ed3d5cbde21872c8fdd6d3f1
Author: Liang-Chi Hsieh <viirya@...>
Date:   2018-10-20T12:47:37Z

    Try more reasonable solution.

commit 400f87817183640006140e2db1839f8d78a13856
Author: Liang-Chi Hsieh <viirya@...>
Date:   2018-10-22T02:56:20Z

    Address comment.

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to