[ 
https://issues.apache.org/jira/browse/SPARK-27388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Taoufik DACHRAOUI updated SPARK-27388:
--------------------------------------
    Description: 
*What changes were proposed in this pull request?*

The PR adds support for bean objects, java.util.List, java.util.Map, 
java.nio.ByteBuffer and java enums to ScalaReflection; unlike the existing 
javaBean Encoder, properties can be named without the set/get prefix (this is 
one of the key points that allows the encoding of Avro Fixed types. I believe, 
the other key point is that the addition must be in ScalaReflection).

Reminder of Avro types:
 * primitive types: null, boolean, int, long, float, double, bytes, string
 * complex types: Records, Enums, Arrays, Maps, Unions, Fixed

This PR supports simple unions (having a null type and a non-null type) but not 
complex unions for the simple reason that the Avro compiler will generate java 
code with type Object for all complex unions, and fields with simple unions 
will be typed as the non-null type of the union.

 

*How was this patch tested?*

currently only 1 encodeDecodeTest was added to ExpressionEncoderSuite; the test 
uses the following avro schema:
{code:java}
{"namespace": "org.apache.spark.sql.catalyst.encoders", "type": "record", 
"name": "AvroExample1",
 "fields": [
 
{"name":"mymoney","type":["null",{"type":"record","name":"Money","namespace":"org.apache.spark.sql.catalyst.encoders","fields":[
 {"name":"amount","type":"float","default":0},
 
{"name":"currency","type":{"type":"enum","name":"Currency","symbols":["EUR","USD","BRL"]},"default":"EUR"}]}],
 "default":null},
 {"name": "myfloat", "type": "float"},
 {"name": "mylong", "type": "long"},
 {"name": "myint", "type": "int"},
 {"name": "mydouble", "type": "double"},
 {"name": "myboolean", "type": "boolean"},
 {"name": "mystring", "type": "string"},
 {"name": "mybytes", "type": "bytes"},
 {"name": "myfixed", "type": {"type": "fixed", "name": "Magic", "size": 4}},
 {"name": "myarray", "type": {"type": "array", "items": "string"}},
 {"name": "mymap", "type": {"type": "map", "values": "int"}}
 ] }{code}
 

 

  was:
*What changes were proposed in this pull request?*

This PR adds expression encoders for beans, java.util.List, java.util.Map and 
java enum.

The Beans are objects defined by properties; A property is defined by a setter 
and a getter functions where the getter return type is equal to the setter 
unique parameter type and the getter and setter functions have the same name; 
if the getter name is prefixed by "get" then the setter name must be prefixed 
by "set"; see tests for bean examples.

Avro objects are beans and thus we can create an expression encoder for avro 
objects as follows:
{code:java}
implicit val exprEncoder = ExpressionEncoder[Foo]()
{code}
All avro types, including fixed types, and excluding complex union types, are 
suppported by this addition.

The avro fixed types are beans with exactly one property: bytes.

Currently complex avro unions are not supported because a complex union is 
declared as Object and there cannot be an expression encoder for Object type 
(need to use a custom serializer like kryo for example)

*How was this patch tested?*

currently only 1 encodeDecodeTest was added to ExpressionEncoderSuite; the test 
uses the following avro schema:
{code:java}
{"namespace": "org.apache.spark.sql.catalyst.encoders", "type": "record", 
"name": "AvroExample1",
 "fields": [
 
{"name":"mymoney","type":["null",{"type":"record","name":"Money","namespace":"org.apache.spark.sql.catalyst.encoders","fields":[
 {"name":"amount","type":"float","default":0},
 
{"name":"currency","type":{"type":"enum","name":"Currency","symbols":["EUR","USD","BRL"]},"default":"EUR"}]}],
 "default":null},
 {"name": "myfloat", "type": "float"},
 {"name": "mylong", "type": "long"},
 {"name": "myint", "type": "int"},
 {"name": "mydouble", "type": "double"},
 {"name": "myboolean", "type": "boolean"},
 {"name": "mystring", "type": "string"},
 {"name": "mybytes", "type": "bytes"},
 {"name": "myfixed", "type": {"type": "fixed", "name": "Magic", "size": 4}},
 {"name": "myarray", "type": {"type": "array", "items": "string"}},
 {"name": "mymap", "type": {"type": "map", "values": "int"}}
 ] }{code}
 

 


> expression encoder for avro like objects
> ----------------------------------------
>
>                 Key: SPARK-27388
>                 URL: https://issues.apache.org/jira/browse/SPARK-27388
>             Project: Spark
>          Issue Type: New Feature
>          Components: SQL
>    Affects Versions: 2.4.1
>            Reporter: Taoufik DACHRAOUI
>            Priority: Major
>
> *What changes were proposed in this pull request?*
> The PR adds support for bean objects, java.util.List, java.util.Map, 
> java.nio.ByteBuffer and java enums to ScalaReflection; unlike the existing 
> javaBean Encoder, properties can be named without the set/get prefix (this is 
> one of the key points that allows the encoding of Avro Fixed types. I 
> believe, the other key point is that the addition must be in ScalaReflection).
> Reminder of Avro types:
>  * primitive types: null, boolean, int, long, float, double, bytes, string
>  * complex types: Records, Enums, Arrays, Maps, Unions, Fixed
> This PR supports simple unions (having a null type and a non-null type) but 
> not complex unions for the simple reason that the Avro compiler will generate 
> java code with type Object for all complex unions, and fields with simple 
> unions will be typed as the non-null type of the union.
>  
> *How was this patch tested?*
> currently only 1 encodeDecodeTest was added to ExpressionEncoderSuite; the 
> test uses the following avro schema:
> {code:java}
> {"namespace": "org.apache.spark.sql.catalyst.encoders", "type": "record", 
> "name": "AvroExample1",
>  "fields": [
>  
> {"name":"mymoney","type":["null",{"type":"record","name":"Money","namespace":"org.apache.spark.sql.catalyst.encoders","fields":[
>  {"name":"amount","type":"float","default":0},
>  
> {"name":"currency","type":{"type":"enum","name":"Currency","symbols":["EUR","USD","BRL"]},"default":"EUR"}]}],
>  "default":null},
>  {"name": "myfloat", "type": "float"},
>  {"name": "mylong", "type": "long"},
>  {"name": "myint", "type": "int"},
>  {"name": "mydouble", "type": "double"},
>  {"name": "myboolean", "type": "boolean"},
>  {"name": "mystring", "type": "string"},
>  {"name": "mybytes", "type": "bytes"},
>  {"name": "myfixed", "type": {"type": "fixed", "name": "Magic", "size": 4}},
>  {"name": "myarray", "type": {"type": "array", "items": "string"}},
>  {"name": "mymap", "type": {"type": "map", "values": "int"}}
>  ] }{code}
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to