Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21847#discussion_r208841399
  
    --- Diff: 
external/avro/src/main/scala/org/apache/spark/sql/avro/AvroSerializer.scala ---
    @@ -87,10 +87,18 @@ class AvroSerializer(rootCatalystType: DataType, 
rootAvroType: Schema, nullable:
             (getter, ordinal) => getter.getDouble(ordinal)
           case d: DecimalType =>
             (getter, ordinal) => getter.getDecimal(ordinal, d.precision, 
d.scale).toString
    -      case StringType =>
    -        (getter, ordinal) => new 
Utf8(getter.getUTF8String(ordinal).getBytes)
    -      case BinaryType =>
    -        (getter, ordinal) => ByteBuffer.wrap(getter.getBinary(ordinal))
    +      case StringType => avroType.getType match {
    +        case Type.ENUM =>
    +          (getter, ordinal) => new EnumSymbol(avroType, 
getter.getUTF8String(ordinal).toString)
    +        case _ =>
    +          (getter, ordinal) => new 
Utf8(getter.getUTF8String(ordinal).getBytes)
    +      }
    +      case BinaryType => avroType.getType match {
    +        case Type.FIXED =>
    --- End diff --
    
    FIXED has a "size" attribute, shall we consider it when preparing the 
bytes? e.g. shall we throw exception if the bytes from Spark exceed the size, 
and shall we padding the bytes when its length is smaller than the size.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to