Thanks for the answer.

As far as the next step goes, I am thinking of writing out the dfKV
dataframe to disk and then use Avro apis to read the data.

This smells like a bug somewhere.

Cheers,

Hien

On Thu, Feb 28, 2019 at 4:02 AM Gabor Somogyi <gabor.g.somo...@gmail.com>
wrote:

> No, just take a look at the schema of dfStruct since you've converted its
> value column with to_avro:
>
> scala> dfStruct.printSchema
> root
>  |-- id: integer (nullable = false)
>  |-- name: string (nullable = true)
>  |-- age: integer (nullable = false)
>  |-- value: struct (nullable = false)
>  |    |-- name: string (nullable = true)
>  |    |-- age: integer (nullable = false)
>
>
> On Wed, Feb 27, 2019 at 6:51 PM Hien Luu <hien...@gmail.com> wrote:
>
>> Thanks for looking into this.  Does this mean string fields should alway
>> be nullable?
>>
>> You are right that the result is not yet correct and further digging is
>> needed :(
>>
>> On Wed, Feb 27, 2019 at 1:19 AM Gabor Somogyi <gabor.g.somo...@gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> I was dealing with avro stuff lately and most of the time it has
>>> something to do with the schema.
>>> One thing I've pinpointed quickly (where I was struggling also) is the
>>> name field should be nullable but the result is not yet correct so further
>>> digging needed...
>>>
>>> scala> val expectedSchema = StructType(Seq(StructField("name",
>>> StringType,true),StructField("age", IntegerType, false)))
>>> expectedSchema: org.apache.spark.sql.types.StructType =
>>> StructType(StructField(name,StringType,true),
>>> StructField(age,IntegerType,false))
>>>
>>> scala> val avroTypeStruct =
>>> SchemaConverters.toAvroType(expectedSchema).toString
>>> avroTypeStruct: String =
>>> {"type":"record","name":"topLevelRecord","fields":[{"name":"name","type":["string","null"]},{"name":"age","type":"int"}]}
>>>
>>> scala> dfKV.select(from_avro('value, avroTypeStruct)).show
>>> +---------------------------------------------+
>>> |from_avro(value, struct<name:string,age:int>)|
>>> +---------------------------------------------+
>>> |                              [Mary Jane, 25]|
>>> |                              [Mary Jane, 25]|
>>> +---------------------------------------------+
>>>
>>> BR,
>>> G
>>>
>>>
>>> On Wed, Feb 27, 2019 at 7:43 AM Hien Luu <hien...@gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> I ran into a pretty weird issue with to_avro and from_avro where it was
>>>> not
>>>> able to parse the data in a struct correctly.  Please see the simple and
>>>> self contained example below. I am using Spark 2.4.  I am not sure if I
>>>> missed something.
>>>>
>>>> This is how I start the spark-shell on my Mac:
>>>>
>>>> ./bin/spark-shell --packages org.apache.spark:spark-avro_2.11:2.4.0
>>>>
>>>> import org.apache.spark.sql.types._
>>>> import org.apache.spark.sql.avro._
>>>> import org.apache.spark.sql.functions._
>>>>
>>>>
>>>> spark.version
>>>>
>>>> val df = Seq((1, "John Doe",  30), (2, "Mary Jane", 25)).toDF("id",
>>>> "name",
>>>> "age")
>>>>
>>>> val dfStruct = df.withColumn("value", struct("name","age"))
>>>>
>>>> dfStruct.show
>>>> dfStruct.printSchema
>>>>
>>>> val dfKV = dfStruct.select(to_avro('id).as("key"),
>>>> to_avro('value).as("value"))
>>>>
>>>> val expectedSchema = StructType(Seq(StructField("name", StringType,
>>>> false),StructField("age", IntegerType, false)))
>>>>
>>>> val avroTypeStruct =
>>>> SchemaConverters.toAvroType(expectedSchema).toString
>>>>
>>>> val avroTypeStr = s"""
>>>>       |{
>>>>       |  "type": "int",
>>>>       |  "name": "key"
>>>>       |}
>>>>     """.stripMargin
>>>>
>>>>
>>>> dfKV.select(from_avro('key, avroTypeStr)).show
>>>>
>>>> // output
>>>> +-------------------+
>>>> |from_avro(key, int)|
>>>> +-------------------+
>>>> |                  1|
>>>> |                  2|
>>>> +-------------------+
>>>>
>>>> dfKV.select(from_avro('value, avroTypeStruct)).show
>>>>
>>>> // output
>>>> +---------------------------------------------+
>>>> |from_avro(value, struct<name:string,age:int>)|
>>>> +---------------------------------------------+
>>>> |                                        [, 9]|
>>>> |                                        [, 9]|
>>>> +---------------------------------------------+
>>>>
>>>> Please help and thanks in advance.
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>>>>
>>>>
>>
>> --
>> Regards,
>>
>

-- 
Regards,

Reply via email to