No, just take a look at the schema of dfStruct since you've converted its value column with to_avro:
scala> dfStruct.printSchema root |-- id: integer (nullable = false) |-- name: string (nullable = true) |-- age: integer (nullable = false) |-- value: struct (nullable = false) | |-- name: string (nullable = true) | |-- age: integer (nullable = false) On Wed, Feb 27, 2019 at 6:51 PM Hien Luu <hien...@gmail.com> wrote: > Thanks for looking into this. Does this mean string fields should alway > be nullable? > > You are right that the result is not yet correct and further digging is > needed :( > > On Wed, Feb 27, 2019 at 1:19 AM Gabor Somogyi <gabor.g.somo...@gmail.com> > wrote: > >> Hi, >> >> I was dealing with avro stuff lately and most of the time it has >> something to do with the schema. >> One thing I've pinpointed quickly (where I was struggling also) is the >> name field should be nullable but the result is not yet correct so further >> digging needed... >> >> scala> val expectedSchema = StructType(Seq(StructField("name", >> StringType,true),StructField("age", IntegerType, false))) >> expectedSchema: org.apache.spark.sql.types.StructType = >> StructType(StructField(name,StringType,true), >> StructField(age,IntegerType,false)) >> >> scala> val avroTypeStruct = >> SchemaConverters.toAvroType(expectedSchema).toString >> avroTypeStruct: String = >> {"type":"record","name":"topLevelRecord","fields":[{"name":"name","type":["string","null"]},{"name":"age","type":"int"}]} >> >> scala> dfKV.select(from_avro('value, avroTypeStruct)).show >> +---------------------------------------------+ >> |from_avro(value, struct<name:string,age:int>)| >> +---------------------------------------------+ >> | [Mary Jane, 25]| >> | [Mary Jane, 25]| >> +---------------------------------------------+ >> >> BR, >> G >> >> >> On Wed, Feb 27, 2019 at 7:43 AM Hien Luu <hien...@gmail.com> wrote: >> >>> Hi, >>> >>> I ran into a pretty weird issue with to_avro and from_avro where it was >>> not >>> able to parse the data in a struct correctly. Please see the simple and >>> self contained example below. I am using Spark 2.4. I am not sure if I >>> missed something. >>> >>> This is how I start the spark-shell on my Mac: >>> >>> ./bin/spark-shell --packages org.apache.spark:spark-avro_2.11:2.4.0 >>> >>> import org.apache.spark.sql.types._ >>> import org.apache.spark.sql.avro._ >>> import org.apache.spark.sql.functions._ >>> >>> >>> spark.version >>> >>> val df = Seq((1, "John Doe", 30), (2, "Mary Jane", 25)).toDF("id", >>> "name", >>> "age") >>> >>> val dfStruct = df.withColumn("value", struct("name","age")) >>> >>> dfStruct.show >>> dfStruct.printSchema >>> >>> val dfKV = dfStruct.select(to_avro('id).as("key"), >>> to_avro('value).as("value")) >>> >>> val expectedSchema = StructType(Seq(StructField("name", StringType, >>> false),StructField("age", IntegerType, false))) >>> >>> val avroTypeStruct = SchemaConverters.toAvroType(expectedSchema).toString >>> >>> val avroTypeStr = s""" >>> |{ >>> | "type": "int", >>> | "name": "key" >>> |} >>> """.stripMargin >>> >>> >>> dfKV.select(from_avro('key, avroTypeStr)).show >>> >>> // output >>> +-------------------+ >>> |from_avro(key, int)| >>> +-------------------+ >>> | 1| >>> | 2| >>> +-------------------+ >>> >>> dfKV.select(from_avro('value, avroTypeStruct)).show >>> >>> // output >>> +---------------------------------------------+ >>> |from_avro(value, struct<name:string,age:int>)| >>> +---------------------------------------------+ >>> | [, 9]| >>> | [, 9]| >>> +---------------------------------------------+ >>> >>> Please help and thanks in advance. >>> >>> >>> >>> >>> -- >>> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ >>> >>> --------------------------------------------------------------------- >>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org >>> >>> > > -- > Regards, >