Re: to_avro and from_avro not working with struct type in spark 2.4

2019-03-01 Thread Gabor Somogyi
> I am thinking of writing out the dfKV dataframe to disk and then use Avro apis to read the data. Ping me if you have something, I'm planning similar things... On Thu, Feb 28, 2019 at 5:27 PM Hien Luu wrote: > Thanks for the answer. > > As far as the next step goes, I am thinking of writing

Re: to_avro and from_avro not working with struct type in spark 2.4

2019-02-28 Thread Hien Luu
Thanks for the answer. As far as the next step goes, I am thinking of writing out the dfKV dataframe to disk and then use Avro apis to read the data. This smells like a bug somewhere. Cheers, Hien On Thu, Feb 28, 2019 at 4:02 AM Gabor Somogyi wrote: > No, just take a look at the schema of

Re: to_avro and from_avro not working with struct type in spark 2.4

2019-02-28 Thread Gabor Somogyi
No, just take a look at the schema of dfStruct since you've converted its value column with to_avro: scala> dfStruct.printSchema root |-- id: integer (nullable = false) |-- name: string (nullable = true) |-- age: integer (nullable = false) |-- value: struct (nullable = false) ||-- name:

Re: to_avro and from_avro not working with struct type in spark 2.4

2019-02-27 Thread Hien Luu
Thanks for looking into this. Does this mean string fields should alway be nullable? You are right that the result is not yet correct and further digging is needed :( On Wed, Feb 27, 2019 at 1:19 AM Gabor Somogyi wrote: > Hi, > > I was dealing with avro stuff lately and most of the time it

Re: to_avro and from_avro not working with struct type in spark 2.4

2019-02-27 Thread Gabor Somogyi
Hi, I was dealing with avro stuff lately and most of the time it has something to do with the schema. One thing I've pinpointed quickly (where I was struggling also) is the name field should be nullable but the result is not yet correct so further digging needed... scala> val expectedSchema =

to_avro and from_avro not working with struct type in spark 2.4

2019-02-26 Thread Hien Luu
Hi, I ran into a pretty weird issue with to_avro and from_avro where it was not able to parse the data in a struct correctly. Please see the simple and self contained example below. I am using Spark 2.4. I am not sure if I missed something. This is how I start the spark-shell on my Mac: