Whether it writes the data as garbage or string representation, this is not able to load back. So, I'd say both are wrong and bugs.
I think it'd be great if we can write and read back CSV in its own format but I guess we can't for now. 2016-08-20 2:54 GMT+09:00 Efe Selcuk <efema...@gmail.com>: > Okay so this is partially PEBKAC. I just noticed that there's a debugging > field at the end that's another case class with its own simple fields - > *that's* the struct that was showing up in the error, not the entry > itself. > > This raises a different question. What has changed that this is no longer > possible? The pull request said that it prints garbage. Was that some > regression in 2.0? The same code prints fine in 1.6.1. The field prints as > an array of the values of its fields. > > On Thu, Aug 18, 2016 at 5:56 PM, Hyukjin Kwon <gurwls...@gmail.com> wrote: > >> Ah, BTW, there is an issue, SPARK-16216, about printing dates and >> timestamps here. So please ignore the integer values for dates >> >> 2016-08-19 9:54 GMT+09:00 Hyukjin Kwon <gurwls...@gmail.com>: >> >>> Ah, sorry, I should have read this carefully. Do you mind if I ask your >>> codes to test? >>> >>> I would like to reproduce. >>> >>> >>> I just tested this by myself but I couldn't reproduce as below (is this >>> what your doing, right?): >>> >>> case class ClassData(a: String, b: Date) >>> >>> val ds: Dataset[ClassData] = Seq( >>> ("a", Date.valueOf("1990-12-13")), >>> ("a", Date.valueOf("1990-12-13")), >>> ("a", Date.valueOf("1990-12-13")) >>> ).toDF("a", "b").as[ClassData] >>> ds.write.csv("/tmp/data.csv") >>> spark.read.csv("/tmp/data.csv").show() >>> >>> prints as below: >>> >>> +---+----+ >>> |_c0| _c1| >>> +---+----+ >>> | a|7651| >>> | a|7651| >>> | a|7651| >>> +---+----+ >>> >>> >>> >>> 2016-08-19 9:27 GMT+09:00 Efe Selcuk <efema...@gmail.com>: >>> >>>> Thanks for the response. The problem with that thought is that I don't >>>> think I'm dealing with a complex nested type. It's just a dataset where >>>> every record is a case class with only simple types as fields, strings and >>>> dates. There's no nesting. >>>> >>>> That's what confuses me about how it's interpreting the schema. The >>>> schema seems to be one complex field rather than a bunch of simple fields. >>>> >>>> On Thu, Aug 18, 2016, 5:07 PM Hyukjin Kwon <gurwls...@gmail.com> wrote: >>>> >>>>> Hi Efe, >>>>> >>>>> If my understanding is correct, supporting to write/read complex types >>>>> is not supported because CSV format can't represent the nested types in >>>>> its >>>>> own format. >>>>> >>>>> I guess supporting them in writing in external CSV is rather a bug. >>>>> >>>>> I think it'd be great if we can write and read back CSV in its own >>>>> format but I guess we can't. >>>>> >>>>> Thanks! >>>>> >>>>> On 19 Aug 2016 6:33 a.m., "Efe Selcuk" <efema...@gmail.com> wrote: >>>>> >>>>>> We have an application working in Spark 1.6. It uses the databricks >>>>>> csv library for the output format when writing out. >>>>>> >>>>>> I'm attempting an upgrade to Spark 2. When writing with both the >>>>>> native DataFrameWriter#csv() method and with first specifying the >>>>>> "com.databricks.spark.csv" format (I suspect underlying format is the >>>>>> same >>>>>> but I don't know how to verify), I get the following error: >>>>>> >>>>>> java.lang.UnsupportedOperationException: CSV data source does not >>>>>> support struct<[bunch of field names and types]> data type >>>>>> >>>>>> There are 20 fields, mostly plain strings with a couple of dates. The >>>>>> source object is a Dataset[T] where T is a case class with various fields >>>>>> The line just looks like: someDataset.write.csv(outputPath) >>>>>> >>>>>> Googling returned this fairly recent pull request: >>>>>> https://mail-archives.apache.org/mod_mbox/spark-com >>>>>> mits/201605.mbox/%3c65d35a72bd05483392857098a2635...@git.apa >>>>>> che.org%3E >>>>>> >>>>>> If I'm reading that correctly, the schema shows that each record has >>>>>> one field of this complex struct type? And the validation thinks it's >>>>>> something that it can't serialize. I would expect the schema to have a >>>>>> bunch of fields in it matching the case class, so maybe there's something >>>>>> I'm misunderstanding. >>>>>> >>>>>> Efe >>>>>> >>>>> >>> >> >