Hi, We can add these new SQL types by adding support to the file formats first. But the most important and immediate goal is reserving these types for their desired meaning and that can already be done without such support.
Of course, eventually the new types need to be implemented as well, and for that we would need support from the file format components. I have already contacted the Avro, ORC, Parquet, Arrow, Kudu, Iceberg and CarbonData communities to let them know of this new requirement. Parquet, Arrow and Iceberg already has semantics metadata that supports LocalDateTime and Instant semantics and we plan to actively drive their addition to Avro and would also be happy to contribute to ORC. Regarding the OffsetDateTime semantics, I don't know about any file format that would already support it natively. Alternatively, we could also do the new types without such support, in which case the semantics metadata could not be deduced from the files themselves but would have to come directly from the user (at least initially). This will be the case for text files for example, where no metadata can be stored in the files. I think we should reserve this way for file formats where having proper metadata in the files is impossible (text files) or where the developers of a file format component prefer not to add new types for this purpose (unlikely but possible). Br, Zoltan On Thu, Feb 21, 2019 at 8:32 AM Wenchen Fan <cloud0...@gmail.com> wrote: > I think this is the right direction to go, but I'm wondering how can Spark > support these new types if the underlying data sources(like parquet files) > do not support them yet. > > I took a quick look at the new doc for file formats, but not sure what's > the proposal. Are we going to implement these new types in Parquet/Orc > first? Or are we going to use low-level physical types directly and add > Spark-specific metadata to Parquet/Orc files? > > On Wed, Feb 20, 2019 at 10:57 PM Zoltan Ivanfi <z...@cloudera.com.invalid> > wrote: > > > Hi, > > > > Last december we shared a timestamp harmonization proposal > > <https://goo.gl/VV88c5> with the Hive, Spark and Impala communities. > This > > was followed by an extensive discussion in January that lead to various > > updates and improvements to the proposal, as well as the creation of a > new > > document for file format components. February has been quiet regarding > this > > topic and the latest revision of the proposal has been steady in the > recent > > weeks. > > > > In short, the following is being proposed (please see the document for > > details): > > > > - The TIMESTAMP WITHOUT TIME ZONE type should have LocalDateTime > > semantics. > > - The TIMESTAMP WITH LOCAL TIME ZONE type should have Instant > > semantics. > > - The TIMESTAMP WITH TIME ZONE type should have OffsetDateTime > > semantics. > > > > This proposal is in accordance with the SQL standard and many major DB > > engines. > > > > Based on the feedback we got I believe that the latest revision of the > > proposal addresses the needs of all affected components, therefore I > would > > like to move forward and create JIRA-s and/or roadmap documentation pages > > for the desired semantics of the different SQL types according to the > > proposal. > > > > Please let me know if you have any remaning concerns about the proposal > or > > about the course of action outlined above. > > > > Thanks, > > > > Zoltan > > >