Shubham, DataSourceV2 passes Spark's internal representation to your source and expects Spark's internal representation back from the source. That's why you consume and produce InternalRow: "internal" indicates that Spark doesn't need to convert the values.
Spark's internal representation for a date is the ordinal from the unix epoch date, 1970-01-01 = 0. rb On Tue, Feb 5, 2019 at 4:46 AM Shubham Chaurasia <shubh.chaura...@gmail.com> wrote: > Hi All, > > I am using custom DataSourceV2 implementation (*Spark version 2.3.2*) > > Here is how I am trying to pass in *date type *from spark shell. > > scala> val df = >> sc.parallelize(Seq("2019-02-05")).toDF("datetype").withColumn("datetype", >> col("datetype").cast("date")) >> scala> df.write.format("com.shubham.MyDataSource").save > > > Below is the minimal write() method of my DataWriter implementation. > > @Override > public void write(InternalRow record) throws IOException { > ByteArrayOutputStream format = streamingRecordFormatter.format(record); > System.out.println("MyDataWriter.write: " + record.get(0, > DataTypes.DateType)); > > } > > It prints an integer as output: > > MyDataWriter.write: 17039 > > > Is this a bug? or I am doing something wrong? > > Thanks, > Shubham > -- Ryan Blue Software Engineer Netflix