Hi Vinoth,

Thank you for looking into this. I am planning to try out this over this 
weekend if it is possible. Just downloaded the 0.4.6 version of Hudi.
I think we can start with a very simple schema as below (copied from the Hudi's 
own example)?

val EXAMPLE_SCHEMA = "{\"type\": \"record\"," + "\"name\": \"hudirec\"," + 
"\"fields\": [ " +
"{\"name\": \"timestamp\", \"type\": \"double\"}," +
"{\"name\": \"_row_key\", \"type\": \"string\"}," +
"{\"name\": \"trade_date\", \"type\": \"string\"}," +
"{\"name\": \"bats\", \"type\": \"int\"}]}";

And sample data could be (cricket bats):
kabeer,2018-11-15T07:35:54.387Z,7
vinoth,2018-11-16T09:35:54.387Z,9
I did try passing several combinations in the timestamp field above to indicate 
the logicalType to timestamp but no success. I was using Hive 1.1 compile time 
flag but I was not worried about reading data through Hive. I could see in the 
generated parquet that the timestamp field was NOT INT96 timestamp format that 
the parquet expects.

Keep me posted as to how you get along with this and I shall keep you posted if 
I find any joy sooner than yourself.
Thanks
Kabeer.

On Mar 23 2019, at 12:05 am, Vinoth Chandar <[email protected]> wrote:
> Hi Kabeer,
>
> I spent time looking at the issue and its other linked issues as well.
> High level, seems like we need to change the data type mappings for these
> date/timestamp types..
> It does seem doable, given Avro also supports date/timestamp types..
>
> Do you have some sample schema/data generation that we can start with?
> Thanks
> Vinoth
>
> On Fri, Mar 15, 2019 at 11:19 AM Vinoth Chandar <[email protected]> wrote:
> > Hi Kabeer,
> > Thanks for bringing this up. I don't think we have actually hit this
> > before :)
> >
> > Let me spend sometime understanding the issue and get back to you
> > Thanks
> > Vinoth
> >
> > On Thu, Mar 14, 2019 at 10:46 PM Kabeer Ahmed <[email protected]>
> > wrote:
> >
> > > Hi,
> > > https://github.com/apache/incubator-hudi/issues/547 (
> > > https://link.getmailspring.com/link/[email protected]/0?redirect=https%3A%2F%2Fgithub.com%2Fapache%2Fincubator-hudi%2Fissues%2F547&recipient=ZGV2QGh1ZGkuYXBhY2hlLm9yZw%3D%3D)
> > > has resulted in the jira https://issues.apache.org/jira/browse/HUDI-12 (
> > > https://link.getmailspring.com/link/[email protected]/1?redirect=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FHUDI-12&recipient=ZGV2QGh1ZGkuYXBhY2hlLm9yZw%3D%3D
> > > ).
> > > The requirement is to be able to interpret timestamp from CSV and store
> > > it in the parquet table. Does anyone have a working example on these 
> > > lines?
> > > Going by the Hudi example from the GitHub:
> > > Timestamp is being encoded in avro as double:
> > > https://github.com/apache/incubator-hudi/blob/master/hoodie-client/src/test/java/com/uber/hoodie/common/HoodieTestDataGenerator.java#L69
> > > (
> > > https://link.getmailspring.com/link/[email protected]/2?redirect=https%3A%2F%2Fgithub.com%2Fapache%2Fincubator-hudi%2Fblob%2Fmaster%2Fhoodie-client%2Fsrc%2Ftest%2Fjava%2Fcom%2Fuber%2Fhoodie%2Fcommon%2FHoodieTestDataGenerator.java%23L69&recipient=ZGV2QGh1ZGkuYXBhY2hlLm9yZw%3D%3D
> > > )
> > >
> > > The end result is that parquet field for timestamp is not of timestamp
> > > (INT96).
> > >
> > > My best guess is that this would have been a requirement at Uber
> > > (tracking trips in minutes and seconds) and how is it being handled.
> > >
> > > If anyone else has handled this and has an example that can be shared, it
> > > will be much appreciated.
> > > Kabeer Ahmed, http://www.linkedin.com/in/kabeerahmed
> >
>
>

Reply via email to