Will give this a shot as well. Between this and the S3 thing, what's
blocking progress? both? ;) ?

On Sat, Mar 23, 2019 at 7:10 PM Kabeer Ahmed <[email protected]> wrote:

> Hi Vinoth,
>
> Thank you for looking into this. I am planning to try out this over this
> weekend if it is possible. Just downloaded the 0.4.6 version of Hudi.
> I think we can start with a very simple schema as below (copied from the
> Hudi's own example)?
>
> val EXAMPLE_SCHEMA = "{\"type\": \"record\"," + "\"name\": \"hudirec\"," +
> "\"fields\": [ " +
> "{\"name\": \"timestamp\", \"type\": \"double\"}," +
> "{\"name\": \"_row_key\", \"type\": \"string\"}," +
> "{\"name\": \"trade_date\", \"type\": \"string\"}," +
> "{\"name\": \"bats\", \"type\": \"int\"}]}";
>
> And sample data could be (cricket bats):
> kabeer,2018-11-15T07:35:54.387Z,7
> vinoth,2018-11-16T09:35:54.387Z,9
> I did try passing several combinations in the timestamp field above to
> indicate the logicalType to timestamp but no success. I was using Hive 1.1
> compile time flag but I was not worried about reading data through Hive. I
> could see in the generated parquet that the timestamp field was NOT INT96
> timestamp format that the parquet expects.
>
> Keep me posted as to how you get along with this and I shall keep you
> posted if I find any joy sooner than yourself.
> Thanks
> Kabeer.
>
> On Mar 23 2019, at 12:05 am, Vinoth Chandar <[email protected]> wrote:
> > Hi Kabeer,
> >
> > I spent time looking at the issue and its other linked issues as well.
> > High level, seems like we need to change the data type mappings for these
> > date/timestamp types..
> > It does seem doable, given Avro also supports date/timestamp types..
> >
> > Do you have some sample schema/data generation that we can start with?
> > Thanks
> > Vinoth
> >
> > On Fri, Mar 15, 2019 at 11:19 AM Vinoth Chandar <[email protected]>
> wrote:
> > > Hi Kabeer,
> > > Thanks for bringing this up. I don't think we have actually hit this
> > > before :)
> > >
> > > Let me spend sometime understanding the issue and get back to you
> > > Thanks
> > > Vinoth
> > >
> > > On Thu, Mar 14, 2019 at 10:46 PM Kabeer Ahmed <[email protected]>
> > > wrote:
> > >
> > > > Hi,
> > > > https://github.com/apache/incubator-hudi/issues/547 (
> > > >
> https://link.getmailspring.com/link/[email protected]/0?redirect=https%3A%2F%2Fgithub.com%2Fapache%2Fincubator-hudi%2Fissues%2F547&recipient=ZGV2QGh1ZGkuYXBhY2hlLm9yZw%3D%3D
> )
> > > > has resulted in the jira
> https://issues.apache.org/jira/browse/HUDI-12 (
> > > >
> https://link.getmailspring.com/link/[email protected]/1?redirect=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FHUDI-12&recipient=ZGV2QGh1ZGkuYXBhY2hlLm9yZw%3D%3D
> > > > ).
> > > > The requirement is to be able to interpret timestamp from CSV and
> store
> > > > it in the parquet table. Does anyone have a working example on these
> lines?
> > > > Going by the Hudi example from the GitHub:
> > > > Timestamp is being encoded in avro as double:
> > > >
> https://github.com/apache/incubator-hudi/blob/master/hoodie-client/src/test/java/com/uber/hoodie/common/HoodieTestDataGenerator.java#L69
> > > > (
> > > >
> https://link.getmailspring.com/link/[email protected]/2?redirect=https%3A%2F%2Fgithub.com%2Fapache%2Fincubator-hudi%2Fblob%2Fmaster%2Fhoodie-client%2Fsrc%2Ftest%2Fjava%2Fcom%2Fuber%2Fhoodie%2Fcommon%2FHoodieTestDataGenerator.java%23L69&recipient=ZGV2QGh1ZGkuYXBhY2hlLm9yZw%3D%3D
> > > > )
> > > >
> > > > The end result is that parquet field for timestamp is not of
> timestamp
> > > > (INT96).
> > > >
> > > > My best guess is that this would have been a requirement at Uber
> > > > (tracking trips in minutes and seconds) and how is it being handled.
> > > >
> > > > If anyone else has handled this and has an example that can be
> shared, it
> > > > will be much appreciated.
> > > > Kabeer Ahmed, http://www.linkedin.com/in/kabeerahmed
> > >
> >
> >
>
>

Reply via email to