Will give this a shot as well. Between this and the S3 thing, what's blocking progress? both? ;) ?
On Sat, Mar 23, 2019 at 7:10 PM Kabeer Ahmed <[email protected]> wrote: > Hi Vinoth, > > Thank you for looking into this. I am planning to try out this over this > weekend if it is possible. Just downloaded the 0.4.6 version of Hudi. > I think we can start with a very simple schema as below (copied from the > Hudi's own example)? > > val EXAMPLE_SCHEMA = "{\"type\": \"record\"," + "\"name\": \"hudirec\"," + > "\"fields\": [ " + > "{\"name\": \"timestamp\", \"type\": \"double\"}," + > "{\"name\": \"_row_key\", \"type\": \"string\"}," + > "{\"name\": \"trade_date\", \"type\": \"string\"}," + > "{\"name\": \"bats\", \"type\": \"int\"}]}"; > > And sample data could be (cricket bats): > kabeer,2018-11-15T07:35:54.387Z,7 > vinoth,2018-11-16T09:35:54.387Z,9 > I did try passing several combinations in the timestamp field above to > indicate the logicalType to timestamp but no success. I was using Hive 1.1 > compile time flag but I was not worried about reading data through Hive. I > could see in the generated parquet that the timestamp field was NOT INT96 > timestamp format that the parquet expects. > > Keep me posted as to how you get along with this and I shall keep you > posted if I find any joy sooner than yourself. > Thanks > Kabeer. > > On Mar 23 2019, at 12:05 am, Vinoth Chandar <[email protected]> wrote: > > Hi Kabeer, > > > > I spent time looking at the issue and its other linked issues as well. > > High level, seems like we need to change the data type mappings for these > > date/timestamp types.. > > It does seem doable, given Avro also supports date/timestamp types.. > > > > Do you have some sample schema/data generation that we can start with? > > Thanks > > Vinoth > > > > On Fri, Mar 15, 2019 at 11:19 AM Vinoth Chandar <[email protected]> > wrote: > > > Hi Kabeer, > > > Thanks for bringing this up. I don't think we have actually hit this > > > before :) > > > > > > Let me spend sometime understanding the issue and get back to you > > > Thanks > > > Vinoth > > > > > > On Thu, Mar 14, 2019 at 10:46 PM Kabeer Ahmed <[email protected]> > > > wrote: > > > > > > > Hi, > > > > https://github.com/apache/incubator-hudi/issues/547 ( > > > > > https://link.getmailspring.com/link/[email protected]/0?redirect=https%3A%2F%2Fgithub.com%2Fapache%2Fincubator-hudi%2Fissues%2F547&recipient=ZGV2QGh1ZGkuYXBhY2hlLm9yZw%3D%3D > ) > > > > has resulted in the jira > https://issues.apache.org/jira/browse/HUDI-12 ( > > > > > https://link.getmailspring.com/link/[email protected]/1?redirect=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FHUDI-12&recipient=ZGV2QGh1ZGkuYXBhY2hlLm9yZw%3D%3D > > > > ). > > > > The requirement is to be able to interpret timestamp from CSV and > store > > > > it in the parquet table. Does anyone have a working example on these > lines? > > > > Going by the Hudi example from the GitHub: > > > > Timestamp is being encoded in avro as double: > > > > > https://github.com/apache/incubator-hudi/blob/master/hoodie-client/src/test/java/com/uber/hoodie/common/HoodieTestDataGenerator.java#L69 > > > > ( > > > > > https://link.getmailspring.com/link/[email protected]/2?redirect=https%3A%2F%2Fgithub.com%2Fapache%2Fincubator-hudi%2Fblob%2Fmaster%2Fhoodie-client%2Fsrc%2Ftest%2Fjava%2Fcom%2Fuber%2Fhoodie%2Fcommon%2FHoodieTestDataGenerator.java%23L69&recipient=ZGV2QGh1ZGkuYXBhY2hlLm9yZw%3D%3D > > > > ) > > > > > > > > The end result is that parquet field for timestamp is not of > timestamp > > > > (INT96). > > > > > > > > My best guess is that this would have been a requirement at Uber > > > > (tracking trips in minutes and seconds) and how is it being handled. > > > > > > > > If anyone else has handled this and has an example that can be > shared, it > > > > will be much appreciated. > > > > Kabeer Ahmed, http://www.linkedin.com/in/kabeerahmed > > > > > > > > >
