@JulienLD or @Jacques could you add your perspective on the JIRA from the Java side? Would be good to reach a consensus on this so we can make appropriate changes. We will want to push towards integration tests for the various date and time types in the 0.3 release
If JulienH could also offer some SQL perspective that would be useful On Fri, Mar 10, 2017 at 8:00 PM, Julien Le Dem <jul...@dremio.com> wrote: > The first option makes it easier for compatibility with existing code > I opened a JIRA to discuss: > https://issues.apache.org/jira/browse/ARROW-617 > > On Fri, Mar 10, 2017 at 4:42 PM, Wes McKinney <wesmck...@gmail.com> wrote: > >> There's two routes, I guess: >> >> - Use 64 bits for microseconds and nanoseconds, 32 bits for other units >> - Use 64 bits for everything >> >> The latter is simpler to implement, the former saves space. I am not >> sure which is the better solution. Another situation where this will >> occur is with decimals, where the storage type may be a function of >> the precision and scale. Thoughts? >> >> On Fri, Mar 10, 2017 at 6:18 PM, Julien Le Dem <jul...@dremio.com> wrote: >> > It sounds like we need to specify a different bit width depending on the >> > unit? >> > millisecond time fits in 32 bits but neither do micros nor nanos. >> > the java TimeVector uses 32 bit for now (and supports millis only): >> > https://github.com/apache/arrow/blob/6b3ae2aecc8cd31425035a021fa04b >> 9ed3385a8d/java/vector/src/main/codegen/data/ValueVectorTypes.tdd#L60 >> > >> > >> > On Fri, Mar 10, 2017 at 2:12 PM, Wes McKinney <wesmck...@gmail.com> >> wrote: >> > >> >> Sorry to be a little slow to respond on this. >> >> >> >> Since we support nanosecond time unit, we need to use 64 bits. So it >> >> sounds like the bug is on the Java side >> >> >> >> On Fri, Mar 10, 2017 at 4:47 PM, Bryan Cutler <cutl...@gmail.com> >> wrote: >> >> > Thanks for the info Julien. I'll open a JIRA for fixing the type >> layout >> >> > for TIME, and I'll give the documentation a shot. >> >> > >> >> > Regards, >> >> > Bryan >> >> > >> >> > On Thu, Mar 9, 2017 at 9:01 PM, Julien Le Dem <jul...@dremio.com> >> wrote: >> >> > >> >> >> Hi Bryan, >> >> >> In the JSON representation we should use the integer representation >> of >> >> the >> >> >> Timestamp. We should not depend on joda for this. >> >> >> >> >> >> DATE is on 8 bytes => 64bits: >> >> >> https://github.com/apache/arrow/blob/6b3ae2aecc8cd31425035a021fa04b >> >> >> 9ed3385a8d/format/Message.fbs#L79 >> >> >> https://github.com/apache/arrow/blob/6b3ae2aecc8cd31425035a021fa04b >> >> >> 9ed3385a8d/java/vector/src/main/codegen/data/ >> ValueVectorTypes.tdd#L73 >> >> >> >> >> >> Time in on 4 bytes => 32 bits and has a unit: >> >> >> https://github.com/apache/arrow/blob/6b3ae2aecc8cd31425035a021fa04b >> >> >> 9ed3385a8d/format/Message.fbs#L85 >> >> >> https://github.com/apache/arrow/blob/6b3ae2aecc8cd31425035a021fa04b >> >> >> 9ed3385a8d/java/vector/src/main/codegen/data/ >> ValueVectorTypes.tdd#L60 >> >> >> It should the time in {unit} since midnight stored in a 32 bit >> integer. >> >> >> It should not have a default unit IMO >> >> >> >> >> >> So as you pointed out it looks like a bug both on the C++ and java >> side >> >> for >> >> >> Time >> >> >> https://github.com/apache/arrow/blob/6b3ae2aecc8cd31425035a021fa04b >> >> >> 9ed3385a8d/java/vector/src/main/java/org/apache/arrow/ >> >> >> vector/schema/TypeLayout.java#L163 >> >> >> tests TODO here: >> >> >> https://issues.apache.org/jira/browse/ARROW-510 >> >> >> >> >> >> We need to add the Date, Time, Timestamp description in the doc: >> >> >> https://github.com/apache/arrow/blob/master/format/Metadata.md >> >> >> You are welcome to take a stab at it and send a Pull request if you >> feel >> >> >> like it. >> >> >> Otherwise I'll update it. >> >> >> >> >> >> On Thu, Mar 9, 2017 at 3:37 PM, Bryan Cutler <cutl...@gmail.com> >> wrote: >> >> >> >> >> >> > I guess it would make sense to just store the time of day value in >> >> >> > milliseconds to go along with the DATE type that contains days >> since >> >> >> epoch, >> >> >> > which would fit into a 4 byte value. Only I see conflicting code >> in >> >> >> > TypeLayout.java that defines the schema as 64 bit width >> >> >> > >> >> >> > public TypeLayout visit(Time type) { >> >> >> > return newFixedWidthTypeLayout(dataVector(64)); >> >> >> > } >> >> >> > >> >> >> > And in C++ there is this comment >> >> >> > // Exact time encoded with int64, default unit millisecond >> >> >> > TIME, >> >> >> > >> >> >> > Does the TIME type still need to go through some discussion to get >> >> pinned >> >> >> > down? >> >> >> > >> >> >> > Thanks, >> >> >> > Bryan >> >> >> > >> >> >> > On Thu, Mar 9, 2017 at 10:53 AM, Bryan Cutler <cutl...@gmail.com> >> >> wrote: >> >> >> > >> >> >> > > Hello All, >> >> >> > > >> >> >> > > I've started work on ARROW-582 to add Date/Time support for Java >> >> JSON >> >> >> > > files and would just like to clear up a few things. I believe >> the >> >> Java >> >> >> > > Time type is supposed to represent milliseconds since epoch, it >> is >> >> >> stored >> >> >> > > as a FixedValueVector with a width of 4 bytes (equivalent to Java >> >> >> 'int') >> >> >> > > and it retrieved by constructing a org.joda.time.DateTime with >> that >> >> >> > value. >> >> >> > > Shouldn't this be an 8 byte width, equivalent to Java 'long'? >> >> >> > > >> >> >> > > <#elseif minor.class == "Time"> >> >> >> > > @Override >> >> >> > > public DateTime getObject(int index) { >> >> >> > > >> >> >> > > org.joda.time.DateTime time = new >> >> org.joda.time.DateTime(get( >> >> >> > index), >> >> >> > > org.joda.time.DateTimeZone.UTC); >> >> >> > > time = time.withZoneRetainFields(org. >> >> joda.time.DateTimeZone. >> >> >> > > getDefault()); >> >> >> > > return time; >> >> >> > > } >> >> >> > > >> >> >> > > Thanks, >> >> >> > > Bryan >> >> >> > > >> >> >> > >> >> >> >> >> >> >> >> >> >> >> >> -- >> >> >> Julien >> >> >> >> >> >> > >> > >> > >> > -- >> > Julien >> > > > > -- > Julien