The first option makes it easier for compatibility with existing code I opened a JIRA to discuss: https://issues.apache.org/jira/browse/ARROW-617
On Fri, Mar 10, 2017 at 4:42 PM, Wes McKinney <[email protected]> wrote: > There's two routes, I guess: > > - Use 64 bits for microseconds and nanoseconds, 32 bits for other units > - Use 64 bits for everything > > The latter is simpler to implement, the former saves space. I am not > sure which is the better solution. Another situation where this will > occur is with decimals, where the storage type may be a function of > the precision and scale. Thoughts? > > On Fri, Mar 10, 2017 at 6:18 PM, Julien Le Dem <[email protected]> wrote: > > It sounds like we need to specify a different bit width depending on the > > unit? > > millisecond time fits in 32 bits but neither do micros nor nanos. > > the java TimeVector uses 32 bit for now (and supports millis only): > > https://github.com/apache/arrow/blob/6b3ae2aecc8cd31425035a021fa04b > 9ed3385a8d/java/vector/src/main/codegen/data/ValueVectorTypes.tdd#L60 > > > > > > On Fri, Mar 10, 2017 at 2:12 PM, Wes McKinney <[email protected]> > wrote: > > > >> Sorry to be a little slow to respond on this. > >> > >> Since we support nanosecond time unit, we need to use 64 bits. So it > >> sounds like the bug is on the Java side > >> > >> On Fri, Mar 10, 2017 at 4:47 PM, Bryan Cutler <[email protected]> > wrote: > >> > Thanks for the info Julien. I'll open a JIRA for fixing the type > layout > >> > for TIME, and I'll give the documentation a shot. > >> > > >> > Regards, > >> > Bryan > >> > > >> > On Thu, Mar 9, 2017 at 9:01 PM, Julien Le Dem <[email protected]> > wrote: > >> > > >> >> Hi Bryan, > >> >> In the JSON representation we should use the integer representation > of > >> the > >> >> Timestamp. We should not depend on joda for this. > >> >> > >> >> DATE is on 8 bytes => 64bits: > >> >> https://github.com/apache/arrow/blob/6b3ae2aecc8cd31425035a021fa04b > >> >> 9ed3385a8d/format/Message.fbs#L79 > >> >> https://github.com/apache/arrow/blob/6b3ae2aecc8cd31425035a021fa04b > >> >> 9ed3385a8d/java/vector/src/main/codegen/data/ > ValueVectorTypes.tdd#L73 > >> >> > >> >> Time in on 4 bytes => 32 bits and has a unit: > >> >> https://github.com/apache/arrow/blob/6b3ae2aecc8cd31425035a021fa04b > >> >> 9ed3385a8d/format/Message.fbs#L85 > >> >> https://github.com/apache/arrow/blob/6b3ae2aecc8cd31425035a021fa04b > >> >> 9ed3385a8d/java/vector/src/main/codegen/data/ > ValueVectorTypes.tdd#L60 > >> >> It should the time in {unit} since midnight stored in a 32 bit > integer. > >> >> It should not have a default unit IMO > >> >> > >> >> So as you pointed out it looks like a bug both on the C++ and java > side > >> for > >> >> Time > >> >> https://github.com/apache/arrow/blob/6b3ae2aecc8cd31425035a021fa04b > >> >> 9ed3385a8d/java/vector/src/main/java/org/apache/arrow/ > >> >> vector/schema/TypeLayout.java#L163 > >> >> tests TODO here: > >> >> https://issues.apache.org/jira/browse/ARROW-510 > >> >> > >> >> We need to add the Date, Time, Timestamp description in the doc: > >> >> https://github.com/apache/arrow/blob/master/format/Metadata.md > >> >> You are welcome to take a stab at it and send a Pull request if you > feel > >> >> like it. > >> >> Otherwise I'll update it. > >> >> > >> >> On Thu, Mar 9, 2017 at 3:37 PM, Bryan Cutler <[email protected]> > wrote: > >> >> > >> >> > I guess it would make sense to just store the time of day value in > >> >> > milliseconds to go along with the DATE type that contains days > since > >> >> epoch, > >> >> > which would fit into a 4 byte value. Only I see conflicting code > in > >> >> > TypeLayout.java that defines the schema as 64 bit width > >> >> > > >> >> > public TypeLayout visit(Time type) { > >> >> > return newFixedWidthTypeLayout(dataVector(64)); > >> >> > } > >> >> > > >> >> > And in C++ there is this comment > >> >> > // Exact time encoded with int64, default unit millisecond > >> >> > TIME, > >> >> > > >> >> > Does the TIME type still need to go through some discussion to get > >> pinned > >> >> > down? > >> >> > > >> >> > Thanks, > >> >> > Bryan > >> >> > > >> >> > On Thu, Mar 9, 2017 at 10:53 AM, Bryan Cutler <[email protected]> > >> wrote: > >> >> > > >> >> > > Hello All, > >> >> > > > >> >> > > I've started work on ARROW-582 to add Date/Time support for Java > >> JSON > >> >> > > files and would just like to clear up a few things. I believe > the > >> Java > >> >> > > Time type is supposed to represent milliseconds since epoch, it > is > >> >> stored > >> >> > > as a FixedValueVector with a width of 4 bytes (equivalent to Java > >> >> 'int') > >> >> > > and it retrieved by constructing a org.joda.time.DateTime with > that > >> >> > value. > >> >> > > Shouldn't this be an 8 byte width, equivalent to Java 'long'? > >> >> > > > >> >> > > <#elseif minor.class == "Time"> > >> >> > > @Override > >> >> > > public DateTime getObject(int index) { > >> >> > > > >> >> > > org.joda.time.DateTime time = new > >> org.joda.time.DateTime(get( > >> >> > index), > >> >> > > org.joda.time.DateTimeZone.UTC); > >> >> > > time = time.withZoneRetainFields(org. > >> joda.time.DateTimeZone. > >> >> > > getDefault()); > >> >> > > return time; > >> >> > > } > >> >> > > > >> >> > > Thanks, > >> >> > > Bryan > >> >> > > > >> >> > > >> >> > >> >> > >> >> > >> >> -- > >> >> Julien > >> >> > >> > > > > > > > > -- > > Julien > -- Julien
