The first option makes it easier for compatibility with existing code
I opened a JIRA to discuss:
https://issues.apache.org/jira/browse/ARROW-617

On Fri, Mar 10, 2017 at 4:42 PM, Wes McKinney <[email protected]> wrote:

> There's two routes, I guess:
>
> - Use 64 bits for microseconds and nanoseconds, 32 bits for other units
> - Use 64 bits for everything
>
> The latter is simpler to implement, the former saves space. I am not
> sure which is the better solution. Another situation where this will
> occur is with decimals, where the storage type may be a function of
> the precision and scale. Thoughts?
>
> On Fri, Mar 10, 2017 at 6:18 PM, Julien Le Dem <[email protected]> wrote:
> > It sounds like we need to specify a different bit width depending on the
> > unit?
> > millisecond time fits in 32 bits but neither do micros nor nanos.
> > the java TimeVector uses 32 bit for now (and supports millis only):
> > https://github.com/apache/arrow/blob/6b3ae2aecc8cd31425035a021fa04b
> 9ed3385a8d/java/vector/src/main/codegen/data/ValueVectorTypes.tdd#L60
> >
> >
> > On Fri, Mar 10, 2017 at 2:12 PM, Wes McKinney <[email protected]>
> wrote:
> >
> >> Sorry to be a little slow to respond on this.
> >>
> >> Since we support nanosecond time unit, we need to use 64 bits. So it
> >> sounds like the bug is on the Java side
> >>
> >> On Fri, Mar 10, 2017 at 4:47 PM, Bryan Cutler <[email protected]>
> wrote:
> >> > Thanks for the info Julien.  I'll open a JIRA for fixing the type
> layout
> >> > for TIME, and I'll give the documentation a shot.
> >> >
> >> > Regards,
> >> > Bryan
> >> >
> >> > On Thu, Mar 9, 2017 at 9:01 PM, Julien Le Dem <[email protected]>
> wrote:
> >> >
> >> >> Hi Bryan,
> >> >> In the JSON representation we should use the integer representation
> of
> >> the
> >> >> Timestamp. We should not depend on joda for this.
> >> >>
> >> >> DATE is on 8 bytes => 64bits:
> >> >> https://github.com/apache/arrow/blob/6b3ae2aecc8cd31425035a021fa04b
> >> >> 9ed3385a8d/format/Message.fbs#L79
> >> >> https://github.com/apache/arrow/blob/6b3ae2aecc8cd31425035a021fa04b
> >> >> 9ed3385a8d/java/vector/src/main/codegen/data/
> ValueVectorTypes.tdd#L73
> >> >>
> >> >> Time in on 4 bytes => 32 bits and has a unit:
> >> >> https://github.com/apache/arrow/blob/6b3ae2aecc8cd31425035a021fa04b
> >> >> 9ed3385a8d/format/Message.fbs#L85
> >> >> https://github.com/apache/arrow/blob/6b3ae2aecc8cd31425035a021fa04b
> >> >> 9ed3385a8d/java/vector/src/main/codegen/data/
> ValueVectorTypes.tdd#L60
> >> >> It should the time in {unit} since midnight stored in a 32 bit
> integer.
> >> >> It should not have a default unit IMO
> >> >>
> >> >> So as you pointed out it looks like a bug both on the C++ and java
> side
> >> for
> >> >> Time
> >> >> https://github.com/apache/arrow/blob/6b3ae2aecc8cd31425035a021fa04b
> >> >> 9ed3385a8d/java/vector/src/main/java/org/apache/arrow/
> >> >> vector/schema/TypeLayout.java#L163
> >> >> tests TODO here:
> >> >> https://issues.apache.org/jira/browse/ARROW-510
> >> >>
> >> >> We need to add the Date, Time, Timestamp description in the doc:
> >> >> https://github.com/apache/arrow/blob/master/format/Metadata.md
> >> >> You are welcome to take a stab at it and send a Pull request if you
> feel
> >> >> like it.
> >> >> Otherwise I'll update it.
> >> >>
> >> >> On Thu, Mar 9, 2017 at 3:37 PM, Bryan Cutler <[email protected]>
> wrote:
> >> >>
> >> >> > I guess it would make sense to just store the time of day value in
> >> >> > milliseconds to go along with the DATE type that contains days
> since
> >> >> epoch,
> >> >> > which would fit into a 4 byte value.  Only I see conflicting code
> in
> >> >> > TypeLayout.java that defines the schema as 64 bit width
> >> >> >
> >> >> > public TypeLayout visit(Time type) {
> >> >> >     return newFixedWidthTypeLayout(dataVector(64));
> >> >> > }
> >> >> >
> >> >> > And in C++ there is this comment
> >> >> >   // Exact time encoded with int64, default unit millisecond
> >> >> >   TIME,
> >> >> >
> >> >> > Does the TIME type still need to go through some discussion to get
> >> pinned
> >> >> > down?
> >> >> >
> >> >> > Thanks,
> >> >> > Bryan
> >> >> >
> >> >> > On Thu, Mar 9, 2017 at 10:53 AM, Bryan Cutler <[email protected]>
> >> wrote:
> >> >> >
> >> >> > > Hello All,
> >> >> > >
> >> >> > > I've started work on ARROW-582 to add Date/Time support for Java
> >> JSON
> >> >> > > files and would just like to clear up a few things.  I believe
> the
> >> Java
> >> >> > > Time type is supposed to represent milliseconds since epoch, it
> is
> >> >> stored
> >> >> > > as a FixedValueVector with a width of 4 bytes (equivalent to Java
> >> >> 'int')
> >> >> > > and it retrieved by constructing a org.joda.time.DateTime with
> that
> >> >> > value.
> >> >> > > Shouldn't this be an 8 byte width, equivalent to Java 'long'?
> >> >> > >
> >> >> > >     <#elseif minor.class == "Time">
> >> >> > >     @Override
> >> >> > >     public DateTime getObject(int index) {
> >> >> > >
> >> >> > >         org.joda.time.DateTime time = new
> >> org.joda.time.DateTime(get(
> >> >> > index),
> >> >> > > org.joda.time.DateTimeZone.UTC);
> >> >> > >         time = time.withZoneRetainFields(org.
> >> joda.time.DateTimeZone.
> >> >> > > getDefault());
> >> >> > >         return time;
> >> >> > >     }
> >> >> > >
> >> >> > > Thanks,
> >> >> > > Bryan
> >> >> > >
> >> >> >
> >> >>
> >> >>
> >> >>
> >> >> --
> >> >> Julien
> >> >>
> >>
> >
> >
> >
> > --
> > Julien
>



-- 
Julien

Reply via email to