There's two routes, I guess:

- Use 64 bits for microseconds and nanoseconds, 32 bits for other units
- Use 64 bits for everything

The latter is simpler to implement, the former saves space. I am not
sure which is the better solution. Another situation where this will
occur is with decimals, where the storage type may be a function of
the precision and scale. Thoughts?

On Fri, Mar 10, 2017 at 6:18 PM, Julien Le Dem <jul...@dremio.com> wrote:
> It sounds like we need to specify a different bit width depending on the
> unit?
> millisecond time fits in 32 bits but neither do micros nor nanos.
> the java TimeVector uses 32 bit for now (and supports millis only):
> https://github.com/apache/arrow/blob/6b3ae2aecc8cd31425035a021fa04b9ed3385a8d/java/vector/src/main/codegen/data/ValueVectorTypes.tdd#L60
>
>
> On Fri, Mar 10, 2017 at 2:12 PM, Wes McKinney <wesmck...@gmail.com> wrote:
>
>> Sorry to be a little slow to respond on this.
>>
>> Since we support nanosecond time unit, we need to use 64 bits. So it
>> sounds like the bug is on the Java side
>>
>> On Fri, Mar 10, 2017 at 4:47 PM, Bryan Cutler <cutl...@gmail.com> wrote:
>> > Thanks for the info Julien.  I'll open a JIRA for fixing the type layout
>> > for TIME, and I'll give the documentation a shot.
>> >
>> > Regards,
>> > Bryan
>> >
>> > On Thu, Mar 9, 2017 at 9:01 PM, Julien Le Dem <jul...@dremio.com> wrote:
>> >
>> >> Hi Bryan,
>> >> In the JSON representation we should use the integer representation of
>> the
>> >> Timestamp. We should not depend on joda for this.
>> >>
>> >> DATE is on 8 bytes => 64bits:
>> >> https://github.com/apache/arrow/blob/6b3ae2aecc8cd31425035a021fa04b
>> >> 9ed3385a8d/format/Message.fbs#L79
>> >> https://github.com/apache/arrow/blob/6b3ae2aecc8cd31425035a021fa04b
>> >> 9ed3385a8d/java/vector/src/main/codegen/data/ValueVectorTypes.tdd#L73
>> >>
>> >> Time in on 4 bytes => 32 bits and has a unit:
>> >> https://github.com/apache/arrow/blob/6b3ae2aecc8cd31425035a021fa04b
>> >> 9ed3385a8d/format/Message.fbs#L85
>> >> https://github.com/apache/arrow/blob/6b3ae2aecc8cd31425035a021fa04b
>> >> 9ed3385a8d/java/vector/src/main/codegen/data/ValueVectorTypes.tdd#L60
>> >> It should the time in {unit} since midnight stored in a 32 bit integer.
>> >> It should not have a default unit IMO
>> >>
>> >> So as you pointed out it looks like a bug both on the C++ and java side
>> for
>> >> Time
>> >> https://github.com/apache/arrow/blob/6b3ae2aecc8cd31425035a021fa04b
>> >> 9ed3385a8d/java/vector/src/main/java/org/apache/arrow/
>> >> vector/schema/TypeLayout.java#L163
>> >> tests TODO here:
>> >> https://issues.apache.org/jira/browse/ARROW-510
>> >>
>> >> We need to add the Date, Time, Timestamp description in the doc:
>> >> https://github.com/apache/arrow/blob/master/format/Metadata.md
>> >> You are welcome to take a stab at it and send a Pull request if you feel
>> >> like it.
>> >> Otherwise I'll update it.
>> >>
>> >> On Thu, Mar 9, 2017 at 3:37 PM, Bryan Cutler <cutl...@gmail.com> wrote:
>> >>
>> >> > I guess it would make sense to just store the time of day value in
>> >> > milliseconds to go along with the DATE type that contains days since
>> >> epoch,
>> >> > which would fit into a 4 byte value.  Only I see conflicting code in
>> >> > TypeLayout.java that defines the schema as 64 bit width
>> >> >
>> >> > public TypeLayout visit(Time type) {
>> >> >     return newFixedWidthTypeLayout(dataVector(64));
>> >> > }
>> >> >
>> >> > And in C++ there is this comment
>> >> >   // Exact time encoded with int64, default unit millisecond
>> >> >   TIME,
>> >> >
>> >> > Does the TIME type still need to go through some discussion to get
>> pinned
>> >> > down?
>> >> >
>> >> > Thanks,
>> >> > Bryan
>> >> >
>> >> > On Thu, Mar 9, 2017 at 10:53 AM, Bryan Cutler <cutl...@gmail.com>
>> wrote:
>> >> >
>> >> > > Hello All,
>> >> > >
>> >> > > I've started work on ARROW-582 to add Date/Time support for Java
>> JSON
>> >> > > files and would just like to clear up a few things.  I believe the
>> Java
>> >> > > Time type is supposed to represent milliseconds since epoch, it is
>> >> stored
>> >> > > as a FixedValueVector with a width of 4 bytes (equivalent to Java
>> >> 'int')
>> >> > > and it retrieved by constructing a org.joda.time.DateTime with that
>> >> > value.
>> >> > > Shouldn't this be an 8 byte width, equivalent to Java 'long'?
>> >> > >
>> >> > >     <#elseif minor.class == "Time">
>> >> > >     @Override
>> >> > >     public DateTime getObject(int index) {
>> >> > >
>> >> > >         org.joda.time.DateTime time = new
>> org.joda.time.DateTime(get(
>> >> > index),
>> >> > > org.joda.time.DateTimeZone.UTC);
>> >> > >         time = time.withZoneRetainFields(org.
>> joda.time.DateTimeZone.
>> >> > > getDefault());
>> >> > >         return time;
>> >> > >     }
>> >> > >
>> >> > > Thanks,
>> >> > > Bryan
>> >> > >
>> >> >
>> >>
>> >>
>> >>
>> >> --
>> >> Julien
>> >>
>>
>
>
>
> --
> Julien

Reply via email to