Re: [DISCUSS][Format] Time Interval Changes

2019-04-03 Thread Micah Kornfield
Sgtm, I think a PMC member needs to kick it off?

On Wednesday, April 3, 2019, Wes McKinney  wrote:

> Agreed
>
> On Wed, Apr 3, 2019 at 9:53 AM Jacques Nadeau  wrote:
> >
> > Option 1 sounds good to me. Let's take to a vote.
> >
> > On Tue, Apr 2, 2019 at 8:53 PM Micah Kornfield 
> wrote:
> >>
> >> Based on the discussion so far, my attempt at concrete Schema proposals
> >> below.Jacques I think summarizes what we've discussed, apologies if
> >> I've misunderstood.  Wes would Option 1 work to support the Pandas Time
> >> Delta use-case?  I'm leaning towards Option 1 if it satisfies everyone
> (but
> >> happy to implement whatever we come to a consensus on).
> >>
> >> ** Option 1:  New Type: **
> >> /// An absolute length of time unrelated to any calendar artifacts.  For
> >> the purposes
> >> /// of Arrow Implementations, adding this value to a Timestamp ("t1")
> >> naively (i.e. simply summing
> >> /// the two number) is acceptable even though in some cases the
> resulting
> >> Timestamp (t2) would
> >> /// not account for leap-seconds during the elapsed time between "t1"
> and
> >> "t2".  Similarly, representing
> >> /// the difference between two Unix timestamp is acceptable, but would
> >> yield a value that is possibly a few seconds
> >> /// off from the true elapsed time.
> >> ///
> >> ///  The resolution defaults to
> >> /// millisecond, but can be any of the other supported TimeUnit values
> as
> >> /// with Timestamp and Time types.  This type is always represented as
> >> /// an 8-byte integer.
> >> table DurationInterval {
> >>unit: TimeUnit = MILLISECOND;
> >> }
> >>
> >> ** Option 2: New TimeDelta enum on Interval Unit (strong definition
> around
> >> leap-seconds): **
> >>
> >> enum IntervalUnit: short { YEAR_MONTH, DAY_TIME, TIME_DELTA}
> >> // A "calendar" interval which models types that don't necessarily
> >> // have a precise duration without the context of a base timestamp (e.g.
> >> // days can differ in length during day light savings time transitions).
> >> In the case
> >> // of TimeDelta it is possible no precise definition is possible if the
> >> base timestamp occurs
> >> // at an instant when a leap second was added (but would only differ by
> at
> >> most 1 second).
> >> // YEAR_MONTH - Indicates the number of elapsed whole months, stored as
> >> //   4-byte integers.
> >> // DAY_TIME - Indicates the number of elapsed days and milliseconds,
> >> //   stored as 2 contiguous 32-bit integers (8-bytes in total).  Support
> >> //   of this IntervalUnit is not required for full arrow compatibility.
> >> // TIME_DELTA - Indicates absolute time difference between Unix
> Timstamps
> >> (i.e. excluding leap seconds).  This value is always represented as an
> >> 8-byte integer.
> >> table Interval {
> >>   unit: IntervalUnit;
> >>   resolution: TimeUnit  // Only relevant for TIME_DELTA
> >> }
> >>
> >> On Tue, Apr 2, 2019 at 10:03 AM Wes McKinney 
> wrote:
> >>
> >> > Since there were some mentions of leap seconds:
> >> >
> >> > I think the intent of the timedelta/duration type should be to express
> >> > the difference between UNIX timestamps (from second to nanosecond
> >> > resolution), which don't include leap seconds. We use the
> >> > timedelta64[ns] type in pandas for example, which is a
> >> > nanosecond-resolution difference of UNIX timestamps.
> >> >
> >> > On Tue, Apr 2, 2019 at 10:05 AM Jacques Nadeau 
> wrote:
> >> > >
> >> > > >
> >> > > > I could go either way, it has some benefits for forward
> compatibility I
> >> > > > suppose, but on the other hand YAGNI, if you feel strongly, I'm ok
> >> > > > including it.  However, the more optional fields we have for a
> specific
> >> > > > enum value, makes me lean more towards a new type instead of just
> an
> >> > enum.
> >> > > >
> >> > > I'm okay with skipping for now. Appreciate the focus on only what we
> >> > > actually need.
> >> > >
> >> > >
> >> > >
> >> > > > Could you elaborate on defining standard arithmetic conversions
> between
> >> > > > time-delta/duration in seconds and other time unit (days, months,
> >> > years) as
> >> > > > part of the standard/format, I'm still not sure I understand what
> the
> >> > > > use-case is here.
> >> > > >
> >> > >
> >> > > Here goes nothing...
> >> > >
> >> > > Seems like there are two options for durations:
> >> > > 1) they aren't related to any other type
> >> > > 2) they have a relationship to timestamps and dates.
> >> > >
> >> > > If 1, then the only thing I could understand is real world duration
> how
> >> > > seconds are defined (and fractions thereof). E.g. [1] :D. In this
> >> > > situation, there is no way to express any unit of time of higher
> >> > > granularity than a second (e.g. days) since it is up to application
> >> > > implementer to define the relationship. This severely limits the
> >> > > expressiveness of the concept. (I can't ever use something
> TimeUnit.DAYS)
> >> > > and stops the ability to cover the existing interval YEAR_MONTH
> type I
> >> >

Re: [DISCUSS][Format] Time Interval Changes

2019-04-03 Thread Wes McKinney
Agreed

On Wed, Apr 3, 2019 at 9:53 AM Jacques Nadeau  wrote:
>
> Option 1 sounds good to me. Let's take to a vote.
>
> On Tue, Apr 2, 2019 at 8:53 PM Micah Kornfield  wrote:
>>
>> Based on the discussion so far, my attempt at concrete Schema proposals
>> below.Jacques I think summarizes what we've discussed, apologies if
>> I've misunderstood.  Wes would Option 1 work to support the Pandas Time
>> Delta use-case?  I'm leaning towards Option 1 if it satisfies everyone (but
>> happy to implement whatever we come to a consensus on).
>>
>> ** Option 1:  New Type: **
>> /// An absolute length of time unrelated to any calendar artifacts.  For
>> the purposes
>> /// of Arrow Implementations, adding this value to a Timestamp ("t1")
>> naively (i.e. simply summing
>> /// the two number) is acceptable even though in some cases the resulting
>> Timestamp (t2) would
>> /// not account for leap-seconds during the elapsed time between "t1" and
>> "t2".  Similarly, representing
>> /// the difference between two Unix timestamp is acceptable, but would
>> yield a value that is possibly a few seconds
>> /// off from the true elapsed time.
>> ///
>> ///  The resolution defaults to
>> /// millisecond, but can be any of the other supported TimeUnit values as
>> /// with Timestamp and Time types.  This type is always represented as
>> /// an 8-byte integer.
>> table DurationInterval {
>>unit: TimeUnit = MILLISECOND;
>> }
>>
>> ** Option 2: New TimeDelta enum on Interval Unit (strong definition around
>> leap-seconds): **
>>
>> enum IntervalUnit: short { YEAR_MONTH, DAY_TIME, TIME_DELTA}
>> // A "calendar" interval which models types that don't necessarily
>> // have a precise duration without the context of a base timestamp (e.g.
>> // days can differ in length during day light savings time transitions).
>> In the case
>> // of TimeDelta it is possible no precise definition is possible if the
>> base timestamp occurs
>> // at an instant when a leap second was added (but would only differ by at
>> most 1 second).
>> // YEAR_MONTH - Indicates the number of elapsed whole months, stored as
>> //   4-byte integers.
>> // DAY_TIME - Indicates the number of elapsed days and milliseconds,
>> //   stored as 2 contiguous 32-bit integers (8-bytes in total).  Support
>> //   of this IntervalUnit is not required for full arrow compatibility.
>> // TIME_DELTA - Indicates absolute time difference between Unix Timstamps
>> (i.e. excluding leap seconds).  This value is always represented as an
>> 8-byte integer.
>> table Interval {
>>   unit: IntervalUnit;
>>   resolution: TimeUnit  // Only relevant for TIME_DELTA
>> }
>>
>> On Tue, Apr 2, 2019 at 10:03 AM Wes McKinney  wrote:
>>
>> > Since there were some mentions of leap seconds:
>> >
>> > I think the intent of the timedelta/duration type should be to express
>> > the difference between UNIX timestamps (from second to nanosecond
>> > resolution), which don't include leap seconds. We use the
>> > timedelta64[ns] type in pandas for example, which is a
>> > nanosecond-resolution difference of UNIX timestamps.
>> >
>> > On Tue, Apr 2, 2019 at 10:05 AM Jacques Nadeau  wrote:
>> > >
>> > > >
>> > > > I could go either way, it has some benefits for forward compatibility I
>> > > > suppose, but on the other hand YAGNI, if you feel strongly, I'm ok
>> > > > including it.  However, the more optional fields we have for a specific
>> > > > enum value, makes me lean more towards a new type instead of just an
>> > enum.
>> > > >
>> > > I'm okay with skipping for now. Appreciate the focus on only what we
>> > > actually need.
>> > >
>> > >
>> > >
>> > > > Could you elaborate on defining standard arithmetic conversions between
>> > > > time-delta/duration in seconds and other time unit (days, months,
>> > years) as
>> > > > part of the standard/format, I'm still not sure I understand what the
>> > > > use-case is here.
>> > > >
>> > >
>> > > Here goes nothing...
>> > >
>> > > Seems like there are two options for durations:
>> > > 1) they aren't related to any other type
>> > > 2) they have a relationship to timestamps and dates.
>> > >
>> > > If 1, then the only thing I could understand is real world duration how
>> > > seconds are defined (and fractions thereof). E.g. [1] :D. In this
>> > > situation, there is no way to express any unit of time of higher
>> > > granularity than a second (e.g. days) since it is up to application
>> > > implementer to define the relationship. This severely limits the
>> > > expressiveness of the concept. (I can't ever use something TimeUnit.DAYS)
>> > > and stops the ability to cover the existing interval YEAR_MONTH type I
>> > > believe (since it has a resolution of months).
>> > >
>> > > If 2, then we must define the canonical value of ts + duration, otherwise
>> > > duration are somewhat meaningless, thus the proposed translation chart
>> > > (which causes its own oddities depending on the resolution of the time
>> > type
>> > > you are adding to).
>> > >
>> > > T

Re: [DISCUSS][Format] Time Interval Changes

2019-04-03 Thread Jacques Nadeau
Option 1 sounds good to me. Let's take to a vote.

On Tue, Apr 2, 2019 at 8:53 PM Micah Kornfield 
wrote:

> Based on the discussion so far, my attempt at concrete Schema proposals
> below.Jacques I think summarizes what we've discussed, apologies if
> I've misunderstood.  Wes would Option 1 work to support the Pandas Time
> Delta use-case?  I'm leaning towards Option 1 if it satisfies everyone (but
> happy to implement whatever we come to a consensus on).
>
> ** Option 1:  New Type: **
> /// An absolute length of time unrelated to any calendar artifacts.  For
> the purposes
> /// of Arrow Implementations, adding this value to a Timestamp ("t1")
> naively (i.e. simply summing
> /// the two number) is acceptable even though in some cases the resulting
> Timestamp (t2) would
> /// not account for leap-seconds during the elapsed time between "t1" and
> "t2".  Similarly, representing
> /// the difference between two Unix timestamp is acceptable, but would
> yield a value that is possibly a few seconds
> /// off from the true elapsed time.
> ///
> ///  The resolution defaults to
> /// millisecond, but can be any of the other supported TimeUnit values as
> /// with Timestamp and Time types.  This type is always represented as
> /// an 8-byte integer.
> table DurationInterval {
>unit: TimeUnit = MILLISECOND;
> }
>
> ** Option 2: New TimeDelta enum on Interval Unit (strong definition around
> leap-seconds): **
>
> enum IntervalUnit: short { YEAR_MONTH, DAY_TIME, TIME_DELTA}
> // A "calendar" interval which models types that don't necessarily
> // have a precise duration without the context of a base timestamp (e.g.
> // days can differ in length during day light savings time transitions).
> In the case
> // of TimeDelta it is possible no precise definition is possible if the
> base timestamp occurs
> // at an instant when a leap second was added (but would only differ by at
> most 1 second).
> // YEAR_MONTH - Indicates the number of elapsed whole months, stored as
> //   4-byte integers.
> // DAY_TIME - Indicates the number of elapsed days and milliseconds,
> //   stored as 2 contiguous 32-bit integers (8-bytes in total).  Support
> //   of this IntervalUnit is not required for full arrow compatibility.
> // TIME_DELTA - Indicates absolute time difference between Unix Timstamps
> (i.e. excluding leap seconds).  This value is always represented as an
> 8-byte integer.
> table Interval {
>   unit: IntervalUnit;
>   resolution: TimeUnit  // Only relevant for TIME_DELTA
> }
>
> On Tue, Apr 2, 2019 at 10:03 AM Wes McKinney  wrote:
>
> > Since there were some mentions of leap seconds:
> >
> > I think the intent of the timedelta/duration type should be to express
> > the difference between UNIX timestamps (from second to nanosecond
> > resolution), which don't include leap seconds. We use the
> > timedelta64[ns] type in pandas for example, which is a
> > nanosecond-resolution difference of UNIX timestamps.
> >
> > On Tue, Apr 2, 2019 at 10:05 AM Jacques Nadeau 
> wrote:
> > >
> > > >
> > > > I could go either way, it has some benefits for forward
> compatibility I
> > > > suppose, but on the other hand YAGNI, if you feel strongly, I'm ok
> > > > including it.  However, the more optional fields we have for a
> specific
> > > > enum value, makes me lean more towards a new type instead of just an
> > enum.
> > > >
> > > I'm okay with skipping for now. Appreciate the focus on only what we
> > > actually need.
> > >
> > >
> > >
> > > > Could you elaborate on defining standard arithmetic conversions
> between
> > > > time-delta/duration in seconds and other time unit (days, months,
> > years) as
> > > > part of the standard/format, I'm still not sure I understand what the
> > > > use-case is here.
> > > >
> > >
> > > Here goes nothing...
> > >
> > > Seems like there are two options for durations:
> > > 1) they aren't related to any other type
> > > 2) they have a relationship to timestamps and dates.
> > >
> > > If 1, then the only thing I could understand is real world duration how
> > > seconds are defined (and fractions thereof). E.g. [1] :D. In this
> > > situation, there is no way to express any unit of time of higher
> > > granularity than a second (e.g. days) since it is up to application
> > > implementer to define the relationship. This severely limits the
> > > expressiveness of the concept. (I can't ever use something
> TimeUnit.DAYS)
> > > and stops the ability to cover the existing interval YEAR_MONTH type I
> > > believe (since it has a resolution of months).
> > >
> > > If 2, then we must define the canonical value of ts + duration,
> otherwise
> > > duration are somewhat meaningless, thus the proposed translation chart
> > > (which causes its own oddities depending on the resolution of the time
> > type
> > > you are adding to).
> > >
> > > That being said, having started to remember previous discussions on
> this,
> > > I'm most inclined to simply pick #1 and ignore the need for anything
> > more.
> > 

Re: [DISCUSS][Format] Time Interval Changes

2019-04-02 Thread Micah Kornfield
Based on the discussion so far, my attempt at concrete Schema proposals
below.Jacques I think summarizes what we've discussed, apologies if
I've misunderstood.  Wes would Option 1 work to support the Pandas Time
Delta use-case?  I'm leaning towards Option 1 if it satisfies everyone (but
happy to implement whatever we come to a consensus on).

** Option 1:  New Type: **
/// An absolute length of time unrelated to any calendar artifacts.  For
the purposes
/// of Arrow Implementations, adding this value to a Timestamp ("t1")
naively (i.e. simply summing
/// the two number) is acceptable even though in some cases the resulting
Timestamp (t2) would
/// not account for leap-seconds during the elapsed time between "t1" and
"t2".  Similarly, representing
/// the difference between two Unix timestamp is acceptable, but would
yield a value that is possibly a few seconds
/// off from the true elapsed time.
///
///  The resolution defaults to
/// millisecond, but can be any of the other supported TimeUnit values as
/// with Timestamp and Time types.  This type is always represented as
/// an 8-byte integer.
table DurationInterval {
   unit: TimeUnit = MILLISECOND;
}

** Option 2: New TimeDelta enum on Interval Unit (strong definition around
leap-seconds): **

enum IntervalUnit: short { YEAR_MONTH, DAY_TIME, TIME_DELTA}
// A "calendar" interval which models types that don't necessarily
// have a precise duration without the context of a base timestamp (e.g.
// days can differ in length during day light savings time transitions).
In the case
// of TimeDelta it is possible no precise definition is possible if the
base timestamp occurs
// at an instant when a leap second was added (but would only differ by at
most 1 second).
// YEAR_MONTH - Indicates the number of elapsed whole months, stored as
//   4-byte integers.
// DAY_TIME - Indicates the number of elapsed days and milliseconds,
//   stored as 2 contiguous 32-bit integers (8-bytes in total).  Support
//   of this IntervalUnit is not required for full arrow compatibility.
// TIME_DELTA - Indicates absolute time difference between Unix Timstamps
(i.e. excluding leap seconds).  This value is always represented as an
8-byte integer.
table Interval {
  unit: IntervalUnit;
  resolution: TimeUnit  // Only relevant for TIME_DELTA
}

On Tue, Apr 2, 2019 at 10:03 AM Wes McKinney  wrote:

> Since there were some mentions of leap seconds:
>
> I think the intent of the timedelta/duration type should be to express
> the difference between UNIX timestamps (from second to nanosecond
> resolution), which don't include leap seconds. We use the
> timedelta64[ns] type in pandas for example, which is a
> nanosecond-resolution difference of UNIX timestamps.
>
> On Tue, Apr 2, 2019 at 10:05 AM Jacques Nadeau  wrote:
> >
> > >
> > > I could go either way, it has some benefits for forward compatibility I
> > > suppose, but on the other hand YAGNI, if you feel strongly, I'm ok
> > > including it.  However, the more optional fields we have for a specific
> > > enum value, makes me lean more towards a new type instead of just an
> enum.
> > >
> > I'm okay with skipping for now. Appreciate the focus on only what we
> > actually need.
> >
> >
> >
> > > Could you elaborate on defining standard arithmetic conversions between
> > > time-delta/duration in seconds and other time unit (days, months,
> years) as
> > > part of the standard/format, I'm still not sure I understand what the
> > > use-case is here.
> > >
> >
> > Here goes nothing...
> >
> > Seems like there are two options for durations:
> > 1) they aren't related to any other type
> > 2) they have a relationship to timestamps and dates.
> >
> > If 1, then the only thing I could understand is real world duration how
> > seconds are defined (and fractions thereof). E.g. [1] :D. In this
> > situation, there is no way to express any unit of time of higher
> > granularity than a second (e.g. days) since it is up to application
> > implementer to define the relationship. This severely limits the
> > expressiveness of the concept. (I can't ever use something TimeUnit.DAYS)
> > and stops the ability to cover the existing interval YEAR_MONTH type I
> > believe (since it has a resolution of months).
> >
> > If 2, then we must define the canonical value of ts + duration, otherwise
> > duration are somewhat meaningless, thus the proposed translation chart
> > (which causes its own oddities depending on the resolution of the time
> type
> > you are adding to).
> >
> > That being said, having started to remember previous discussions on this,
> > I'm most inclined to simply pick #1 and ignore the need for anything
> more.
> > The curiousness of interval math in database systems underscores the fact
> > that it apparently doesn't matter that much. In most cases, today + 3
> > months is close enough to today + 90 days for government work.
> >
> > Let's +2 a patch and get it merged quickly so we never have to think
> about
> > this again :)
> >
> > [1] 

Re: [DISCUSS][Format] Time Interval Changes

2019-04-02 Thread Wes McKinney
Since there were some mentions of leap seconds:

I think the intent of the timedelta/duration type should be to express
the difference between UNIX timestamps (from second to nanosecond
resolution), which don't include leap seconds. We use the
timedelta64[ns] type in pandas for example, which is a
nanosecond-resolution difference of UNIX timestamps.

On Tue, Apr 2, 2019 at 10:05 AM Jacques Nadeau  wrote:
>
> >
> > I could go either way, it has some benefits for forward compatibility I
> > suppose, but on the other hand YAGNI, if you feel strongly, I'm ok
> > including it.  However, the more optional fields we have for a specific
> > enum value, makes me lean more towards a new type instead of just an enum.
> >
> I'm okay with skipping for now. Appreciate the focus on only what we
> actually need.
>
>
>
> > Could you elaborate on defining standard arithmetic conversions between
> > time-delta/duration in seconds and other time unit (days, months, years) as
> > part of the standard/format, I'm still not sure I understand what the
> > use-case is here.
> >
>
> Here goes nothing...
>
> Seems like there are two options for durations:
> 1) they aren't related to any other type
> 2) they have a relationship to timestamps and dates.
>
> If 1, then the only thing I could understand is real world duration how
> seconds are defined (and fractions thereof). E.g. [1] :D. In this
> situation, there is no way to express any unit of time of higher
> granularity than a second (e.g. days) since it is up to application
> implementer to define the relationship. This severely limits the
> expressiveness of the concept. (I can't ever use something TimeUnit.DAYS)
> and stops the ability to cover the existing interval YEAR_MONTH type I
> believe (since it has a resolution of months).
>
> If 2, then we must define the canonical value of ts + duration, otherwise
> duration are somewhat meaningless, thus the proposed translation chart
> (which causes its own oddities depending on the resolution of the time type
> you are adding to).
>
> That being said, having started to remember previous discussions on this,
> I'm most inclined to simply pick #1 and ignore the need for anything more.
> The curiousness of interval math in database systems underscores the fact
> that it apparently doesn't matter that much. In most cases, today + 3
> months is close enough to today + 90 days for government work.
>
> Let's +2 a patch and get it merged quickly so we never have to think about
> this again :)
>
> [1]  "the duration of 9,192,631,770 periods
>  of the radiation corresponding to
> the transition between the two hyperfine levels
>  of the ground state of
> the caesium-133  atom" (at a
> temperature of 0 K )
>
> >


Re: [DISCUSS][Format] Time Interval Changes

2019-04-02 Thread Jacques Nadeau
>
> I could go either way, it has some benefits for forward compatibility I
> suppose, but on the other hand YAGNI, if you feel strongly, I'm ok
> including it.  However, the more optional fields we have for a specific
> enum value, makes me lean more towards a new type instead of just an enum.
>
I'm okay with skipping for now. Appreciate the focus on only what we
actually need.



> Could you elaborate on defining standard arithmetic conversions between
> time-delta/duration in seconds and other time unit (days, months, years) as
> part of the standard/format, I'm still not sure I understand what the
> use-case is here.
>

Here goes nothing...

Seems like there are two options for durations:
1) they aren't related to any other type
2) they have a relationship to timestamps and dates.

If 1, then the only thing I could understand is real world duration how
seconds are defined (and fractions thereof). E.g. [1] :D. In this
situation, there is no way to express any unit of time of higher
granularity than a second (e.g. days) since it is up to application
implementer to define the relationship. This severely limits the
expressiveness of the concept. (I can't ever use something TimeUnit.DAYS)
and stops the ability to cover the existing interval YEAR_MONTH type I
believe (since it has a resolution of months).

If 2, then we must define the canonical value of ts + duration, otherwise
duration are somewhat meaningless, thus the proposed translation chart
(which causes its own oddities depending on the resolution of the time type
you are adding to).

That being said, having started to remember previous discussions on this,
I'm most inclined to simply pick #1 and ignore the need for anything more.
The curiousness of interval math in database systems underscores the fact
that it apparently doesn't matter that much. In most cases, today + 3
months is close enough to today + 90 days for government work.

Let's +2 a patch and get it merged quickly so we never have to think about
this again :)

[1]  "the duration of 9,192,631,770 periods
 of the radiation corresponding to
the transition between the two hyperfine levels
 of the ground state of
the caesium-133  atom" (at a
temperature of 0 K )

>


Re: [DISCUSS][Format] Time Interval Changes

2019-04-01 Thread Micah Kornfield
On Mon, Apr 1, 2019 at 4:17 PM Jacques Nadeau  wrote:

>
>>
>> I don't think we should include byte-width unless we have a concrete
>> use-case (it can be added later, using 8 Bytes as the default if not set).
>>
> I'm okay with only allowing one today. I wonder whether we should declare
> it now and only allow 8?
>

I could go either way, it has some benefits for forward compatibility I
suppose, but on the other hand YAGNI, if you feel strongly, I'm ok
including it.  However, the more optional fields we have for a specific
enum value, makes me lean more towards a new type instead of just an enum.


>
>
>>
>> Comment below on equivalences, is that I don't fully understand this.
>>
>
> I don't either :)
>
> "Unix time numbers are repeated in the second immediately following a
> positive leap second. The Unix time number 915148800.50 is thus
> ambiguous: it can refer either to the instant in the middle of the leap
> second, or to the instant one second later, half a second after midnight
> UTC." https://en.wikipedia.org/wiki/Unix_time#Leap_seconds
>
> If that's the case, what does the comment in the format mean exactly when
> you say "unix time excluding leap seconds"? I don't really understand what
> a duration has to do with unix time but my understanding is unix time also
> respects leap seconds typically which means what? I think that a duration
> has to be understood in its relationship to addition to a timestamp to be
> meaningful across systems, doesn't it?
>

The documentation, was mostly from a PR put together by Wes a while ago,
maybe he can chime in.   I think any modifications or previous statements
on my part about the proposed time being interpreted separately from a
timestamp don't make sense.  I was assuming it was OK, to ignore the few
seconds of inaccuracy that would occur with the interpretation.  I think we
would need yet another type if we wanted to measure time truly
independently of reference timestamps.As you point out if a reference
timestamp corresponds to when a leap-second is added then
Duration/time-delta is always ambiguous.  If the reference timestamp does't
fall when a leap-second is added, it seems like the conversion is to
"actual" elapsed seconds is fairly straight-forward.

Could you elaborate on defining standard arithmetic conversions between
time-delta/duration in seconds and other time unit (days, months, years) as
part of the standard/format, I'm still not sure I understand what the
use-case is here.

Thanks,
Micah


Re: [DISCUSS][Format] Time Interval Changes

2019-04-01 Thread Jacques Nadeau
>
>
>
> I don't think we should include byte-width unless we have a concrete
> use-case (it can be added later, using 8 Bytes as the default if not set).
>
I'm okay with only allowing one today. I wonder whether we should declare
it now and only allow 8?


>
> Comment below on equivalences, is that I don't fully understand this.
>

I don't either :)

"Unix time numbers are repeated in the second immediately following a
positive leap second. The Unix time number 915148800.50 is thus ambiguous:
it can refer either to the instant in the middle of the leap second, or to
the instant one second later, half a second after midnight UTC."
https://en.wikipedia.org/wiki/Unix_time#Leap_seconds

If that's the case, what does the comment in the format mean exactly when
you say "unix time excluding leap seconds"? I don't really understand what
a duration has to do with unix time but my understanding is unix time also
respects leap seconds typically which means what? I think that a duration
has to be understood in its relationship to addition to a timestamp to be
meaningful across systems, doesn't it?



> Defining the new type as a difference between two Unix Epochs seems
> sufficient, and consumers can provide there own rules of thumb.  If we want
> to model explicit DAY, MONTH, YEAR, etc objects, I think we should have a
> type that enumerates those fields explicitly.
>
> On Mon, Apr 1, 2019 at 12:01 PM Micah Kornfield 
> wrote:
>
> > TL;DR;  I'm in favor of moving forward with this declaration:
> >
> >
> > On Mon, Apr 1, 2019 at 11:38 AM Jacques Nadeau 
> wrote:
> >
> >> I'm sorry, I've been busy with several other things.
> >>
> >> A question, what about this alternative?
> >
> >
> >> enum IntervalUnit: short { YEAR_MONTH, DAY_TIME, DURATION }
> >> table Interval {
> >>   unit: IntervalUnit;
> >>   timeUnit: TimeUnit; // defined when using duration
> >>   byteWidth: short; // defined when using duration
> >> }
> >>
> > I would lean towards this, and rename Duration to TimeDelta (I think
> > Duration might be confusing, especially given the questions below.
> >
> >
> >>
> >> Whether this or the other, I think we should probably declare the
> >> byteWidth
> >> of the value. Do you disagree?
> >>
> >
> > I disagree, unless we want to support multiple byteWidths for
> > Duration/TimeDelta (and i would rather add this in later while choosing a
> > default of 8 bytes for it).
> >
> >
> >>
> >> Also, I don't think your definition is sufficient for a duration since
> it
> >> is related to epoch time which suggests that the duration is relative
> to a
> >> point in time. I think we have to declare the equivalences. Probably
> >> these:
> >>
> >> 1 century = 100 years
> >> 1 year = 12 months
> >> 1 month = 30 days
> >> 1 day = 24 hours
> >> 1 hour = 60 minutes
> >> 1 minute = 60 seconds
> >>
> >> Otherwise, there is no consistency around how the duration maps to a
> >> timestamp.
> >>
> > I think my use of Duration might have added confusion here.  Could you
> > elaborate what you are proposing?  I think the  I feel uncomfortable
> doing
> > this conversion in the absence of a timestamp (if implementations want to
> > make these approximations that seems fine, but I don't think it should be
> > part of the standard), the fact that this is
> >
> >>
> >>
>


Re: [DISCUSS][Format] Time Interval Changes

2019-04-01 Thread Micah Kornfield
Sorry sent this too early.
TL;DR;  I'm in favor of moving forward with this declaration:
table Interval {
  unit: IntervalUnit;
  timeUnit: TimeUnit; // defined when using duration
}

I don't think we should include byte-width unless we have a concrete
use-case (it can be added later, using 8 Bytes as the default if not set).

Comment below on equivalences, is that I don't fully understand this.
Defining the new type as a difference between two Unix Epochs seems
sufficient, and consumers can provide there own rules of thumb.  If we want
to model explicit DAY, MONTH, YEAR, etc objects, I think we should have a
type that enumerates those fields explicitly.

On Mon, Apr 1, 2019 at 12:01 PM Micah Kornfield 
wrote:

> TL;DR;  I'm in favor of moving forward with this declaration:
>
>
> On Mon, Apr 1, 2019 at 11:38 AM Jacques Nadeau  wrote:
>
>> I'm sorry, I've been busy with several other things.
>>
>> A question, what about this alternative?
>
>
>> enum IntervalUnit: short { YEAR_MONTH, DAY_TIME, DURATION }
>> table Interval {
>>   unit: IntervalUnit;
>>   timeUnit: TimeUnit; // defined when using duration
>>   byteWidth: short; // defined when using duration
>> }
>>
> I would lean towards this, and rename Duration to TimeDelta (I think
> Duration might be confusing, especially given the questions below.
>
>
>>
>> Whether this or the other, I think we should probably declare the
>> byteWidth
>> of the value. Do you disagree?
>>
>
> I disagree, unless we want to support multiple byteWidths for
> Duration/TimeDelta (and i would rather add this in later while choosing a
> default of 8 bytes for it).
>
>
>>
>> Also, I don't think your definition is sufficient for a duration since it
>> is related to epoch time which suggests that the duration is relative to a
>> point in time. I think we have to declare the equivalences. Probably
>> these:
>>
>> 1 century = 100 years
>> 1 year = 12 months
>> 1 month = 30 days
>> 1 day = 24 hours
>> 1 hour = 60 minutes
>> 1 minute = 60 seconds
>>
>> Otherwise, there is no consistency around how the duration maps to a
>> timestamp.
>>
> I think my use of Duration might have added confusion here.  Could you
> elaborate what you are proposing?  I think the  I feel uncomfortable doing
> this conversion in the absence of a timestamp (if implementations want to
> make these approximations that seems fine, but I don't think it should be
> part of the standard), the fact that this is
>
>>
>>


Re: [DISCUSS][Format] Time Interval Changes

2019-04-01 Thread Micah Kornfield
TL;DR;  I'm in favor of moving forward with this declaration:


On Mon, Apr 1, 2019 at 11:38 AM Jacques Nadeau  wrote:

> I'm sorry, I've been busy with several other things.
>
> A question, what about this alternative?


> enum IntervalUnit: short { YEAR_MONTH, DAY_TIME, DURATION }
> table Interval {
>   unit: IntervalUnit;
>   timeUnit: TimeUnit; // defined when using duration
>   byteWidth: short; // defined when using duration
> }
>
I would lean towards this, and rename Duration to TimeDelta (I think
Duration might be confusing, especially given the questions below.


>
> Whether this or the other, I think we should probably declare the byteWidth
> of the value. Do you disagree?
>

I disagree, unless we want to support multiple byteWidths for
Duration/TimeDelta (and i would rather add this in later while choosing a
default of 8 bytes for it).


>
> Also, I don't think your definition is sufficient for a duration since it
> is related to epoch time which suggests that the duration is relative to a
> point in time. I think we have to declare the equivalences. Probably these:
>
> 1 century = 100 years
> 1 year = 12 months
> 1 month = 30 days
> 1 day = 24 hours
> 1 hour = 60 minutes
> 1 minute = 60 seconds
>
> Otherwise, there is no consistency around how the duration maps to a
> timestamp.
>
I think my use of Duration might have added confusion here.  Could you
elaborate what you are proposing?  I think the  I feel uncomfortable doing
this conversion in the absence of a timestamp (if implementations want to
make these approximations that seems fine, but I don't think it should be
part of the standard), the fact that this is

>
>


Re: [DISCUSS][Format] Time Interval Changes

2019-04-01 Thread Jacques Nadeau
I'm sorry, I've been busy with several other things.

A question, what about this alternative?

enum IntervalUnit: short { YEAR_MONTH, DAY_TIME, DURATION }
table Interval {
  unit: IntervalUnit;
  timeUnit: TimeUnit; // defined when using duration
  byteWidth: short; // defined when using duration
}

Whether this or the other, I think we should probably declare the byteWidth
of the value. Do you disagree?

Also, I don't think your definition is sufficient for a duration since it
is related to epoch time which suggests that the duration is relative to a
point in time. I think we have to declare the equivalences. Probably these:

1 century = 100 years
1 year = 12 months
1 month = 30 days
1 day = 24 hours
1 hour = 60 minutes
1 minute = 60 seconds

Otherwise, there is no consistency around how the duration maps to a
timestamp.






On Mon, Apr 1, 2019 at 10:38 AM Wes McKinney  wrote:

> I would like to propose a vote on this feature this week. Could
> someone from the Java side weigh in since there is some existing code
> relating to intervals there already?
>
> On Wed, Mar 27, 2019 at 10:49 PM Micah Kornfield 
> wrote:
> >
> > Hi Wes,
> > Thanks for the feedback.  I'm happy to update the PR to include c++ and
> python once there is consensus on the format change.  I'd also welcome
> feedback and an extra set of eyes on the issues I raised below, since it is
> hard to change once we make a release.
> >
> > Based on previous discussions, I thought we were OK supporting
> YEAR_MONTH and deprecating or making DAY_TIME optional.  I'm also happy to
> try to add support for these in C++/Python as a separate PR.
> >
> > I'm not sure what the "Apache Way" is on this, but it seems like this
> particular issue has taken a long time to resolve, because these threads
> tend to lose steam (even this thread only had your response over the course
> of a week).  The only guidance I can find is Lazy Consensus [1], but maybe
> that doesn't apply in this situation?
> >
> > In short, it would be nice to get explicit consensus via a PMC vote or
> an alternative proposal made, that can gain consensus.  I'm happy to help
> out in either case, but would like avoid this stalling out yet again.
> >
> > Thanks,
> > Micah
> >
> > [1] https://community.apache.org/committers/lazyConsensus.html
> >
> > On Wednesday, March 27, 2019, Wes McKinney  wrote:
> >>
> >> hi Micah,
> >>
> >> Sorry for the delay.
> >>
> >> I'm in favor of introducing the Duration/DurationInterval type to
> >> unblock the difference-of-timestamps / timedelta use case that many
> >> Arrow users have. I'd like Jacques or someone from the Java side to
> >> comment about this before starting a vote.
> >>
> >> We can merge these changes into a feature branch and I or someone else
> >> can complete the C++ side and work on integration tests (so we
> >> eventually have proof of two complete implementations)
> >>
> >> I'm not sure what to do with the existing YEAR_MONTH and DAY_TIME
> >> interval types. These are featured in a number of SQL database systems
> >> and so one option is to simply leave them as is.
> >>
> >> Thanks
> >> Wes
> >>
> >> On Sat, Mar 23, 2019 at 12:58 AM Micah Kornfield 
> wrote:
> >> >
> >> > Hi arrow-dev,
> >> > I just wanted to bump this thread to see if anyone wanted to comment
> or
> >> > discuss a path forward.
> >> >
> >> > If no one chimes in by Monday evening, could I ask a PMC member to
> start a
> >> > vote on Tuesday (I believe a member of the PMC needs to initiate a
> vote?)
> >> >
> >> > I will implement the C++ side once there is consensus around the
> change to
> >> > the format.
> >> >
> >> > Thanks,
> >> > Micah
> >> >
> >> > On Tue, Mar 19, 2019 at 12:13 AM Micah Kornfield <
> emkornfi...@gmail.com>
> >> > wrote:
> >> >
> >> > > Hi Arrow Dev,
> >> > > Based on the recent thread on discussing and voting on changes to
> files
> >> > > under format, I'd figure I'd try see how the process works for
> changes to
> >> > > Schema.fbs to close out lingering time interval issues.  In
> particular,
> >> > > ARROW-352 (Interval(DAY_TIME) has no unit) and ARROW-835 (Add
> Timedelta
> >> > > type to describe time intervals).
> >> > >
> >> > > I submitted a PR [1] that introduces a new DurationType that models
> >> > > (sub)seconds (excluding leap seconds) as a 8-byte integer type.
> Some of
> >> > > these issues have been discussed previously, the most recent thread
> was
> >> > > within the last month [2].
> >> > >
> >> > > The reason for creating a new type is to avoid breaking changes with
> >> > > existing types (in particular Interval[DAY_TIME] in Java).I
> think
> >> > > things worth discussing are:
> >> > >
> >> > > 1.  Is this a desirable change in principle?
> >> > > 2.  Naming: is DurationInterval a good name (should it be
> TimeDelta)?
> >> > > 3.  New Type: Should this be collapsed as a new enum on Interval
> (because
> >> > > it excludes leap-seconds, I think it still technically falls into
> the class
> >> > > of Calendar like o

Re: [DISCUSS][Format] Time Interval Changes

2019-04-01 Thread Wes McKinney
I would like to propose a vote on this feature this week. Could
someone from the Java side weigh in since there is some existing code
relating to intervals there already?

On Wed, Mar 27, 2019 at 10:49 PM Micah Kornfield  wrote:
>
> Hi Wes,
> Thanks for the feedback.  I'm happy to update the PR to include c++ and 
> python once there is consensus on the format change.  I'd also welcome 
> feedback and an extra set of eyes on the issues I raised below, since it is 
> hard to change once we make a release.
>
> Based on previous discussions, I thought we were OK supporting YEAR_MONTH and 
> deprecating or making DAY_TIME optional.  I'm also happy to try to add 
> support for these in C++/Python as a separate PR.
>
> I'm not sure what the "Apache Way" is on this, but it seems like this 
> particular issue has taken a long time to resolve, because these threads tend 
> to lose steam (even this thread only had your response over the course of a 
> week).  The only guidance I can find is Lazy Consensus [1], but maybe that 
> doesn't apply in this situation?
>
> In short, it would be nice to get explicit consensus via a PMC vote or an 
> alternative proposal made, that can gain consensus.  I'm happy to help out in 
> either case, but would like avoid this stalling out yet again.
>
> Thanks,
> Micah
>
> [1] https://community.apache.org/committers/lazyConsensus.html
>
> On Wednesday, March 27, 2019, Wes McKinney  wrote:
>>
>> hi Micah,
>>
>> Sorry for the delay.
>>
>> I'm in favor of introducing the Duration/DurationInterval type to
>> unblock the difference-of-timestamps / timedelta use case that many
>> Arrow users have. I'd like Jacques or someone from the Java side to
>> comment about this before starting a vote.
>>
>> We can merge these changes into a feature branch and I or someone else
>> can complete the C++ side and work on integration tests (so we
>> eventually have proof of two complete implementations)
>>
>> I'm not sure what to do with the existing YEAR_MONTH and DAY_TIME
>> interval types. These are featured in a number of SQL database systems
>> and so one option is to simply leave them as is.
>>
>> Thanks
>> Wes
>>
>> On Sat, Mar 23, 2019 at 12:58 AM Micah Kornfield  
>> wrote:
>> >
>> > Hi arrow-dev,
>> > I just wanted to bump this thread to see if anyone wanted to comment or
>> > discuss a path forward.
>> >
>> > If no one chimes in by Monday evening, could I ask a PMC member to start a
>> > vote on Tuesday (I believe a member of the PMC needs to initiate a vote?)
>> >
>> > I will implement the C++ side once there is consensus around the change to
>> > the format.
>> >
>> > Thanks,
>> > Micah
>> >
>> > On Tue, Mar 19, 2019 at 12:13 AM Micah Kornfield 
>> > wrote:
>> >
>> > > Hi Arrow Dev,
>> > > Based on the recent thread on discussing and voting on changes to files
>> > > under format, I'd figure I'd try see how the process works for changes to
>> > > Schema.fbs to close out lingering time interval issues.  In particular,
>> > > ARROW-352 (Interval(DAY_TIME) has no unit) and ARROW-835 (Add Timedelta
>> > > type to describe time intervals).
>> > >
>> > > I submitted a PR [1] that introduces a new DurationType that models
>> > > (sub)seconds (excluding leap seconds) as a 8-byte integer type.  Some of
>> > > these issues have been discussed previously, the most recent thread was
>> > > within the last month [2].
>> > >
>> > > The reason for creating a new type is to avoid breaking changes with
>> > > existing types (in particular Interval[DAY_TIME] in Java).I think
>> > > things worth discussing are:
>> > >
>> > > 1.  Is this a desirable change in principle?
>> > > 2.  Naming: is DurationInterval a good name (should it be TimeDelta)?
>> > > 3.  New Type: Should this be collapsed as a new enum on Interval (because
>> > > it excludes leap-seconds, I think it still technically falls into the 
>> > > class
>> > > of Calendar like objects).
>> > >
>> > > Please feel free to add items for discussion.
>> > >
>> > > I'm not sure the typical time that discussions are held open for, but it
>> > > would be great if we could try to get to a consensus sometime soon (and
>> > > then schedule a vote).  Maybe early next week is a good goal to aim for?
>> > >
>> > > Thanks,
>> > > Micah
>> > >
>> > >
>> > > [1] https://github.com/apache/arrow/pull/3644
>> > > [2]
>> > > https://lists.apache.org/thread.html/0e606a6afd2332b4ae5b4382e533bea309c790ea71c05047cf983372@%3Cdev.arrow.apache.org%3E
>> > >


Re: [DISCUSS][Format] Time Interval Changes

2019-03-27 Thread Micah Kornfield
Hi Wes,
Thanks for the feedback.  I'm happy to update the PR to include c++ and
python once there is consensus on the format change.  I'd also welcome
feedback and an extra set of eyes on the issues I raised below, since it is
hard to change once we make a release.

Based on previous discussions, I thought we were OK supporting YEAR_MONTH
and deprecating or making DAY_TIME optional.  I'm also happy to try to add
support for these in C++/Python as a separate PR.

I'm not sure what the "Apache Way" is on this, but it seems like this
particular issue has taken a long time to resolve, because these threads
tend to lose steam (even this thread only had your response over the course
of a week).  The only guidance I can find is Lazy Consensus [1], but maybe
that doesn't apply in this situation?

In short, it would be nice to get explicit consensus via a PMC vote or an
alternative proposal made, that can gain consensus.  I'm happy to help out
in either case, but would like avoid this stalling out yet again.

Thanks,
Micah

[1] https://community.apache.org/committers/lazyConsensus.html

On Wednesday, March 27, 2019, Wes McKinney  wrote:

> hi Micah,
>
> Sorry for the delay.
>
> I'm in favor of introducing the Duration/DurationInterval type to
> unblock the difference-of-timestamps / timedelta use case that many
> Arrow users have. I'd like Jacques or someone from the Java side to
> comment about this before starting a vote.
>
> We can merge these changes into a feature branch and I or someone else
> can complete the C++ side and work on integration tests (so we
> eventually have proof of two complete implementations)
>
> I'm not sure what to do with the existing YEAR_MONTH and DAY_TIME
> interval types. These are featured in a number of SQL database systems
> and so one option is to simply leave them as is.
>
> Thanks
> Wes
>
> On Sat, Mar 23, 2019 at 12:58 AM Micah Kornfield 
> wrote:
> >
> > Hi arrow-dev,
> > I just wanted to bump this thread to see if anyone wanted to comment or
> > discuss a path forward.
> >
> > If no one chimes in by Monday evening, could I ask a PMC member to start
> a
> > vote on Tuesday (I believe a member of the PMC needs to initiate a vote?)
> >
> > I will implement the C++ side once there is consensus around the change
> to
> > the format.
> >
> > Thanks,
> > Micah
> >
> > On Tue, Mar 19, 2019 at 12:13 AM Micah Kornfield 
> > wrote:
> >
> > > Hi Arrow Dev,
> > > Based on the recent thread on discussing and voting on changes to files
> > > under format, I'd figure I'd try see how the process works for changes
> to
> > > Schema.fbs to close out lingering time interval issues.  In particular,
> > > ARROW-352 (Interval(DAY_TIME) has no unit) and ARROW-835 (Add Timedelta
> > > type to describe time intervals).
> > >
> > > I submitted a PR [1] that introduces a new DurationType that models
> > > (sub)seconds (excluding leap seconds) as a 8-byte integer type.  Some
> of
> > > these issues have been discussed previously, the most recent thread was
> > > within the last month [2].
> > >
> > > The reason for creating a new type is to avoid breaking changes with
> > > existing types (in particular Interval[DAY_TIME] in Java).I think
> > > things worth discussing are:
> > >
> > > 1.  Is this a desirable change in principle?
> > > 2.  Naming: is DurationInterval a good name (should it be TimeDelta)?
> > > 3.  New Type: Should this be collapsed as a new enum on Interval
> (because
> > > it excludes leap-seconds, I think it still technically falls into the
> class
> > > of Calendar like objects).
> > >
> > > Please feel free to add items for discussion.
> > >
> > > I'm not sure the typical time that discussions are held open for, but
> it
> > > would be great if we could try to get to a consensus sometime soon (and
> > > then schedule a vote).  Maybe early next week is a good goal to aim
> for?
> > >
> > > Thanks,
> > > Micah
> > >
> > >
> > > [1] https://github.com/apache/arrow/pull/3644
> > > [2]
> > >
> https://lists.apache.org/thread.html/0e606a6afd2332b4ae5b4382e533bea309c790ea71c05047cf983372@%3Cdev.arrow.apache.org%3E
> > >
>


Re: [DISCUSS][Format] Time Interval Changes

2019-03-27 Thread Wes McKinney
hi Micah,

Sorry for the delay.

I'm in favor of introducing the Duration/DurationInterval type to
unblock the difference-of-timestamps / timedelta use case that many
Arrow users have. I'd like Jacques or someone from the Java side to
comment about this before starting a vote.

We can merge these changes into a feature branch and I or someone else
can complete the C++ side and work on integration tests (so we
eventually have proof of two complete implementations)

I'm not sure what to do with the existing YEAR_MONTH and DAY_TIME
interval types. These are featured in a number of SQL database systems
and so one option is to simply leave them as is.

Thanks
Wes

On Sat, Mar 23, 2019 at 12:58 AM Micah Kornfield  wrote:
>
> Hi arrow-dev,
> I just wanted to bump this thread to see if anyone wanted to comment or
> discuss a path forward.
>
> If no one chimes in by Monday evening, could I ask a PMC member to start a
> vote on Tuesday (I believe a member of the PMC needs to initiate a vote?)
>
> I will implement the C++ side once there is consensus around the change to
> the format.
>
> Thanks,
> Micah
>
> On Tue, Mar 19, 2019 at 12:13 AM Micah Kornfield 
> wrote:
>
> > Hi Arrow Dev,
> > Based on the recent thread on discussing and voting on changes to files
> > under format, I'd figure I'd try see how the process works for changes to
> > Schema.fbs to close out lingering time interval issues.  In particular,
> > ARROW-352 (Interval(DAY_TIME) has no unit) and ARROW-835 (Add Timedelta
> > type to describe time intervals).
> >
> > I submitted a PR [1] that introduces a new DurationType that models
> > (sub)seconds (excluding leap seconds) as a 8-byte integer type.  Some of
> > these issues have been discussed previously, the most recent thread was
> > within the last month [2].
> >
> > The reason for creating a new type is to avoid breaking changes with
> > existing types (in particular Interval[DAY_TIME] in Java).I think
> > things worth discussing are:
> >
> > 1.  Is this a desirable change in principle?
> > 2.  Naming: is DurationInterval a good name (should it be TimeDelta)?
> > 3.  New Type: Should this be collapsed as a new enum on Interval (because
> > it excludes leap-seconds, I think it still technically falls into the class
> > of Calendar like objects).
> >
> > Please feel free to add items for discussion.
> >
> > I'm not sure the typical time that discussions are held open for, but it
> > would be great if we could try to get to a consensus sometime soon (and
> > then schedule a vote).  Maybe early next week is a good goal to aim for?
> >
> > Thanks,
> > Micah
> >
> >
> > [1] https://github.com/apache/arrow/pull/3644
> > [2]
> > https://lists.apache.org/thread.html/0e606a6afd2332b4ae5b4382e533bea309c790ea71c05047cf983372@%3Cdev.arrow.apache.org%3E
> >


Re: [DISCUSS][Format] Time Interval Changes

2019-03-22 Thread Micah Kornfield
Hi arrow-dev,
I just wanted to bump this thread to see if anyone wanted to comment or
discuss a path forward.

If no one chimes in by Monday evening, could I ask a PMC member to start a
vote on Tuesday (I believe a member of the PMC needs to initiate a vote?)

I will implement the C++ side once there is consensus around the change to
the format.

Thanks,
Micah

On Tue, Mar 19, 2019 at 12:13 AM Micah Kornfield 
wrote:

> Hi Arrow Dev,
> Based on the recent thread on discussing and voting on changes to files
> under format, I'd figure I'd try see how the process works for changes to
> Schema.fbs to close out lingering time interval issues.  In particular,
> ARROW-352 (Interval(DAY_TIME) has no unit) and ARROW-835 (Add Timedelta
> type to describe time intervals).
>
> I submitted a PR [1] that introduces a new DurationType that models
> (sub)seconds (excluding leap seconds) as a 8-byte integer type.  Some of
> these issues have been discussed previously, the most recent thread was
> within the last month [2].
>
> The reason for creating a new type is to avoid breaking changes with
> existing types (in particular Interval[DAY_TIME] in Java).I think
> things worth discussing are:
>
> 1.  Is this a desirable change in principle?
> 2.  Naming: is DurationInterval a good name (should it be TimeDelta)?
> 3.  New Type: Should this be collapsed as a new enum on Interval (because
> it excludes leap-seconds, I think it still technically falls into the class
> of Calendar like objects).
>
> Please feel free to add items for discussion.
>
> I'm not sure the typical time that discussions are held open for, but it
> would be great if we could try to get to a consensus sometime soon (and
> then schedule a vote).  Maybe early next week is a good goal to aim for?
>
> Thanks,
> Micah
>
>
> [1] https://github.com/apache/arrow/pull/3644
> [2]
> https://lists.apache.org/thread.html/0e606a6afd2332b4ae5b4382e533bea309c790ea71c05047cf983372@%3Cdev.arrow.apache.org%3E
>


[DISCUSS][Format] Time Interval Changes

2019-03-19 Thread Micah Kornfield
Hi Arrow Dev,
Based on the recent thread on discussing and voting on changes to files
under format, I'd figure I'd try see how the process works for changes to
Schema.fbs to close out lingering time interval issues.  In particular,
ARROW-352 (Interval(DAY_TIME) has no unit) and ARROW-835 (Add Timedelta
type to describe time intervals).

I submitted a PR [1] that introduces a new DurationType that models
(sub)seconds (excluding leap seconds) as a 8-byte integer type.  Some of
these issues have been discussed previously, the most recent thread was
within the last month [2].

The reason for creating a new type is to avoid breaking changes with
existing types (in particular Interval[DAY_TIME] in Java).I think
things worth discussing are:

1.  Is this a desirable change in principle?
2.  Naming: is DurationInterval a good name (should it be TimeDelta)?
3.  New Type: Should this be collapsed as a new enum on Interval (because
it excludes leap-seconds, I think it still technically falls into the class
of Calendar like objects).

Please feel free to add items for discussion.

I'm not sure the typical time that discussions are held open for, but it
would be great if we could try to get to a consensus sometime soon (and
then schedule a vote).  Maybe early next week is a good goal to aim for?

Thanks,
Micah


[1] https://github.com/apache/arrow/pull/3644
[2]
https://lists.apache.org/thread.html/0e606a6afd2332b4ae5b4382e533bea309c790ea71c05047cf983372@%3Cdev.arrow.apache.org%3E