Re: [Numpy-discussion] NumPy date/time types and the resolution concept

2008-07-17 Thread Francesc Alted
A Tuesday 15 July 2008, Pierre GM escrigué:
 On Tuesday 15 July 2008 07:30:09 Francesc Alted wrote:
  Maybe is only that.  But by using the term 'frequency' I tend to
  think that you are expecting to have one entry (observation) in
  your array for each time 'tick' since time start.  OTOH, the term
  'resolution' doesn't have this implication, and only states the
  precision of the timestamp.

 OK, now I get it.

  I don't know whether my impression is true or not, but after
  reading about your TimeSeries package, I'm still thinking that this
  expectation of one observation per 'tick' was what driven you to
  choose the 'frequency' name.

 Well, we do require a one point per tick for some operations, such
 as conversion from one frequency to another, but only for TimeSeries.
 A Date Array doesn't have to be regularly spaced.

Ok, I see.  So, it is just the 'frequency' keyword that was misleading 
me.  Thanks for the clarification.

Cheers,

-- 
Francesc Alted
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] NumPy date/time types and the resolution concept

2008-07-16 Thread Francesc Alted
A Tuesday 15 July 2008, Anne Archibald escrigué:
 2008/7/15 Francesc Alted [EMAIL PROTECTED]:
  Maybe is only that.  But by using the term 'frequency' I tend to
  think that you are expecting to have one entry (observation) in
  your array for each time 'tick' since time start.  OTOH, the term
  'resolution' doesn't have this implication, and only states the
  precision of the timestamp.
 
  Well, after reading the mails from Chris and Anne, I think the best
  is that the origin would be kept as an int64 with a resolution of
  microseconds (for compatibility with the ``datetime`` module, as
  I've said before).

 A couple of details worth pointing out: we don't need a zillion
 resolutions. One that's as good as the world time standards, and one
 that spans an adequate length of time should cover it. After all, the
 only reason for not using the highest available resolution is if you
 want to cover a larger range of times. So there is no real need for
 microseconds and milliseconds and seconds and days and weeks and...

Maybe you are right, but by providing many resolutions we are trying to 
cope with the needs of people that are using them a lot.  In 
particular, we are willing that the authors of the timseries scikit can 
find on these new dtype a fair replacement of their Date class (our 
proposal will be not so featured, but...).

 There is also no need for the origin to be kept with a resolution as
 high as microseconds; seconds would do just fine, since if necessary
 it can be interpreted as exactly 7000 seconds after the epoch even
 if you are using femtoseconds elsewhere.

Good point.  However, we finally managed to not include the ``origin`` 
metadata in our new proposal.  Have a look at the second proposal that 
I'll be posting soon for details.

Cheers,

-- 
Francesc Alted
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] NumPy date/time types and the resolution concept

2008-07-15 Thread Francesc Alted
A Monday 14 July 2008, Pierre GM escrigué:
 On Monday 14 July 2008 14:17:18 Francesc Alted wrote:
  Well, what we are after is precisely this: a new dtype type.  After
  integrating it in NumPy, I suppose that your DateArray would be
  similar than a NumPy array with a dtype ``datetime64`` (bar the
  conceptual differences between your 'frequency' behind DateArray
  and
  the 'resolution' behind the datetime64 dtype).

 Well, you're losing me on this one: could you explain the difference
 between the two concepts ? It might only be a problem of
 vocabulary...

Maybe is only that.  But by using the term 'frequency' I tend to think 
that you are expecting to have one entry (observation) in your array 
for each time 'tick' since time start.  OTOH, the term 'resolution' 
doesn't have this implication, and only states the precision of the 
timestamp.

I don't know whether my impression is true or not, but after reading 
about your TimeSeries package, I'm still thinking that this expectation 
of one observation per 'tick' was what driven you to choose 
the 'frequency' name.

  It would start when the origin tells that it should start.  It is
  important to note that our proposal will not force a '7d' (seven
  days) 'tick' to start on monday, or a '1m' (one month) to start the
  1st day of a calendar month, but rather where the user decides to
  set its origin.

 OK, so we need 2 flags, one for the resolution, one for the origin.
 Because there won't be that many resolution possible, an int8 should
 be sufficient. What do you have in mind for the origin ? When using a
 resolution coarser than 1d (7d, 1m, 3m, 1a), an origin in day is OK.
 What about less than a day ?

Well, after reading the mails from Chris and Anne, I think the best is 
that the origin would be kept as an int64 with a resolution of 
microseconds (for compatibility with the ``datetime`` module, as I've 
said before).

Cheers,

-- 
Francesc Alted
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] NumPy date/time types and the resolution concept

2008-07-15 Thread Pierre GM
On Tuesday 15 July 2008 07:30:09 Francesc Alted wrote:
 Maybe is only that.  But by using the term 'frequency' I tend to think
 that you are expecting to have one entry (observation) in your array
 for each time 'tick' since time start.  OTOH, the term 'resolution'
 doesn't have this implication, and only states the precision of the
 timestamp.

OK, now I get it.

 I don't know whether my impression is true or not, but after reading
 about your TimeSeries package, I'm still thinking that this expectation
 of one observation per 'tick' was what driven you to choose
 the 'frequency' name.

Well, we do require a one point per tick for some operations, such as 
conversion from one frequency to another, but only for TimeSeries. A Date 
Array doesn't have to be regularly spaced.
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] NumPy date/time types and the resolution concept

2008-07-15 Thread Anne Archibald
2008/7/15 Francesc Alted [EMAIL PROTECTED]:

 Maybe is only that.  But by using the term 'frequency' I tend to think
 that you are expecting to have one entry (observation) in your array
 for each time 'tick' since time start.  OTOH, the term 'resolution'
 doesn't have this implication, and only states the precision of the
 timestamp.

 Well, after reading the mails from Chris and Anne, I think the best is
 that the origin would be kept as an int64 with a resolution of
 microseconds (for compatibility with the ``datetime`` module, as I've
 said before).

A couple of details worth pointing out: we don't need a zillion
resolutions. One that's as good as the world time standards, and one
that spans an adequate length of time should cover it. After all, the
only reason for not using the highest available resolution is if you
want to cover a larger range of times. So there is no real need for
microseconds and milliseconds and seconds and days and weeks and...

There is also no need for the origin to be kept with a resolution as
high as microseconds; seconds would do just fine, since if necessary
it can be interpreted as exactly 7000 seconds after the epoch even
if you are using femtoseconds elsewhere.

Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] NumPy date/time types and the resolution concept

2008-07-14 Thread Francesc Alted
Hi,

Before giving more thought to the new proposal of the date/time types 
for NumPy based in the concept of 'resolution', I'd like to gather more 
feedback on your opinions about this.

After pondering about the opinions about the first proposal, the idea we 
are incubating is to complement the ``datetime64`` with a 'resolution' 
metainfo.  The ``datetime64`` will still be based on a int64 type, but 
the meaning of the 'ticks' would depend on a 'resolution' property.  
This is best seen with an example:

In [21]: numpy.arange(3, dtype=numpy.dtype('datetime64', 'sec'))
Out [21]: [1970-01-01T00:00:00, 1970-01-01T00:00:01, 
1970-01-01T00:00:02]

In [22]: numpy.arange(3, dtype=numpy.dtype('datetime64', 'hour'))
Out [22]: [1970-01-01T00, 1970-01-01T01, 1970-01-01T02]

i.e. the 'resolution' gives the actual meaning to the 'int64' counter.

The advantage of this abstraction is that the user can easily choose the 
scale of resolution that better fits his need.  I'm thinking in 
providing the next resolutions:

[femtosec, picosec, nanosec, microsec, millisec, sec, min,
hour, month, year]

Also, together with the absolute ``datetime64`` one can have a relative 
counterpart, say, ``timedelta64`` that also supports the notion 
of 'resolution'.  Between both one would cover the needs for most uses, 
while providing the user with a lot of flexibility, IMO.  We very much 
prefer this new approach than the stated in our first proposal.

Now, it comes the tricky part: how to integrate the notion 
of 'resolution' with the 'dtype' data type factory of NumPy?  Well, we 
have thought a couple of possibilities.

1) Using the NumPy 'dtype' factory:

nanoabs = numpy.dtype('datetime64', resolution=nanosec)
nanorel = numpy.dtype('timedelta64', resolution=nanosec)

2) Extending the string notation by using the '[]' square brackets:

nanoabs = numpy.dtype('datetime64[nanosec]')  # long notation
nanoabs = numpy.dtype('T[nanosec]')  # short notation
nanorel = numpy.dtype('timedelta64[nanosec]')  # long notation
nanorel = numpy.dtype('t[nanosec]')  # short notation

With these building blocks, one may obtain more complex dtype structures 
easily.

Now, the question is:  would that proposal enter in conflict with the 
spirit of the current 'dtype' factory?  And another important one, 
would that complicate the implementation too much?

If the answer to the both previous questions is 'no', then we will study 
this more and provide another proposal based on this.  BTW, I suppose 
that the best candidate to answer these would be Travis O., but if 
anybody feels brave enough ;-) please go ahead and give your advice.

Cheers,

-- 
Francesc Alted
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] NumPy date/time types and the resolution concept

2008-07-14 Thread Pierre GM
On Monday 14 July 2008 09:07:47 Francesc Alted wrote:
 The advantage of this abstraction is that the user can easily choose the
 scale of resolution that better fits his need.  I'm thinking in
 providing the next resolutions:

 [femtosec, picosec, nanosec, microsec, millisec, sec, min,
 hour, month, year]

In TimeSeries, we don't have anything less than a second, but we 
have 'daily', 'business daily', 'weekly' and 'quarterly' resolutions. 

A very useful point that Matt Knox had coded is the possibility to specify 
starting points for switching from one resolution to another. For example, 
you can have a series with a 'ANN_MAR' frequency, that corresponds to 1 point 
a year, the year starting in April. When switching back to a monthly 
resolution, the points from January to March of the first year will be 
masked.

Another useful point would be allow the user to define his/her own resolution 
(every 15min, every 12h...). Right now it's a bit clunky in TimeSeries, we 
have to use the lowest resolution of the series (min, hour) and leave a lot 
of blanks (TimeSeries don't have to be regularly spaced, but it helps...)

 Now, it comes the tricky part: how to integrate the notion
 of 'resolution' with the 'dtype' data type factory of NumPy?  

In TimeSeries, the frequency is stored as an integer. For example, a daily 
frequency is stored as 6000, an annual frequency as 1000, a 'ANN_MAR' 
frequency as 1003...
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] NumPy date/time types and the resolution concept

2008-07-14 Thread Alan G Isaac
On Mon, 14 Jul 2008, Francesc Alted apparently wrote:
 Before giving more thought to the new proposal of the 
 date/time types for NumPy based in the concept of 
 'resolution', I'd like to gather more feedback on your 
 opinions about this. 

It might be a good idea to run the proposal(s) past
Marc-Andre Lemburg mal (at) egenix (dot) com

Cheers,
Alan Isaac



___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] NumPy date/time types and the resolution concept

2008-07-14 Thread Anne Archibald
2008/7/14 Francesc Alted [EMAIL PROTECTED]:

 After pondering about the opinions about the first proposal, the idea we
 are incubating is to complement the ``datetime64`` with a 'resolution'
 metainfo.  The ``datetime64`` will still be based on a int64 type, but
 the meaning of the 'ticks' would depend on a 'resolution' property.

This is an interesting idea. To be useful, though, you would also need
a flexible offset defining the zero of time. After all, the reason
not to just always use (say) femtosecond accuracy is that 2**64
femtoseconds is only about five hours. So if you're going to use
femtosecond steps, you really want to choose your start point
carefully. (It's also worth noting that there is little need for more
time accuracy than atomic clocks can provide, since anyone looking for
more than that is going to be doing some tricky metrology anyway.)

One might take guidance from the FITS format, which represents (arrays
of) quantities as (usually) fixed-point numbers, but has a global
scale and offset parameter for each array. This allows one to
accurately represent many common arrays with relatively few bits. The
FITS libraries transparently convert these quantities. Of course, this
isn't so convenient if you don't have basic machine datatypes with
enough precision to handle all the quantities of interest.

Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] NumPy date/time types and the resolution concept

2008-07-14 Thread Francesc Alted
A Monday 14 July 2008, Alan G Isaac escrigué:
 On Mon, 14 Jul 2008, Francesc Alted apparently wrote:
  Before giving more thought to the new proposal of the
  date/time types for NumPy based in the concept of
  'resolution', I'd like to gather more feedback on your
  opinions about this.

 It might be a good idea to run the proposal(s) past
 Marc-Andre Lemburg mal (at) egenix (dot) com

Sure.  And maybe also to Fred Drake, the original autor of the 
``datetime`` module.  However, I'd prefer to send them something in a 
more advanced state of refinement than it is now.

Thanks for the suggestion,

-- 
Francesc Alted
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] NumPy date/time types and the resolution concept

2008-07-14 Thread Francesc Alted
A Monday 14 July 2008, Pierre GM escrigué:
 On Monday 14 July 2008 12:50:21 Francesc Alted wrote:
   A very useful point that Matt Knox had coded is the possibility
   to specify starting points for switching from one resolution to
   another. For example, you can have a series with a 'ANN_MAR'
   frequency, that corresponds to 1 point a year, the year starting
   in April. When switching back to a monthly resolution, the points
   from January to March of the first year will be masked.
 
  Ok.  Ann was also suggesting that the origin of time would be
  configurable, but then, you are talking about *masking* values. 
  Mmm, I don't think we should try to incorporate masking
  capabilities in the NumPy date/time types.

 Francesc,
 In scikits.timeseries, we have 2 different objects:
 * DateArray, which is basically a ndarray of integers with a given
 'frequency' attribute.
 * TimeSeries, which is basically the combination of a MaskedArray
 (the data part) and a DateArray (which keeps track of the date
 corresponding to each data point. TimeSeries object have the
 resolution/origin of the companion DateArray, and when they're
 converted from one resolution to another, some masking may occur.

 My understanding is that you intend to define an object similar to
 DateArray. You want to define a new dtype (datetime64 or other), we
 used yet another class instead, Date. A dtype would be easier to
 manipulate, but as neither Matt nor I were particularly experienced
 with that at the time, we followed the simpler approach of a class...

Well, what we are after is precisely this: a new dtype type.  After 
integrating it in NumPy, I suppose that your DateArray would be similar 
than a NumPy array with a dtype ``datetime64`` (bar the conceptual 
differences between your 'frequency' behind DateArray and 
the 'resolution' behind the datetime64 dtype).


  [N]timeunit
 
  where ``timeunit`` can take the values in:
 
  ['y', 'm', 'd', 'h', 'm', 's', 'ms', 'us', 'ns', 'fs']
 
  so, for example, '14d' means a resolution of 14 days, or '10ms'
  means a resolution of 1 hundreth of second.  Sounds good to me. 
  What other people think?

 Sounds pretty cool and intuitive to use. However, writing the
 conversion rules from one to another will be a lot of fun. Take
 weekly, for example: that's a period of 7 days, but when does it
 start ? On a monday ? Then, 12/31/2007 was the start of the first
 week of 2008... OK, we can leave that problem for the moment...

It would start when the origin tells that it should start.  It is 
important to note that our proposal will not force a '7d' (seven 
days) 'tick' to start on monday, or a '1m' (one month) to start the 1st 
day of a calendar month, but rather where the user decides to set its 
origin.

Cheers,

-- 
Francesc Alted
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] NumPy date/time types and the resolution concept

2008-07-14 Thread Pierre GM
On Monday 14 July 2008 14:17:18 Francesc Alted wrote:
 Well, what we are after is precisely this: a new dtype type.  After
 integrating it in NumPy, I suppose that your DateArray would be similar
 than a NumPy array with a dtype ``datetime64`` (bar the conceptual
 differences between your 'frequency' behind DateArray and
 the 'resolution' behind the datetime64 dtype).

Well, you're losing me on this one: could you explain the difference between 
the two concepts ? It might only be a problem of vocabulary...


 It would start when the origin tells that it should start.  It is
 important to note that our proposal will not force a '7d' (seven
 days) 'tick' to start on monday, or a '1m' (one month) to start the 1st
 day of a calendar month, but rather where the user decides to set its
 origin.

OK, so we need 2 flags, one for the resolution, one for the origin. Because 
there won't be that many resolution possible, an int8 should be sufficient. 
What do you have in mind for the origin ? When using a resolution coarser 
than 1d (7d, 1m, 3m, 1a), an origin in day is OK. What about less than a 
day ?


___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion