Re: [Numpy-discussion] NumPy date/time types and the resolution concept
A Tuesday 15 July 2008, Pierre GM escrigué: On Tuesday 15 July 2008 07:30:09 Francesc Alted wrote: Maybe is only that. But by using the term 'frequency' I tend to think that you are expecting to have one entry (observation) in your array for each time 'tick' since time start. OTOH, the term 'resolution' doesn't have this implication, and only states the precision of the timestamp. OK, now I get it. I don't know whether my impression is true or not, but after reading about your TimeSeries package, I'm still thinking that this expectation of one observation per 'tick' was what driven you to choose the 'frequency' name. Well, we do require a one point per tick for some operations, such as conversion from one frequency to another, but only for TimeSeries. A Date Array doesn't have to be regularly spaced. Ok, I see. So, it is just the 'frequency' keyword that was misleading me. Thanks for the clarification. Cheers, -- Francesc Alted ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] NumPy date/time types and the resolution concept
A Tuesday 15 July 2008, Anne Archibald escrigué: 2008/7/15 Francesc Alted [EMAIL PROTECTED]: Maybe is only that. But by using the term 'frequency' I tend to think that you are expecting to have one entry (observation) in your array for each time 'tick' since time start. OTOH, the term 'resolution' doesn't have this implication, and only states the precision of the timestamp. Well, after reading the mails from Chris and Anne, I think the best is that the origin would be kept as an int64 with a resolution of microseconds (for compatibility with the ``datetime`` module, as I've said before). A couple of details worth pointing out: we don't need a zillion resolutions. One that's as good as the world time standards, and one that spans an adequate length of time should cover it. After all, the only reason for not using the highest available resolution is if you want to cover a larger range of times. So there is no real need for microseconds and milliseconds and seconds and days and weeks and... Maybe you are right, but by providing many resolutions we are trying to cope with the needs of people that are using them a lot. In particular, we are willing that the authors of the timseries scikit can find on these new dtype a fair replacement of their Date class (our proposal will be not so featured, but...). There is also no need for the origin to be kept with a resolution as high as microseconds; seconds would do just fine, since if necessary it can be interpreted as exactly 7000 seconds after the epoch even if you are using femtoseconds elsewhere. Good point. However, we finally managed to not include the ``origin`` metadata in our new proposal. Have a look at the second proposal that I'll be posting soon for details. Cheers, -- Francesc Alted ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] NumPy date/time types and the resolution concept
A Monday 14 July 2008, Pierre GM escrigué: On Monday 14 July 2008 14:17:18 Francesc Alted wrote: Well, what we are after is precisely this: a new dtype type. After integrating it in NumPy, I suppose that your DateArray would be similar than a NumPy array with a dtype ``datetime64`` (bar the conceptual differences between your 'frequency' behind DateArray and the 'resolution' behind the datetime64 dtype). Well, you're losing me on this one: could you explain the difference between the two concepts ? It might only be a problem of vocabulary... Maybe is only that. But by using the term 'frequency' I tend to think that you are expecting to have one entry (observation) in your array for each time 'tick' since time start. OTOH, the term 'resolution' doesn't have this implication, and only states the precision of the timestamp. I don't know whether my impression is true or not, but after reading about your TimeSeries package, I'm still thinking that this expectation of one observation per 'tick' was what driven you to choose the 'frequency' name. It would start when the origin tells that it should start. It is important to note that our proposal will not force a '7d' (seven days) 'tick' to start on monday, or a '1m' (one month) to start the 1st day of a calendar month, but rather where the user decides to set its origin. OK, so we need 2 flags, one for the resolution, one for the origin. Because there won't be that many resolution possible, an int8 should be sufficient. What do you have in mind for the origin ? When using a resolution coarser than 1d (7d, 1m, 3m, 1a), an origin in day is OK. What about less than a day ? Well, after reading the mails from Chris and Anne, I think the best is that the origin would be kept as an int64 with a resolution of microseconds (for compatibility with the ``datetime`` module, as I've said before). Cheers, -- Francesc Alted ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] NumPy date/time types and the resolution concept
On Tuesday 15 July 2008 07:30:09 Francesc Alted wrote: Maybe is only that. But by using the term 'frequency' I tend to think that you are expecting to have one entry (observation) in your array for each time 'tick' since time start. OTOH, the term 'resolution' doesn't have this implication, and only states the precision of the timestamp. OK, now I get it. I don't know whether my impression is true or not, but after reading about your TimeSeries package, I'm still thinking that this expectation of one observation per 'tick' was what driven you to choose the 'frequency' name. Well, we do require a one point per tick for some operations, such as conversion from one frequency to another, but only for TimeSeries. A Date Array doesn't have to be regularly spaced. ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] NumPy date/time types and the resolution concept
2008/7/15 Francesc Alted [EMAIL PROTECTED]: Maybe is only that. But by using the term 'frequency' I tend to think that you are expecting to have one entry (observation) in your array for each time 'tick' since time start. OTOH, the term 'resolution' doesn't have this implication, and only states the precision of the timestamp. Well, after reading the mails from Chris and Anne, I think the best is that the origin would be kept as an int64 with a resolution of microseconds (for compatibility with the ``datetime`` module, as I've said before). A couple of details worth pointing out: we don't need a zillion resolutions. One that's as good as the world time standards, and one that spans an adequate length of time should cover it. After all, the only reason for not using the highest available resolution is if you want to cover a larger range of times. So there is no real need for microseconds and milliseconds and seconds and days and weeks and... There is also no need for the origin to be kept with a resolution as high as microseconds; seconds would do just fine, since if necessary it can be interpreted as exactly 7000 seconds after the epoch even if you are using femtoseconds elsewhere. Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] NumPy date/time types and the resolution concept
Hi, Before giving more thought to the new proposal of the date/time types for NumPy based in the concept of 'resolution', I'd like to gather more feedback on your opinions about this. After pondering about the opinions about the first proposal, the idea we are incubating is to complement the ``datetime64`` with a 'resolution' metainfo. The ``datetime64`` will still be based on a int64 type, but the meaning of the 'ticks' would depend on a 'resolution' property. This is best seen with an example: In [21]: numpy.arange(3, dtype=numpy.dtype('datetime64', 'sec')) Out [21]: [1970-01-01T00:00:00, 1970-01-01T00:00:01, 1970-01-01T00:00:02] In [22]: numpy.arange(3, dtype=numpy.dtype('datetime64', 'hour')) Out [22]: [1970-01-01T00, 1970-01-01T01, 1970-01-01T02] i.e. the 'resolution' gives the actual meaning to the 'int64' counter. The advantage of this abstraction is that the user can easily choose the scale of resolution that better fits his need. I'm thinking in providing the next resolutions: [femtosec, picosec, nanosec, microsec, millisec, sec, min, hour, month, year] Also, together with the absolute ``datetime64`` one can have a relative counterpart, say, ``timedelta64`` that also supports the notion of 'resolution'. Between both one would cover the needs for most uses, while providing the user with a lot of flexibility, IMO. We very much prefer this new approach than the stated in our first proposal. Now, it comes the tricky part: how to integrate the notion of 'resolution' with the 'dtype' data type factory of NumPy? Well, we have thought a couple of possibilities. 1) Using the NumPy 'dtype' factory: nanoabs = numpy.dtype('datetime64', resolution=nanosec) nanorel = numpy.dtype('timedelta64', resolution=nanosec) 2) Extending the string notation by using the '[]' square brackets: nanoabs = numpy.dtype('datetime64[nanosec]') # long notation nanoabs = numpy.dtype('T[nanosec]') # short notation nanorel = numpy.dtype('timedelta64[nanosec]') # long notation nanorel = numpy.dtype('t[nanosec]') # short notation With these building blocks, one may obtain more complex dtype structures easily. Now, the question is: would that proposal enter in conflict with the spirit of the current 'dtype' factory? And another important one, would that complicate the implementation too much? If the answer to the both previous questions is 'no', then we will study this more and provide another proposal based on this. BTW, I suppose that the best candidate to answer these would be Travis O., but if anybody feels brave enough ;-) please go ahead and give your advice. Cheers, -- Francesc Alted ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] NumPy date/time types and the resolution concept
On Monday 14 July 2008 09:07:47 Francesc Alted wrote: The advantage of this abstraction is that the user can easily choose the scale of resolution that better fits his need. I'm thinking in providing the next resolutions: [femtosec, picosec, nanosec, microsec, millisec, sec, min, hour, month, year] In TimeSeries, we don't have anything less than a second, but we have 'daily', 'business daily', 'weekly' and 'quarterly' resolutions. A very useful point that Matt Knox had coded is the possibility to specify starting points for switching from one resolution to another. For example, you can have a series with a 'ANN_MAR' frequency, that corresponds to 1 point a year, the year starting in April. When switching back to a monthly resolution, the points from January to March of the first year will be masked. Another useful point would be allow the user to define his/her own resolution (every 15min, every 12h...). Right now it's a bit clunky in TimeSeries, we have to use the lowest resolution of the series (min, hour) and leave a lot of blanks (TimeSeries don't have to be regularly spaced, but it helps...) Now, it comes the tricky part: how to integrate the notion of 'resolution' with the 'dtype' data type factory of NumPy? In TimeSeries, the frequency is stored as an integer. For example, a daily frequency is stored as 6000, an annual frequency as 1000, a 'ANN_MAR' frequency as 1003... ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] NumPy date/time types and the resolution concept
On Mon, 14 Jul 2008, Francesc Alted apparently wrote: Before giving more thought to the new proposal of the date/time types for NumPy based in the concept of 'resolution', I'd like to gather more feedback on your opinions about this. It might be a good idea to run the proposal(s) past Marc-Andre Lemburg mal (at) egenix (dot) com Cheers, Alan Isaac ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] NumPy date/time types and the resolution concept
2008/7/14 Francesc Alted [EMAIL PROTECTED]: After pondering about the opinions about the first proposal, the idea we are incubating is to complement the ``datetime64`` with a 'resolution' metainfo. The ``datetime64`` will still be based on a int64 type, but the meaning of the 'ticks' would depend on a 'resolution' property. This is an interesting idea. To be useful, though, you would also need a flexible offset defining the zero of time. After all, the reason not to just always use (say) femtosecond accuracy is that 2**64 femtoseconds is only about five hours. So if you're going to use femtosecond steps, you really want to choose your start point carefully. (It's also worth noting that there is little need for more time accuracy than atomic clocks can provide, since anyone looking for more than that is going to be doing some tricky metrology anyway.) One might take guidance from the FITS format, which represents (arrays of) quantities as (usually) fixed-point numbers, but has a global scale and offset parameter for each array. This allows one to accurately represent many common arrays with relatively few bits. The FITS libraries transparently convert these quantities. Of course, this isn't so convenient if you don't have basic machine datatypes with enough precision to handle all the quantities of interest. Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] NumPy date/time types and the resolution concept
A Monday 14 July 2008, Alan G Isaac escrigué: On Mon, 14 Jul 2008, Francesc Alted apparently wrote: Before giving more thought to the new proposal of the date/time types for NumPy based in the concept of 'resolution', I'd like to gather more feedback on your opinions about this. It might be a good idea to run the proposal(s) past Marc-Andre Lemburg mal (at) egenix (dot) com Sure. And maybe also to Fred Drake, the original autor of the ``datetime`` module. However, I'd prefer to send them something in a more advanced state of refinement than it is now. Thanks for the suggestion, -- Francesc Alted ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] NumPy date/time types and the resolution concept
A Monday 14 July 2008, Pierre GM escrigué: On Monday 14 July 2008 12:50:21 Francesc Alted wrote: A very useful point that Matt Knox had coded is the possibility to specify starting points for switching from one resolution to another. For example, you can have a series with a 'ANN_MAR' frequency, that corresponds to 1 point a year, the year starting in April. When switching back to a monthly resolution, the points from January to March of the first year will be masked. Ok. Ann was also suggesting that the origin of time would be configurable, but then, you are talking about *masking* values. Mmm, I don't think we should try to incorporate masking capabilities in the NumPy date/time types. Francesc, In scikits.timeseries, we have 2 different objects: * DateArray, which is basically a ndarray of integers with a given 'frequency' attribute. * TimeSeries, which is basically the combination of a MaskedArray (the data part) and a DateArray (which keeps track of the date corresponding to each data point. TimeSeries object have the resolution/origin of the companion DateArray, and when they're converted from one resolution to another, some masking may occur. My understanding is that you intend to define an object similar to DateArray. You want to define a new dtype (datetime64 or other), we used yet another class instead, Date. A dtype would be easier to manipulate, but as neither Matt nor I were particularly experienced with that at the time, we followed the simpler approach of a class... Well, what we are after is precisely this: a new dtype type. After integrating it in NumPy, I suppose that your DateArray would be similar than a NumPy array with a dtype ``datetime64`` (bar the conceptual differences between your 'frequency' behind DateArray and the 'resolution' behind the datetime64 dtype). [N]timeunit where ``timeunit`` can take the values in: ['y', 'm', 'd', 'h', 'm', 's', 'ms', 'us', 'ns', 'fs'] so, for example, '14d' means a resolution of 14 days, or '10ms' means a resolution of 1 hundreth of second. Sounds good to me. What other people think? Sounds pretty cool and intuitive to use. However, writing the conversion rules from one to another will be a lot of fun. Take weekly, for example: that's a period of 7 days, but when does it start ? On a monday ? Then, 12/31/2007 was the start of the first week of 2008... OK, we can leave that problem for the moment... It would start when the origin tells that it should start. It is important to note that our proposal will not force a '7d' (seven days) 'tick' to start on monday, or a '1m' (one month) to start the 1st day of a calendar month, but rather where the user decides to set its origin. Cheers, -- Francesc Alted ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] NumPy date/time types and the resolution concept
On Monday 14 July 2008 14:17:18 Francesc Alted wrote: Well, what we are after is precisely this: a new dtype type. After integrating it in NumPy, I suppose that your DateArray would be similar than a NumPy array with a dtype ``datetime64`` (bar the conceptual differences between your 'frequency' behind DateArray and the 'resolution' behind the datetime64 dtype). Well, you're losing me on this one: could you explain the difference between the two concepts ? It might only be a problem of vocabulary... It would start when the origin tells that it should start. It is important to note that our proposal will not force a '7d' (seven days) 'tick' to start on monday, or a '1m' (one month) to start the 1st day of a calendar month, but rather where the user decides to set its origin. OK, so we need 2 flags, one for the resolution, one for the origin. Because there won't be that many resolution possible, an int8 should be sufficient. What do you have in mind for the origin ? When using a resolution coarser than 1d (7d, 1m, 3m, 1a), an origin in day is OK. What about less than a day ? ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion