On Wed, Jun 8, 2011 at 7:36 AM, Chris Barker <chris.bar...@noaa.gov> wrote: > On 6/7/11 4:53 PM, Pierre GM wrote: >> Anyhow, each time yo >> read 'frequency' in scikits.timeseries, think 'unit'. > > or maybe "precision" -- when I think if unit, I think of something that > can be represented as a floating point value -- but here, with integers, > it's the precision that can be represented. Just a thought. > >> Well, it can be argued that the epoch is 0... > > yes, but that really should be transparent to the user -- what epoch is > chosen should influence as little as possible (e.g. only the range of > values representable) > >> Mmh. How would you define a quarter unit ? [3M] ? But then, what if >> you want your year to start in December, say (we often use >> DJF/MAM/JJA/SON as a way to decompose a year in four 'hydrological' >> seasons, for example) > > And the federal fiscal year is Oct - Sept, so the first quarter is (Oct, > Nov, Dec) -- clearly that needs to be flexible. > > > -Chris > > > > > -- > Christopher Barker, Ph.D. > Oceanographer > > Emergency Response Division > NOAA/NOS/OR&R (206) 526-6959 voice > 7600 Sand Point Way NE (206) 526-6329 fax > Seattle, WA 98115 (206) 526-6317 main reception > > chris.bar...@noaa.gov > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion >
Your guys' discussion is a bit overwhelming for me in my currently jet-lagged state ( =) ) but I thought I would comment on a couple things, especially now with the input of another financial Python user (great!). Note that I use scikits.timeseries very little for a few reasons (a bit OT, but...): - Fundamental need to be able to work with multiple time series, especially performing operations involving cross-sectional data - I think it's a bit hard for lay people to use (read: ex-MATLAB/R users). This is just my opinion, but a few years ago I thought about using it and concluded that teaching people how to properly use it (a precision tool, indeed!) was going to cause me grief. - The data alignment problem, best explained in code: In [8]: ts Out[8]: 2000-01-05 00:00:00 0.0503706684002 2000-01-12 00:00:00 -1.7660004939 2000-01-19 00:00:00 1.11716758554 2000-01-26 00:00:00 -0.171029995265 2000-02-02 00:00:00 -0.99876580126 2000-02-09 00:00:00 -0.262729046405 In [9]: ts.index Out[9]: <class 'pandas.core.daterange.DateRange'> offset: <1 Week: kwds={'weekday': 2}, weekday=2>, tzinfo: None [2000-01-05 00:00:00, ..., 2000-02-09 00:00:00] length: 6 In [10]: ts2 = ts[:4] In [11]: ts2.index Out[11]: <class 'pandas.core.daterange.DateRange'> offset: <1 Week: kwds={'weekday': 2}, weekday=2>, tzinfo: None [2000-01-05 00:00:00, ..., 2000-01-26 00:00:00] length: 4 In [12]: ts + ts2 Out[12]: 2000-01-05 00:00:00 0.1007413368 2000-01-12 00:00:00 -3.5320009878 2000-01-19 00:00:00 2.23433517109 2000-01-26 00:00:00 -0.34205999053 2000-02-02 00:00:00 NaN 2000-02-09 00:00:00 NaN Or ts / or ts2 could be completely DateRange-naive (e.g. they have no way of knowing that they are fixed-frequency), or even out of order, and stuff like this will work no problem. I view the "fixed frequency" issue as sort of an afterthought-- if you need it, it's there for you (the DateRange class is a valid Index--"label vector"--for pandas objects, and provides an API for defining custom time deltas). Which leads me to: - Inability to derive custom offsets: I can do: In [14]: ts.shift(2, offset=2 * datetools.BDay()) Out[14]: 2000-01-11 00:00:00 0.0503706684002 2000-01-18 00:00:00 -1.7660004939 2000-01-25 00:00:00 1.11716758554 2000-02-01 00:00:00 -0.171029995265 2000-02-08 00:00:00 -0.99876580126 2000-02-15 00:00:00 -0.262729046405 or even generate, say, 5-minutely or 10-minutely date ranges thusly: In [16]: DateRange('6/8/2011 5:00', '6/8/2011 12:00', offset=datetools.Minute(5)) Out[16]: <class 'pandas.core.daterange.DateRange'> offset: <5 Minutes>, tzinfo: None [2011-06-08 05:00:00, ..., 2011-06-08 12:00:00] length: 85 I'm currently working on high perf reduceat-based resampling methods (e.g. converting secondly data to 5-minutely data). So in summary, w.r.t. time series data and datetime, the only things I care about from a datetime / pandas point of view: - Ability to easily define custom timedeltas - Generate datetime objects, or some equivalent, which can be used to back pandas data structures - (possible now??) Ability to have a set of frequency-naive dates (possibly not in order). This last point actually matters. Suppose you wanted to get the worst 5-performing days in the S&P 500 index: In [7]: spx.index Out[7]: <class 'pandas.core.daterange.DateRange'> offset: <1 BusinessDay>, tzinfo: None [1999-12-31 00:00:00, ..., 2011-05-10 00:00:00] length: 2963 # but this is OK In [8]: spx.order()[:5] Out[8]: 2008-10-15 00:00:00 -0.0903497960942 2008-12-01 00:00:00 -0.0892952780505 2008-09-29 00:00:00 -0.0878970494885 2008-10-09 00:00:00 -0.0761670761671 2008-11-20 00:00:00 -0.0671229140321 - W _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion