Re: [R] Problem with diff(strptime(...

2008-03-21 Thread Jim Lemon
I think I have worked out the problem, and because it may trouble 
others, I take the liberty of explaining it on the mailing list.

When diff is applied to a vector of POSIXt values returned by strptime, 
the units depend upon the smallest interval in the input vector. If that 
interval is less than one day, _all_ of the differences are in seconds. 
If the smallest interval is at least one day, all of the differences are 
in days. This is quite sensible behavior, and I assume it is the clue 
that Prof. Ripley mentioned. However, if the units argument is 
included in the diff call, it has no effect on diff.POSIXt, which I 
think does the calculation (in contrast, difftime does return 0 days 
with units=days).

There may be quite a few R users like me who set up their script using a 
toy dataset and are puzzled when the real dataset produces what looks 
like garbage. Thus I humbly suggest the following additions to the help 
file.

Value
...
When used with times, diff may return different units depending upon the 
type of time object. An object created by as.Date always returns days, 
while a POSIXt object will return days if all differences are at least 
one day, seconds if any are less than one day.


I would also suggest a fix for the underlying code including a units 
argument for diff, except that I could not find it, despite grepping for 
diff in the src directories.

Jim

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Problem with diff(strptime(...

2008-03-20 Thread Jim Lemon
Hi all,

I have been chipping away at a problem I encountered in calculating 
rates per year from a moderately large data file (46412 rows). When I 
ran the following command, I got obviously wrong output:

interval-
  c(NA,as.numeric(diff(
  strptime(mkdf$MEAS_DATE,%d/%m/%Y)))/365.25)

The values in MEAS_DATE looked like this:

mkdf$MEAS_DATE[1:10]
  [1] 1/5/1962  1/5/1963  1/5/1964  1/3/1965  1/4/1966  1/4/1967
  1/6/1968
  [8] 25/3/1969 1/4/1971  1/2/1974
146 Levels: 10/10/1967 1/10/1947 1/10/1965 1/10/1967 1/10/1983 ... 9/1/1992

To abbreviate three evenings of work, I finally found that values 17170 
and 17171 were the same. If I ran the entire set, or anything over 
1:17170, I would get output like this:

interval[1:10]
  [1]NA  86340.86  86577.41  71911.29  93673.92  86340.86
  101006.98
  [8]  70255.44 174337.58 245292.81

If I ran any set of values up to 17170, I would get the correct output:

interval[1:10]
  [1]NA 0.9993155 1.0020534 0.8323066 1.0841889 0.9993155
  1.1690623
  [8] 0.8131417 2.0177960 2.8390372

If I changed value 17171 by one day (and added that level), the command 
worked correctly:

interval[1:10]
  [1]NA 0.9993155 1.0020534 0.8323066 1.0841889 0.9993155
  1.1690623
  [8] 0.8131417 2.0177960 2.8390372

There have been a few messages about this problem, but apparently no 
solution. The problem can be seen with these examples (I haven't 
included the real data as it is not mine):

foodate-c(1/7/1991,1/8/1991,1/8/1991,3/8/1991)
as.numeric(diff(strptime(foodate,%d/%m/%Y))/365.25)
[1] 7333.05950.  473.1006

foodate-factor(c(1/7/1991,1/8/1991,1/8/1991,3/8/1991))
as.numeric(diff(strptime(foodate,%d/%m/%Y))/365.25)
[1] 7333.05950.  473.1006

foodate-factor(c(1/7/1991,1/8/1991,2/8/1991,3/8/1991))
  as.numeric(diff(strptime(foodate,%d/%m/%Y))/365.25)
[1] 0.084873374 0.002737851 0.002737851

Beats me.

Jim

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Problem with diff(strptime(...

2008-03-20 Thread Prof Brian Ripley
You are throwing away the clue in your use of as.numeric.

First. strptime returns a POSIXlt value, which you will convert to POSIXct 
when you do arithetic (using diff()).  Why are you doing that?  So

 foodate-factor(c(1/7/1991,1/8/1991,1/8/1991,3/8/1991))
 diff(strptime(foodate,%d/%m/%Y))
Time differences in secs
[1] 2678400   0  172800
attr(,tzone)
[1] 

is correct.  I think you intended

diff(as.Date(foodate,%d/%m/%Y))/365.25

or even add as.numeric() inside diff().



On Thu, 20 Mar 2008, Jim Lemon wrote:

 Hi all,

 I have been chipping away at a problem I encountered in calculating
 rates per year from a moderately large data file (46412 rows). When I
 ran the following command, I got obviously wrong output:

 interval-
  c(NA,as.numeric(diff(
  strptime(mkdf$MEAS_DATE,%d/%m/%Y)))/365.25)

 The values in MEAS_DATE looked like this:

 mkdf$MEAS_DATE[1:10]
  [1] 1/5/1962  1/5/1963  1/5/1964  1/3/1965  1/4/1966  1/4/1967
  1/6/1968
  [8] 25/3/1969 1/4/1971  1/2/1974
 146 Levels: 10/10/1967 1/10/1947 1/10/1965 1/10/1967 1/10/1983 ... 9/1/1992

 To abbreviate three evenings of work, I finally found that values 17170
 and 17171 were the same. If I ran the entire set, or anything over
 1:17170, I would get output like this:

 interval[1:10]
  [1]NA  86340.86  86577.41  71911.29  93673.92  86340.86
  101006.98
  [8]  70255.44 174337.58 245292.81

 If I ran any set of values up to 17170, I would get the correct output:

 interval[1:10]
  [1]NA 0.9993155 1.0020534 0.8323066 1.0841889 0.9993155
  1.1690623
  [8] 0.8131417 2.0177960 2.8390372

 If I changed value 17171 by one day (and added that level), the command
 worked correctly:

 interval[1:10]
  [1]NA 0.9993155 1.0020534 0.8323066 1.0841889 0.9993155
  1.1690623
  [8] 0.8131417 2.0177960 2.8390372

 There have been a few messages about this problem, but apparently no
 solution. The problem can be seen with these examples (I haven't
 included the real data as it is not mine):

 foodate-c(1/7/1991,1/8/1991,1/8/1991,3/8/1991)
 as.numeric(diff(strptime(foodate,%d/%m/%Y))/365.25)
 [1] 7333.05950.  473.1006

 foodate-factor(c(1/7/1991,1/8/1991,1/8/1991,3/8/1991))
 as.numeric(diff(strptime(foodate,%d/%m/%Y))/365.25)
 [1] 7333.05950.  473.1006

 foodate-factor(c(1/7/1991,1/8/1991,2/8/1991,3/8/1991))
  as.numeric(diff(strptime(foodate,%d/%m/%Y))/365.25)
 [1] 0.084873374 0.002737851 0.002737851

 Beats me.

 Jim

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Problem with diff(strptime(...

2008-03-20 Thread Jim Lemon
Prof Brian Ripley wrote:
 You are throwing away the clue in your use of as.numeric.
 
 First. strptime returns a POSIXlt value, which you will convert to 
 POSIXct when you do arithetic (using diff()).  Why are you doing that?  So
 
 foodate-factor(c(1/7/1991,1/8/1991,1/8/1991,3/8/1991))
 diff(strptime(foodate,%d/%m/%Y))
 
 Time differences in secs
 [1] 2678400   0  172800
 attr(,tzone)
 [1] 
 
 is correct.  I think you intended
 
 diff(as.Date(foodate,%d/%m/%Y))/365.25
 
 or even add as.numeric() inside diff().
 
This is true, but I am puzzled as to why I get the correct output except 
when there are two consecutive input values that are the same. The idea 
was to get the number of years between each date in order to calculate a 
  rate per year. If I put the as.numeric inside diff:

diff(as.numeric(strptime(foodate,%d/%m/%Y))/365.25)
Error in Ops.POSIXt(as.numeric(strptime(foodate, %d/%m/%Y)), 365.25) :
   / not defined for POSIXt objects

Jim

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.