from:"Sachin J"

[R] Arrange Data

2007-04-02 Thread Sachin J

Hi,

I have following data set and want to arrange as follows. 

structure(list(C1 = structure(c(1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 
4, 4), .Label = c("B", "C", "D", "E"), class = "factor"), C2 = c(34, 
4, 54, 3, 23, 33, 2, 12, 33, 12, 10, 4)), .Names = c("C1", "C2"
), class = "data.frame", row.names = c("1", "2", "3", "4", "5", 
"6", "7", "8", "9", "10", "11", "12"))

OUTPUT:
B C  E  D
343212
4231210
544


Please let me know how can I can accomplish this in R.

TIA
Sachin



 




[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] prediction interval for new value

2006-09-19 Thread Sachin J

Berton,

  Thanks for your inupt. The 'nist' link you mentioned was one of the reasons 
for my confusion and how it is implemented in R. As for now I am assuming 
predict function with 'prediction' option will provide me tolerance/prediction 
interval. Is this a proper assumption?

  TIA for your help.
  Sachin

Berton Gunter <[EMAIL PROTECTED]> wrote:
  Peter et. al.:
> 
> With those definitions (which are hardly universal), tolerance
> intervals are the same as prediction intervals with k == m == 1, which
> is what R provides.
> 
> 

I don't believe this is the case. See also:

http://www.itl.nist.gov/div898/handbook/prc/section2/prc263.htm

This **is** fairly standard, I believe. For example, see the venerable
classic text (INTRO TO MATH STAT) by Hogg and Craig.

To be clear, since I may also be misinterpreting, what I understand/mean is:

Peter's definition of a "tolerance/prediction interval" is a random interval
that with a prespecified confidence contain a future predicted value;

The definition I understand to be a random interval that with a prespecified
confidence will contain a prespecfied proportion of the distribution of
future values. ..e.g. a "95%/90%" tolerance interval will with 95%
confidence contain 90% of future values (and one may well ask, "which
90%"?).

Whether this is a useful idea is another issue: the parametric version is
extremely sensitive (as one might imagine) to the assumption of exact
normality; the nonparametric version relies on order statistics and is more
robust. I believe it is nontrivial and perhaps ambiguous to extend the
concept from the usual fixed distribution to the linear regression case. I
seem to recall some papers on this, perhaps in JASA, in the past few years.

As always, I welcome correction of any errors or misunderstandings herein.

Cheers to all,

Bert Gunter

-

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] prediction interval for new value

2006-09-18 Thread Sachin J

Google search gave me this: 

  http://ewr.cee.vt.edu/environmental/teach/smprimer/intervals/interval.html

  TIA
  Sachin

Peter Dalgaard <[EMAIL PROTECTED]> wrote:
  Sachin J writes:

> RUsers: 
> 
> Just confirming, does predict function with interval="prediction"
> option gives prediction interval or tolerance interval?. Sorry for
> reposting this question.

Is there any definition of tolerance interval that is different from
prediction interval?

(Tolerance intervals in the medical sense means intervals that are
designed to detect patients with abnormal levels of serum cholesterol
(say).)

-- 
O__  Peter Dalgaard Øster Farimagsgade 5, Entr.B
c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
(*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918
~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907

-

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] prediction interval for new value

2006-09-18 Thread Sachin J

RUsers: 

  Just confirming, does predict function with interval="prediction" option 
gives prediction interval or tolerance interval?. Sorry for reposting this 
question. 

  Thanks in advance

  Sachin

David Barron <[EMAIL PROTECTED]> wrote:
  Sorry, I think I may have mislead you; the documentation describes these 
rather ambiguously as "prediction (tolerance) intervals", but having done some 
comparisons with other software I believe they are what most of us call 
prediction intervals after all! 

  On 15/09/06, Sachin J <[EMAIL PROTECTED]> wrote:  If its true then how do 
I find prediction interval.?

  Thanx in advance.
  Sachin

David Barron < [EMAIL PROTECTED]> wrote:
  I believe it is a tolerance interval

  On 15/09/06, Sachin J <[EMAIL PROTECTED]> wrote:   David,

  Thanks for the quick reply. 
  Just confirming, does predict(s.lm,data.frame(x=3),interval="prediction") 
gives prediction interval or tolerance interval? 

  Thanks
  Sachin

David Barron <[EMAIL PROTECTED] > wrote:
  > predict(s.lm,data.frame(x=3),interval="prediction")
  fit  lwr  upr 
[1,] 16073985 -9981352 42129323
> predict(s.lm,data.frame(x=3),interval="confidence")
  fit lwr  upr 
[1,] 16073985 5978125 26169846

  On 15/09/06, Sachin J <[EMAIL PROTECTED]> wrote:   Hi,

  1. How do I construct 95% prediction interval for new x values, for example - 
x = 3?
  2. How do I construct 95% confidence interval?

  my dataframe is as follows :

  >dt

  structure(list(y = c(2610, 
6050, 1620, 3070, 7010, 5770, 4670, 860,
1000, 6180, 3020, 5220, 7190, 5500, 1270
), x = c(108000, 136000, 35000,
77000, 178000, 15, 126000, 24000, 28000, 214000, 108000, 
19, 308000, 252000, 71000)), .Names = c("y",
"x"), class = "data.frame", row.names = c("1",
"2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", 
"14", "15"))

  my regression eqn is as below:

  > s.lm <- lm(y ~ x)

  Thanks in advance.

-

[[alternative HTML version deleted]] 

__ 
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help 
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html 
and provide commented, minimal, self-contained, reproducible code. 

-- 
=
David Barron
Said Business School
University of Oxford 
Park End Street
Oxford OX1 1HP 

-

-- 
=
David Barron
Said Business School
University of Oxford
Park End Street
Oxford OX1 1HP 

-

-- 
=
David Barron
Said Business School
University of Oxford
Park End Street
Oxford OX1 1HP 

-

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] prediction interval for new value

2006-09-15 Thread Sachin J

David,

  Thanks for the quick reply. 
  Just confirming, does predict(s.lm,data.frame(x=3),interval="prediction") 
gives prediction interval or tolerance interval?

  Thanks
  Sachin

David Barron <[EMAIL PROTECTED]> wrote:
  > predict(s.lm,data.frame(x=3),interval="prediction")
  fit  lwr  upr
[1,] 16073985 -9981352 42129323
> predict(s.lm,data.frame(x=3),interval="confidence")
  fit lwr  upr 
[1,] 16073985 5978125 26169846

  On 15/09/06, Sachin J <[EMAIL PROTECTED]> wrote:  Hi,

  1. How do I construct 95% prediction interval for new x values, for example - 
x = 3?
  2. How do I construct 95% confidence interval?

  my dataframe is as follows :

  >dt

  structure(list(y = c(2610, 
6050, 1620, 3070, 7010, 5770, 4670, 860,
1000, 6180, 3020, 5220, 7190, 5500, 1270
), x = c(108000, 136000, 35000,
77000, 178000, 15, 126000, 24000, 28000, 214000, 108000, 
19, 308000, 252000, 71000)), .Names = c("y",
"x"), class = "data.frame", row.names = c("1",
"2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", 
"14", "15"))

  my regression eqn is as below:

  > s.lm <- lm(y ~ x)

  Thanks in advance.

-

[[alternative HTML version deleted]] 

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help 
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

-- 
=
David Barron
Said Business School
University of Oxford
Park End Street
Oxford OX1 1HP 

-

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] prediction interval for new value

2006-09-15 Thread Sachin J

Hi,
   
  1. How do I construct 95% prediction interval for new x values, for example - 
x = 3? 
  2. How do I construct 95% confidence interval?
   
  my dataframe is as follows :
   
  >dt
   
  structure(list(y = c(2610, 
6050, 1620, 3070, 7010, 5770, 4670, 860, 
1000, 6180, 3020, 5220, 7190, 5500, 1270
), x = c(108000, 136000, 35000, 
77000, 178000, 15, 126000, 24000, 28000, 214000, 108000, 
19, 308000, 252000, 71000)), .Names = c("y", 
"x"), class = "data.frame", row.names = c("1", 
"2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", 
"14", "15"))

  my regression eqn is as below:
   
  > s.lm <- lm(y ~ x)
   
  Thanks in advance. 


-

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Quickie : unload library

2006-08-25 Thread Sachin J

try detach("package:zoo")
   
  Sachin

Horace Tso <[EMAIL PROTECTED]> wrote:
  Sachin,

I did try that, ex

detach(zoo)

Error in detach(zoo) : invalid name

detach("zoo")

Error in detach("zoo") : invalid name

But zoo has been loaded,

sessionInfo()
Version 2.3.1 (2006-06-01) 
i386-pc-mingw32 

attached base packages:
[1] "methods" "datasets" "stats" "tcltk" "utils" 
"graphics" 
[7] "grDevices" "base" 

other attached packages:
tseries quadprog zoo MASS Rpad 
"0.10-1" "1.4-8" "1.2-0" "7.2-27.1" "1.1.1" 

Thks,

H.


>>> Sachin J 8/25/2006 12:56 PM >>>
see ?detach 


Horace Tso wrote:
Dear list,

I know it must be obvious and I did my homework. (In fact I've
RSiteSearched with keyword "remove AND library" but got timed
out.(why?))

How do I unload a library? I don't mean getting ride of it permanently
but just to unload it for the time being.

A related problem : I have some libraries loaded at startup in
.First()
which I have in .Rprofile. Now, I exited R and commented out the lines
in .First(). Next time I launch R the same libraries are loaded again.
I.e. there seems to be a memory of the old .First() somewhere which
refuses to die.

Thanks in adv.

Horace

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help 
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html 
and provide commented, minimal, self-contained, reproducible code.



-

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help 
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html 
and provide commented, minimal, self-contained, reproducible code.



-

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Quickie : unload library

2006-08-25 Thread Sachin J

see ?detach 
  

Horace Tso <[EMAIL PROTECTED]> wrote:
  Dear list,

I know it must be obvious and I did my homework. (In fact I've
RSiteSearched with keyword "remove AND library" but got timed
out.(why?))

How do I unload a library? I don't mean getting ride of it permanently
but just to unload it for the time being.

A related problem : I have some libraries loaded at startup in .First()
which I have in .Rprofile. Now, I exited R and commented out the lines
in .First(). Next time I launch R the same libraries are loaded again.
I.e. there seems to be a memory of the old .First() somewhere which
refuses to die.

Thanks in adv.

Horace

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



-

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Dataframe modification

2006-08-21 Thread Sachin J

Hi Gabor,
   
  Thanx for the help. I forgot to mention this. Column A is something like this
   
  A <- c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 1, 2, 3, 4, 5, 6, 
7,8,9,10,11,12)
   
  i.e it repeats. Rest all is same. How can I modify your solution to take care 
of this issue.
   
  Thanx in advance.
   
  Sachin


Gabor Grothendieck <[EMAIL PROTECTED]> wrote:
  Here are two solutions:


A <- 1:8
B <- c(1,2,4,7,8)
C <- c(5,3,10,12,17)

# solution 1 - assignment with subscripting
DF <- data.frame(A, B = A, C = 0)
DF[A %in% B, "C"] <- C

# solution 2 - merge
DF <- with(merge(data.frame(A), data.frame(B, C), by = 1, all = TRUE),
data.frame(A, B = A, C = ifelse(is.na(C), 0, C)))


On 8/21/06, Sachin J wrote:
> Hi,
>
> How can I accomplish this in R.
>
> I have a Dataframe with 3 columns. Column B and C have same elements. But 
> column A has more elements than B and C. I want to compare Column A with B 
> and do the following:
>
> If A is not in B then insert a new row in B and C and fill these new rows with
> B = A and C = 0. So finally I will have balanced dataframe with equal no of 
> rows (entries) in all the columns.
>
> For example:
>
> A[3] = 3 but is not in B. So insert new row and set B[3] = 3 (new row) and 
> C[3] = 0. Final result would look like:
>
> A B C
> 1 1 5
> 2 2 3
> 3 3 0
> 4 4 10
> 5 5 0
> 6 6 0
> 7 7 12
> 8 8 17
>
> These are the columns of DF
> > a <- c(1,2,3,4,5,6,7,8)
> > b <- c(1,2,4,7,8)
> > c(5,3,10,12,17)
>
> Thanx in advance for the help.
>
> Sachin
>
> __
>
>
>
> [[alternative HTML version deleted]]
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


 __



[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Dataframe modification

2006-08-21 Thread Sachin J

Hi,
   
  How can I accomplish this in R.
   
  I have a Dataframe with 3 columns. Column B and C have same elements. But 
column A has more elements than B and C. I want to compare Column A with B and 
do the following:
   
  If A is not in B then insert a new row in B and C and fill these new rows 
with 
  B = A and C = 0. So finally I will have balanced dataframe with equal no of 
rows (entries) in all the columns. 
   
  For example:
   
  A[3] = 3 but is not in B. So insert new row  and set B[3]  = 3 (new row) and 
C[3] = 0. Final result would look like:
   
  A   B   C
  115
  223 
  330
  44   10
  550
  660
  7712
  8817
   
  These are the columns of DF
  > a <- c(1,2,3,4,5,6,7,8)
  > b <- c(1,2,4,7,8)
  > c(5,3,10,12,17)
   
  Thanx in advance for the help.
   
  Sachin

 __



[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] dataframe of unequal rows

2006-08-18 Thread Sachin J

Bert,

  I tried readLines. It reads the data as is, but cant access individual 
columns. Still cant figure out how to accomplish this. An example would be of 
great help.

  PS: How do you indicate which fields are present in a record with less than 
the
full number? - Via known delimiters for all fields. 

TIA
  Sachin

Berton Gunter <[EMAIL PROTECTED]> wrote:
  How do you indicate which fields are present in a record with less than the
full number? Via known delimiters for all fields? Via the order of values
(fields are filled in order and only the last fields in a record can
therefore be missing)?

If the former, see the "sep" parameter in read.table() and friends.
If the latter, one way is to open the file as a connection and use
readLines()(you would check how many values were present and fill in the
NA's as needed).There may be better ways, though. ?connections will get you
started.

-- Bert Gunter
Genentech Non-Clinical Statistics
South San Francisco, CA

"The business of the statistician is to catalyze the scientific learning
process." - George E. P. Box

> -Original Message-
> From: [EMAIL PROTECTED] 
> [mailto:[EMAIL PROTECTED] On Behalf Of Sachin J
> Sent: Friday, August 18, 2006 9:14 AM
> To: R-help@stat.math.ethz.ch
> Subject: [R] dataframe of unequal rows
> 
> Hi,
> 
> How can I read data of unequal number of observations 
> (rows) as is (i.e. without introducing NA for columns of less 
> observations than the maximum. Example:
> 
> A B C D
> 1 10 1 12
> 2 10 3 12
> 3 10 4 12
> 4 10 
> 5 10 
> 
> Thanks in advance.
> 
> Sachin
> 
> 
> 
> 
> -
> 
> [[alternative HTML version deleted]]
> 
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 

-

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] dataframe of unequal rows

2006-08-18 Thread Sachin J

Hi,
   
  How can I read data of unequal number of observations (rows) as is (i.e. 
without introducing NA for columns of less observations than the maximum. 
Example:
   
  AB   C   D
  110  1   12
  210  3   12
  310  4   12
  410  
  510  
   
  Thanks in advance.
   
  Sachin
   
   


-

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Insert rows - how can I accomplish this in R

2006-08-18 Thread Sachin J

Gabor,

  Thanks a lot for the help. The 1st method works fine. In 2nd method I am 
getting following error.

  > do.call(rbind, by(DF, cumsum(DF$A == 1), f))
Error in zoo(, time(as.ts(z)), z, fill = 0) : 
unused argument(s) (fill ...)

  Unable to figure out the cause. 

  Thanks,
  Sachin

Gabor Grothendieck <[EMAIL PROTECTED]> wrote:
  Here are two solutions. In both we break up DF into rows
which start with 1.

In solution #1 we create a new data frame with the required sequence
for A and zeros for B and then we fill it in.

In solution #2 we convert each set of rows to a zoo object z
where column A is the times and B is the data. We convert
that zoo object to a ts object (which has the effect of
filling in the missing times) and then create a zoo object
with no data from its times merging that zoo object with z
using a fill of 0.

Finally in both solutions we reconstruct the rows from that by
rbind'ing everything together.

# 1
f <- function(x) {
DF <- data.frame(A = 1:max(x$A), B = 0)
DF[x$A,"B"] <- x$B
DF
}
do.call(rbind, by(DF, cumsum(DF$A == 1), f))

# 2
library(zoo)
f <- function(x) {
z <- zoo(x$B, x$A)
ser <- merge(zoo(,time(as.ts(z)), z, fill = 0)
data.frame(A = time(ser), B = coredata(ser))
}
do.call(rbind, by(DF, cumsum(DF$A == 1), f)

On 8/18/06, Sachin J wrote:
> Hi,
>
> I have following dataframe. Column A indicates months.
>
> DF <- structure(list(A = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 1,
> 2, 3, 4, 5, 7, 8, 11, 12, 1, 2, 3, 4, 5, 8), B = c(0, 0, 0, 8,
> 0, 19, 5, 19, 0, 0, 0, 11, 0, 8, 5, 11, 19, 8, 11, 10, 0, 8,
> 36, 10, 16, 10, 22)), .Names = c("A", "B"), class = "data.frame", row.names = 
> c("1",
> "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13",
> "14", "15", "16", "17", "18", "19", "20", "21", "22", "23", "24",
> "25", "26", "27"))
>
> There is some discontinuity in the data. For example month 6, 9,10 data (2nd 
> year) and month 6 data (3rd year) are absent. I want to insert the rows in 
> place of these missing months and set the corresponding B column to zero. 
> i.e., the result should look like:
>
> DFNEW <- structure(list(A = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 1,
> 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 1, 2, 3, 4, 5, 6, 7, 8),
> B = c(0, 0, 0, 8, 0, 19, 5, 19, 0, 0, 0, 11, 0, 8, 5, 11,
> 19, 0, 8, 11, 0, 0, 10, 0, 8, 36, 10, 16, 10, 0, 0, 22)), .Names = c("A",
> "B"), class = "data.frame", row.names = c("1", "2", "3", "4",
> "5", "6", "7", "8", "9", "10", "11", "12", "13", "14", "15",
> "16", "17", "18", "19", "20", "21", "22", "23", "24", "25", "26",
> "27", "28", "29", "30", "31", "32"))
>
> Thanks in advance.
>
> Sachin
>
>
> -
>
>
> [[alternative HTML version deleted]]
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Insert rows - how can I accomplish this in R

2006-08-18 Thread Sachin J

Hi,
   
  I have following dataframe. Column A indicates months. 
   
  DF <- structure(list(A = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 1, 
2, 3, 4, 5, 7, 8, 11, 12, 1, 2, 3, 4, 5, 8), B = c(0, 0, 0, 8, 
0, 19, 5, 19, 0, 0, 0, 11, 0, 8, 5, 11, 19, 8, 11, 10, 0, 8, 
36, 10, 16, 10, 22)), .Names = c("A", "B"), class = "data.frame", row.names = 
c("1", 
"2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", 
"14", "15", "16", "17", "18", "19", "20", "21", "22", "23", "24", 
"25", "26", "27"))
   
  There is some discontinuity in the data. For example month 6, 9,10 data (2nd 
year) and month 6 data (3rd year) are absent. I want to insert the rows in 
place of these missing months and set the corresponding B column to zero. i.e., 
the result should look like:
   
  DFNEW <- structure(list(A = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 1, 
2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 1, 2, 3, 4, 5, 6, 7, 8), 
B = c(0, 0, 0, 8, 0, 19, 5, 19, 0, 0, 0, 11, 0, 8, 5, 11, 
19, 0, 8, 11, 0, 0, 10, 0, 8, 36, 10, 16, 10, 0, 0, 22)), .Names = c("A", 
"B"), class = "data.frame", row.names = c("1", "2", "3", "4", 
"5", "6", "7", "8", "9", "10", "11", "12", "13", "14", "15", 
"16", "17", "18", "19", "20", "21", "22", "23", "24", "25", "26", 
"27", "28", "29", "30", "31", "32"))
   
   Thanks in advance.
   
  Sachin


-


[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] arima() function - issues

2006-07-26 Thread Sachin J

Hi,
   
  My query is related to ARIMA function in stats package.
While looking for the time series literature I found following link which 
highlights discrepancy in "arima" function while dealing with 
differenced time series. Is there a substitute function similar to 
"sarima" mentioned in the following website implemened in R? Any pointers would 
be of great help. 
   
  http://lib.stat.cmu.edu/general/stoffer/tsa2/Rissues.htm
   
  Thanx in advance.
  Sachin
   


-

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] AICc vs AIC for model selection

2006-07-17 Thread Sachin J

Hi Spencer,

  I did go through the previous postings in the mailing list. But couldn't find 
satisfactory answer to my question. I am dealing with univariate time series. I 
suspect that my  data may contain some trend and seasonal components. Hence, 
rather than just fitting just AR(1) model, I am trying to find the right model 
which fits the data well and then use that model to forecast. In order to 
achieve this I am using best.arima model. If you have any other thoughts on 
this please let me know. 

  Thanx in advance for your help.

  Regards
  Sachin

Spencer Graves <[EMAIL PROTECTED]> wrote:
  Regarding AIC.c, have you tried RSiteSearch("AICc") and 
RSiteSearch("AIC.c")? This produced several comments that looked to me 
like they might help answer your question. Beyond that, I've never 
heard of the "forecast" package, and I got zero hits for 
RSiteSearch("best.arima"), so I can't comment directly on your question.

Do you have only one series or multiple? If you have only one, I 
think it would be hard to justify more than a simple AR(1) model. 
Almost anything else would likely be overfitting.

If you have multiple series, have you considered using 'lme' in the 
'nlme' package? Are you familiar with Pinheiro and Bates (2000) 
Mixed-Effects Models in S and S-Plus (Springer)? If not, I encourage 
you to spend some quality time with this book. My study of it has been 
amply rewarded, and I believe yours will likely also.

Best Wishes,
Spencer Graves

Sachin J wrote:
> Hi,
> 
> I am using 'best.arima' function from forecast package 
to obtain point forecast for a time series data set. The
documentation says it utilizes AIC value to select best ARIMA
model. But in my case the sample size very small - 26
observations (demand data). Is it the right to use AIC value for
model selection in this case. Should I use AICc instead of AIC.
If so how can I modify best.arima function to change the selection
creteria? Any pointers would be of great help.
> 
> Thanx in advance.
> 
> Sachin
> 
> 
> 
> 
> -
> 
> [[alternative HTML version deleted]]
> 
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

-

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] Recreate new dataframe based on condition

2006-07-14 Thread Sachin J

Hi,
   
  How can I achieve this in R. Dataset is as follows:
   
  >df
x
1 2
2 4
3 1
4 3
5 3
6 2

  structure(list(x = c(2, 4, 1, 3, 3, 2)), .Names = "x", row.names = c("1", 
"2", "3", "4", "5", "6"), class = "data.frame")

  I want to recreate a new data frame whose rows are sum of (1&2, 3&4, 5&6) of 
original df. For example
   
  >newdf
 x 
  1 6
  2 4
  3 5
   
  Thanx in advance for the help.
   
  Sachin
   


-

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] AICc vs AIC for model selection

2006-07-13 Thread Sachin J

Hi,
   
  I am using 'best.arima' function from forecast package to obtain point 
forecast for a time series data set. The documentation says it utilizes AIC 
value to select best ARIMA model. But in my case the sample size very small - 
26 observations (demand data). Is it the right to use AIC value for model 
selection in this case. Should I use AICc instead of AIC. If so how can I 
modify best.arima function to change the selection creteria? Any pointers would 
be of great help.
   
  Thanx in advance.
   
  Sachin


-


[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] AICc vs AIC for model selection

2006-07-12 Thread Sachin J

Hi,
   
  I am using 'best.arima' function from forecast package to obtain point 
forecast for a time series data set. The documentation says it utilizes AIC 
value to select best ARIMA model. But in my case the sample size very small - 
26 observations (demand data). Is it the right to use AIC value for model 
selection in this case. Should I use AICc instead of AIC. If so how can I 
modify best.arima function to change the selection creteria? Any pointers would 
be of great help.
   
  Thanx in advance.
   
  Sachin
   
   


-

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] KPSS test

2006-07-06 Thread Sachin J

Hi Mark,

  Thanx for the help. I will verify my results with PP and DF test. Also as 
suggested I will take a look at the references pointed out. One small doubt: 
How do I decide what terms ( trend, constant, seasonality ) to include while 
using these stationarity tests. Any references would be of great help. 

  Thanx,
  Sachin

[EMAIL PROTECTED] wrote:
  >From: 
>Date: Thu Jul 06 14:17:25 CDT 2006
>To: Sachin J 
>Subject: Re: [R] KPSS test

sachin : i think your interpretations are right given the data
but kpss is quite a different test than the usual tests
because it assumes that the null is stationarity while dickey fuller ( DF ) and 
phillips perron ( PP ) ) assume that the null is a unit root. therefore, you 
should check whetheer
the conclusions you get from kpss are consistent with what you would get from 
DF or PP. the results often are not consistent.

also, DF depends on what terms ( trend, constant ) 
you used in your estimation of the model. i'm not sure if kpss 
does also. people generally report Dickey fuller results but they
are a little biased towards acepting unit root ( lower
power ) so maybe that's why
you are using KPSS ? Eric Zivot has a nice explanation
of a lot of the of the stationarity tests in his S+Finmetrics 
book.

testing for cyclical variation is pretty complex because
that's basically the same as testing for seasonality.
check ord's or ender's book for relatively simple ways of doing that.

>
>>From: Sachin J 
>>Date: Thu Jul 06 14:17:25 CDT 2006
>>To: R-help@stat.math.ethz.ch
>>Subject: [R] KPSS test
>
>>Hi,
>> 
>> Am I interpreting the results properly? Are my conclusions correct?
>> 
>> > KPSS.test(df)
>>  
>> KPSS test
>>  
>> Null hypotheses: Level stationarity and stationarity around a linear trend.
>> Alternative hypothesis: Unit root.
>>
>> Statistic for the null hypothesis of 
>> level stationarity: 1.089 
>> Critical values:
>> 0.10 0.05 0.025 0.01
>> 0.347 0.463 0.574 0.739
>>
>> Statistic for the null hypothesis of 
>> trend stationarity: 0.13 
>> Critical values:
>> 0.10 0.05 0.025 0.01
>> 0.119 0.146 0.176 0.216
>>
>> Lag truncation parameter: 1 
>> 
>>CONCLUSION: Reject Ho at 0.05 sig level - Level Stationary
>> Fail to reject Ho at 0.05 sig level - Trend Stationary 
>> 
>>> kpss.test(df,null = c("Trend"))
>> KPSS Test for Trend Stationarity
>> data: tsdata[, 6] 
>>KPSS Trend = 0.1298, Truncation lag parameter = 1, p-value = 0.07999
>> 
>> CONCLUSION: Fail to reject Ho - Trend Stationary as p-value < sig. level 
>> (0.05)
>> 
>>> kpss.test(df,null = c("Level"))
>> KPSS Test for Level Stationarity
>> data: tsdata[, 6] 
>>KPSS Level = 1.0891, Truncation lag parameter = 1, p-value = 0.01
>> Warning message:
>>p-value smaller than printed p-value in: kpss.test(tsdata[, 6], null = 
>>c("Level")) 
>> 
>> CONCLUSION: Reject Ho - Level Stationary as p-value > sig. level (0.05)
>> 
>> Following is my data set
>> 
>> structure(c(11.08, 7.08, 7.08, 6.08, 6.08, 6.08, 23.08, 32.08, 
>>8.08, 11.08, 6.08, 13.08, 13.83, 16.83, 19.83, 8.83, 20.83, 17.83, 
>>9.83, 20.83, 10.83, 12.83, 15.83, 11.83), .Tsp = c(2004, 2005.917, 
>>12), class = "ts")
>>
>> Also how do I test this time series for cyclical varitions? 
>> 
>> Thanks in advance.
>> 
>> Sachin
>>
>> 
>>-
>>
>> [[alternative HTML version deleted]]
>>
>>__
>>R-help@stat.math.ethz.ch mailing list
>>https://stat.ethz.ch/mailman/listinfo/r-help
>>PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

-

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] Access values in kpssstat-class

2006-07-06 Thread Sachin J

Hi,
   
  How can I access the Values stored in kpssstat-class given by KPSS.test 
function and store it in a variable. 
   
  For example:
   
  >x <- rnorm(1000)
  >test  <- KPSS.test(ts(x))
  >test
 
  KPSS test
   
Null hypotheses: Level stationarity and stationarity around a linear trend.
  Alternative hypothesis: Unit root.

  Statistic for the null hypothesis of 
   level stationarity: 0.138 
  Critical values:
0.10  0.05 0.025  0.01
 0.347 0.463 0.574 0.739

  Statistic for the null hypothesis of 
   trend stationarity: 0.038 
  Critical values:
0.10  0.05 0.025  0.01
 0.119 0.146 0.176 0.216

  Lag truncation parameter: 7 

  then store the test stat values in some variable say - result
   
  Thanx in advance.
   
   


-


[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] KPSS test

2006-07-06 Thread Sachin J

Hi,
   
  Am I interpreting the results properly? Are my conclusions correct?
   
  > KPSS.test(df)
 
  KPSS test
   
Null hypotheses: Level stationarity and stationarity around a linear trend.
  Alternative hypothesis: Unit root.

  Statistic for the null hypothesis of 
   level stationarity: 1.089 
  Critical values:
0.10  0.05 0.025  0.01
 0.347 0.463 0.574 0.739

  Statistic for the null hypothesis of 
   trend stationarity: 0.13 
  Critical values:
0.10  0.05 0.025  0.01
 0.119 0.146 0.176 0.216

  Lag truncation parameter: 1 
  
CONCLUSION: Reject Ho at 0.05 sig level - Level Stationary
 Fail to reject Ho at 0.05 sig level - Trend Stationary 
  
> kpss.test(df,null = c("Trend"))
  KPSS Test for Trend Stationarity
  data:  tsdata[, 6] 
KPSS Trend = 0.1298, Truncation lag parameter = 1, p-value = 0.07999
   
  CONCLUSION: Fail to reject Ho - Trend Stationary as p-value < sig. level 
(0.05)
  
> kpss.test(df,null = c("Level"))
  KPSS Test for Level Stationarity
  data:  tsdata[, 6] 
KPSS Level = 1.0891, Truncation lag parameter = 1, p-value = 0.01
  Warning message:
p-value smaller than printed p-value in: kpss.test(tsdata[, 6], null = 
c("Level")) 
   
  CONCLUSION: Reject Ho - Level Stationary as p-value > sig. level (0.05)
   
  Following is my data set
   
  structure(c(11.08, 7.08, 7.08, 6.08, 6.08, 6.08, 23.08, 32.08, 
8.08, 11.08, 6.08, 13.08, 13.83, 16.83, 19.83, 8.83, 20.83, 17.83, 
9.83, 20.83, 10.83, 12.83, 15.83, 11.83), .Tsp = c(2004, 2005.917, 
12), class = "ts")

  Also how do I test this time series for cyclical varitions? 
   
  Thanks in advance.
   
  Sachin


-

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] KPSS test

2006-07-06 Thread Sachin J

Hi,
   
  Am I interpreting the results properly? Are my conclusions correct?
   
  > KPSS.test(df)
 
  KPSS test
   
Null hypotheses: Level stationarity and stationarity around a linear trend.
  Alternative hypothesis: Unit root.

  Statistic for the null hypothesis of 
   level stationarity: 1.089 
  Critical values:
0.10  0.05 0.025  0.01
 0.347 0.463 0.574 0.739

  Statistic for the null hypothesis of 
   trend stationarity: 0.13 
  Critical values:
0.10  0.05 0.025  0.01
 0.119 0.146 0.176 0.216

  Lag truncation parameter: 1 
  
CONCLUSION: Reject Ho at 0.05 sig level - Level Stationary
 Fail to reject Ho at 0.05 sig level - Trend Stationary 
  
> kpss.test(df,null = c("Trend"))
  KPSS Test for Trend Stationarity
  data:  tsdata[, 6] 
KPSS Trend = 0.1298, Truncation lag parameter = 1, p-value = 0.07999
   
  CONCLUSION: Fail to reject Ho - Trend Stationary as p-value < sig. level 
(0.05)
  
> kpss.test(df,null = c("Level"))
  KPSS Test for Level Stationarity
  data:  tsdata[, 6] 
KPSS Level = 1.0891, Truncation lag parameter = 1, p-value = 0.01
  Warning message:
p-value smaller than printed p-value in: kpss.test(tsdata[, 6], null = 
c("Level")) 
   
  CONCLUSION: Reject Ho - Level Stationary as p-value > sig. level (0.05)
   
  Following is my data set
   
  structure(c(11.08, 7.08, 7.08, 6.08, 6.08, 6.08, 23.08, 32.08, 
8.08, 11.08, 6.08, 13.08, 13.83, 16.83, 19.83, 8.83, 20.83, 17.83, 
9.83, 20.83, 10.83, 12.83, 15.83, 11.83), .Tsp = c(2004, 2005.917, 
12), class = "ts")

  Also how do I test this time series for cyclical varitions? 
   
  Thanks in advance.
   
  Sachin


-

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] Run-Sequence Plot

2006-07-05 Thread Sachin J

Hi,
   
  How can get Run-Sequence Plot and Autocorrelation plot (to visually test for 
stationarity of Time Series Data) in R.
   
  Thanks in advance.
   
  Sachin
 
  This is my df
  >df
  structure(list(V1= c(11.08, 7.08, 7.08, 6.08, 6.08, 
6.08, 23.08, 32.08, 8.08, 11.08, 6.08, 13.08, 13.83, 16.83, 19.83, 
8.83, 20.83, 17.83, 9.83, 20.83, 10.83, 12.83, 15.83, 11.83)), .Names = "V1", 
class = "data.frame", row.names = c("1", 
"2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", 
"14", "15", "16", "17", "18", "19", "20", "21", "22", "23", "24"
))


-

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] write.table & csv help

2006-06-26 Thread Sachin J

Hi,
   
  How can I produce the following output in .csv format using write.table 
function.
   
  for(i in seq(1:2))
{
 df <- rnorm(4, mean=0, sd=1)
 write.table(df,"C:/output.csv", append = TRUE, quote = FALSE, sep = ",", 
row.names = FALSE, col.names = TRUE)
}

  Current O/p:
  x0.287816-0.81803-0.15231-0.25849x2.26831 
   0.8631740.2699140.181486
  
Desired output
  x1  x20.287816  2.26831-0.81803  0.863174-0.15231  
0.269914-0.25849  0.181486
   
  Thanx in advance
   
  Sachin


-

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] converting to time series object : ts - package:stats

2006-06-26 Thread Sachin J

Hi Gabor,
   
  You are correct. The real problem is with read.csv. I am not sure why? My 
data looks 
   
  V1,V2,V3
11.08,21.73,13.08
7.08,37.73,6.08
7.08,11.73,21.08
   
  I never had this problem earlier. Anyway I did 
   
  >df <- read.csv("Data.csv")
  >tsdata <-  ts((df),frequency = 12, start = c(1999, 1))

  it works fine. But still puzzled with read.csv behavior. Any thoughts?
   
  Thanx Gabor, Achim  and Brian for your help.

  Sachin

Gabor Grothendieck <[EMAIL PROTECTED]> wrote:
  df[] <- sapply(format(df), as.numeric)

will convert it to numeric but I think the real problem is the read.csv
statement. Do commas represent separators or decimals since
you have specified comma for both? Assuming it looks like:

A,B,C
1,2,3
4,5,6

just do:

DF <- read.csv("Data.csv")
str(DF)




On 6/26/06, Sachin J wrote:
>
>
> It seems I have problem in reading the data as dataframe. It is reading it
> as factors. Here is the df
>
> df <-
> read.csv("C:/Data.csv",header=TRUE,sep=",",na.strings="NA",
> dec=",", strip.white=TRUE)
>
> > dput(df)
>
> > df <- structure(list(V1 = structure(c(2, 15, 15, 14, 14, 14, 12, 13,
> + 16, 2, 14, 5, 6, 8, 10, 17, 11, 9, 18, 11, 1, 4, 7, 3), .Label =
> c("10.83",
> + "11.08", "11.83", "12.83", "13.08", "13.83", "15.83", "16.83",
> + "17.83", "19.83", "20.83", "23.08", "32.08", "6.08", "7.08",
> + "8.08", "8.83", "9.83"), class = "factor"), V2 = structure(c(8,
> + 15, 2, 10, 9, 18, 1, 4, 10, 2, 8, 6, 17, 5, 16, 13, 5, 14, 3,
> + 11, 3, 12, 7, 7), .Label = c("10.73", "11.73", "11.75", "12.73",
> + "15.75", "19.73", "19.75", "21.73", "25.73", "26.73", "26.75",
> + "27.75", "32.75", "33.75", "37.73", "42.75", "61.75", "9.73"), class =
> "factor"),
> + V3 = structure(c(3, 8, 7, 9, 11, 9, 3, 8, 10, 9, 11, 10,
> + 2, 1, 12, 12, 6, 5, 4, 6, 2, 5, 5, 1), .Label = c("10.33",
> + "12.33", "13.08", "13.33", "14.33", "15.33", "21.08", "6.08",
> + "7.08", "8.08", "9.08", "9.33"), class = "factor")), .Names = c("V1",
> + "V2", "V3"), class = "data.frame", row.names = c("1", "2", "3",
> + "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14", "15",
> + "16", "17", "18", "19", "20", "21", "22", "23", "24"))
>
> TIA
>
> Sachin
>
>
>
>
> Gabor Grothendieck wrote:
>
> Sorry I meant issue dput(df) and
> post
>
> df <- ...the output your got from dput(df)...
> ...rest of your code...
>
> Now its reproducible.
>
>
> On 6/26/06, Gabor Grothendieck wrote:
> > We don't have data.csv so its still not ***reproducible*** by anyone
> > else. To be reproducible it means that anyone can copy the code
> > in your post, paste it into R and get the same answer.
> >
> > Suggest you post the output of
> > dput(df)
> >
> > and then post
> > dput <- ...the output you got from dput(df)...
> >
> > Now its reproducible.
> >
> > On 6/26/06, Sachin J wrote:
> > > Hi Achim,
> > >
> > > I did the following:
> > >
> > > >df <- read.csv("C:/data.csv", header=TRUE,sep=",",na.strings="NA",
> dec=",", strip.white=TRUE)
> > >
> > > Note: data.csv has 10 (V1...V10) columns.
> > >
> > > >df[1]
> > > V1
> > > 1 11.08
> > > 2 7.08
> > > 3 7.08
> > > 4 6.08
> > > 5 6.08
> > > 6 6.08
> > > 7 23.08
> > > 8 32.08
> > > 9 8.08
> > > 10 11.08
> > > 11 6.08
> > > 12 13.08
> > > 13 13.83
> > > 14 16.83
> > > 15 19.83
> > > 16 8.83
> > > 17 20.83
> > > 18 17.83
> > > 19 9.83
> > > 20 20.83
> > > 21 10.83
> > > 22 12.83
> > > 23 15.83
> > > 24 11.83
> > >
> > > >tsdata <- ts((df[1]),frequency = 12, start = c(2005, 1))
> > >
> > > The resulting

Re: [R] converting to time series object : ts - package:stats

2006-06-26 Thread Sachin J

 
  It seems I have problem in reading the data as dataframe. It is reading it as 
factors. Here is the df
   
  df <- read.csv("C:/Data.csv",header=TRUE,sep=",",na.strings="NA", dec=",", 
strip.white=TRUE)
   
  > dput(df)
   
  > df <- structure(list(V1 = structure(c(2, 15, 15, 14, 14, 14, 12, 13, 
+ 16, 2, 14, 5, 6, 8, 10, 17, 11, 9, 18, 11, 1, 4, 7, 3), .Label = c("10.83", 
+ "11.08", "11.83", "12.83", "13.08", "13.83", "15.83", "16.83", 
+ "17.83", "19.83", "20.83", "23.08", "32.08", "6.08", "7.08", 
+ "8.08", "8.83", "9.83"), class = "factor"), V2 = structure(c(8, 
+ 15, 2, 10, 9, 18, 1, 4, 10, 2, 8, 6, 17, 5, 16, 13, 5, 14, 3, 
+ 11, 3, 12, 7, 7), .Label = c("10.73", "11.73", "11.75", "12.73", 
+ "15.75", "19.73", "19.75", "21.73", "25.73", "26.73", "26.75", 
+ "27.75", "32.75", "33.75", "37.73", "42.75", "61.75", "9.73"), class = 
"factor"), 
+ V3 = structure(c(3, 8, 7, 9, 11, 9, 3, 8, 10, 9, 11, 10, 
+ 2, 1, 12, 12, 6, 5, 4, 6, 2, 5, 5, 1), .Label = c("10.33", 
+ "12.33", "13.08", "13.33", "14.33", "15.33", "21.08", "6.08", 
+ "7.08", "8.08", "9.08", "9.33"), class = "factor")), .Names = c("V1", 
+ "V2", "V3"), class = "data.frame", row.names = c("1", "2", "3", 
+ "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14", "15", 
+ "16", "17", "18", "19", "20", "21", "22", "23", "24"))
   
  TIA
   
  Sachin



Gabor Grothendieck <[EMAIL PROTECTED]> wrote:
  Sorry I meant issue dput(df) and
post

df <- ...the output your got from dput(df)...
...rest of your code...

Now its reproducible.


On 6/26/06, Gabor Grothendieck wrote:
> We don't have data.csv so its still not ***reproducible*** by anyone
> else. To be reproducible it means that anyone can copy the code
> in your post, paste it into R and get the same answer.
>
> Suggest you post the output of
> dput(df)
>
> and then post
> dput <- ...the output you got from dput(df)...
>
> Now its reproducible.
>
> On 6/26/06, Sachin J wrote:
> > Hi Achim,
> >
> > I did the following:
> >
> > >df <- read.csv("C:/data.csv", header=TRUE,sep=",",na.strings="NA", 
> > >dec=",", strip.white=TRUE)
> >
> > Note: data.csv has 10 (V1...V10) columns.
> >
> > >df[1]
> > V1
> > 1 11.08
> > 2 7.08
> > 3 7.08
> > 4 6.08
> > 5 6.08
> > 6 6.08
> > 7 23.08
> > 8 32.08
> > 9 8.08
> > 10 11.08
> > 11 6.08
> > 12 13.08
> > 13 13.83
> > 14 16.83
> > 15 19.83
> > 16 8.83
> > 17 20.83
> > 18 17.83
> > 19 9.83
> > 20 20.83
> > 21 10.83
> > 22 12.83
> > 23 15.83
> > 24 11.83
> >
> > >tsdata <- ts((df[1]),frequency = 12, start = c(2005, 1))
> >
> > The resulting time series is different from the df. I don't know why? I 
> > think I am doing something silly.
> >
> > TIA
> >
> > Sachin
> >
> >
> > Achim Zeileis wrote:
> > On Mon, 26 Jun 2006, Sachin J wrote:
> >
> > > Hi,
> > >
> > > I am trying to convert a dataset (dataframe) into time series object
> > > using ts function in stats package. My dataset is as follows:
> > >
> > > >df
> > > [1] 11.08 7.08 7.08 6.08 6.08 6.08 23.08 32.08 8.08 11.08 6.08 13.08 
> > > 13.83 16.83 19.83 8.83 20.83 17.83
> > > [19] 9.83 20.83 10.83 12.83 15.83 11.83
> >
> > Please provide a reproducible example. You just showed us the print output
> > for an object, claiming that it is an object of class "data.frame" which
> > is rather unlikely given the print output.
> >
> > > I converted this into time series object as follows
> > >
> > > >tsdata <- ts((df),frequency = 12, start = c(1999, 1))
> >
> > which produces the right result for me if `df' is a vector or a
> > data.frame:
> >
> > df <- c(11.08, 7.08,

Re: [R] converting to time series object : ts - package:stats

2006-06-26 Thread Sachin J

You are right. The df is as follows:

  >df[1]

  V1
111.08
2 7.08
3 7.08
4 6.08
5 6.08
6 6.08
723.08
832.08
9 8.08
10   11.08
116.08
12   13.08
13   13.83
14   16.83
15   19.83
168.83
17   20.83
18   17.83
199.83
20   20.83
21   10.83
22   12.83
23   15.83
24   11.83

  But when I provide df[,1] it prints as earlier in factor form. How do I take 
are of this (factor) issue. 

  TIA

  Sachin

Prof Brian Ripley <[EMAIL PROTECTED]> wrote:
  On Mon, 26 Jun 2006, Sachin J wrote:

> I am trying to convert a dataset (dataframe) into time series object 
> using ts function in stats package. My dataset is as follows:
>
> >df
> [1] 11.08 7.08 7.08 6.08 6.08 6.08 23.08 32.08 8.08 11.08 6.08 13.08 13.83 
> 16.83 19.83 8.83 20.83 17.83
> [19] 9.83 20.83 10.83 12.83 15.83 11.83

No data frame will print like that, so it seems that your description and 
printout do not match.

> I converted this into time series object as follows
>
> >tsdata <- ts((df),frequency = 12, start = c(1999, 1))

>From the help page for ts:

data: a numeric vector or matrix of the observed time-series
values. A data frame will be coerced to a numeric matrix via
'data.matrix'.

I suspect you have a single-column data frame with a factor column.
Look up what data.matrix does for factors.

> The resulting time series is as follows:
>
> Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
> 1999 2 15 15 14 14 14 12 13 16 2 14 5
> 2000 6 8 10 17 11 9 18 11 1 4 7 3
>
> I am unable to understand why the values of df and tsdata does not 
> match. I looked at ts function and I couldn't find any data 
> transformation. Am I missing something here? Any pointers would be of 
> great help.
>
> Thanks in advance.
>
> Sachin

> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

-- 
Brian D. Ripley, [EMAIL PROTECTED]
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595

-

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] converting to time series object : ts - package:stats

2006-06-26 Thread Sachin J

Hi Achim,
   
  I did the following:
   
  >df <- read.csv("C:/data.csv", header=TRUE,sep=",",na.strings="NA", dec=",",  
strip.white=TRUE)
   
  Note: data.csv has 10 (V1...V10) columns. 
  
>df[1]
  V1
  111.08
2 7.08
3 7.08
4 6.08
5 6.08
6 6.08
723.08
832.08
9 8.08
10   11.08
116.08
12   13.08
13   13.83
14   16.83
15   19.83
168.83
17   20.83
18   17.83
199.83
20   20.83
21   10.83
22   12.83
23   15.83
24   11.83

  >tsdata <-  ts((df[1]),frequency = 12, start = c(2005, 1))
   
  The resulting time series is different from the df. I don't know why? I think 
I am doing something silly.
   
  TIA
   
  Sachin


Achim Zeileis <[EMAIL PROTECTED]> wrote:
  On Mon, 26 Jun 2006, Sachin J wrote:

> Hi,
>
> I am trying to convert a dataset (dataframe) into time series object
> using ts function in stats package. My dataset is as follows:
>
> >df
> [1] 11.08 7.08 7.08 6.08 6.08 6.08 23.08 32.08 8.08 11.08 6.08 13.08 13.83 
> 16.83 19.83 8.83 20.83 17.83
> [19] 9.83 20.83 10.83 12.83 15.83 11.83

Please provide a reproducible example. You just showed us the print output
for an object, claiming that it is an object of class "data.frame" which
is rather unlikely given the print output.

> I converted this into time series object as follows
>
> >tsdata <- ts((df),frequency = 12, start = c(1999, 1))

which produces the right result for me if `df' is a vector or a
data.frame:

df <- c(11.08, 7.08, 7.08, 6.08, 6.08, 6.08, 23.08, 32.08, 8.08, 11.08,
6.08, 13.08, 13.83, 16.83, 19.83, 8.83, 20.83, 17.83, 9.83, 20.83,
10.83, 12.83, 15.83, 11.83)
ts(df, frequency = 12, start = c(1999, 1))
ts(as.data.frame(df), frequency = 12, start = c(1999, 1))

> The resulting time series is as follows:
>
> Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
> 1999 2 15 15 14 14 14 12 13 16 2 14 5
> 2000 6 8 10 17 11 9 18 11 1 4 7 3
>
> I am unable to understand why the values of df and tsdata does not match.

So are we because you didn't really tell us enough about df...

Best,
Z

> I looked at ts function and I couldn't find any data transformation. Am
> I missing something here? Any pointers would be of great help.
>
> Thanks in advance.
>
> Sachin
>
>
> -
>
> [[alternative HTML version deleted]]
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>


 __



[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] converting to time series object : ts - package:stats

2006-06-26 Thread Sachin J

Hi,
   
  I am trying to convert a dataset (dataframe) into time series object using ts 
function in stats package. My dataset is as follows:
   
  >df
  [1] 11.08 7.08  7.08  6.08  6.08  6.08  23.08 32.08 8.08  11.08 6.08  13.08 
13.83 16.83 19.83 8.83  20.83 17.83 
  [19] 9.83  20.83 10.83 12.83 15.83 11.83
   
  I converted this into time series object as follows
   
  >tsdata <-  ts((df),frequency = 12, start = c(1999, 1))
   
  The resulting time series is as follows:

   Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
1999   2  15  15  14  14  14  12  13  16   2  14   5
2000   6   8  10  17  11   9  18  11   1   4   7   3

  I am unable to understand why the values of df and tsdata does not match. I 
looked at ts function and I couldn't find any data transformation. Am I missing 
something here? Any pointers would be of great help.
   
  Thanks in advance.
   
  Sachin


-

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] conditional replacement

2006-05-23 Thread Sachin J

Thank you Gabor,Marc,Dimitrios and Sundar.
   
  Sachin

Gabor Grothendieck <[EMAIL PROTECTED]> wrote:
  x <- 10*1:10
pmin(pmax(x, 30), 60) # 30 30 30 40 50 60 60 60 60 60


On 5/23/06, Sachin J wrote:
> Hi
>
> How can do this in R.
>
> >df
>
> 48
> 1
> 35
> 32
> 80
>
> If df < 30 then replace it with 30 and else if df > 60 replace it with 60. I 
> have a large dataset so I cant afford to identify indexes and then replace.
> Desired o/p:
>
> 48
> 30
> 35
> 32
> 60
>
> Thanx in advance.
>
> Sachin
> __
>
>
>
> [[alternative HTML version deleted]]
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>


__



[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] conditional replacement

2006-05-23 Thread Sachin J

Hi 
   
  How can do this in R.
   
  >df 
   
  48
  1  
  35
  32
  80
   
  If df < 30  then replace it with 30 and else if df > 60 replace it with 60. I 
have a large dataset so I cant afford to identify indexes and then replace. 
  Desired o/p:
   
  48
  30
  35
  32
  60
   
  Thanx in advance.
   
Sachin
__



[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] Distribution Identification/Significance testing

2006-05-23 Thread Sachin J

Hi,
   
  What are methods for identifying the right distribution for the dataset? As 
far as I know Fisher test (p > alpha) for stat. significance or min(square 
error) are two criteria for deciding. What are the other alternatives? - 
CONFIDENCE INTERVAL?. If any, how can I accomplish them in R. 
   
  Thanx in advance.
   
  Sachin


-

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] writing 100 files

2006-05-22 Thread Sachin J

Try this:

  x <- 1:12
for (i in 1:2){
bb8 = sample(x)
a <- sprintf("whatever%f.txt",i)
write.table(bb8, quote = F, sep = '\t', row.names = F, col.names = F,
file = a)
}

  HTH

  Sachin

Duncan Murdoch <[EMAIL PROTECTED]> wrote:
  On 5/22/2006 11:24 AM, Federico Calboli wrote:
> Hi All,
> 
> I need to write as text files 1000 ish variation of the same data frame, 
> once I permute a row.
> 
> I would like to use the function write.table() to write the files, and 
> use a loop to do it:
> 
> for (i in 1:1000){
> 
> bb8[2,] = sample(bb8[2,])
> write.table(bb8, quote = F, sep = '\t', row.names = F, col.names = F, 
> file = 'whatever?.txt')
> }
> so all the files are called whatever1: whatever1000
> 
> Any idea?

Use the paste() function to construct the name, e.g.

file = paste("whatever",i,".txt", sep="")

Duncan Murdoch

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

-

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] write.csv + appending output (FILE I/O)

2006-05-18 Thread Sachin J

Hi,
   
  How can I write the output to an excel (csv) file without printing row names 
(i.e without breaks). Here is my code: 
   
  library(
   
  fn <- function()
{
 q <- c(1,2,3)
 write.csv(q,"C:/Temp/op.xls", append = TRUE, row.names = FALSE,quote = FALSE)
}
   
  # Function Call
  for(i in 1:3)
{
  fn()
}
   
  Present Output :
  x123x123x123
   
  Desired output:
  1
  2
  3
  1
  2
  3
  1
  2
  3
   
  Also it displays following warning messages. 
   
  Warning messages:
1: appending column names to file in: write.table(q, "C:/Temp/op.xls",  
2: appending column names to file in: write.table(q, "C:/Temp/op.xls",  
3: appending column names to file in: write.table(q, "C:/Temp/op.xls",  

  I am using R2.2.1 windows version. I tried using write.xls from "marray" 
package but no success. 
   
  Thanx in advance.
   
  Sachin

__



[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] boxplot - labelling

2006-05-05 Thread Sachin J

Hi,
   
  How can I get the values of mean and median (not only points but values too) 
on the boxplot. I am using boxplot function from graphics package. Following is 
my data set
   
  > df 
  [1]  5  1  1  0  0 10 38 47  2  5  0 28  5  8 81 21 12  9  1 12  2  4 22  3
   
  > mean.val <- sapply(df,mean)
> boxplot(df,las = 1,col = "light blue")
> points(seq(df), mean.val, pch = 19)
   
  I could get mean as dot symbol but i need values too? Also how to print the 
x-axis labels vertically instead of horizontally? Is there any other function 
to achieve these?
   
  Thanx in advance.
   
  Sachin

   
   

__



[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] using parnor (lmomco package) - output

2006-05-02 Thread Sachin J

Hi,
   
  I am using parnor function of lmomco package. I believe it provides mean and 
std. dev for the set of data. But the std. dev provided does not match with the 
actual std. dev of the data which is 247.9193 (using sd function).  Am I 
missing something here?
   
  > lmr <- lmom.ub(c(123,34,4,654,37,78))
> parnor(lmr)
$type
[1] "nor"
  $para
[1] 155. 210.2130
  > sd(c(123,34,4,654,37,78))
[1] 247.9193
> mean(c(123,34,4,654,37,78))
[1] 155
> 

  TIA
  Sachin

__



[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] Rcmdr problem - SciViews R

2006-05-02 Thread Sachin J

Hi,
   
  I am getting following error messages while using SciViews R. It displays a 
message saying: Package or Bundle Rcmdr was not found in 
C:\Software\R-22.1.1\Library would you like to install now?.  However the Rcmdr 
package is there in the library. I reinstalled Rcmdr  but still gives me same 
error message every time I try to use one of the GUI functions.  Any pointers 
would be of great help.
   
  ERROR: 
   
  Loading required package: datasets
Loading required package: utils
Loading required package: grDevices
Loading required package: graphics
Loading required package: stats
Loading required package: methods
Loading required package: tcltk
Loading Tcl/Tk interface ... done
Loading required package: R2HTML
Loading required package: svMisc
Loading required package: svIO
Loading required package: svViews
Loading required package: Rcmdr
Loading required package: car
Error in .Tcl.args.objv(...) : argument "default" is missing, with no default
Error: .onLoad failed in 'loadNamespace' for 'Rcmdr'
  trying URL 'http://www.sciviews.org/SciViews-R/Rcmdr_1.1-2.zip'
Content type 'application/zip' length 788628 bytes
opened URL
downloaded 770Kb
  package 'Rcmdr' successfully unpacked and MD5 sums checked
Warning: cannot remove prior installation of package 'Rcmdr'
  The downloaded packages are in
C:\Documents and Settings\Local 
settings\Temp\Rtmp2g5Kpb\downloaded_packages
updating HTML package descriptions
Loading required package: Rcmdr
Error in .Tcl.args.objv(...) : argument "default" is missing, with no default
Error: .onLoad failed in 'loadNamespace' for 'Rcmdr'
Error in .Tcl.args.objv(...) : argument "default" is missing, with no default
Error: .onLoad failed in 'loadNamespace' for 'Rcmdr'
   
  TIA.
  Sachin




-

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Error in rm.outlier method

2006-04-28 Thread Sachin J

Thank you Marc. That was of great help. There was some problem with the 
environment. I closed and reopened the workspace. Works fine now.

  Sachin

"Marc Schwartz (via MN)" <[EMAIL PROTECTED]> wrote:  Sachin,

I don't have a definitive thought, but some possibilities might be a
conflict somewhere in your environment with a local function or with one
in the searchpath.

Use ls() to review the current objects in your environment to see if
something looks suspicious. It did not look like 'outliers' is using a
namespace, so a conflict of some nature is a little more possible here.

Also use searchpaths() to get a feel for where R is searching for the
function. See what is getting searched "above" the outliers package in
the search order, which might provide a clue.

Also, try to start R from the command line using 'R --vanilla', which
should give you a clean working environment. Then use library(outliers)
and your code below to see if the same behavior is present. If so,
perhaps there was a corruption in the package installation. If not, it
would support some type of conflict or perhaps a corruption in your
default working environment.

HTH,

Marc

On Fri, 2006-04-28 at 11:57 -0700, Sachin J wrote:
> Hi Marc:
> 
> I am using rm.outlier() function from outliers package (reference:
> CRAN package help).
> You are right. I too couldn't find this error message in rm.outlier
> function. Thats why I am unable to understand the cause of error. Any
> further thoughts? I will take a look at the robust analytic methods as
> suggested.
> 
> Thanx
> Sachin 
> 
> 
> "Marc Schwartz (via MN)" wrote:
> On Fri, 2006-04-28 at 11:17 -0700, Sachin J wrote:
> > Hi,
> > 
> > I am trying to use rm.outlier method but encountering
> following error:
> > 
> > > y <- rnorm(100)
> > > rm.outlier(y)
> > 
> > Error: 
> > Error in if (nrow(x) != ncol(x)) stop("x must be a square
> matrix") : 
> > argument is of length zero
> > 
> > Whats wrong here?
> > 
> > TIA
> > Sachin
> 
> It would be helpful to know which rm.outlier() function you
> are using
> and from which package it comes.
> 
> The only one that I noted in a search is in the 'outliers'
> CRAN package
> and it can take a vector as the 'x' argument.
> 
> The above square matrix test and resultant error message is
> not in the
> tarball R code for either outlier() or rm.outlier() in that
> package, so
> the source of the error is unclear.
> 
> As an aside, you may wish to consider robust analytic methods
> rather
> than doing post hoc outlier removal. A search of the list
> archives will
> provide some insights here. RSiteSearch("outlier") will get
> you there.
> 
> HTH,
> 
> Marc Schwartz
> 
> 
> 
> 
> 
> 
> __

> save big.

__

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Error in rm.outlier method

2006-04-28 Thread Sachin J

Hi Marc:

  I am using rm.outlier() function from outliers package (reference: CRAN 
package help).
  You are right. I too couldn't find this error message in rm.outlier function. 
Thats why I am unable to understand the cause of error. Any further thoughts? I 
will take a look at the robust analytic methods as suggested.

  Thanx
  Sachin 

"Marc Schwartz (via MN)" <[EMAIL PROTECTED]> wrote:
  On Fri, 2006-04-28 at 11:17 -0700, Sachin J wrote:
> Hi,
> 
> I am trying to use rm.outlier method but encountering following error:
> 
> > y <- rnorm(100)
> > rm.outlier(y)
> 
> Error: 
> Error in if (nrow(x) != ncol(x)) stop("x must be a square matrix") : 
> argument is of length zero
> 
> Whats wrong here?
> 
> TIA
> Sachin

It would be helpful to know which rm.outlier() function you are using
and from which package it comes.

The only one that I noted in a search is in the 'outliers' CRAN package
and it can take a vector as the 'x' argument.

The above square matrix test and resultant error message is not in the
tarball R code for either outlier() or rm.outlier() in that package, so
the source of the error is unclear.

As an aside, you may wish to consider robust analytic methods rather
than doing post hoc outlier removal. A search of the list archives will
provide some insights here. RSiteSearch("outlier") will get you there.

HTH,

Marc Schwartz

-

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] Error in rm.outlier method

2006-04-28 Thread Sachin J

Hi,
   
  I am trying to use rm.outlier method but encountering following error:
   
  > y <- rnorm(100)
  > rm.outlier(y)
   
  Error: 
  Error in if (nrow(x) != ncol(x)) stop("x must be a square matrix") : 
argument is of length zero
   
  Whats wrong here?
   
  TIA
  Sachin
  
 

__



[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] cdf of weibull distribution

2006-04-26 Thread Sachin J

Hi,
   
  I have a data set which is assumed to follow weibull distr'. How can I find 
of cdf for this data. For example, for normal data I used (package - lmomco)
   
  >cdfnor(15,parnor(lmom.ub(c(df$V1 
   
  Also, lmomco package does not have functions for finding cdf for some of the 
distributions like lognormal. Is there any other package, which can handle 
these distributions?
   
  Thanx in advance
  Sachin
   
   


-

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Handling large dataset & dataframe

2006-04-25 Thread Sachin J

Mark:
   
  Thanx for the pointers. As suggested I will explore scan() method. 
   
  Andy:
   
  How can I use colClasses in my case. I tried it unsuccessfully. Encountering 
following error. 
  
coltypes<-
c("numeric","factor","numeric","numeric","numeric","numeric","factor",
"numeric","numeric","factor","factor","numeric","numeric","numeric","n
"numeric","numeric","numeric","numeric")

  mydf <- read.csv("C:/temp/data.csv", header=FALSE, colClasses = 
coltypes, strip.white=TRUE)

 ERROR: Error in scan(file = file, what = what, sep = sep, quote =  quote, dec 
= dec, : scan() expected 'a real', got 'V1'

  Thank again.
   
  Sachin
  
"Liaw, Andy" <[EMAIL PROTECTED]> wrote:
  Much easier to use colClasses in read.table, and in many cases just as fast
(or even faster).

Andy

From: Mark Stephens
> 
> From ?scan: "the *type* of what gives the type of data to be 
> read". So list(integer(), integer(), double(), raw(), ...) In 
> your code all columns are being read as character regardless 
> of the contents of the character vector.
> 
> I have to admit that I have added the *'s in *type*. I have 
> been caught out by this too. Its not the most convenient way 
> to specify the types of a large number of columns either. As 
> you have a lot of columns you might want to do something like 
> this: as.list(rep(integer(1),250)), assuming your dummies 
> are together, to save typing. Also storage.mode() is useful 
> to tell you the precise type (and therefore size) of an 
> object e.g. sapply(coltypes,
> storage.mode) is actually the types scan() will use. Note 
> that 'numeric' could be 'double' or 'integer' which are 
> important in your case to fit inside the 1GB limit, because 
> 'integer' (4 bytes) is half 'double' (8 bytes).
> 
> Perhaps someone on r-devel could enhance the documentation to 
> make "type" stand out in capitals in bold in help(scan)? Or 
> maybe scan could be clever enough to accept a character 
> vector 'what'. Or maybe I'm missing a good reason why this 
> isn't possible - anyone? How about allowing a character 
> vector length one, with each character representing the type 
> of that column e.g. what="DDCD" would mean 4 integers 
> followed by 2 double's followed by a character column, 
> followed finally by a double column, 8 columns in total. 
> Probably someone somewhere has done that already, but I'm not 
> aware anyone has wrapped it up conveniently?
> 
> On 25/04/06, Sachin J wrote:
> >
> > Mark:
> >
> > Here is the information I didn't provide in my earlier 
> post. R version 
> > is R2.2.1 running on Windows XP. My dataset has 16 variables with 
> > following data type.
> > ColNumber: 1 2 3 ...16
> > Datatypes:
> >
> > 
> "numeric","numeric","numeric","numeric","numeric","numeric","character
> > 
> ","numeric","numeric","character","character","numeric","numeric","num
> > eric","numeric","numeric","numeric","numeric"
> >
> > Variable (2) which is numeric and variables denoted as 
> character are 
> > to be treated as dummy variables in the regression.
> >
> > Search in R help list suggested I can use read.csv with colClasses 
> > option also instead of using scan() and then converting it to 
> > dataframe as you suggested. I am trying both these methods 
> but unable 
> > to resolve syntactical error.
> >
> > >coltypes<-
> > 
> c("numeric","factor","numeric","numeric","numeric","numeric","factor",
> > 
> "numeric","numeric","factor","factor","numeric","numeric","numeric","n
> > umeric","numeric","numeric","numeric")
> >
> > >mydf <- read.csv("C:/temp/data.csv", header=FALSE, colClasses = 
> > >coltypes,
> > strip.white=TRUE)
> >
> > ERROR: Error in scan(file = file, what = what, sep = sep, quote = 
> > quote, dec = dec, :
> > scan() expected 'a real', got 'V1'
> >
> > No idea whats the problem.
> >
> > AS PER YOUR SUGGESTION I TRIED scan() as follows:
> >
> >
> > 
> >col

[R] NA in dummy regression coefficients

2006-04-25 Thread Sachin J

I'm running a regression model with dummy variables and getting NA 
for some coefficients. I believe this due to singularity problem. How can I 
exclude some of the dummy variables from the regression model in R to take care 
of this issue. I read in R help that lm() method takes care of this issue 
automatically. But in my case its not happening? Any pointers would be of great 
help.
   
  Regression Model: 
   
  reg06 <- lm(mydf$y~ mydf$x1 + factor(mydf$x2) + factor(mydf$x3)+ 
factor(mydf$x4) +  mydf$x5, singular.ok = TRUE)
   
  Thanx in advance
   
  Sachin


-

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Handling large dataset & dataframe

2006-04-25 Thread Sachin J

Mark:

  Here is the information I didn't provide in my earlier post. R version is 
R2.2.1 running on Windows XP.  My dataset has 16 variables with following data 
type.
  ColNumber:   1  2  3  ...16
  Datatypes:

"numeric","numeric","numeric","numeric","numeric","numeric","character","numeric","numeric","character","character","numeric","numeric","numeric","numeric","numeric","numeric","numeric"

  Variable (2) which is numeric and variables denoted as character are to be 
treated as dummy variables in the regression. 

  Search in R help list  suggested I can use read.csv with colClasses option 
also instead of using scan() and then converting it to dataframe as you 
suggested. I am trying both these methods but unable to resolve syntactical 
error. 

  >coltypes<- 
c("numeric","factor","numeric","numeric","numeric","numeric","factor","numeric","numeric","factor","factor","numeric","numeric","numeric","numeric","numeric","numeric","numeric")

  >mydf <- read.csv("C:/temp/data.csv", header=FALSE, colClasses = coltypes, 
strip.white=TRUE)

  ERROR: Error in scan(file = file, what = what, sep = sep, quote = quote, dec 
= dec,  : 
scan() expected 'a real', got 'V1'

  No idea whats the problem.

  AS PER YOUR SUGGESTION I TRIED scan() as follows:

>coltypes<-c("numeric","factor","numeric","numeric","numeric","numeric","factor","numeric","numeric","factor","factor","numeric","numeric","numeric","numeric","numeric","numeric","numeric")
  >x<-scan(file = 
"C:/temp/data.dbf",what=as.list(coltypes),sep=",",quiet=TRUE,skip=1) 
  >names(x)<-scan(file = "C:/temp/data.dbf",what="",nlines=1, sep=",") 
  >x<-as.data.frame(x) 

  This is working fine but x has no data in it and contains
  > x

   [1] X._.   NA.NA..1  NA..2  NA..3  NA..4  NA..5  NA..6  NA..7  NA..8  
NA..9  NA..10 NA..11
[14] NA..12 NA..13 NA..14 NA..15 NA..16
<0 rows> (or 0-length row.names)

  Please let me know how to properly use scan or colClasses option.

  Sachin

Mark Stephens <[EMAIL PROTECTED]> wrote:
  Sachin,
With your dummies stored as integer, the size of your object would appear
to be 35 * (4*250 + 8*16) bytes = 376MB.
You said "PC" but did not provide R version information, assuming windows
then ...
With 1GB RAM you should be able to load a 376MB object into memory. If you
can store the dummies as 'raw' then object size is only 126MB.
You don't say how you attempted to load the data. Assuming your input data
is in text file (or can be) have you tried scan()? Setup the 'what' argument
with length 266 and make sure the dummy column are set to integer() or
raw(). Then x = scan(...); class(x)=" data.frame".
What is the result of memory.limit()? If it is 256MB or 512MB, then try
starting R with --max-mem-size=800M (I forget the syntax exactly). Leave a
bit of room below 1GB. Once the object is in memory R may need to copy it
once, or a few times. You may need to close all other apps in memory, or
send them to swap.
I don't really see why your data should not fit into the memory you have.
Purchasing an extra 1GB may help. Knowing the object size calculation (as
above) should help you guage whether it is worth it.
Have you used process monitor to see the memory growing as R loads the
data? This can be useful.
If all the above fails, then consider 64-bit and purchasing as much memory
as you can afford. R can use over 64GB RAM+ on 64bit machines. Maybe you can
hire some time on a 64-bit server farm - i heard its quite cheap but never
tried it myself. You shouldn't need to go that far with this data set
though.
Hope this helps,
Mark

Hi Roger,

I want to carry out regression analysis on this dataset. So I believe I
can't read the dataset in chunks. Any other solution?

TIA
Sachin

roger koenker < [EMAIL PROTECTED]> wrote:
You can read chunks of it at a time and store it in sparse matrix
form using the packages SparseM or Matrix, but then you need
to think about what you want to do with it least squares sorts
of things are ok, but other options are somewhat limited...

url: www.econ.uiuc.edu/~roger Roger Koenker
email [EMAIL PROTECTED] Department of Economics
vox: 217-333-4558 University of Illinois
fax: 217-244-

Re: [R] Handling large dataset & dataframe

2006-04-24 Thread Sachin J

Hi Andy:

  I searched through R-archive to find out how to handle large data set using 
readLines and other related R functions. I couldn't find any single post which 
elaborates the process. Can you provide me with an example or any pointers to 
the postings elaborating the process. 

  Thanx in advance
  Sachin

"Liaw, Andy" <[EMAIL PROTECTED]> wrote:
  Instead of reading the entire data in at once, you read a chunk at a time,
and compute X'X and X'y on that chunk, and accumulate (i.e., add) them.
There are examples in "S Programming", taken from independent replies by the
two authors to a post on S-news, if I remember correctly.

Andy

From: Sachin J
> 
> Gabor:
> 
> Can you elaborate more.
> 
> Thanx
> Sachin
> 
> Gabor Grothendieck wrote:
> You just need the much smaller cross product matrix X'X and 
> vector X'Y so you can build those up as you read the data in 
> in chunks.
> 
> 
> On 4/24/06, Sachin J wrote:
> > Hi,
> >
> > I have a dataset consisting of 350,000 rows and 266 columns. Out of 
> > 266 columns 250 are dummy variable columns. I am trying to 
> read this 
> > data set into R dataframe object but unable to do it due to memory 
> > size limitations (object size created is too large to 
> handle in R). Is 
> > there a way to handle such a large dataset in R.
> >
> > My PC has 1GB of RAM, and 55 GB harddisk space running windows XP.
> >
> > Any pointers would be of great help.
> >
> > TIA
> > Sachin
> >
> >
> > -
> >
> > [[alternative HTML version deleted]]
> >
> > __
> > R-help@stat.math.ethz.ch mailing list 
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide! 
> > http://www.R-project.org/posting-guide.html
> >
> 
> 
> 
> -
> 
> [[alternative HTML version deleted]]
> 
> __
> R-help@stat.math.ethz.ch mailing list 
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! 
> http://www.R-project.org/posting-guide.html
> 
> 

--

--

-

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Handling large dataset & dataframe

2006-04-24 Thread Sachin J

Gabor:

  Can you elaborate more.

  Thanx
  Sachin

Gabor Grothendieck <[EMAIL PROTECTED]> wrote:
  You just need the much smaller cross product matrix X'X and vector X'Y so you
can build those up as you read the data in in chunks.

On 4/24/06, Sachin J wrote:
> Hi,
>
> I have a dataset consisting of 350,000 rows and 266 columns. Out of 266 
> columns 250 are dummy variable columns. I am trying to read this data set 
> into R dataframe object but unable to do it due to memory size limitations 
> (object size created is too large to handle in R). Is there a way to handle 
> such a large dataset in R.
>
> My PC has 1GB of RAM, and 55 GB harddisk space running windows XP.
>
> Any pointers would be of great help.
>
> TIA
> Sachin
>
>
> -
>
> [[alternative HTML version deleted]]
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>

-

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Handling large dataset & dataframe

2006-04-24 Thread Sachin J

Hi Richard:

  Even if I dont read the dummy var columns, i.e. just read the original 
dataset with 350,000 rows and 16 columns, when I try to run the regression - 
using

  >lm(y ~ c1 + factor(c2) + factor(c3) ) ; where c2, c3 are dummy variables,

  The procedure fails saying not enough memory. But,

  > lm(y ~ c1 + factor(c2) ) works fine. 

  Any thoughts.

  Thanks
  Sachin

"Richard M. Heiberger" <[EMAIL PROTECTED]> wrote:
  Where is the excess size being identified? Is it the read? or in the lm().

If it is in the reading of the data, then why are you reading the dummy 
variables?
Would it make sense to read a single column of a factor instead of 80 columns
of dummy variables?

-

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Handling large dataset & dataframe

2006-04-24 Thread Sachin J

Hi Roger,

  I want to carry out regression analysis on this dataset. So I believe I can't 
read the dataset in chunks. Any other solution?

  TIA
  Sachin

roger koenker <[EMAIL PROTECTED]> wrote:
  You can read chunks of it at a time and store it in sparse matrix
form using the packages SparseM or Matrix, but then you need
to think about what you want to do with it least squares sorts
of things are ok, but other options are somewhat limited...

url: www.econ.uiuc.edu/~roger Roger Koenker
email [EMAIL PROTECTED] Department of Economics
vox: 217-333-4558 University of Illinois
fax: 217-244-6678 Champaign, IL 61820

On Apr 24, 2006, at 12:41 PM, Sachin J wrote:

> Hi,
>
> I have a dataset consisting of 350,000 rows and 266 columns. Out 
> of 266 columns 250 are dummy variable columns. I am trying to read 
> this data set into R dataframe object but unable to do it due to 
> memory size limitations (object size created is too large to handle 
> in R). Is there a way to handle such a large dataset in R.
>
> My PC has 1GB of RAM, and 55 GB harddisk space running windows XP.
>
> Any pointers would be of great help.
>
> TIA
> Sachin
>
> 
> -
>
> [[alternative HTML version deleted]]
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting- 
> guide.html

-

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] Handling large dataset & dataframe

2006-04-24 Thread Sachin J

Hi,
   
  I have a dataset consisting of 350,000 rows and 266 columns.  Out of 266 
columns 250 are dummy variable columns. I am trying to read this data set into 
R dataframe object but unable to do it due to memory size limitations (object 
size created is too large to handle in R).  Is there a way to handle such a 
large dataset in R. 
   
  My PC has 1GB of RAM, and 55 GB harddisk space running windows XP.
   
  Any pointers would be of great help.
   
  TIA
  Sachin


-

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Creat new column based on condition

2006-04-21 Thread Sachin J

Hi Gabor,

  The first one works fine. Just out of curiosity, in second solution: I dont 
want to create a matrix. I want to add a new column to the existing dataframe 
(i.e. V2 based on the values in V1). Is there a way to do it?

  TIA
  Sachin

Gabor Grothendieck <[EMAIL PROTECTED]> wrote:
  Try:

V1 <- matrix(c(10, 20, 30, 10, 10, 20), nc = 1)

V2 <- 4 * (V1 == 10) + 6 * (V1 == 20) + 10 * (V1 == 30)

or

V2 <- matrix(c(4, 6, 10)[V1/10], nc = 1)

On 4/21/06, Sachin J wrote:
> Hi,
>
> How can I accomplish this task in R?
>
> V1
> 10
> 20
> 30
> 10
> 10
> 20
>
> Create a new column V2 such that:
> If V1 = 10 then V2 = 4
> If V1 = 20 then V2 = 6
> V1 = 30 then V2 = 10
>
> So the O/P looks like this
>
> V1 V2
> 10 4
> 20 6
> 30 10
> 10 4
> 10 4
> 20 6
>
> Thanks in advance.
>
> Sachin
>
> __
>
>
>
> [[alternative HTML version deleted]]
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>

-

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] Creat new column based on condition

2006-04-21 Thread Sachin J

Hi,
   
  How can I accomplish this task in R?
   
V1
10
20
30
10
10
20
 
  Create a new column V2 such that: 
  If V1 = 10 then V2 = 4
  If V1 = 20 then V2 = 6
  V1 =   30 then V2 = 10
   
  So the O/P looks like this
   
V1  V2
10   4
20   6
30  10
10   4
10   4  
20   6
   
  Thanks in advance.
   
  Sachin

__



[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Conditional Row Sum

2006-04-20 Thread Sachin J

Thanx Marc and Gabor for your help.

  Sachin

"Marc Schwartz (via MN)" <[EMAIL PROTECTED]> wrote:
  On Thu, 2006-04-20 at 11:46 -0700, Sachin J wrote:
> Hi,
> 
> How can I accomplish this in R. Example:
> 
> R1 R2
> 3 101
> 4 102
> 3 102
> 18 102
> 11 101
> 
> I want to find Sum(101) = 14 - i.e SUM(R1) where R2 = 101
> Sum(102) = 25 - SUM(R2) where R2 = 102
> 
> TIA
> Sachin

Presuming that your data is in a data frame called DF:

> DF
R1 R2
1 3 101
2 4 102
3 3 102
4 18 102
5 11 101

At least three options:

> with(DF, tapply(R1, R2, sum))
101 102
14 25

> aggregate(DF$R1, list(R2 = DF$R2), sum)
R2 x
1 101 14
2 102 25

> by(DF$R1, DF$R2, sum)
INDICES: 101
[1] 14
--
INDICES: 102
[1] 25

See ?by, ?aggregate and ?tapply and ?with.

HTH,

Marc Schwartz

-

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] Conditional Row Sum

2006-04-20 Thread Sachin J

Hi,
   
  How can I accomplish this in R. Example:
   
  R1  R2
  3 101
  4 102
  3 102
  18102
  11101
   
  I want to find Sum(101) =  14 - i.e SUM(R1) where R2 = 101
  Sum(102) = 25- SUM(R2) where R2 = 102
   
  TIA
  Sachin


-

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Count Unique Rows/Values

2006-04-20 Thread Sachin J

x.unique$V1 gives the list of individual column's unique values. Thank you 
again Andy.
   
  Sachin

"Liaw, Andy" <[EMAIL PROTECTED]> wrote:
  From:Sachin J
> 
> Hi,
> 
> This one is not working for me. It is listing all the rows 
> instead of unique ones. My dataset has 30 odd rows and 
> following is the resulting o/p
> 
> [[308313]]
> [1] 126
> [[308314]]
> [1] 126
> [[308315]]
> [1] 126
> [[308316]]
> [1] 126
> [[308317]]
> [1] 126
> [[308318]]
> [1] 126
> [[308319]]
> [1] 126
> [[308320]]
> [1] 126
> [[308321]]
> [1] 126
> 
> I used following set of commands.
> 
> > (x.unique <- lapply(x$V1, unique))

You want "x" instead of "x$V1" as the first argument to lapply(), so that it
runs unique() on all columns of "x".

Andy


> > sapply(x.unique, length)
> 
> x$V1 is numeric field.
> where x is my data frame already read (therefore i ignored 
> your first step). Am I missing something. ?
> 
> Thanks
> Sachin
> 
> "Liaw, Andy" wrote:
> This might help:
> 
> > x <- read.table("clipboard", colClasses=c("numeric", "character")) 
> > (x.unique <- lapply(x, unique))
> $V1
> [1] 155 138 126 123 103 143 111 156
> 
> $V2
> [1] "A" "B" "C" "D"
> 
> > sapply(x.unique, length)
> V1 V2 
> 8 4 
> 
> Andy
> 
> From: Sachin J
> > 
> > Hi,
> > 
> > I have a dataset which has both numeric and character
> > values with dupllicates. For example:
> > 
> > 155 A
> > 138 A
> > 138 B
> > 126 C
> > 126 D
> > 123 A
> > 103 A
> > 103 B
> > 143 D
> > 111 C
> > 111 D
> > 156 C
> > 
> > How can I count the number of unqiue entries without
> > counting duplicate entries. Also can I extract the list in a 
> > object. What I mean is
> > Col1 unique count = 8 Unique Elements are : 
> > 103,111,123,126,138,143,155,156
> > Col2 unique count = 4 Unique Elements are : A,B,C,D.
> > 
> > Any pointers would be of great help.
> > 
> > TIA
> > Sachin
> > 
> > 
> > 
> > -
> > 
> > [[alternative HTML version deleted]]
> > 
> > __
> > R-help@stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide! 
> > http://www.R-project.org/posting-guide.html
> > 
> > 
> 
> 
> --
> 
> 
> --
> 
> 
> 
> 
> -
> 
> [[alternative HTML version deleted]]
> 
> __
> R-help@stat.math.ethz.ch mailing list 
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! 
> http://www.R-project.org/posting-guide.html
> 
> 


--

--



-
Celebrate Earth Day everyday!  Discover 10 things you can do to help slow 
climate change. Yahoo! Earth Day
[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Count Unique Rows/Values

2006-04-20 Thread Sachin J



"Liaw, Andy" <[EMAIL PROTECTED]> wrote:  From:Sachin J
> 
> Hi,
> 
> This one is not working for me. It is listing all the rows 
> instead of unique ones. My dataset has 30 odd rows and 
> following is the resulting o/p
> 
> [[308313]]
> [1] 126
> [[308314]]
> [1] 126
> [[308315]]
> [1] 126
> [[308316]]
> [1] 126
> [[308317]]
> [1] 126
> [[308318]]
> [1] 126
> [[308319]]
> [1] 126
> [[308320]]
> [1] 126
> [[308321]]
> [1] 126
> 
> I used following set of commands.
> 
> > (x.unique <- lapply(x$V1, unique))

You want "x" instead of "x$V1" as the first argument to lapply(), so that it
runs unique() on all columns of "x".

Andy


> > sapply(x.unique, length)
> 
> x$V1 is numeric field.
> where x is my data frame already read (therefore i ignored 
> your first step). Am I missing something. ?
> 
> Thanks
> Sachin
> 
> "Liaw, Andy" wrote:
> This might help:
> 
> > x <- read.table("clipboard", colClasses=c("numeric", "character")) 
> > (x.unique <- lapply(x, unique))
> $V1
> [1] 155 138 126 123 103 143 111 156
> 
> $V2
> [1] "A" "B" "C" "D"
> 
> > sapply(x.unique, length)
> V1 V2 
> 8 4 
> 
> Andy
> 
> From: Sachin J
> > 
> > Hi,
> > 
> > I have a dataset which has both numeric and character
> > values with dupllicates. For example:
> > 
> > 155 A
> > 138 A
> > 138 B
> > 126 C
> > 126 D
> > 123 A
> > 103 A
> > 103 B
> > 143 D
> > 111 C
> > 111 D
> > 156 C
> > 
> > How can I count the number of unqiue entries without
> > counting duplicate entries. Also can I extract the list in a 
> > object. What I mean is
> > Col1 unique count = 8 Unique Elements are : 
> > 103,111,123,126,138,143,155,156
> > Col2 unique count = 4 Unique Elements are : A,B,C,D.
> > 
> > Any pointers would be of great help.
> > 
> > TIA
> > Sachin
> > 
> > 
> > 
> > -
> > 
> > [[alternative HTML version deleted]]
> > 
> > __
> > R-help@stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide! 
> > http://www.R-project.org/posting-guide.html
> > 
> > 
> 
> 
> --
> 
> 
> --
> 
> 
> 
> 
> -
> 
> [[alternative HTML version deleted]]
> 
> __
> R-help@stat.math.ethz.ch mailing list 
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! 
> http://www.R-project.org/posting-guide.html
> 
> 


--

--



-

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Count Unique Rows/Values

2006-04-20 Thread Sachin J

But it is not giving me the list of unique elements. Count works fine.
   
  Sachin

"Liaw, Andy" <[EMAIL PROTECTED]> wrote:
  From:Sachin J
> 
> Hi,
> 
> This one is not working for me. It is listing all the rows 
> instead of unique ones. My dataset has 30 odd rows and 
> following is the resulting o/p
> 
> [[308313]]
> [1] 126
> [[308314]]
> [1] 126
> [[308315]]
> [1] 126
> [[308316]]
> [1] 126
> [[308317]]
> [1] 126
> [[308318]]
> [1] 126
> [[308319]]
> [1] 126
> [[308320]]
> [1] 126
> [[308321]]
> [1] 126
> 
> I used following set of commands.
> 
> > (x.unique <- lapply(x$V1, unique))

You want "x" instead of "x$V1" as the first argument to lapply(), so that it
runs unique() on all columns of "x".

Andy


> > sapply(x.unique, length)
> 
> x$V1 is numeric field.
> where x is my data frame already read (therefore i ignored 
> your first step). Am I missing something. ?
> 
> Thanks
> Sachin
> 
> "Liaw, Andy" wrote:
> This might help:
> 
> > x <- read.table("clipboard", colClasses=c("numeric", "character")) 
> > (x.unique <- lapply(x, unique))
> $V1
> [1] 155 138 126 123 103 143 111 156
> 
> $V2
> [1] "A" "B" "C" "D"
> 
> > sapply(x.unique, length)
> V1 V2 
> 8 4 
> 
> Andy
> 
> From: Sachin J
> > 
> > Hi,
> > 
> > I have a dataset which has both numeric and character
> > values with dupllicates. For example:
> > 
> > 155 A
> > 138 A
> > 138 B
> > 126 C
> > 126 D
> > 123 A
> > 103 A
> > 103 B
> > 143 D
> > 111 C
> > 111 D
> > 156 C
> > 
> > How can I count the number of unqiue entries without
> > counting duplicate entries. Also can I extract the list in a 
> > object. What I mean is
> > Col1 unique count = 8 Unique Elements are : 
> > 103,111,123,126,138,143,155,156
> > Col2 unique count = 4 Unique Elements are : A,B,C,D.
> > 
> > Any pointers would be of great help.
> > 
> > TIA
> > Sachin
> > 
> > 
> > 
> > -
> > 
> > [[alternative HTML version deleted]]
> > 
> > __
> > R-help@stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide! 
> > http://www.R-project.org/posting-guide.html
> > 
> > 
> 
> 
> --
> 
> 
> --
> 
> 
> 
> 
> -
> 
> [[alternative HTML version deleted]]
> 
> __
> R-help@stat.math.ethz.ch mailing list 
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! 
> http://www.R-project.org/posting-guide.html
> 
> 


--

--



-

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Count Unique Rows/Values

2006-04-20 Thread Sachin J

Thanks Andy. That works.
   
  Sachin

"Liaw, Andy" <[EMAIL PROTECTED]> wrote:
  From:Sachin J
> 
> Hi,
> 
> This one is not working for me. It is listing all the rows 
> instead of unique ones. My dataset has 30 odd rows and 
> following is the resulting o/p
> 
> [[308313]]
> [1] 126
> [[308314]]
> [1] 126
> [[308315]]
> [1] 126
> [[308316]]
> [1] 126
> [[308317]]
> [1] 126
> [[308318]]
> [1] 126
> [[308319]]
> [1] 126
> [[308320]]
> [1] 126
> [[308321]]
> [1] 126
> 
> I used following set of commands.
> 
> > (x.unique <- lapply(x$V1, unique))

You want "x" instead of "x$V1" as the first argument to lapply(), so that it
runs unique() on all columns of "x".

Andy


> > sapply(x.unique, length)
> 
> x$V1 is numeric field.
> where x is my data frame already read (therefore i ignored 
> your first step). Am I missing something. ?
> 
> Thanks
> Sachin
> 
> "Liaw, Andy" wrote:
> This might help:
> 
> > x <- read.table("clipboard", colClasses=c("numeric", "character")) 
> > (x.unique <- lapply(x, unique))
> $V1
> [1] 155 138 126 123 103 143 111 156
> 
> $V2
> [1] "A" "B" "C" "D"
> 
> > sapply(x.unique, length)
> V1 V2 
> 8 4 
> 
> Andy
> 
> From: Sachin J
> > 
> > Hi,
> > 
> > I have a dataset which has both numeric and character
> > values with dupllicates. For example:
> > 
> > 155 A
> > 138 A
> > 138 B
> > 126 C
> > 126 D
> > 123 A
> > 103 A
> > 103 B
> > 143 D
> > 111 C
> > 111 D
> > 156 C
> > 
> > How can I count the number of unqiue entries without
> > counting duplicate entries. Also can I extract the list in a 
> > object. What I mean is
> > Col1 unique count = 8 Unique Elements are : 
> > 103,111,123,126,138,143,155,156
> > Col2 unique count = 4 Unique Elements are : A,B,C,D.
> > 
> > Any pointers would be of great help.
> > 
> > TIA
> > Sachin
> > 
> > 
> > 
> > -
> > 
> > [[alternative HTML version deleted]]
> > 
> > __
> > R-help@stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide! 
> > http://www.R-project.org/posting-guide.html
> > 
> > 
> 
> 
> --
> 
> 
> --
> 
> 
> 
> 
> -
> 
> [[alternative HTML version deleted]]
> 
> __
> R-help@stat.math.ethz.ch mailing list 
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! 
> http://www.R-project.org/posting-guide.html
> 
> 


--

--



-

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Count Unique Rows/Values

2006-04-20 Thread Sachin J

Hi,
   
  This one is not working for me. It is listing all the rows instead of unique 
ones. My dataset has 30 odd rows and following is the resulting o/p
  
[[308313]]
[1] 126
  [[308314]]
[1] 126
  [[308315]]
[1] 126
  [[308316]]
[1] 126
  [[308317]]
[1] 126
  [[308318]]
[1] 126
  [[308319]]
[1] 126
  [[308320]]
[1] 126
  [[308321]]
[1] 126
   
  I used following set of commands.
   
  > (x.unique <- lapply(x$V1, unique))
> sapply(x.unique, length)
   
  x$V1 is numeric field.
  where x is my data frame already read (therefore i ignored your first step). 
Am I missing something. ?
   
  Thanks
  Sachin

"Liaw, Andy" <[EMAIL PROTECTED]> wrote:
  This might help:

> x <- read.table("clipboard", colClasses=c("numeric", "character"))
> (x.unique <- lapply(x, unique))
$V1
[1] 155 138 126 123 103 143 111 156

$V2
[1] "A" "B" "C" "D"

> sapply(x.unique, length)
V1 V2 
8 4 

Andy

From: Sachin J
> 
> Hi,
> 
> I have a dataset which has both numeric and character 
> values with dupllicates. For example:
> 
> 155 A
> 138 A
> 138 B
> 126 C
> 126 D
> 123 A
> 103 A
> 103 B
> 143 D
> 111 C
> 111 D
> 156 C
> 
> How can I count the number of unqiue entries without 
> counting duplicate entries. Also can I extract the list in a 
> object. What I mean is
> Col1 unique count = 8 Unique Elements are : 
> 103,111,123,126,138,143,155,156
> Col2 unique count = 4 Unique Elements are : A,B,C,D.
> 
> Any pointers would be of great help.
> 
> TIA
> Sachin
> 
> 
> 
> -
> 
> [[alternative HTML version deleted]]
> 
> __
> R-help@stat.math.ethz.ch mailing list 
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! 
> http://www.R-project.org/posting-guide.html
> 
> 


--

--



-

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] Count Unique Rows/Values

2006-04-20 Thread Sachin J

Hi,
   
  I have a dataset which has both numeric and character values with 
dupllicates. For example:
   
  155   A
138   A
138   B
126   C
126   D
123   A
103   A
103   B
143   D
111   C
111   D
156   C
   
  How can I count the number of unqiue entries without counting duplicate 
entries. Also can I extract the list in a object. What I mean is
   Col1 unique count = 8  Unique Elements are : 103,111,123,126,138,143,155,156
   Col2 unique count = 4  Unique Elements are :  A,B,C,D.
   
  Any pointers would be of great help.
   
  TIA
  Sachin
   


-

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] nls - nonlinear regression

2006-04-18 Thread Sachin J


Hi, I am trying to run the following nonlinear regression model.
  > nreg <- nls(y ~ exp(-b*x), data = mydf, start = list(b = 0), alg = 
"default", trace = TRUE) OUTPUT: 24619327 :  0   24593178 :  
0.0001166910   24555219 :  0.0005019005   24521810 :  0.001341571   24500774 :  
0.002705402   24490713 :  0.004401078   24486658 :  0.00607728   24485115 :  
0.007484372   24484526 :  0.008552635   24484298 :  0.009314779   24484208 :  
0.009837009   24484172 :  0.01018542   24484158 :  0.01041381   24484152 :  
0.01056181   24484150 :  0.01065700   24484149 :  0.01071794   24484148 :  
0.01075683   24484148 :  0.01078161   24484148 :  0.01079736   24484148 :  
0.01080738   24484148 :  0.01081374   Nonlinear regression modelmodel:  y ~ 
exp(-b * x)  data:  mydfb   0.01081374residual 
sum-of-squares:  24484148   My question is how do I interpret the results 
of this model.  > profile(nreg) 24484156 : 24484156 :
 24484156 : 24484156 : 24484156 : 24484156 : 24484156 : 
24484156 : 24484156 : 24484156 : 24484156 : 24484156 : 
24484156 : 24484156 : 24484156 : 24484156 : 24484156 : 
24484156 : 24484156 : 24484156 : 24484156 : 24484156 : 
24484156 : 24484156 : 24484156 : 24484156 : 24484156 : 
24484156 : 24484156 : 24484156 : 24484156 : 24484156 : 
24484156 : 24484156 : 24484156 : 24484156 : 24484156 : 
24484156 : 24484156 : 24484156 : 24484156 : 24484156 : 
24484156 : 24484156 : 24484156 : 24484156 : 24484156 : 
24484156 : 24484156 : 24484156 : 24484156 : Error in 
prof$getProfile() : number of iterations exceeded maximum of 50 I am 
unable to understand the error cause. Any pointers would be of great help.  
Regards,Sachin  


-

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] Nonlinear Regression model: Diagnostics

2006-04-18 Thread Sachin J

Hi,
   
  I am trying to run the following nonlinear regression model. 
   
  > nreg <- nls(y ~ exp(-b*x), data = mydf, start = list(b = 0), alg = 
"default", trace = TRUE)
   
  OUTPUT: 
  24619327 :  0 
24593178 :  0.0001166910 
24555219 :  0.0005019005 
24521810 :  0.001341571 
24500774 :  0.002705402 
24490713 :  0.004401078 
24486658 :  0.00607728 
24485115 :  0.007484372 
24484526 :  0.008552635 
24484298 :  0.009314779 
24484208 :  0.009837009 
24484172 :  0.01018542 
24484158 :  0.01041381 
24484152 :  0.01056181 
24484150 :  0.01065700 
24484149 :  0.01071794 
24484148 :  0.01075683 
24484148 :  0.01078161 
24484148 :  0.01079736 
24484148 :  0.01080738 
24484148 :  0.01081374 
Nonlinear regression model
  model:  y ~ exp(-b * x) 
   data:  mydf 
 b 
0.01081374 
 residual sum-of-squares:  24484148 

  My question is how do I interpret the results of this model. 
   
  > profile(nreg)
   
  24484156 :   
24484156 :   
24484156 :   
24484156 :   
24484156 :   
24484156 :   
24484156 :   
24484156 :   
24484156 :   
24484156 :   
24484156 :   
24484156 :   
24484156 :   
24484156 :   
24484156 :   
24484156 :   
24484156 :   
24484156 :   
24484156 :   
24484156 :   
24484156 :   
24484156 :   
24484156 :   
24484156 :   
24484156 :   
24484156 :   
24484156 :   
24484156 :   
24484156 :   
24484156 :   
24484156 :   
24484156 :   
24484156 :   
24484156 :   
24484156 :   
24484156 :   
24484156 :   
24484156 :   
24484156 :   
24484156 :   
24484156 :   
24484156 :   
24484156 :   
24484156 :   
24484156 :   
24484156 :   
24484156 :   
24484156 :   
24484156 :   
24484156 :   
24484156 :   
Error in prof$getProfile() : number of iterations exceeded maximum of 50
   
  I am unable to understand the error cause. Any pointers would be of great 
help. 
   
  Regards,
  Sachin


-

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Subset dataframe based on condition

2006-04-18 Thread Sachin J

Thanx Steve and Tony for your help.

  Sachin

Tony Plate <[EMAIL PROTECTED]> wrote:
  Works OK for me:

> x <- data.frame(a=10^(-2:7), b=10^(10:1))
> subset(x, a > 1)
a b
4 1e+01 1e+07
5 1e+02 1e+06
6 1e+03 1e+05
7 1e+04 1e+04
8 1e+05 1e+03
9 1e+06 1e+02
10 1e+07 1e+01
> subset(x, a > 1 & b < a)
a b
8 1e+05 1000
9 1e+06 100
10 1e+07 10
>

Do you get all "numeric" for the following?

> sapply(x, class)
a b
"numeric" "numeric"
>

If not, then your data frame is probably encoding the information in 
some way that you don't want (though if it was as factors, I would have 
expected a warning from the comparison operator).

You might get more help by distilling your problem to a simple example 
that can be tried out by others.

-- Tony Plate

Sachin J wrote:
> Hi,
> 
> I am trying to extract subset of data from my original data frame 
> based on some condition. For example : (mydf -original data frame, submydf 
> - subset dada frame)
> 
> >submydf = subset(mydf, a > 1 & b <= a), 
> 
> here column a contains values ranging from 0.01 to 10. I want to 
> extract only those matching condition 1 i.e a > . But when i execute 
> this command it is not giving me appropriate result. The subset df - 
> submydf contains rows with 0.01 also. Please help me to resolve this 
> problem.
> 
> Thanks in advance.
> 
> Sachin
> 
> 
> -
> 
> [[alternative HTML version deleted]]
> 
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
> 

-

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] Subset dataframe based on condition

2006-04-17 Thread Sachin J

Hi,
   
  I am trying to extract subset of data from my original data frame 
based on some condition. For example : (mydf -original data frame, submydf 
- subset dada frame)
   
  >submydf = subset(mydf, a > 1 & b <= a), 
   
  here column a contains values ranging from 0.01 to 10. I want to 
extract only those matching condition 1 i.e a > . But when i execute 
this command it is not giving me appropriate result. The subset df - 
submydf  contains rows with 0.01 also. Please help me to resolve this 
problem.
   
  Thanks in advance.
   
  Sachin


-

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] Subset dataframe based on condition

2006-04-17 Thread Sachin J

Hi,
   
  I am trying to extract subset of data from my original data frame based on 
some condition. For example : (mydf -original data frame, submydf - subset dada 
frame)
   
  >submydf = subset(mydf, a > 1 & b <= a), 
   
  here column a contains values ranging from 0.01 to 10. I want to extract 
only those matching condition 1 i.e a > . But when i execute this command it is 
not giving me appropriate result. The subset df - submydf  contains rows with 
0.01 also. Please help me to resolve this problem.
   
  Thanks in advance.
   
  Sachin


-

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

65 matches

Mail list logo