[R] array vs matrix vs dataframe?

2006-10-01 Thread r user
What is the difference among an array, a dataframe and
a matrix?

Why is the size of a dataframe so much larger? (see
example below)


a<-c(rep(1:100,1))
b<-c(rep(1:100,1))
c1<-cbind(a,b)
cdf<-as.data.frame(cbind(a,b))
cm<-as.matrix(cbind(a,b))

object.size(a)/100
object.size(b)/100
object.size(c1)/100
object.size(cdf)/100
object.size(cm)/100

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] "summarry.lm" and NA values

2006-09-21 Thread r user
Gentlemen,

(I am using R 2.2.1 in a Windows environment.)

I apologize but I did not fully comprehend all of your
answer.  I have a dataframe called “data1”.  I run
several liner regression using the lm function similar
to:

reg <- ( lm(lm(data1[,2] ~., data1[,2:4])) )


I see from generous answers below how I can use 
"coef(reg)" to extract the coefficient estimates.  (If
the coefficient for a variable is for some reason NA,
"coef(reg)"  returns  NA for that coefficient, which
is what I want.)

My question: 
What is the best way to get the standard errors,
including NA values that “go with” each of these
coefficient estimates?  (i.e. If the coefficient
estimate is NA, I similarly want the standard error to
come back as NA, so that the length of coef(reg) is
the same as the length of the vector that contains the
standard errors. )

Thanks very much for all your help, and I apologize
for my need of additional assistance.






--- Berton Gunter <[EMAIL PROTECTED]> wrote:

> "Is there a way to..." always has the answer "yes"
> in R (or C or any
> language for that matter). The question is: "Is
> there a GOOD way...?" where
> "good" depends on the specifics of the situation. So
> after that polemic,
> below is an effort to answer, (adding to what Petr
> Pikal already said):
> 
> -- Bert Gunter
> Genentech Non-Clinical Statistics
> South San Francisco, CA
>  
> "The business of the statistician is to catalyze the
> scientific learning
> process."  - George E. P. Box
>  
>  
> 
> > -Original Message-
> > From: [EMAIL PROTECTED] 
> > [mailto:[EMAIL PROTECTED] On
> Behalf Of r user
> > Sent: Tuesday, August 15, 2006 7:01 AM
> > To: rhelp
> > Subject: [R] question re: "summarry.lm" and NA
> values
> > 
> > Is there a way to get the following code to
> include
> > NA values where the coefficients are "NA"?
> > 
> > ((summary(reg))$coefficients)
> BAAAD! Don't so this. Use the extractor on the
> object: coef(reg) 
> This suggests that you haven't read the
> documentation carefully, which tends
> to arouse the ire of would-be helpers.
> 
> > 
> > explanation:
> > 
> > Using a loop, I am running regressions on several
> > "subsets" of "data1".
> > 
> > "reg <- ( lm(lm(data1[,1] ~., data1[,2:l])) )"
> ??? There's an error here I think. Do you mean
> update()? Do you have your
> subscripting correct?
> 
> > 
> > My regression has 10 independent variables, and I
> > therefore expect 11 coefficients.
> > After each regression, I wish to save the
> coefficients
> > and standard errors of the coefficients in a table
> > with 22 columns.
> > 
> > I successfully extract the coefficients using the
> > following code:
> > "reg$coefficients"
> Use the extractor, coef()
> 
> > 
> > I attempt to extract the standard errors using :
> > 
> > aperm((summary(reg))$coefficients)[2,]
> 
> BAAAD! Use the extractor vcov():
> sqrt(diag(vcov(reg)))
> > 
> > ((summary(reg))$coefficients)
> > 
> > My problem:
> > For some of my subsets, I am missing data for one
> or
> > more of the independent variables.  This of course
> > causes the coefficients and standard erros for
> this
> > variable to be "NA".
> Not it doesn't, as Petr said.
> 
> One possible approach: Assuming that a variable is
> actually missing (all
> NA's), note that coef(reg) is a named vector, so
> that the character string
> names of the regressors actually used are available.
> You can thus check for
> what's missing and add them as NA's at each return.
> Though I confess that I
> see no reason to put things ina matrix rather than
> just using a list. But
> that's a matter of personal taste I suppose.
> 
> > 
> > Is there a way to include the NA standard errors,
> so
> > that I have the same number of standard erros and
> > coefficients for each regression, and can then
> store
> > the coefficients and standard erros in my table of
> 22
> > columns?
> > 
> > __
> > R-help@stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide 
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained,
> reproducible code.
> > 
> 
>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] getting sapply to skip columns with non-numeric data?

2006-08-17 Thread r user
getting s-apply to skip columns with non-numeric data?
I have a dataframe “x” of w columns.

Some columns are numeric, some are not.

I wish to create a function to calculate the mean and
standard deviation of each numeric column, and then
“bind” the column mean and standard deviation to the
bottom of the dataframe.

e.g. 

tempmean <- apply(data.frame(x), 2, mean, na.rm = T)
xnew <- rbind(x,tempmean)

I am running into one small problem…what is the best
way to have sapply “skip” the non-numeric data and
return NA’s?

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] question re: "summarry.lm" and NA values

2006-08-15 Thread r user
Is there a way to get the following code to include
NA values where the coefficients are “NA”?

((summary(reg))$coefficients)

explanation:

Using a loop, I am running regressions on several
“subsets” of “data1”.

“reg <- ( lm(lm(data1[,1] ~., data1[,2:l])) )”

My regression has 10 independent variables, and I
therefore expect 11 coefficients.
After each regression, I wish to save the coefficients
and standard errors of the coefficients in a table
with 22 columns.

I successfully extract the coefficients using the
following code:
“reg$coefficients”

I attempt to extract the standard errors using :

aperm((summary(reg))$coefficients)[2,]

((summary(reg))$coefficients)

My problem:
For some of my subsets, I am missing data for one or
more of the independent variables.  This of course
causes the coefficients and standard erros for this
variable to be “NA”.

Is there a way to include the NA standard errors, so
that I have the same number of standard erros and
coefficients for each regression, and can then store
the coefficients and standard erros in my table of 22
columns?

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Getting summary.lm to include data for coefficients that are NAs?

2006-08-11 Thread r user
Is there a way to get the following code to include
liens where the coefficients are “NA”?

((summary(reg))$coefficients)

explanation:

Using a loop, I am running regressions on several
“subsets” of “data1”.

“reg <- ( lm(lm(data1[,1] ~., data1[,2:l])) )”

My regression has 10 independent variables, and I
therefore expect 11 coefficients.
After each regression, I wish to save the coefficients
and standard errors of the coefficients in a table
with 22 columns.

I successfully extract the coefficients using the
following code:
“reg$coefficients”

I attempt to extract the standard erros using :

aperm((summary(reg))$coefficients)[2,]

((summary(reg))$coefficients)

My problem:
For some of my subsets, I am missing data for one or
more of the independent variables.  This of course
causes the coefficients and standard erros for this
variable to be “NA”.

Is there a way to include the NA standard errors, so
that I have the same number of standard erros and
coefficients for each regression, and can then store
the coefficients and standard erros in my table of 22
columns?

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] basic question re lm()

2006-08-10 Thread r user
I am using R in a Windows environment.

I have a basic question regarding lm().

I have a dataframe “data1” with ncol=w.

I know that my dependent variable is in column1.

Is there a way to write the regression formula so that
I can use columns 2 thru w as my independent
variables?



e.g. something like:  “ lm(data1[,1] ~ data1[,2:w] ) “

Thanks

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] average by group...

2006-05-30 Thread r user
I have a dataframe with 700,000 rows and 2 vectors
(columns): “group” and “score”.

I  wish to calculate a third vector of length 70:
the average score by group.  Even though the avarge
value will repeat, I wish to return the average for
that particular group for each row.

(I know I can do this by calculating each group’s
average and then using the merge command, but as my
calculations get more complex and my data set gets
larger, the merge command seems to be fairly slow.)

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] help converting code to a function

2006-05-05 Thread r user
I want to write a function that loads a data frame
from my hard drive, and then creates a new dataframe
that calculates the difference between column n and
column n+4, and them saves this new dataframe to my
hard drive, and finally, removes both the new and old
data frame from memory..

Here is the code I am using.

How do I convert this into a function that can be used
to perform the same process on any dataframe?


load ('c:/r_pit/sampledf.r')
w<-ncol(sampledf)
l<-nrow(sampledf)
sampledf_yychg <-
data.frame(matrix(data=NA,nrow=l,ncol=w-4))
for(j in 1:(w-4)) { sampledf_yychg[, j]<-sampledf[,
j]- sampledf[, j+4] }
save(sampledf_yychg, file='c:/r_pit/sampledf_yychg.r')
rm(sampledf, sampledf_sq, sampledf_yychg)

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] function to check if an object is present, and if not, load it from my hard drive

2006-05-05 Thread r user
I want to check if an "object" (dataset, vector, etc)
is “present”.  If it is present, I will do nothing. 
If it is not present, I will load it from my hard
drive.

Is there function to determine if an object is
present?

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] gc(), memory.size()

2006-05-05 Thread r user
Can someone please explain for me what the vcells and
ncells “used” column means when I run gc()?
> gc()
   used  (Mb) gc trigger  (Mb)  max used  
(Mb)
Ncells   882296  23.6   13812157 368.9  19400892 
518.1
Vcells 14811586 113.1  114763459 875.6 317464335
2422.1
>

(I read the help file , but still do not fully
understand?)
Also, how do I determine the total memory being
used?Do I simply run memory.size()?
Finally, when I run memory. size(max=T), I get a
negative value.  What does this mean

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] converting code into a function - seperating a data frame with n columns into n individual vectors

2006-05-04 Thread r user
I have many very large dataframes with 20 columns
each.

In order to conserve memory, I wish to separate the
data frame into 20 vectors, each named the name of the
dataframe followed by .1,.2,.3….20.

(For example purposes, one data frame is named
“testa”.)

e.g. testa.1, testa.2, testa.3

I have written the code to do this (see below). I am
trying to convert this into a function that I can
reuse.  Suggestions are appreciated.

(I am not sure if this is the best way to approach the
problem, but I do think it will work. FYI, I really do
need all the data, so selecting subset of the data is
not a good option.)

Here is the code I’ve been using:

load('c:/testa.r')
testa.1<-testa[ , 1]
testa.2<-testa[ , 2]
testa.3<-testa[ , 3]
testa.4<-testa[ , 4]
testa.5<-testa[ , 5]
testa.6<-testa[ , 6]
testa.7<-testa[ , 7]
testa.8<-testa[ , 8]
testa.9<-testa[ , 9]
testa.10<-testa[ , 10]
testa.11<-testa[ , 11]
testa.12<-testa[ , 12]
testa.13<-testa[ , 13]
testa.14<-testa[ , 14]
testa.15<-testa[ , 15]
testa.16<-testa[ , 16]
testa.17<-testa[ , 17]
testa.18<-testa[ , 18]
testa.19<-testa[ , 19]
testa.20<-testa[ , 20]
rm(testa)
gc()

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Determining the "memory" used by a dataset or vector?

2006-05-04 Thread r user

Is there a function that reports the amount of memory
used by a dataset and/or vector?

If I have a dataset with only 1 column, does it use
more memory then the same data arranged as a vector?

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] function to replace missing values with median value?

2006-05-03 Thread r user
I have a data set with ~10 variables (i.e. columns).

I wrote this little function to replace missing values
with zero.  

“ sz <- function(x) { ifelse(is.na(x)==F,x,0) } “

Can anyone help with a function that replaces missing
values with the median of the non-missing values?

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] pros and cons of "robust regression"? (i.e. rlm vs lm)

2006-04-06 Thread r user
Can anyone comment or point me to a discussion of the
pros and cons of robust regressions, vs. a more
"manual" approach to trimming outliers and/or
"normalizing" data used in regression analysis?

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] rowVars

2006-03-31 Thread r user
I am using the R 2.2.1 in a Windows XP environment.

I have a dataframe with 12 columns and 1,000 rows.
(Some of the rows have 1 or fewer values.)

I am trying to use rowVars to calculate the variance
of each row.

I am getting the following message:
“Error in na.remove.default(x) : length of 'dimnames'
[1] not equal to array extent”


Is there a good work-around?

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] calcualtign a trailing 12 column mean in a dataframe?

2006-03-29 Thread r user
I have a dataframe of 25 columns and 100,000 rows
called “testdf”.

I wish to build a new dataframe, with 14 columns and
100,000 rows.

I wish the new dataframe to have the “trailing 12
column” mean.  That is, I want column 1 of the new
dataframe to have soemthing like:

“( mean(testdf[,1:12],na.rm=T)”

What is the best way to accomplish this?

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] "renaming" dataframe1 using "column" names from dataframe2?

2006-03-17 Thread r user
I have a dataframe named “temp”, and another dataframe
named “descriptions”.

I wish to “rename” temp, and to “call” it the names of
a certain column in the dataframe “descriptions”.

Is there a good way to do this?

A similar question:

I am using a “for loop” to create several new
dataframes.
e.g.
for(j in 1:9){…..

I’d like each dataframe to be named d1, d2, d3, with
the number being tied to the j (the iteration).

Is this possible

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] using a value in a column to "lookup" data in a certian column of a dataset?

2006-03-14 Thread r user
I have a dataset with 20 columns and ~600,000 rows.

Column 1 has a number from 2-19.  This number tells
me, for each row, which column has the “applicable”
data.  (i.e. the data that I wish to use for each
individual row)

I want to create a vector that contains the data from
the value in column 1.

e.g. 
If column 1, row 1, has a value of “6”, I want to
obtain the value in column 6, row1.

If column1, row 2, has value of “2”, I want to obtain
the value in column 2, row2. etc


I have created a for next loop to do this, but am
looking for a more efficient manner.

Thanks.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] vector math: calculating a rolling 12 row product?

2006-02-28 Thread r user
I have a dataframe of numeric values with 30 “rows”
and 7 “columns”.

For each column, beginning at “row” 12 and down to
“row” 30, I wish to calculate the “rolling 12 row
product”.  I.e., within each column, I wish to
multiply all the values in row 1:12, 2:13,…19:30.

I wish to save the results as a new dataframe, which
will have 19 rows and 7 columns.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] memory managment under Windows XP

2006-02-23 Thread r user
I am using R 2.2.1 in a Windowes XP environment.

I work with very large datasets, and occassionally run
out of memory.

I have modified my boot.ini file to use the "/3gb
switch".

I also run the following line after I launch R ( I am
unsure if it is helpful).

"memory.limit(size = 4095)"

Please point me to useful references on how to better
manage memory, or suggestother actions.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] "Conditional" match?

2006-01-27 Thread r user
I have two datasets, big and small.

s_date<-c(‘2005-12-02’, ‘2005-12-01’,
‘2004-11-02’,’2002-10-05’,’2000-12-15’)
s_id<-c(‘a’,’a’,’b’,’c’,’d’)

b_date<- c(‘2005-12-31’, ‘2005-12-31’,
‘2004-12-31’,’2002-10-05’,’2001-10-31’,’1999-12-31’)

b_id<-c(‘a’,’b’,’c’,’d’,’e’,’c’)

small<-data.frame(date_=as.Date(s_date),id=s_id)
big<-data.frame(date_=as.Date(b_date),id=b_id)

For each row in “big”, I want to look for a match in
small where two conditions are met:

a.  big$id=small$id
b.  big$date_>=small$date

If  match is found, I wish to return the value of the
date.  If no match is found, I want NA.  

If more than 1 match is found, I wish to return the
match where small$date is greatest.

I’m thinking I might be able to do this using the
match function, and by sorting the “small” dataset by
date_ in descending order.  

However, I do not know how to make the match
conditional on big$date_>=small$date_.

Any help is appreciated.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] paste - eliminate spaces?

2006-01-25 Thread r user
I found the answer:

add sep="" to the paste command

paste('test',1,sep="")



--- r user <[EMAIL PROTECTED]> wrote:

> I am trying to combine the value of a variable and
> text.
> 
> e.g.
> I want “test1”, with no spaces.
> 
> I try:
> 
> h=1
> paste(‘test’,1)
> 
> But get:
> [1] "test 1"
> 
> (i.e. there is a space between “test’“ and “1”)
> 
> Is there a way to eliminate the space?
> 
> 
> __
> Do You Yahoo!?

> protection around 
> http://mail.yahoo.com 
>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] paste - eliminate spaces?

2006-01-25 Thread r user
I am trying to combine the value of a variable and
text.

e.g.
I want “test1”, with no spaces.

I try:

h=1
paste(‘test’,1)

But get:
[1] "test 1"

(i.e. there is a space between “test’“ and “1”)

Is there a way to eliminate the space?

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] importing a VERY LARGE database from Microsoft SQL into R

2006-01-24 Thread r user
I am using R 2.1.1 in a Windows Xp environment.

I need to import a large database from Microsoft SQL
into R.

I am currently using the “sqlQuery” function/command.

This works, but I sometimes run out of memory if my
database is too big, or it take quite a long time for
the data to import into R.

Is there a better way to bring a large SQL database
into R? 

IS there an efficient way to convert the data into R
format prior to bringing it into R? (E.g. directly
from Microsoft SQL?)

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] exporting dates into Microsoft SQL Server

2006-01-23 Thread r user
I am running R 2.1.1 in a Windows XP environment.

I wish to use the sqlSave command to export a
dataframe into Microsoft SQL.

My dataframe is called temp and has 2 “columns”,
“monthenddate” and “value”.

Monthenddate is in 'POSIXct', format. (i.e. 'POSIXct',
format: chr  "1984-01-31" "1984-01-31" "1984-01-31"
"1984-01-31" ...).

How can I export this dataframe into SQL and have the
format in SQL by one of the “standard” SQL date
formats?

I am using the following r code:

db <- odbcConnect("testserver")
sqlSave(db, temp)

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Converting from a dataset to a single "column"

2006-01-23 Thread r user
I have a dataset of 3 “columns” and 5 “rows”.

temp<-data.frame(col1=c(5,10,14,56,7),col2=c(4,2,8,3,34),col3=c(28,4,52,34,67))

I wish to convert this to a single “column”, with
column 1 on “top” and column 3 on “bottom”.

i.e.

5
10
14
56
7
4
2
8
3
34
28
4
52
34
67

Are there any functions that do this, and that will
work well on much larger datasets (e.g. 1000 rows,
6000 columns)?

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] matrix logic

2006-01-10 Thread r user
I have 2 dataframes, each with 5 columns and 20 rows.
They are called data1 and data2.I wish to create a
third dataframe called data3, also with 5 columns and
20 rows.

I want data3 to contains the values in data1 when the
value in data1 is not NA.  Otherwise it should contain
the values in data2.

I have tried afew methids, but they do not seem to
work as intended.:

data3<-ifelse(is.na(data1)=F,data1,data2)

and 

data3[,]<-ifelse(is.na(data1[,])=F,data1[,],data2[,])

Please suggest the “best” way.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] matrix math

2006-01-04 Thread r user
>   I am using R 2.1.1 in an windows XP environment.
>
>   I have 2 dataframes, temp1 and temp2.
>
>   Each dataframe has 20 variables (“cocolumns") and
> 525 observations (“rows”).  All variables are
> numeric.
>
>   I want to create a new dataframe that also has 20
> columns and 525 rows.  The values in this dataframe
> should be the sum of the 2 other dataframe.
>
>   (i.e. temp1$column 1+temp2$column1,
> temp1$column2+temp2$column2, etc)
>
>   What is the best/easiest way to accomplish this?
>
>   Is I wish to "multiply" (instead of sum) the
> columns, how do I?
>
>   I tried:
>
>   temp3<-as.matrix(temp1)+as.matrix(temp2)
>
>   I get the following error message: “Error in
> as.matrix(temp1) + as.matrix(temp2) : 
>   non-numeric argument to binary operator” 
> 
> 
>   
> -

> $16.99/mo. or less

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] matrix math

2006-01-04 Thread r user
  I am using R 2.1.1 in an windows XP environment.
   
  I have 2 dataframes, temp1 and temp2.
   
  Each dataframe has 20 variables (“cocolumns") and 525 observations (“rows”).  
All variables are numeric.
   
  I want to create a new dataframe that also has 20 columns and 525 rows.  The 
values in this dataframe should be the sum of the 2 other dataframe.
   
  (i.e. temp1$column 1+temp2$column1, temp1$column2+temp2$column2, etc)
   
  What is the best/easiest way to accomplish this?
   
  Is I wish to "multiply" (instead of sum) the columns, how do I?
   
  I tried:
   
  temp3<-as.matrix(temp1)+as.matrix(temp2)
   
  I get the following error message: “Error in as.matrix(temp1) + 
as.matrix(temp2) : 
  non-numeric argument to binary operator” 



-

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] For loop gets exponentially slower as dataset gets larger...

2006-01-03 Thread r user
I am running R 2.1.1 in a Microsoft Windows XP environment.
   
  I have a matrix with three vectors (“columns”) and ~2 million “rows”.  The 
three vectors are date_, id, and price.  The data is ordered (sorted) by code 
and date_.
   
  (The matrix contains daily prices for several thousand stocks, and has ~2 
million “rows”. If a stock did not trade on a particular date, its price is set 
to “NA”)
   
  I wish to add a fourth vector that is “next_price”. (“Next price” is the 
current price as long as the current price is not “NA”.  If the current price 
is NA, the “next_price” is the next price that the security with this same ID 
trades.  If the stock does not trade again,  “next_price” is set to NA.)
   
  I wrote the following loop to calculate next_price.  It works as intended, 
but I have one problem.  When I have only 10,000 rows of data, the calculations 
are very fast.  However, when I run the loop on the full 2 million rows, it 
seems to take ~ 1 second per row.
   
  Why is this happening?  What can I do to speed the calculations when running 
the loop on the full 2 million rows?
   
  (I am not running low on memory, but I am maxing out my CPU at 100%)
   
  Here is my code and some sample data:
   
  data<- data[order(data$code,data$date_),] 
  l<-dim(data)[1]
  w<-3
  data[l,w+1]<-NA
   
  for (i in (l-1):(1)){
  
data[i,w+1]<-ifelse(is.na(data[i,w])==F,data[i,w],ifelse(data[i,2]==data[i+1,2],data[i+1,w+1],NA))
  }
   
   
  date  id price next_price
  6/24/20051635444.7838 444.7838
  6/27/20051635448.4756 448.4756
  6/28/20051635455.4161 455.4161
  6/29/20051635454.6658 454.6658
  6/30/20051635453.9155 453.9155
  7/1/2005  1635453.3153 453.3153
  7/4/2005  1635NA  453.9155
  7/5/2005  1635453.9155 453.9155
  7/6/2005  1635453.0152 453.0152
  7/7/2005  1635452.8651 452.8651
  7/8/2005  1635456.0163 456.0163
  12/19/2005  1635442.6982 442.6982
  12/20/2005  1635446.5159 446.5159
  12/21/2005  1635452.4714 452.4714
  12/22/2005  1635451.074   451.074
  12/23/2005  1635454.6453 454.6453
  12/27/2005  1635NA  NA
  12/28/2005  1635NA  NA
  12/1/2003188166.1562   66.1562
  12/2/2003188164.9192   64.9192
  12/3/2003188166.0078   66.0078
  12/4/2003188165.8098   65.8098
  12/5/2003188164.1275   64.1275
  12/8/2003188164.8697   64.8697
  12/9/2003188163.5337   63.5337
  12/10/2003  188162.9399   62.9399


-

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Compare rows of two matrices

2005-02-21 Thread R user
> y <-  matrix( c(20,  NA,  NA,  45,  50,  19,  32, 101,  10,  22,  NA,  NA,  
> 80,  49,  61, 190), ncol=4 )
> x <-  matrix( c(20,  NA,  NA,  NA,  50,  19,  32, 101,  10,  22,  NA,  NA,  
> 80,  49,  61, 190), ncol=4 )
> 
> #Whereas x contains all NA´s from y plus some additional NA´s.
> #I want to find the index of these additional NA´s. I think, there must be a 
> very easy way to do this.


How about this:

  is.na(x) & !is.na(y)


Jonne.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Extracting a numeric prefix from a string

2005-01-31 Thread R user
You could use something like

y <- gsub('([0-9]+(.[0-9]+)?)?.*','\\1',x)
as.numeric(y)

But maybe there's a much nicer way.

Jonne.

On Mon, 2005-01-31 at 08:51 +, Mike White wrote:
> Hi
> Does anyone know if there is a function similar to as.numeric that will
> extract a numeric prefix from a string as in the following examples?
> 
> x<-c(3, "abc", 5.67, "2.4a", "6a", "6b", "2.4.a", 3, "4.2a")
> df.x<-data.frame(Code=x)
> x.str<-levels(df.x[,1])
> # required function  result
> 2.40 3.00 4.20 5.67 6.00 NA
> 
> Thanks
> Mike White
> 
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
-- 
R user <[EMAIL PROTECTED]>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] parameter couldn't be set in high-level plot() function

2005-01-25 Thread R user
Think the problem I had with the bandplot (gplots) function is solved by
changing the expand.dots = FALSE to expand.dots = TRUE.
Don't understand actually why it says FALSE here, because that means it
does *not* pass extra arguments to plot.
If I change it to TRUE, my main/xlab/ylab arguments are passed just like
I wanted.

fragment of bandplot[gplots]

if (!add) {
m <- match.call(expand.dots = FALSE)
m$width <- m$add <- m$sd <- m$sd.col <- NULL
m$method <- m$n <- NULL
m[[1]] <- as.name("plot")
mf <- eval(m, parent.frame())
}

Jonne.


On Mon, 2005-01-24 at 15:50 +0100, R user wrote:
> 
> Dear R users,
> 
> I am using function bandplot from the gplots package.
> To my understanding (viewing the source of bandplot) it calls
> function plot (add = FALSE) with the same parameters (except for a few
> removed).
> 
> I would like to give extra parameters 'xlab' and 'ylab' to function
> bandplot, but, as can be seen below, that raises warnings (and the
> labels do not show up at the end).
> 
> It does work to call title(... xlab="blah", ylab="foo") after bandplot
> (), but then I have two labels on top of each other, which is even more
> ugly.
> 
> Can anyone explain me why this goes wrong?
> 
> Thanks in advance,
> Jonne.
> 
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
-- 
R user <[EMAIL PROTECTED]>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] parameter couldn't be set in high-level plot() function

2005-01-24 Thread R user


Dear R users,

I am using function bandplot from the gplots package.
To my understanding (viewing the source of bandplot) it calls
function plot (add = FALSE) with the same parameters (except for a few
removed).

I would like to give extra parameters 'xlab' and 'ylab' to function
bandplot, but, as can be seen below, that raises warnings (and the
labels do not show up at the end).

It does work to call title(... xlab="blah", ylab="foo") after bandplot
(), but then I have two labels on top of each other, which is even more
ugly.

Can anyone explain me why this goes wrong?

Thanks in advance,
Jonne.


> x11() ; bandplot(x=xdata, y=zdata)

[works fine]

> x11() ; bandplot(x=xdata, y=zdata, xlab="blah", ylab="foo")
There were 22 warnings (use warnings() to see them)
> warnings()
Warning messages:
1: parameter "xlab" couldn't be set in high-level plot() function
2: parameter "ylab" couldn't be set in high-level plot() function
3: parameter "xlab" couldn't be set in high-level plot() function
4: parameter "ylab" couldn't be set in high-level plot() function
5: parameter "xlab" couldn't be set in high-level plot() function
6: parameter "ylab" couldn't be set in high-level plot() function
7: parameter "xlab" couldn't be set in high-level plot() function
8: parameter "ylab" couldn't be set in high-level plot() function
9: parameter "xlab" couldn't be set in high-level plot() function
10: parameter "ylab" couldn't be set in high-level plot() function
11: parameter "xlab" couldn't be set in high-level plot() function
12: parameter "ylab" couldn't be set in high-level plot() function
13: parameter "xlab" couldn't be set in high-level plot() function
14: parameter "ylab" couldn't be set in high-level plot() function
15: parameter "xlab" couldn't be set in high-level plot() function
16: parameter "ylab" couldn't be set in high-level plot() function
17: parameter "xlab" couldn't be set in high-level plot() function
18: parameter "ylab" couldn't be set in high-level plot() function
19: parameter "xlab" couldn't be set in high-level plot() function
20: parameter "ylab" couldn't be set in high-level plot() function
21: parameter "xlab" couldn't be set in high-level plot() function
22: parameter "ylab" couldn't be set in high-level plot() function
There were 22 warnings (use warnings() to see them)

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] 3d bar plot

2005-01-17 Thread R user
This graph -> http://www.math.hope.edu/~tanis/dallas/images/disth36.gif
is an example I found at
http://www.math.hope.edu/~tanis/dallas/disth1.html
created by Maple.

Does anybody know how to create something similar in R?

I have a feeling it could be possible using scatterplot3d
(perhaps with type=h, the fourth example in help('scatterplot3d')?),
but I cannot figure it out.

Thanks in advance,
Jonne.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] evaluate expression on several dataframe columns

2004-12-20 Thread R user
Hi R-users,

I have a collection of dataframes and know how to build
a string that refers to it, in this example, name_infra_alg_inc.
Then, I have a character string yval, which the user can select
from a drop down list. It contains the column names of the
dataframes.

assign(paste(name_infra_alg_inc, "ci", sep="."),
  ci(get(name_infra_alg_inc)[[yval]], confidence=0.95))

My problem is that I sometimes want to combine columns.
For example, if there are columns A, B and C.
Would it be possible that yval has the value "A+B*C" and
then call some sort of evaluate function?
Maybe I could attach the dataframe and then call some function,
I don't know how to figure this out, so hopefully someone can help me.

Thanks in advance

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html