Re: [R] evaluate a string variable

2010-06-29 Thread Bill.Venables
You need to parse it before you evaluate it.

eval(parse(text = avar)) 

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Jeremiah H. Savage
Sent: Wednesday, 30 June 2010 3:45 PM
To: r-help@r-project.org
Subject: [R] evaluate a string variable

Hello,

I was wondering how to evaluate a string variable in R.

eg.

> avar <- "getwd()"
> eval(avar)

gives:
[1] "getwd()"

but I would like to see:
[1] "/home/myhomedir"

Thanks,
Jeremiah

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] evaluate a string variable

2010-06-29 Thread Jeremiah H. Savage
Hello,

I was wondering how to evaluate a string variable in R.

eg.

> avar <- "getwd()"
> eval(avar)

gives:
[1] "getwd()"

but I would like to see:
[1] "/home/myhomedir"

Thanks,
Jeremiah

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Logistic regression with multiple imputation

2010-06-29 Thread Simon Blomberg
mitools is useful too, and I can vouch for mice. mice is easy to use, 
and easy to write new imputation methods too. So it is also very flexible.


Simon.

On 30/06/10 15:31, Jeremy Miles wrote:

Hi Daniel

First, newer versions of SPSS have dramatically improved their ability
to do stuff with missing data - I believe it's an additional module,
and in SPSS-world, each additional module = $$$.

Analyzing missing data is a 3 step process.  First, you impute,
creating multiple datasets, then you analyze each dataset in the
conventional way, then you combine the results.   There are two (that
I know of) packages for imputaton - these are mi and mice.  rseek.org
will find them for you.

Hope that helps,

Jeremy




On 29 June 2010 22:14, Daniel Chen  wrote:
   

Hi,

I am a long time SPSS user but new to R, so please bear with me if my
questions seem to be too basic for you guys.

I am trying to figure out how to analyze survey data using logistic
regression with multiple imputation.

I have a survey data of about 200,000 cases and I am trying to predict the
odds ratio of a dependent variable using 6 categorical independent variables
(dummy-coded). Approximatively 10% of the cases (~20,000) have missing data
in one or more of the independent variables. The percentage of missing
ranges from 0.01% to 10% for the independent variables.

My current thinking is to conduct a logistic regression with multiple
imputation, but I don't know how to do it in R. I searched the web but
couldn't find instructions or examples on how to do this. Since SPSS is
hopeless with missing data, I have to learn to do this in R. I am new to R,
so I would really appreciate if someone can show me some examples or tell me
where to find resources.

Thank you!

Daniel

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

 



   


--
Simon Blomberg, BSc (Hons), PhD, MAppStat.
Lecturer and Consultant Statistician
School of Biological Sciences
The University of Queensland
St. Lucia Queensland 4072
Australia
T: +61 7 3365 2506
email: S.Blomberg1_at_uq.edu.au
http://www.uq.edu.au/~uqsblomb/

Policies:
1.  I will NOT analyse your data for you.
2.  Your deadline is your problem

Statistics is the grammar of science - Karl Pearson.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Logistic regression with multiple imputation

2010-06-29 Thread Jeremy Miles
Hi Daniel

First, newer versions of SPSS have dramatically improved their ability
to do stuff with missing data - I believe it's an additional module,
and in SPSS-world, each additional module = $$$.

Analyzing missing data is a 3 step process.  First, you impute,
creating multiple datasets, then you analyze each dataset in the
conventional way, then you combine the results.   There are two (that
I know of) packages for imputaton - these are mi and mice.  rseek.org
will find them for you.

Hope that helps,

Jeremy




On 29 June 2010 22:14, Daniel Chen  wrote:
> Hi,
>
> I am a long time SPSS user but new to R, so please bear with me if my
> questions seem to be too basic for you guys.
>
> I am trying to figure out how to analyze survey data using logistic
> regression with multiple imputation.
>
> I have a survey data of about 200,000 cases and I am trying to predict the
> odds ratio of a dependent variable using 6 categorical independent variables
> (dummy-coded). Approximatively 10% of the cases (~20,000) have missing data
> in one or more of the independent variables. The percentage of missing
> ranges from 0.01% to 10% for the independent variables.
>
> My current thinking is to conduct a logistic regression with multiple
> imputation, but I don't know how to do it in R. I searched the web but
> couldn't find instructions or examples on how to do this. Since SPSS is
> hopeless with missing data, I have to learn to do this in R. I am new to R,
> so I would really appreciate if someone can show me some examples or tell me
> where to find resources.
>
> Thank you!
>
> Daniel
>
>        [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Jeremy Miles
Psychology Research Methods Wiki: www.researchmethodsinpsychology.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Logistic regression with multiple imputation

2010-06-29 Thread Daniel Chen
Hi,

I am a long time SPSS user but new to R, so please bear with me if my
questions seem to be too basic for you guys.

I am trying to figure out how to analyze survey data using logistic
regression with multiple imputation.

I have a survey data of about 200,000 cases and I am trying to predict the
odds ratio of a dependent variable using 6 categorical independent variables
(dummy-coded). Approximatively 10% of the cases (~20,000) have missing data
in one or more of the independent variables. The percentage of missing
ranges from 0.01% to 10% for the independent variables.

My current thinking is to conduct a logistic regression with multiple
imputation, but I don't know how to do it in R. I searched the web but
couldn't find instructions or examples on how to do this. Since SPSS is
hopeless with missing data, I have to learn to do this in R. I am new to R,
so I would really appreciate if someone can show me some examples or tell me
where to find resources.

Thank you!

Daniel

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] using zoo() to coerce time series to a different reference frame

2010-06-29 Thread Gabor Grothendieck
On Tue, Jun 29, 2010 at 5:58 PM, Jonathan Greenberg
 wrote:
> Folks:
>
> I have two sets of dates, and one set of data:
>
> ***
>
> require("chron")
> require("zoo")
> reference_dates=seq.dates("01/01/92", "12/31/92", by = "months")
> data_dates=seq.dates("01/15/91", "12/15/93", by = "months")
> data=1:length(data_dates)
>
> reference_zoo=zoo(order.by=reference_dates)
> data_zoo=zoo(data,data_dates)
>
> ***
>
> What I would like is to have a zoo object that uses the index from
> reference_dates, but grabs the data for each of the dates (using a
> spline interpolation) from data_zoo object.  I feel like my solution
> is a bit slow, can someone let me know if there is a quicker way to do
> this?  Thanks:
>
> ***
>
> reference_data_zoo_merge=merge(reference_zoo,data_zoo)
> reference_data_zoo_data=na.spline(reference_data_zoo_merge)
> reference_data_zoo_data=merge(reference_zoo,reference_data_zoo_data,all=FALSE)
>

Try this:

> na.spline(data_zoo, xout = reference_dates)
01/01/92 02/01/92 03/01/92 04/01/92 05/01/92 06/01/92 07/01/92
08/01/92 09/01/92 10/01/92 11/01/92 12/01/92
12.55383 13.53979 14.51858 15.55116 16.53268 17.54855 18.53231
19.55283 20.54369 21.53461 22.54817 23.53190

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] using zoo() to coerce time series to a different reference frame

2010-06-29 Thread Achim Zeileis

On Tue, 29 Jun 2010, Jonathan Greenberg wrote:


Folks:

I have two sets of dates, and one set of data:

***

require("chron")
require("zoo")
reference_dates=seq.dates("01/01/92", "12/31/92", by = "months")
data_dates=seq.dates("01/15/91", "12/15/93", by = "months")
data=1:length(data_dates)

reference_zoo=zoo(order.by=reference_dates)
data_zoo=zoo(data,data_dates)

***

What I would like is to have a zoo object that uses the index from
reference_dates, but grabs the data for each of the dates (using a
spline interpolation) from data_zoo object.  I feel like my solution
is a bit slow, can someone let me know if there is a quicker way to do
this?  Thanks:


With current versions of "zoo" you can simply do:

  na.spline(data_zoo, xout = reference_dates)

hth,
Z


***

reference_data_zoo_merge=merge(reference_zoo,data_zoo)
reference_data_zoo_data=na.spline(reference_data_zoo_merge)
reference_data_zoo_data=merge(reference_zoo,reference_data_zoo_data,all=FALSE)

***

--j

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] seq.dates in reverse?

2010-06-29 Thread Marc Schwartz
On Jun 29, 2010, at 7:17 PM, Jonathan Greenberg wrote:

> Pardon the barrage of time series related questions, but another issue
> I'm trying to solve is how to determine a sequence of dates a la
> seq.dates() except going BACKWARDS in time, e.g. if seq.dates()
> allowed for the "to" variables to be set alone, rather than the from=.
> Ultimately, I'd like to have a set of dates preceding a given date in
> predefined intervals (the same ones seq.dates() uses would be fine).
> Thoughts?  Would there be an easy way to "reverse engineer" a starting
> date given the "by=" variable and the "length="?  With the exception
> of using "by=days", I'm a bit unfamiliar with how to easily determine
> what date was, say, 4 months ago without doing a lot of string hacking
> (seq.dates() conveniently keeps the days of the month constant when
> generate date sequences, which is what I'd like).
> 
> --j

Jonathon,

Do you mean something like:

> seq(as.Date("2010-07-29"), length = 2, by = "-4 months")
[1] "2010-07-29" "2010-03-29"

?

Note that the 'by' argument can be a negative interval. See the third bullet in 
the Details section of ?seq.Date.

HTH,

Marc Schwartz

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Interpretation of gam intercept parameter

2010-06-29 Thread Lidia Dobria

Dear All:

I apologize for asking such an elementary question, but I could not find an 
adequate response on line. I am hoping to receive some help with the 
interpretation of the Intercept coefficient in the gam model below.
 
I1 through I3 are dummy coded "Item difficulty" parameters in a data set that 
includes 4 items. If the Intercept is the value of Y when all other terms are 
0, am I correct in assuming that it also equals the difficulty of item 4 (dummy 
coded 0 0 0 )?

Thank you for your help.
Lidia


Family: gaussian 
Link function: identity 

Formula:
Score ~ I1 + I2 + I3 + s(TimeI1, bs = "cr", k = 7) +
s(TimeI2, bs = "cr", k = 7) + s(TimeI3, bs = "cr", k = 7)

Parametric coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)  4.709680.09547  49.330  < 2e-16 ***
I1  -0.221880.21767  -1.019 0.308157
I2   0.512360.16592   3.088 0.002042 ** 
I3  -0.606970.18258  -3.324 0.000902 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 
 
Approximate significance of smooth terms:
 edf Ref.df F  p-value
s(TimeI1) 3.820  3.820 4.587 0.001331 ** 
s(TimeI2) 2.491  2.491 6.271 0.000784 ***
s(TimeI3) 3.481  3.481 8.997 1.54e-06 ***
---

Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 
R-sq.(adj) =  0.057  Scale est. = 2.131 n = 2079
  
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Multiline and grouping in R

2010-06-29 Thread Pablo Cerdeira
Owo! Many thanks to you all. A lot of answers in just a few minutes!! I'd
like to thank you Dennis, Joshua, David, Erik and Bill.

I'll try these solutions and read something about xyplot.

Best regards to you.


On Tue, Jun 29, 2010 at 10:54 PM, Dennis Murphy  wrote:

> Hi:
>
> This is to add a couple of bells and whistles to the excellent replies of
> David and Bill. You may not want the 0.5 increments in the x-axis ticks, and
> if you want all groups in a single plot, then you may want a legend. Below
> are some basic ways to specify these (to head off the obvious follow-up
> questions :)
>
> # (1) Individual panels per AREA:
> xyplot(CASES ~ YEAR|AREA, data=df, type="b",
>   scales = list(x = list(at = c(1988, 1989, 1990
>
> The scales = argument provides control over various aspects of the axes,
> such as labels and tick marks. In this case, we want to limit the x-axis
> ticks to occur at 1988, 1989 and 1990 only.
>
> # (2) Single graph with multiple AREA profiles over time:
> xyplot(CASES ~ YEAR, data = df, type = "b", groups = AREA,
>   scales = list(x = list(at = c(1988, 1989, 1990))),
>   auto.key = list(points = FALSE, lines = TRUE))
>
> The auto.key = argument is used to produce a simple legend. By default, it
> is listed on top of the plot; if, for example, you wanted it on the right
> instead, you could add space = 'right' to the auto.key list.
>
> There are **many** options in xyplot() and other Lattice graphics
> functions, so you have the capability of fine tuning a graph to meet your
> specifications.
>
> HTH,
> Dennis
>
>
> On Tue, Jun 29, 2010 at 5:42 PM, Pablo Cerdeira 
> wrote:
>
>> Hi All,
>>
>> this is my first mail here.
>>
>> I'm trying to plot a multiline chart grouping values with no success. I
>> have
>> read a lot in the official Wiki and also searched via Google, but I did
>> not
>> find anything.
>>
>> I'm importing some data from a cvs file. Here is a sample:
>>
>> YEAR,AREA,CASES
>> 1988,CONTRACTS,286
>> 1988,INTERNATIONAL,189
>> 1988,FAMILY,385
>> 1988,TAXATION,177
>> 1989,CONTRACTS,233
>> 1989,INTERNATIONAL,431
>> 1989,FAMILY,425
>> 1989,TAXATION,201
>> 1990,CONTRACTS,190
>> 1990,INTERNATIONAL,302
>> 1990,FAMILY,303
>> 1990,TAXATION,209
>> ...
>>
>> "t <- read.csv("file.csv", header=TRUE)"
>>
>> So far so good...
>>
>> But the problem is: I'd like to create a multiline plot, one line per
>> AREA,
>> showing the evolution of the number of CASES per YEAR.
>>
>> I know how to do it in Excel, using a Pivot Table. But I'm trying hard to
>> do
>> the same with R but I have no idea on how to do it.
>>
>> Can someone help me?
>>
>> Thanks in advanced
>>
>> --
>> Pablo de Camargo Cerdeira
>> pablo.cerde...@gmail.com
>> +55 (21) 3799-6065
>>
>>[[alternative HTML version deleted]]
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>


-- 
Pablo de Camargo Cerdeira
pablo.cerde...@gmail.com
+55 (21) 3799-6065

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to delete rows based on replicate values in one column with some extra calcuation

2010-06-29 Thread Nikhil Kaza
You can do this in reshape package as mentioned earlier.

However, if you need a solution with aggregate here it is

a <- with(data, aggregate(cbind(v1,v2), by=list(x,y,z),sum))
names (a) <- c("x","y","z","v1","v2")



Nikhil Kaza
Asst. Professor,
City and Regional Planning
University of North Carolina

nikhil.l...@gmail.com

On Jun 29, 2010, at 7:56 PM, Yi wrote:

> Great help. It works when the first and the second columns are  
> ordered the same way. But aggregate does not work for the following  
> case:
>  z=c('ab','ah','bc','ah','dv')
> x=substr(z,start=1,stop=1)
> y=substr(z,start=2,stop=2)
> v1=5:9
> v2=7:11
> data=data.frame(x,y,z,v1,v2)
> > data
>   x y  z v1 v2
> 1 a b ab  5  7
> 2 a h ah  6  8
> 3 b c bc  7  9
> 4 a h ah  8 10
> 5 d v dv  9 11
>
> ##I want to do the aggregate WRT z and sum up v1 and v2. The  
> expected output is:
>
>x y  z v1 v2
> 1 a b ab  5  7
> 2 a h ah 14 18
> 3 b c bc  7  9
> 4 d v dv  9 11
>
> ### I do this almost manually.  As you see here:
>
> newdata=aggregate(data$v1,by=list(data$z),sum)
> newdata2=aggregate(data$v2,by=list(data$z),sum)
> x=substr(newdata$Group.1,start=1,stop=1)
> y=substr(newdata$Group.1,start=2,stop=2)
> data.frame(x,y,newdata$Group.1,newdata$x,newdata2$x)
> new=data.frame(x,y,newdata$Group.1,newdata$x,newdata2$x)
> names(new)=c('x','y','z','v1','v2')
> new
>
> Because I do not think 'aggregate' can not set z as a list and at  
> the same time keep x and y for z.
>
> Any tips? I mean my way is too 'silly'.
>
> Thanks all in advance!
>
> Yi
>
> On Mon, Jun 28, 2010 at 7:58 PM, Nikhil Kaza   
> wrote:
>
> aggregate(data$third, by=list(data$first), sum)
>
> or
>
> reqiure(reshape)
> cast(melt(data), ~first, sum)
>
>
>
> On Jun 28, 2010, at 9:30 PM, Yi wrote:
>
>
> first=c('u','b','e','k','j','c','u','f','c','e')
> second
> =
> c
> ('usa
> ','Brazil
> ','England','Korea','Japan','China','usa','France','China','England')
> third=1:10
> data=data.frame(first,second,third)
>
>


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Multiline and grouping in R

2010-06-29 Thread Dennis Murphy
Hi:

This is to add a couple of bells and whistles to the excellent replies of
David and Bill. You may not want the 0.5 increments in the x-axis ticks, and
if you want all groups in a single plot, then you may want a legend. Below
are some basic ways to specify these (to head off the obvious follow-up
questions :)

# (1) Individual panels per AREA:
xyplot(CASES ~ YEAR|AREA, data=df, type="b",
  scales = list(x = list(at = c(1988, 1989, 1990

The scales = argument provides control over various aspects of the axes,
such as labels and tick marks. In this case, we want to limit the x-axis
ticks to occur at 1988, 1989 and 1990 only.

# (2) Single graph with multiple AREA profiles over time:
xyplot(CASES ~ YEAR, data = df, type = "b", groups = AREA,
  scales = list(x = list(at = c(1988, 1989, 1990))),
  auto.key = list(points = FALSE, lines = TRUE))

The auto.key = argument is used to produce a simple legend. By default, it
is listed on top of the plot; if, for example, you wanted it on the right
instead, you could add space = 'right' to the auto.key list.

There are **many** options in xyplot() and other Lattice graphics functions,
so you have the capability of fine tuning a graph to meet your
specifications.

HTH,
Dennis


On Tue, Jun 29, 2010 at 5:42 PM, Pablo Cerdeira wrote:

> Hi All,
>
> this is my first mail here.
>
> I'm trying to plot a multiline chart grouping values with no success. I
> have
> read a lot in the official Wiki and also searched via Google, but I did not
> find anything.
>
> I'm importing some data from a cvs file. Here is a sample:
>
> YEAR,AREA,CASES
> 1988,CONTRACTS,286
> 1988,INTERNATIONAL,189
> 1988,FAMILY,385
> 1988,TAXATION,177
> 1989,CONTRACTS,233
> 1989,INTERNATIONAL,431
> 1989,FAMILY,425
> 1989,TAXATION,201
> 1990,CONTRACTS,190
> 1990,INTERNATIONAL,302
> 1990,FAMILY,303
> 1990,TAXATION,209
> ...
>
> "t <- read.csv("file.csv", header=TRUE)"
>
> So far so good...
>
> But the problem is: I'd like to create a multiline plot, one line per AREA,
> showing the evolution of the number of CASES per YEAR.
>
> I know how to do it in Excel, using a Pivot Table. But I'm trying hard to
> do
> the same with R but I have no idea on how to do it.
>
> Can someone help me?
>
> Thanks in advanced
>
> --
> Pablo de Camargo Cerdeira
> pablo.cerde...@gmail.com
> +55 (21) 3799-6065
>
>[[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] seq.dates in reverse?

2010-06-29 Thread Jonathan Greenberg
Pardon the barrage of time series related questions, but another issue
I'm trying to solve is how to determine a sequence of dates a la
seq.dates() except going BACKWARDS in time, e.g. if seq.dates()
allowed for the "to" variables to be set alone, rather than the from=.
 Ultimately, I'd like to have a set of dates preceding a given date in
predefined intervals (the same ones seq.dates() uses would be fine).
Thoughts?  Would there be an easy way to "reverse engineer" a starting
date given the "by=" variable and the "length="?  With the exception
of using "by=days", I'm a bit unfamiliar with how to easily determine
what date was, say, 4 months ago without doing a lot of string hacking
(seq.dates() conveniently keeps the days of the month constant when
generate date sequences, which is what I'd like).

--j

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to delete the replicate rows by summing up the numeric columns

2010-06-29 Thread Yi
Thank you very much for response. Finally I took David's way. Others' work
well for this specific case. But I find problem is there are more than one
column of character variables.

z=c('ab','ah','bc','ah','dv')
x=substr(z,start=1,stop=1)
y=substr(z,start=2,stop=2)
v1=5:9
v2=7:11
data=data.frame(x,y,z,v1,v2)

### I want to sum up v1 and v2 wrt z only, and delete duplicat rows. I do
not care x and y here, just keep them only.

data$summed <- ave(data$v1, data$z, FUN=sum)
data[!duplicated(data$z),c('z','x','y','summed')]
### I tried to use 'melt' or 'cast' to solve the problem (as Nikhil and
Dennis sugguested). But it seems since x and y are also charactor variables,
it just does not work here.

Basically, my problem is perfectly answered.  But if you want to comment on
this example, please do it.  I really appreciate it.

Let's say, if you think we can use cast or aggregate functions to delete
duplicate rows by summing up the numerica columns where there are several
columns are charactor variables. I feel no way to deal with two types at the
same time.
Thank you.

On Tue, Jun 29, 2010 at 6:04 PM, Dennis Murphy  wrote:

> Hi:
>
> If you can deal with alphabetic order, the following seems to work:
>
> v <- aggregate(third ~ first, data = data, FUN = sum)
> v$second <- levels(data$second)
> v[, c(1, 3, 2)]
>   first  second third
> 1 b  Brazil 2
> 2 c   China15
> 3 e England13
> 4 f  France 8
> 5 j   Japan 5
> 6 k   Korea 4
> 7 u usa 8
>
> v$second works in this case because the levels are ordered and all are used
> when inserted in v. That's not a guarantee in more complicated problems and
> frankly, this one is a kludge.
>
> A plyr version would be
>
> v <- ddply(data, .(first), summarise, third = sum(third), second = second)
> v[!duplicated(v$first), c(1, 3, 2)]
>   first   second third
> 1 b  Brazil 2
> 2 c   China15
> 4 e England13
> 6 f  France 8
> 7 j   Japan 5
> 8 k   Korea 4
> 9 u usa 8
>
> The advantage of ddply over aggregate in this case is that ddply allows one
> to insert second as an 'identity' of sorts; however, the result contains
> duplicate rows, so we need to remove them in the second statement.
>
> Using melt and cast from the reshape package,
> mm <- melt(data, id = c('first', 'second'))
> (ms <- cast(mm, first + second ~ . , sum))
>   first  second (all)
> 1 b  Brazil 2
> 2 c   China15
> 3 e England13
> 4 f  France 8
> 5 j   Japan 5
> 6 k   Korea 4
> 7 u usa 8
>
> names(ms)[3] <- 'third'
>
> This seems to be the cleanest version of the three in terms of getting both
> ID variables into the final result.
>
> HTH,
> Dennis
>
>  On Tue, Jun 29, 2010 at 12:05 PM, Yi  wrote:
>
>> Hi, folks,
>>
>> I am sorry that I did not state the problem correctly yesterday.
>>
>> Please let me address the problem by the following codes:
>>
>> first=c('u','b','e','k','j','c','u','f','c','e')
>>
>> second=c('usa','Brazil','England','Korea','Japan','China','usa','France','China','England')
>> third=1:10
>> data=data.frame(first,second,third)
>>
>> ## You may understand values in the first column are the unique codes for
>> those in the second column.
>> So 'u' is only for usa. Replicate values appear the same rows for the
>> first and second columns.
>> ### Now I want to delete replicate rows with the same values in first
>> (sceond) rows
>> and sum up values in the third column for the same values.
>>
>> mm=melt(data,id='first')
>> sum=cast(mm,first~variable,sum) ### This does not work.
>>
>> ###I tried another way to do this
>> mm= melt(data, id='first',measure='third')
>> sum=cast(mm,first~variable,sum)
>>
>> ## But then the problem is how to 'merge' the result with the second
>> column
>> in the dataset.
>>
>>
>> The expected dataframe is like this:
>>
>> (I showed a wrong expected dataframe yesterday.)
>>
>> first   second  third
>> 1  u usa  8
>> 2  b   Brazil 2
>> 3  e  England   13
>> 4  k   Korea 4
>> 5  j   Japan  5
>> 6  c   China 15
>> 8  f  France 8
>>
>> Thanks in advance.
>>
>>[[alternative HTML version deleted]]
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to delete rows based on replicate values in one column with some extra calcuation

2010-06-29 Thread Yi
Great help. It works when the first and the second columns are ordered the
same way. But aggregate does not work for the following case:
 z=c('ab','ah','bc','ah','dv')
x=substr(z,start=1,stop=1)
y=substr(z,start=2,stop=2)
v1=5:9
v2=7:11
data=data.frame(x,y,z,v1,v2)
> data
  x y  z v1 v2
1 a b ab  5  7
2 a h ah  6  8
3 b c bc  7  9
4 a h ah  8 10
5 d v dv  9 11

##I want to do the aggregate WRT z and sum up v1 and v2. The expected output
is:

   x y  z v1 v2
1 a b ab  5  7
2 a h ah 14 18
3 b c bc  7  9
4 d v dv  9 11
### I do this almost manually.  As you see here:

newdata=aggregate(data$v1,by=list(data$z),sum)
newdata2=aggregate(data$v2,by=list(data$z),sum)
x=substr(newdata$Group.1,start=1,stop=1)
y=substr(newdata$Group.1,start=2,stop=2)
data.frame(x,y,newdata$Group.1,newdata$x,newdata2$x)
new=data.frame(x,y,newdata$Group.1,newdata$x,newdata2$x)
names(new)=c('x','y','z','v1','v2')
new

Because I do not think 'aggregate' can not set z as a list and at the same
time keep x and y for z.

Any tips? I mean my way is too 'silly'.

Thanks all in advance!

Yi

On Mon, Jun 28, 2010 at 7:58 PM, Nikhil Kaza  wrote:

>
> aggregate(data$third, by=list(data$first), sum)
>
> or
>
> reqiure(reshape)
> cast(melt(data), ~first, sum)
>
>
>
> On Jun 28, 2010, at 9:30 PM, Yi wrote:
>
>
>> first=c('u','b','e','k','j','c','u','f','c','e')
>> second
>> =
>> c
>> ('usa
>> ','Brazil
>> ','England','Korea','Japan','China','usa','France','China','England')
>> third=1:10
>> data=data.frame(first,second,third)
>>
>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Multiline and grouping in R

2010-06-29 Thread Joshua Wiley
Hello,

There are many ways to do what you want in R.  I am showing one way
using one of my favorite graphics packages, ggplot2.  If you do not
have it installed yet, uncomment the first line.

#install.packages("ggplot2")
library(ggplot2) #load the package

#Read in data
samp.dat <- structure(list(YEAR = c(1988L, 1988L, 1988L, 1988L, 1989L,
1989L, 1989L, 1989L, 1990L, 1990L, 1990L, 1990L), AREA =
structure(c(1L, 3L, 2L, 4L, 1L, 3L, 2L, 4L, 1L, 3L, 2L, 4L), .Label =
c("CONTRACTS", "FAMILY", "INTERNATIONAL", "TAXATION"), class =
"factor"), CASES = c(286L, 189L, 385L, 177L, 233L, 431L, 425L, 201L,
190L, 302L, 303L, 209L)), .Names = c("YEAR", "AREA", "CASES"), class =
"data.frame", row.names = c(NA, -12L))

ggplot(data=samp.dat, aes(x=YEAR, y=CASES, group=AREA, colour=AREA)) +
geom_line()



HTH,

Josh



On Tue, Jun 29, 2010 at 5:42 PM, Pablo Cerdeira
 wrote:
> Hi All,
>
> this is my first mail here.
>
> I'm trying to plot a multiline chart grouping values with no success. I have
> read a lot in the official Wiki and also searched via Google, but I did not
> find anything.
>
> I'm importing some data from a cvs file. Here is a sample:
>
> YEAR,AREA,CASES
> 1988,CONTRACTS,286
> 1988,INTERNATIONAL,189
> 1988,FAMILY,385
> 1988,TAXATION,177
> 1989,CONTRACTS,233
> 1989,INTERNATIONAL,431
> 1989,FAMILY,425
> 1989,TAXATION,201
> 1990,CONTRACTS,190
> 1990,INTERNATIONAL,302
> 1990,FAMILY,303
> 1990,TAXATION,209
> ...
>
> "t <- read.csv("file.csv", header=TRUE)"
>
> So far so good...
>
> But the problem is: I'd like to create a multiline plot, one line per AREA,
> showing the evolution of the number of CASES per YEAR.
>
> I know how to do it in Excel, using a Pivot Table. But I'm trying hard to do
> the same with R but I have no idea on how to do it.
>
> Can someone help me?
>
> Thanks in advanced
>
> --
> Pablo de Camargo Cerdeira
> pablo.cerde...@gmail.com
> +55 (21) 3799-6065
>
>        [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles
http://www.joshuawiley.com/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to delete the replicate rows by summing up the numeric columns

2010-06-29 Thread Nikhil Kaza

require(reshape)
cast(data, first+second~ ., sum)


Nikhil Kaza
Asst. Professor,
City and Regional Planning
University of North Carolina

nikhil.l...@gmail.com

On Jun 29, 2010, at 3:05 PM, Yi wrote:


first=c('u','b','e','k','j','c','u','f','c','e')
second
=
c
('usa
','Brazil
','England','Korea','Japan','China','usa','France','China','England')
third=1:10
data=data.frame(first,second,third)


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Need help for SVM code for microarray classification

2010-06-29 Thread Aadhithya

Hello Steve 
   Thanks for quick responses its really helping me out .Ya I made the
necessary changes you had mentioned. I was not sure of that 'type' argument
where u had told me to set it to SVM . Do you mean I have to give that
argument in this line "cl <- c(c(rep("ALL",10), rep("AML",10)));" and when I
ran the code the following output I had got : 
result: 
pred  ALL AML 
  ALL   7   5 
  AML   3   5
Does this mean that 7 samples of ALL from test file has been classified as
ALL and 5 samples of ALL are classified as AML and so on or is there any
other way we can interpret this result . I had done one more thing I had
taken the transpose of both my test and train files as given below: 
model<- svm(t(train),cl); 
pred <- predict(model,t(test)); 
And the result I had got is : 
Result: 
pred   ALL     AML 
  ALL   10       0 
  AML   0       10 

why is there a difference in the result which I had given in the before
post?does this mean doing transpose classifies the samples better? or is
there any reason for this? 
I am sorry I am troubling you a lot  but seriously its a very timely help I
am really thankful to you. 

-Aadhithya
-- 
View this message in context: 
http://r.789695.n4.nabble.com/Need-help-for-SVM-code-for-microarray-classification-tp2271652p2272658.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to delete the replicate rows by summing up the numeric columns

2010-06-29 Thread David Winsemius


On Jun 29, 2010, at 3:05 PM, Yi wrote:


Hi, folks,

I am sorry that I did not state the problem correctly yesterday.

Please let me address the problem by the following codes:

first=c('u','b','e','k','j','c','u','f','c','e')
second
=
c
('usa
','Brazil
','England','Korea','Japan','China','usa','France','China','England')
third=1:10
data=data.frame(first,second,third)

## You may understand values in the first column are the unique  
codes for

those in the second column.
So 'u' is only for usa. Replicate values appear the same rows  
for the

first and second columns.
### Now I want to delete replicate rows with the same values in first
(sceond) rows
and sum up values in the third column for the same values.

mm=melt(data,id='first')
sum=cast(mm,first~variable,sum) ### This does not work.

###I tried another way to do this
mm= melt(data, id='first',measure='third')
sum=cast(mm,first~variable,sum)

## But then the problem is how to 'merge' the result with the second  
column

in the dataset.


> data$summed <- ave(data$third, data$first, FUN=sum)
#computed sums within groups defined by "first"
> data[!duplicated(data$first), c("first", "second", "summed")]
#remove duplicates and leave out "third"

  first  second summed
1 u usa  8
2 b  Brazil  2
3 e England 13
4 k   Korea  4
5 j   Japan  5
6 c   China 15
8 f  France  8




The expected dataframe is like this:

(I showed a wrong expected dataframe yesterday.)

first   second  third
1  u usa  8
2  b   Brazil 2
3  e  England   13
4  k   Korea 4
5  j   Japan  5
6  c   China 15
8  f  France 8

Thanks in advance.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Stacking several vectors from the list

2010-06-29 Thread Arsenio Starodoumov
Monday, June 28, 2010, 4:40:11 PM, you wrote:

> On Mon, Jun 28, 2010 at 7:30 PM,   wrote:
>> Hi everybody,
>>
>> I'm working on the very
>> messy data, I have tried to clean it up in SAS and
>> SAS/IML but there is not enough info on how to handle certain things
>> in SAS so I have turned to R. The thing itself should be rather
>> simple, so i was wondering if someone could help me out.
>>
>> The original .csv has ([1] 7138 6338 ) dimensions with funds with the 
>> corresponding dates and observations for each date for around 10 years and 
>> 4000+ funds, meaning in COL5 has the next fund's name and so on.
>>
>> COL1                  COL2               COL3           COL4
>> HBNNF US Equity Date            EQY_SH_OUT      PX_VOLUME
>>                        #NAME?         #N/A N/A   135000
>>                        7/7/2008        #N/A N/A          105000
>>                        7/17/2008       #N/A N/A          59
>>                        7/22/2008       #N/A N/A          4
>>
>>
>> so in R this .csv is somehow read as list (using typeof) and not as 
>> dataframe, and a lot of stuff like regexpr searches in the

> The typeof of a data.frame is "list" so you do have a data frame --
> not a list.  Perhaps the problem is that you do not want factor
> columns but want character columns instead.  Use read.csv(..., as.is =
> TRUE)

Thanks!! This " as.is" trick solved the list issue and the whole
indexing problem. Now the table is a true dataframe searchable and
indexable. I'm still reading on those differences between in "list"
and "dataframe" types.

Arsenio

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] using zoo() to coerce time series to a different reference frame

2010-06-29 Thread Jonathan Greenberg
Folks:

I have two sets of dates, and one set of data:

***

require("chron")
require("zoo")
reference_dates=seq.dates("01/01/92", "12/31/92", by = "months")
data_dates=seq.dates("01/15/91", "12/15/93", by = "months")
data=1:length(data_dates)

reference_zoo=zoo(order.by=reference_dates)
data_zoo=zoo(data,data_dates)

***

What I would like is to have a zoo object that uses the index from
reference_dates, but grabs the data for each of the dates (using a
spline interpolation) from data_zoo object.  I feel like my solution
is a bit slow, can someone let me know if there is a quicker way to do
this?  Thanks:

***

reference_data_zoo_merge=merge(reference_zoo,data_zoo)
reference_data_zoo_data=na.spline(reference_data_zoo_merge)
reference_data_zoo_data=merge(reference_zoo,reference_data_zoo_data,all=FALSE)

***

--j

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to delete the replicate rows by summing up the numeric columns

2010-06-29 Thread Dennis Murphy
Hi:

If you can deal with alphabetic order, the following seems to work:

v <- aggregate(third ~ first, data = data, FUN = sum)
v$second <- levels(data$second)
v[, c(1, 3, 2)]
  first  second third
1 b  Brazil 2
2 c   China15
3 e England13
4 f  France 8
5 j   Japan 5
6 k   Korea 4
7 u usa 8

v$second works in this case because the levels are ordered and all are used
when inserted in v. That's not a guarantee in more complicated problems and
frankly, this one is a kludge.

A plyr version would be

v <- ddply(data, .(first), summarise, third = sum(third), second = second)
v[!duplicated(v$first), c(1, 3, 2)]
  first   second third
1 b  Brazil 2
2 c   China15
4 e England13
6 f  France 8
7 j   Japan 5
8 k   Korea 4
9 u usa 8

The advantage of ddply over aggregate in this case is that ddply allows one
to insert second as an 'identity' of sorts; however, the result contains
duplicate rows, so we need to remove them in the second statement.

Using melt and cast from the reshape package,
mm <- melt(data, id = c('first', 'second'))
(ms <- cast(mm, first + second ~ . , sum))
  first  second (all)
1 b  Brazil 2
2 c   China15
3 e England13
4 f  France 8
5 j   Japan 5
6 k   Korea 4
7 u usa 8

names(ms)[3] <- 'third'

This seems to be the cleanest version of the three in terms of getting both
ID variables into the final result.

HTH,
Dennis

On Tue, Jun 29, 2010 at 12:05 PM, Yi  wrote:

> Hi, folks,
>
> I am sorry that I did not state the problem correctly yesterday.
>
> Please let me address the problem by the following codes:
>
> first=c('u','b','e','k','j','c','u','f','c','e')
>
> second=c('usa','Brazil','England','Korea','Japan','China','usa','France','China','England')
> third=1:10
> data=data.frame(first,second,third)
>
> ## You may understand values in the first column are the unique codes for
> those in the second column.
> So 'u' is only for usa. Replicate values appear the same rows for the
> first and second columns.
> ### Now I want to delete replicate rows with the same values in first
> (sceond) rows
> and sum up values in the third column for the same values.
>
> mm=melt(data,id='first')
> sum=cast(mm,first~variable,sum) ### This does not work.
>
> ###I tried another way to do this
> mm= melt(data, id='first',measure='third')
> sum=cast(mm,first~variable,sum)
>
> ## But then the problem is how to 'merge' the result with the second column
> in the dataset.
>
>
> The expected dataframe is like this:
>
> (I showed a wrong expected dataframe yesterday.)
>
> first   second  third
> 1  u usa  8
> 2  b   Brazil 2
> 3  e  England   13
> 4  k   Korea 4
> 5  j   Japan  5
> 6  c   China 15
> 8  f  France 8
>
> Thanks in advance.
>
>[[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Need help for SVM code for microarray classification

2010-06-29 Thread Steve Lianoglou
> Hello Steve
>   Thanks for quick responses its really helping me out .Ya I made the
> necessary changes you had mentioned. I was not sure of that 'type' argument
> where u had told me to set it to SVM . Do you mean I have to give that
> argument in this line "cl <- c(c(rep("ALL",10), rep("AML",10)));"

No, look at the help for svm: ?svm

There is an argument in the `svm` function called "type" where you
tell the svm *what* it is that you are trying to do. Among other
things, SVMs can do classification and regression, and they can do so
in slightly different ways (like C-classification vs.
nu-classification).

If you don't specify what you want to do, the `svm` function takes a
guess. From the error output you gave from your last email, you see
that the SVM guessed incorrectly, eg:

> Error in svm.default(train, cl) :
> Need numeric dependent variable for regression.

You see that the SVM guessed regression, when you expected it to do
something else.

Had you specified that you wanted to do some type of classification, I
suspect you would have gotten a more informative error.

(and from your second email)

> I had done one more thing I had taken the transpose of both my test and
> train files as given below:
> model<- svm(t(train),cl);
> pred <- predict(model,t(test));

> why is there a difference in the result which I had given in the before
> post?does this mean doing transpose classifies the samples better? or is

I already mentioned in my previous email that the `svm` function
expects the rows of the data that you supply it to represent different
*observations* your dataset (in your case, the different "people" in
your sample). The columns represent the *features* (or dimensions) of
the data (in your case, the expression level of different genes for
each person (I assume).

Given that explanation, the reason why transposing your data (or not)
has a profound result on your result should be clear.

Hope that helps.

-- 
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Multiline and grouping in R

2010-06-29 Thread Bill.Venables
> require(lattice)
Loading required package: lattice
> xyplot(CASES ~ YEAR | AREA, data = t, type = "b")
> xyplot(CASES ~ YEAR, data = t, type = "b", groups = AREA)
  

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Pablo Cerdeira
Sent: Wednesday, 30 June 2010 10:42 AM
To: r-help@r-project.org
Subject: [R] Multiline and grouping in R

Hi All,

this is my first mail here.

I'm trying to plot a multiline chart grouping values with no success. I have
read a lot in the official Wiki and also searched via Google, but I did not
find anything.

I'm importing some data from a cvs file. Here is a sample:

YEAR,AREA,CASES
1988,CONTRACTS,286
1988,INTERNATIONAL,189
1988,FAMILY,385
1988,TAXATION,177
1989,CONTRACTS,233
1989,INTERNATIONAL,431
1989,FAMILY,425
1989,TAXATION,201
1990,CONTRACTS,190
1990,INTERNATIONAL,302
1990,FAMILY,303
1990,TAXATION,209
...

"t <- read.csv("file.csv", header=TRUE)"

So far so good...

But the problem is: I'd like to create a multiline plot, one line per AREA,
showing the evolution of the number of CASES per YEAR.

I know how to do it in Excel, using a Pivot Table. But I'm trying hard to do
the same with R but I have no idea on how to do it.

Can someone help me?

Thanks in advanced

-- 
Pablo de Camargo Cerdeira
pablo.cerde...@gmail.com
+55 (21) 3799-6065

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Multiline and grouping in R

2010-06-29 Thread Erik Iverson

Pablo Cerdeira wrote:

Hi All,

this is my first mail here.

I'm trying to plot a multiline chart grouping values with no success. I have
read a lot in the official Wiki and also searched via Google, but I did not
find anything.




I know how to do it in Excel, using a Pivot Table. But I'm trying hard to do
the same with R but I have no idea on how to do it.


Can you link to an example of the type of graphical output you'd expect?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Multiline and grouping in R

2010-06-29 Thread David Winsemius


On Jun 29, 2010, at 8:42 PM, Pablo Cerdeira wrote:


Hi All,

this is my first mail here.

I'm trying to plot a multiline chart grouping values with no  
success. I have
read a lot in the official Wiki and also searched via Google, but I  
did not

find anything.

I'm importing some data from a cvs file. Here is a sample:

YEAR,AREA,CASES
1988,CONTRACTS,286
1988,INTERNATIONAL,189
1988,FAMILY,385
1988,TAXATION,177
1989,CONTRACTS,233
1989,INTERNATIONAL,431
1989,FAMILY,425
1989,TAXATION,201
1990,CONTRACTS,190
1990,INTERNATIONAL,302
1990,FAMILY,303
1990,TAXATION,209
...

"t <- read.csv("file.csv", header=TRUE)"

So far so good...

But the problem is: I'd like to create a multiline plot, one line  
per AREA,

showing the evolution of the number of CASES per YEAR.

I know how to do it in Excel, using a Pivot Table. But I'm trying  
hard to do

the same with R but I have no idea on how to do it.

Can someone help me?


> txt <-textConnection("YEAR,AREA,CASES
+ 1988,CONTRACTS,286
+ 1988,INTERNATIONAL,189
+ 1988,FAMILY,385
+ 1988,TAXATION,177
+ 1989,CONTRACTS,233
+ 1989,INTERNATIONAL,431
+ 1989,FAMILY,425
+ 1989,TAXATION,201
+ 1990,CONTRACTS,190
+ 1990,INTERNATIONAL,302
+ 1990,FAMILY,303
+ 1990,TAXATION,209")
> t <- read.csv(txt, header=TRUE)
> t
   YEAR  AREA CASES
1  1988 CONTRACTS   286
2  1988 INTERNATIONAL   189
3  1988FAMILY   385
4  1988  TAXATION   177
5  1989 CONTRACTS   233
6  1989 INTERNATIONAL   431
7  1989FAMILY   425
8  1989  TAXATION   201
9  1990 CONTRACTS   190
10 1990 INTERNATIONAL   302
11 1990FAMILY   303
12 1990  TAXATION   209
> require(lattice)
> help(package=lattice)
> xyplot(CASES ~ YEAR|AREA, data=t)
> ?xyplot
starting httpd help server ... done
> xyplot(CASES ~ YEAR|AREA, data=t, type="l")
> xyplot(CASES ~ YEAR, group=AREA, data=t, type="l")

--

David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Multiline and grouping in R

2010-06-29 Thread Pablo Cerdeira
Hi All,

this is my first mail here.

I'm trying to plot a multiline chart grouping values with no success. I have
read a lot in the official Wiki and also searched via Google, but I did not
find anything.

I'm importing some data from a cvs file. Here is a sample:

YEAR,AREA,CASES
1988,CONTRACTS,286
1988,INTERNATIONAL,189
1988,FAMILY,385
1988,TAXATION,177
1989,CONTRACTS,233
1989,INTERNATIONAL,431
1989,FAMILY,425
1989,TAXATION,201
1990,CONTRACTS,190
1990,INTERNATIONAL,302
1990,FAMILY,303
1990,TAXATION,209
...

"t <- read.csv("file.csv", header=TRUE)"

So far so good...

But the problem is: I'd like to create a multiline plot, one line per AREA,
showing the evolution of the number of CASES per YEAR.

I know how to do it in Excel, using a Pivot Table. But I'm trying hard to do
the same with R but I have no idea on how to do it.

Can someone help me?

Thanks in advanced

-- 
Pablo de Camargo Cerdeira
pablo.cerde...@gmail.com
+55 (21) 3799-6065

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] RPostgreSQL - Unable to locate required modules/DLLs on WinXP/7

2010-06-29 Thread João Gonçalves

Thank you for the fast replies.
I've set the PATH env var to include the PostgreSQL bin diectory and 
it's working fine.


On 29-06-2010 23:44, Joe Conway wrote:

On 06/29/2010 03:35 PM, João Gonçalves wrote:
   

Error: package 'RPostgreSQL' could not be loaded
 
   

exists which makes RPostgreSQL loading to fail. The message appears for
any of the following DLLs (that actually exist on
X:/PostgreSQL_installation_directory/bin):
 
   

To "solve" this problem the actual DLLs from the PostgreSQL installation
directory must be copied into the X:/WINDOWS/System32 shared libraries
folder in order to make the package operational.
Is there a way to solve this from the package internals without having
to copy the DLLs? Am I missing something here?
 

If I am not mistaken, you need to be sure that the directory containing
the DLLs in in your PATH.

HTH,

Joe

   


On 29-06-2010 23:45, Dirk Eddelbuettel wrote:

On Tue, Jun 29, 2010 at 11:35:44PM +0100, João Gonçalves wrote:
   

Dear list users,

The problem occurs when library(RPostgreSQL) is issued on R. This issue
has previously appeared on R mailing list without any robust solution.
The error message issued by R:

Loading required package: RPostgreSQL
Loading required package: DBI
Error in inDL(x, as.logical(local), as.logical(now), ...) :
   unable to load shared library
'C:/PROGRA~1/R/R-210~1.1/library/RPostgreSQL/libs/RPostgreSQL.dll':
   LoadLibrary failure:  Unable to locate the specified module.
Error: package 'RPostgreSQL' could not be loaded

At the same time an error box appears saying that a given DLL does not
exists which makes RPostgreSQL loading to fail. The message appears for
any of the following DLLs (that actually exist on
X:/PostgreSQL_installation_directory/bin):

libpq.dll
ssleay32.dll
libeay32.dll
libintl-8.dll
libiconv-2.dll
krb5_32.dll
comerr32.dll
k5sprt32.dll
msvcr71.dll
gssapi32.dll

To "solve" this problem the actual DLLs from the PostgreSQL installation
directory must be copied into the X:/WINDOWS/System32 shared libraries
folder in order to make the package operational.
Is there a way to solve this from the package internals without having
to copy the DLLs? Am I missing something here?
 

You are missing the correct use of the PATH environment variable.

   

I'm using R-2.11.1 on WinXP/7.
 




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] More than two font in a plot

2010-06-29 Thread Paul Murrell

Hi

On 6/30/2010 2:17 AM, Jinsong Zhao wrote:

Hi there,

I am a Chinese R user. I hope to display Chinese character in a plot,
and than save it in PostScript format. I have read the article titled
"Non-Standard Fonts in PostScript and PDF Graphics", especially the
section about CJK fonts. I also tried the code:


pdf("chinese.pdf", width=3, height=1)
grid.text("\u4F60\u597D", y=2/3, gp=gpar(fontfamily="CNS1"))
grid.text("is 'hello' in (Traditional) Chinese", y=1/3)
dev.off()


however, it's not valid with postscript(). It seems that postscript()
need to set family in postscirpt(..., family = "CNS1"). Then all the
characters are in CJK font, and it's not what I hope to get. I hope the
Latin character is displayed in Helvetica.

Any suggestions? Thanks in advance!


Try this ...

# Use "Helvetica" as default, but include "CNS1" as a font that
# will be used somewhere within the file
postscript("chinese.pdf", width=3, height=1, fonts="CNS1")
grid.text("\u4F60\u597D", y=2/3, gp=gpar(fontfamily="CNS1"))
grid.text("is 'hello' in (Traditional) Chinese", y=1/3)
dev.off()

Paul


Regards,
Jinsong


--
Dr Paul Murrell
Department of Statistics
The University of Auckland
Private Bag 92019
Auckland
New Zealand
64 9 3737599 x85392
p...@stat.auckland.ac.nz
http://www.stat.auckland.ac.nz/~paul/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] process of stepwise selection

2010-06-29 Thread elaine kuo
Dear list,

I wanna select the significant variables relative to bird distribution,
using stepwise method.
However, the result is always the best-fit model.

Please kindly suggest if it is possible to show the selection process.
Thank you

Elaine

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] RPostgreSQL - Unable to locate required modules/DLLs on WinXP/7

2010-06-29 Thread Dirk Eddelbuettel
On Tue, Jun 29, 2010 at 11:35:44PM +0100, João Gonçalves wrote:
> Dear list users,
>
> The problem occurs when library(RPostgreSQL) is issued on R. This issue  
> has previously appeared on R mailing list without any robust solution.  
> The error message issued by R:
>
> Loading required package: RPostgreSQL
> Loading required package: DBI
> Error in inDL(x, as.logical(local), as.logical(now), ...) :
>   unable to load shared library  
> 'C:/PROGRA~1/R/R-210~1.1/library/RPostgreSQL/libs/RPostgreSQL.dll':
>   LoadLibrary failure:  Unable to locate the specified module.
> Error: package 'RPostgreSQL' could not be loaded
>
> At the same time an error box appears saying that a given DLL does not  
> exists which makes RPostgreSQL loading to fail. The message appears for  
> any of the following DLLs (that actually exist on  
> X:/PostgreSQL_installation_directory/bin):
>
> libpq.dll
> ssleay32.dll
> libeay32.dll
> libintl-8.dll
> libiconv-2.dll
> krb5_32.dll
> comerr32.dll
> k5sprt32.dll
> msvcr71.dll
> gssapi32.dll
>
> To "solve" this problem the actual DLLs from the PostgreSQL installation  
> directory must be copied into the X:/WINDOWS/System32 shared libraries  
> folder in order to make the package operational.
> Is there a way to solve this from the package internals without having  
> to copy the DLLs? Am I missing something here?

You are missing the correct use of the PATH environment variable.

> I'm using R-2.11.1 on WinXP/7.
>
> Best regards,
> João Gonçalves.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Three out of two people have difficulties with fractions.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] RPostgreSQL - Unable to locate required modules/DLLs on WinXP/7

2010-06-29 Thread Joe Conway
On 06/29/2010 03:35 PM, João Gonçalves wrote:
> Error: package 'RPostgreSQL' could not be loaded

> exists which makes RPostgreSQL loading to fail. The message appears for
> any of the following DLLs (that actually exist on
> X:/PostgreSQL_installation_directory/bin):

> To "solve" this problem the actual DLLs from the PostgreSQL installation
> directory must be copied into the X:/WINDOWS/System32 shared libraries
> folder in order to make the package operational.
> Is there a way to solve this from the package internals without having
> to copy the DLLs? Am I missing something here?

If I am not mistaken, you need to be sure that the directory containing
the DLLs in in your PATH.

HTH,

Joe



signature.asc
Description: OpenPGP digital signature
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] RPostgreSQL - Unable to locate required modules/DLLs on WinXP/7

2010-06-29 Thread João Gonçalves

Dear list users,

The problem occurs when library(RPostgreSQL) is issued on R. This issue 
has previously appeared on R mailing list without any robust solution. 
The error message issued by R:


Loading required package: RPostgreSQL
Loading required package: DBI
Error in inDL(x, as.logical(local), as.logical(now), ...) :
  unable to load shared library 
'C:/PROGRA~1/R/R-210~1.1/library/RPostgreSQL/libs/RPostgreSQL.dll':

  LoadLibrary failure:  Unable to locate the specified module.
Error: package 'RPostgreSQL' could not be loaded

At the same time an error box appears saying that a given DLL does not 
exists which makes RPostgreSQL loading to fail. The message appears for 
any of the following DLLs (that actually exist on 
X:/PostgreSQL_installation_directory/bin):


libpq.dll
ssleay32.dll
libeay32.dll
libintl-8.dll
libiconv-2.dll
krb5_32.dll
comerr32.dll
k5sprt32.dll
msvcr71.dll
gssapi32.dll

To "solve" this problem the actual DLLs from the PostgreSQL installation 
directory must be copied into the X:/WINDOWS/System32 shared libraries 
folder in order to make the package operational.
Is there a way to solve this from the package internals without having 
to copy the DLLs? Am I missing something here?


I'm using R-2.11.1 on WinXP/7.

Best regards,
João Gonçalves.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] merging/intersecting 2 data frames

2010-06-29 Thread Greg Snow
Use the merge function, look at the by.x and by.y arguments, also look at the 
all.x and all.y arguments as well as the suffixes argument.  You may need to 
delete some columns after the merge (or replace missing values in one column 
with those in the same location from the next column, see the ifelse function). 
 So it may take a couple steps, but that is probably the most straight forward.

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.org
801.408.8111


> -Original Message-
> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-
> project.org] On Behalf Of Erin Hodgess
> Sent: Tuesday, June 29, 2010 1:22 PM
> To: R help
> Subject: [R] merging/intersecting 2 data frames
> 
> Dear R People:
> 
> I have two data frames, a.df and b.df as seen here:
> 
> > a.df[1:10,]
> DATE GENDER PATIENT_ID AGE SYNDROME
> 1  4/16/2009  F  23686  45 RASH ON BODY
> 2  4/16/2009  F  13840  35 CANT URINATE
> 3  4/16/2009  M  12895  30   BLURRED VISION
> 4  4/16/2009  M  18375  33   UNABLE TO VOID
> 5  4/16/2009  M   2237  44 SOB WEAKNESS
> 6  4/16/2009  F  21484  41 TOOTH PAINTOOTH PAIN
> 7  4/16/2009  M  10783  37  RT ARM PAIN
> 8  4/16/2009  M  12610  65L FOOT INJURY
> 9  4/16/2009  F   3495  29 URINARY DIFFICULTIES
> 10 4/16/2009  F351  36   PT STS MVA
> > b.df[1:10,]
>DATE_OF_DEATHID
> 1  4/19/2009 21676
> 2  4/19/2009 13717
> 3  4/19/2009 20498
> 4  4/19/2009 14281
> 5  4/19/2009 38848
> 6  4/20/2009   331
> 7  4/20/2009  4084
> 8  4/20/2009 19616
> 9  4/20/2009 17965
> 10 4/20/2009 11863
> >
> 
> a.df will always be larger than b.df.
> 
> I want to create a third data frame that is matched on PATIENT_ID from
> a.df and ID from b.df.
> 
> If there is no match from a.df$PATIENT_ID to b.df$ID, then we omit the
> row from the new data.frame.
> 
> If there is a match, we include the DATE_OF_DEATH column from b.df.
> 
> I've tried all kinds of tricks, but nothing works exactly as I wish.
> 
> Thanks in advance,
> Sincerely,
> Erin
> 
> 
> --
> Erin Hodgess
> Associate Professor
> Department of Computer and Mathematical Sciences
> University of Houston - Downtown
> mailto: erinm.hodg...@gmail.com
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] merging/intersecting 2 data frames

2010-06-29 Thread jim holtman
use 'merge'

> a.df
DATE GENDER PATIENT_ID AGE SYNDROME
1  4/16/2009  F  23686  45 RASH ON BODY
2  4/16/2009  F  13840  35 CANT URINATE
3  4/16/2009  M  12895  30   BLURRED VISION
4  4/16/2009  M  18375  33   UNABLE TO VOID
5  4/16/2009  M   2237  44 SOB WEAKNESS
6  4/16/2009  F  21484  41 TOOTH PAINTOOTH PAIN
7  4/16/2009  M  10783  37  RT ARM PAIN
8  4/16/2009  M  12610  65L FOOT INJURY
9  4/16/2009  F   3495  29 URINARY DIFFICULTIES
10 4/16/2009  F351  36   PT STS MVA
> b.df
   DATE_OF_DEATHID
1  4/19/2009 23686
2  4/19/2009 13840
3  4/19/2009 12895
4  4/19/2009 18375
5  4/19/2009   351
6  4/20/2009  3495
7  4/20/2009  4084
8  4/20/2009 19616
9  4/20/2009 17965
10 4/20/2009 11863
> merge(a.df, b.df, by.x="PATIENT_ID", by.y="ID")
  PATIENT_ID  DATE GENDER AGE SYNDROME DATE_OF_DEATH
1351 4/16/2009  F  36   PT STS MVA 4/19/2009
2   3495 4/16/2009  F  29 URINARY DIFFICULTIES 4/20/2009
3  12895 4/16/2009  M  30   BLURRED VISION 4/19/2009
4  13840 4/16/2009  F  35 CANT URINATE 4/19/2009
5  18375 4/16/2009  M  33   UNABLE TO VOID 4/19/2009
6  23686 4/16/2009  F  45 RASH ON BODY 4/19/2009
>


On Tue, Jun 29, 2010 at 3:21 PM, Erin Hodgess  wrote:
> Dear R People:
>
> I have two data frames, a.df and b.df as seen here:
>
>> a.df[1:10,]
>        DATE GENDER PATIENT_ID AGE             SYNDROME
> 1  4/16/2009      F      23686  45         RASH ON BODY
> 2  4/16/2009      F      13840  35         CANT URINATE
> 3  4/16/2009      M      12895  30       BLURRED VISION
> 4  4/16/2009      M      18375  33       UNABLE TO VOID
> 5  4/16/2009      M       2237  44         SOB WEAKNESS
> 6  4/16/2009      F      21484  41 TOOTH PAINTOOTH PAIN
> 7  4/16/2009      M      10783  37          RT ARM PAIN
> 8  4/16/2009      M      12610  65        L FOOT INJURY
> 9  4/16/2009      F       3495  29 URINARY DIFFICULTIES
> 10 4/16/2009      F        351  36           PT STS MVA
>> b.df[1:10,]
>   DATE_OF_DEATH    ID
> 1      4/19/2009 21676
> 2      4/19/2009 13717
> 3      4/19/2009 20498
> 4      4/19/2009 14281
> 5      4/19/2009 38848
> 6      4/20/2009   331
> 7      4/20/2009  4084
> 8      4/20/2009 19616
> 9      4/20/2009 17965
> 10     4/20/2009 11863
>>
>
> a.df will always be larger than b.df.
>
> I want to create a third data frame that is matched on PATIENT_ID from
> a.df and ID from b.df.
>
> If there is no match from a.df$PATIENT_ID to b.df$ID, then we omit the
> row from the new data.frame.
>
> If there is a match, we include the DATE_OF_DEATH column from b.df.
>
> I've tried all kinds of tricks, but nothing works exactly as I wish.
>
> Thanks in advance,
> Sincerely,
> Erin
>
>
> --
> Erin Hodgess
> Associate Professor
> Department of Computer and Mathematical Sciences
> University of Houston - Downtown
> mailto: erinm.hodg...@gmail.com
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] installing multicore package

2010-06-29 Thread Patrick Connolly
On Fri, 25-Jun-2010 at 10:09AM +0530, suman dhara wrote:

|> Sir,
|> I want to apply mclapply() function for my analysis. So, I have to install
|> multicore package. But I can not install the package.
|> 
|> >install.packages("multicore")
|>  It gives that package multicore is not available.
|> 
|> Can you help me?

With minimal information, we'll have to guess what the problem is.  My
guess is that your OS is Windows for which multicore is not available
-- see under packages on CRAN.



-- 
~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.   
   ___Patrick Connolly   
 {~._.~}   Great minds discuss ideas
 _( Y )_ Average minds discuss events 
(:_~*~_:)  Small minds discuss people  
 (_)-(_)  . Eleanor Roosevelt
  
~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Need help for SVM code for microarray classification

2010-06-29 Thread Aadhithya

Hi Steve
I had done one more thing I had taken the transpose of both my test and
train files as given below:
model<- svm(t(train),cl);
pred <- predict(model,t(test));
And the result I had got is :
Result:
pred   ALL AML
  ALL   10   0
  AML   0   10

why is there a difference in the result which I had given in the before
post?does this mean doing transpose classifies the samples better? or is
there any reason for this?
Thanks a ton in advance.
-Aadhithya
-- 
View this message in context: 
http://r.789695.n4.nabble.com/Need-help-for-SVM-code-for-microarray-classification-tp2271652p2272590.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Need help for SVM code for microarray classification

2010-06-29 Thread Aadhithya

Hello Steve 
   Thanks for quick responses its really helping me out .Ya I made the
necessary changes you had mentioned. I was not sure of that 'type' argument
where u had told me to set it to SVM . Do you mean I have to give that
argument in this line "cl <- c(c(rep("ALL",10), rep("AML",10)));" and when I
ran the code the following output I had got :
result:
pred  ALL AML
  ALL   7   5
  AML   3   5
Does this mean that 7 samples of ALL from test file has been classified as
ALL and 5 samples of ALL are classified as AML and so on or is there any
other way we can interpret this result .  I am sorry I am troubling you a
lot  but seriously its a very timely help I am really thankful to you.

-Aadhithya
-- 
View this message in context: 
http://r.789695.n4.nabble.com/Need-help-for-SVM-code-for-microarray-classification-tp2271652p2272563.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] How to delete the replicate rows by summing up the numeric columns

2010-06-29 Thread Yi
Hi, folks,

I am sorry that I did not state the problem correctly yesterday.

Please let me address the problem by the following codes:

first=c('u','b','e','k','j','c','u','f','c','e')
second=c('usa','Brazil','England','Korea','Japan','China','usa','France','China','England')
third=1:10
data=data.frame(first,second,third)

## You may understand values in the first column are the unique codes for
those in the second column.
So 'u' is only for usa. Replicate values appear the same rows for the
first and second columns.
### Now I want to delete replicate rows with the same values in first
(sceond) rows
and sum up values in the third column for the same values.

mm=melt(data,id='first')
sum=cast(mm,first~variable,sum) ### This does not work.

###I tried another way to do this
mm= melt(data, id='first',measure='third')
sum=cast(mm,first~variable,sum)

## But then the problem is how to 'merge' the result with the second column
in the dataset.


The expected dataframe is like this:

(I showed a wrong expected dataframe yesterday.)

 first   second  third
1  u usa  8
2  b   Brazil 2
3  e  England   13
4  k   Korea 4
5  j   Japan  5
6  c   China 15
8  f  France 8

Thanks in advance.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] merging/intersecting 2 data frames

2010-06-29 Thread Weidong Gu
Erin,

?merge

Try 
c.df=merge(a.df,b.df,by.x="PATIENT_ID",by.y="ID")

hope it helps

Weidong

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] gsub issue in R 2.11.1, but not present in 2.9.2

2010-06-29 Thread Uwe Ligges



On 29.06.2010 19:19, Bert Gunter wrote:

Uwe:

Did you forget to add the "fixed = TRUE" parameter to your gsub call in your
reply?



Yes, thanks.

Uwe





gsub("N\\A", "NA", "N\\A")

[1] "N\\A"


gsub("N\\A","NA","N\\A",fixed=TRUE)

[1] "NA"

I only mention it because there is already sufficient confusion that the
typo may totally bewilder people.

-- Bert

Bert Gunter
Genentech Nonclinical Statistics


-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
On Behalf Of Uwe Ligges
Sent: Tuesday, June 29, 2010 4:11 AM
To: Jason Rupert
Cc: r-help@r-project.org
Subject: Re: [R] gsub issue in R 2.11.1, but not present in 2.9.2



On 29.06.2010 12:47, Jason Rupert wrote:

Previously in R 2.9.2 I used the following to convert from an improperly

formatted NA string into one that is a bit more consistent.



gsub("N\A", "NA", "N\A", fixed=TRUE)

This worked in R 2.9.2, but now in R 2.11.1 it doesn't seem to work an

throws the following error.

Error: '\A' is an unrecognized escape in character string starting "N\A"

I guess my questions are the following:
(1) Is this expected behavior?
(2) If it is expected behavior, what is the proper way to replace "N\A"

with "NA" and "N\\A" with "NA"?


If your original text "thestring" contains "N\A", then the R
representation is "N\\A", and hence

gsub("N\\A", "NA", thestring)

If you want to try explicitly, you need to write

gsub("N\\A", "NA", "N\\A")

If you original text contains two backslashes, both have to be escaped as
in

gsub("NA", "NA", thestring)

Uwe Ligges



Thank you again for all the help and insight.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-

guide.html

and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-
guide.html
and provide commented, minimal, self-contained, reproducible code.




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] transposing a data frame from horizontal to vertical (stacking)

2010-06-29 Thread Dimitri Liakhovitski
Thank you very much for a reference to "reshape".
I found that melt(MyData) - and then resorting - gives me exactly what I want!
Dimitri


On Tue, Jun 29, 2010 at 1:46 PM, Henrique Dallazuanna  wrote:
> Try this:
>
> reshape(MyData, direction = 'long', varying = list(c('jan', 'feb')), idvar =
> 2:3)
>
> On Tue, Jun 29, 2010 at 2:22 PM, Dimitri Liakhovitski
>  wrote:
>>
>> Hello, everyone!
>> I have a very simple task - I have a data frame (see MyData below) and
>> I need to stack the data (see result below).
>> I wrote the syntax below - it's very basic and it does what I need.
>> But I am sure what I am trying to do is a very typical task and there
>> must be a much shorter/more elegant way of doing it.
>> Any advice?
>>
>> Thank you very much!
>>
>>
>>
>> MyData<-data.frame(names=c("John","Mary","Paul","Debby"),jan=c(10,15,20,25),feb=c(1,2,3,4))
>> (MyData)
>> months<-names(MyData)[-1]
>> people<-as.character(MyData[[1]])
>>
>> ### Creating a temp matrix with people as columns and months as rows:
>> transposed<-apply(MyData[-1],1,t)
>>
>> ### Putting vertical data (months as rows) - for each person - into a
>> list:
>> list.of.stacked<-list()
>> for(i in 1:ncol(transposed)){
>>
>>  list.of.stacked[[i]]<-as.data.frame(matrix(ncol=3,nrow=length(months)))
>>        names(list.of.stacked[[i]])<-c("month","values","person")
>>        list.of.stacked[[i]][["month"]]<-months
>>        list.of.stacked[[i]][["values"]]<-transposed[1:nrow(transposed),i]
>>        list.of.stacked[[i]][["person"]]<-people[i]
>> }
>> (list.of.stacked)
>>
>> ### Creating a data frame from the list:
>> result<-do.call(rbind,list.of.stacked)
>> (result)
>>
>>
>> --
>> Dimitri Liakhovitski
>> Ninah Consulting
>> www.ninah.com
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>
>
> --
> Henrique Dallazuanna
> Curitiba-Paraná-Brasil
> 25° 25' 40" S 49° 16' 22" O
>



-- 
Dimitri Liakhovitski
Ninah Consulting
www.ninah.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Conditionally constructing columns in a data frame

2010-06-29 Thread David Winsemius


On Jun 29, 2010, at 2:54 PM, Stuart Luppescu wrote:


Hello, I have to construct 5 new columns in a data frame depending on
the value of another of the columns in the data frame. The only way I
could figure out to do this was to subset the data frame five times,  
do

the variable construction, and then rbind the subsets back together.
Here's part of the code I used:

read001 <- read[read$=="001",]

 read001$era1end <- NA
 read001$era2base <- NA
 read001$era2end <- NA
 read001$era3base <- read001$era1base
 read001$era3end <- read001$era3base + (6 * read001$era3tr)

read011 <- read[read$existstr=="011",]

 read011$era1end <- NA
 read011$era2base <- read011$era1base
 read011$era2end <- read011$era2base + (4 * read011$era2tr)
 read011$era3base <- read011$era2end
 read011$era3end <- read011$era2end + (6 * read011$era3tr)

...


?split

processed_list <- split(read, read$existstr)
# then you have the dataframe in sections determined by existstr's value
# process within groups (but your example does not generalize in an  
obvious manner.)

# then:
final <- do.call(rbind , processed_list)

--
David.


read2 <- rbind(read001, read011, read100, read110, read111)


Isn't there an easier way to do this?

Thanks.

--
Stuart Luppescu 
University of Chicago

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] generate irregular series of dates

2010-06-29 Thread Gabor Grothendieck
On Tue, Jun 29, 2010 at 6:22 AM, Simon Kiss  wrote:
> Dear colleagues, particularly academic ones,
> So I'm creating a Microsoft Word template for myself so that every time I
> teach a new course, I don't have to enter in the dates manually for each
> class session.
> I'd like to use an R script that can generate an irregular series of dates
> starting from one date (semester begin) to another (semester end) using an
> irregular interval in between (Tuesdays and Thursdays, for example).
> I know that a regular series of dates is no problem, but what about an
> irregular series?

Generate all the dates in the range of interest and then pick off the
Tuesdays and Thursdays:

dd <- seq(as.Date("2010-01-01"), as.Date("2010-12-31"), "day")
dd[weekdays(dd) %in% c("Tuesday", "Thursday")]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Conditionally constructing columns in a data frame

2010-06-29 Thread Stuart Luppescu
On Tue, 2010-06-29 at 13:59 -0500, Erik Iverson wrote:
> I'm sure there's an easier way, but it's going to be easiest to get a 
> useful response if we have a reproducible, minimal example, as the 
> posting guide requests. ?tapply is probably involved.

A minimal example of data? How about this:

era1base  eb1base era1tr eb1tr era2tr eb2tr   era3treb3tr existstr
11  207.9367 -5.08916 NANA NANA  0.40376  0.70781  001
25  205.5631 -7.46273 NANA NANA  0.24351  0.54757  001
38  211.3405 -1.68539 NANA NANA  0.16300  0.46706  001
40  207.7364 -5.28944 NANA NANA  0.15421  0.45827  001
125 210.8997 -2.12617 NANA NANA -0.06747  0.23659  001
128 210.4231 -2.60274 NANA NANA -0.07540  0.22865  001
129 209.0014 -4.02449 NANA NANA -0.07750  0.22656  001
140 205.7868 -7.23908 NANA NANA -0.09669  0.20737  001
147 204.7511 -8.27474 NANA NANA -0.12341  0.18065  001
199 214.5837  1.55783 NANA NANA -0.19735  0.10671  001
217 210.2797 -2.74620 NANA NANA -0.22830  0.07576  001
220 210.8241 -2.20176 NANA NANA -0.23292  0.07114  001
230 214.1677  1.14188 NANA NANA -0.24546  0.05860  001
23  204.5346  -8.49127 NANA 0.97800  0.05710  0.25214  0.55620  011
45  200.6664 -12.35943 NANA 0.85315 -0.06774  0.13930  0.44336  011
61  206.5377  -6.48822 NANA 1.35338  0.43248  0.08575  0.38980  011
78  204.1361  -8.88975 NANA 1.23077  0.30988  0.02840  0.33246  011
87  205.8586  -7.16726 NANA 1.27300  0.35210  0.01372  0.31778  011
98  214.8767   1.85082 NANA 0.05363 -0.86727 -0.01741  0.28665  011
100 204.2501  -8.77575 NANA 1.75940  0.83850 -0.01860  0.28545  011
133 215.9762   2.95033 NANA 0.54176 -0.37914 -0.08791  0.21615  011
211 209.5723  -3.45358 NANA 0.85139 -0.06951 -0.21637  0.08769  011
523 206.0448 -6.98104 1.02349 0.65789 NANA NANA  100
524 215.9634  2.93754 0.37856 0.01297 NANA NANA  100

-- 
Stuart Luppescu 
University of Chicago
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] merging/intersecting 2 data frames

2010-06-29 Thread Erin Hodgess
Dear R People:

I have two data frames, a.df and b.df as seen here:

> a.df[1:10,]
DATE GENDER PATIENT_ID AGE SYNDROME
1  4/16/2009  F  23686  45 RASH ON BODY
2  4/16/2009  F  13840  35 CANT URINATE
3  4/16/2009  M  12895  30   BLURRED VISION
4  4/16/2009  M  18375  33   UNABLE TO VOID
5  4/16/2009  M   2237  44 SOB WEAKNESS
6  4/16/2009  F  21484  41 TOOTH PAINTOOTH PAIN
7  4/16/2009  M  10783  37  RT ARM PAIN
8  4/16/2009  M  12610  65L FOOT INJURY
9  4/16/2009  F   3495  29 URINARY DIFFICULTIES
10 4/16/2009  F351  36   PT STS MVA
> b.df[1:10,]
   DATE_OF_DEATHID
1  4/19/2009 21676
2  4/19/2009 13717
3  4/19/2009 20498
4  4/19/2009 14281
5  4/19/2009 38848
6  4/20/2009   331
7  4/20/2009  4084
8  4/20/2009 19616
9  4/20/2009 17965
10 4/20/2009 11863
>

a.df will always be larger than b.df.

I want to create a third data frame that is matched on PATIENT_ID from
a.df and ID from b.df.

If there is no match from a.df$PATIENT_ID to b.df$ID, then we omit the
row from the new data.frame.

If there is a match, we include the DATE_OF_DEATH column from b.df.

I've tried all kinds of tricks, but nothing works exactly as I wish.

Thanks in advance,
Sincerely,
Erin


-- 
Erin Hodgess
Associate Professor
Department of Computer and Mathematical Sciences
University of Houston - Downtown
mailto: erinm.hodg...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to draw multi group plot?

2010-06-29 Thread Greg Snow
It is hard to know exactly what you want without a description of your data or 
what you want the final plot to look like.  But you can do the equivalent of 
plot followed by multiple calls to points by using a loop, apply functions, the 
lattice package or the ggplots2 package (I'm guessing the later 2 would be the 
better place for you to start).

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.org
801.408.8111


> -Original Message-
> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-
> project.org] On Behalf Of ??
> Sent: Monday, June 28, 2010 7:57 PM
> To: R0; R1
> Subject: [R] How to draw multi group plot?
> 
> As the attachement,I wanna draw multi group plot.
> But I can only use :
> plot(x,y...)
> points(...)
> 
> It's a heavy work to use these command if there're too many groups to
> be drawn because I have to use point() for many times.
> 
> I wanna know wheter there's command which can draw the multigroup plot
> directly?
> 
> Thanks
> 
> My best.
> 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] generate irregular series of dates

2010-06-29 Thread Greg Snow
A fairly simple way is to generate one series with all the Tuesdays, then 
another with all the Thursdays, combine and sort.

> sort( c( seq.Date( as.Date('2010-6-29'), by='week', length.out=10), 
+ seq.Date( as.Date('2010-7-1'), by='week', length.out=10) )
+ )
 [1] "2010-06-29" "2010-07-01" "2010-07-06" "2010-07-08" "2010-07-13"
 [6] "2010-07-15" "2010-07-20" "2010-07-22" "2010-07-27" "2010-07-29"
[11] "2010-08-03" "2010-08-05" "2010-08-10" "2010-08-12" "2010-08-17"
[16] "2010-08-19" "2010-08-24" "2010-08-26" "2010-08-31" "2010-09-02"
>

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.org
801.408.8111


> -Original Message-
> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-
> project.org] On Behalf Of Simon Kiss
> Sent: Tuesday, June 29, 2010 4:22 AM
> To: r-help@r-project.org
> Subject: [R] generate irregular series of dates
> 
> Dear colleagues, particularly academic ones,
> So I'm creating a Microsoft Word template for myself so that every
> time I teach a new course, I don't have to enter in the dates manually
> for each class session.
> I'd like to use an R script that can generate an irregular series of
> dates starting from one date (semester begin) to another (semester
> end) using an irregular interval in between (Tuesdays and Thursdays,
> for example).
> I know that a regular series of dates is no problem, but what about an
> irregular series?
> Yours,
> Simon Kisss
> *
> Simon J. Kiss, PhD
> SSHRC and DAAD Post-Doctoral Fellow
> John F. Kennedy Institute of North America Studies
> Free University of Berlin
> Lansstraße 7-9
> 14195 Berlin, Germany
> Cell: +49 (0)1525-300-2812,
> Web: http://www.jfki.fu-berlin.de/index.html
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] formating chron date times for printing

2010-06-29 Thread Gabor Grothendieck
Also the paste solution can be abbreviated to just:

   paste(as.Date(x), x - floor(x))

On Tue, Jun 29, 2010 at 2:57 PM, Gabor Grothendieck
 wrote:
> The time zone independent solution, i.e.  paste(as.Date(x), format(x -
> floor(x))),
> is the safer although format(as.POSIXlt(x, tz = "GMT")) seems to work too.
>
> On Tue, Jun 29, 2010 at 2:44 PM, stephen sefick  wrote:
>> Thank you both!  If I don't want to deal with a Time Zone potentailly
>> converting some of the dates, which would be your suggestions.  Or,
>> are they all the same way to skin a cat.  Again thank you for your
>> wonderful help.
>> kindest regards,
>>
>> Stephen Sefick
>>
>>
>> On Tue, Jun 29, 2010 at 1:39 PM, Gabor Grothendieck
>>  wrote:
>>> On Tue, Jun 29, 2010 at 2:22 PM, Gabor Grothendieck
>>>  wrote:
 On Tue, Jun 29, 2010 at 2:01 PM, stephen sefick  wrote:
> the date were created with chron with this argument
>
> format=c(dates="Y/m/d", times="H:M:S"))
>
> so I have the dates being displayed as
>
> (10/06/22 12:00:00)
>
> I would like to have them displayed as
>
> "2010-06-22 12:00:00" or "%Y-%m-%d %H:%M:%S"
>
> and then I can convert these for mergeing with another data frame
>
> x <- (structure(c(14464, 14464.010417, 14464.020833, 14464.03125,
> 14464.041667), format = structure(c("Y/m/d", "H:M:S"), .Names = 
> c("dates",
> "times")), origin = c(1, 1, 1970), class = c("chron", "dates",
> "times")))
>
> reading through old posts I found this:
>
> format(x, enclosed = c("", ""))
>
> which put the which surrounds the date time with "" instead of ()
> now I would like to change the format of the dates to print like the
> above specified.
> kindest regards,
>

 Try this:

> format(as.POSIXlt(x, tz = "GMT"))
 [1] "2009-08-08 00:00:00" "2009-08-08 00:15:00" "2009-08-08 00:29:59"
 [4] "2009-08-08 00:45:00" "2009-08-08 01:00:00"

>>>
>>> Also here is another solution:
>>>
 paste(as.Date(x), format(x - floor(x)))
>>> [1] "2009-08-08 00:00:00" "2009-08-08 00:15:00" "2009-08-08 00:30:00"
>>> [4] "2009-08-08 00:45:00" "2009-08-08 01:00:00"
>>>
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Conditionally constructing columns in a data frame

2010-06-29 Thread Erik Iverson
I'm sure there's an easier way, but it's going to be easiest to get a 
useful response if we have a reproducible, minimal example, as the 
posting guide requests. ?tapply is probably involved.


Stuart Luppescu wrote:

Hello, I have to construct 5 new columns in a data frame depending on
the value of another of the columns in the data frame. The only way I
could figure out to do this was to subset the data frame five times, do
the variable construction, and then rbind the subsets back together.
Here's part of the code I used:

read001 <- read[read$existstr=="001",]

  read001$era1end <- NA
  read001$era2base <- NA
  read001$era2end <- NA
  read001$era3base <- read001$era1base
  read001$era3end <- read001$era3base + (6 * read001$era3tr)

read011 <- read[read$existstr=="011",]

  read011$era1end <- NA
  read011$era2base <- read011$era1base
  read011$era2end <- read011$era2base + (4 * read011$era2tr)
  read011$era3base <- read011$era2end
  read011$era3end <- read011$era2end + (6 * read011$era3tr)

...

read2 <- rbind(read001, read011, read100, read110, read111)


Isn't there an easier way to do this?

Thanks.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] formating chron date times for printing

2010-06-29 Thread Gabor Grothendieck
The time zone independent solution, i.e.  paste(as.Date(x), format(x -
floor(x))),
is the safer although format(as.POSIXlt(x, tz = "GMT")) seems to work too.

On Tue, Jun 29, 2010 at 2:44 PM, stephen sefick  wrote:
> Thank you both!  If I don't want to deal with a Time Zone potentailly
> converting some of the dates, which would be your suggestions.  Or,
> are they all the same way to skin a cat.  Again thank you for your
> wonderful help.
> kindest regards,
>
> Stephen Sefick
>
>
> On Tue, Jun 29, 2010 at 1:39 PM, Gabor Grothendieck
>  wrote:
>> On Tue, Jun 29, 2010 at 2:22 PM, Gabor Grothendieck
>>  wrote:
>>> On Tue, Jun 29, 2010 at 2:01 PM, stephen sefick  wrote:
 the date were created with chron with this argument

 format=c(dates="Y/m/d", times="H:M:S"))

 so I have the dates being displayed as

 (10/06/22 12:00:00)

 I would like to have them displayed as

 "2010-06-22 12:00:00" or "%Y-%m-%d %H:%M:%S"

 and then I can convert these for mergeing with another data frame

 x <- (structure(c(14464, 14464.010417, 14464.020833, 14464.03125,
 14464.041667), format = structure(c("Y/m/d", "H:M:S"), .Names = 
 c("dates",
 "times")), origin = c(1, 1, 1970), class = c("chron", "dates",
 "times")))

 reading through old posts I found this:

 format(x, enclosed = c("", ""))

 which put the which surrounds the date time with "" instead of ()
 now I would like to change the format of the dates to print like the
 above specified.
 kindest regards,

>>>
>>> Try this:
>>>
 format(as.POSIXlt(x, tz = "GMT"))
>>> [1] "2009-08-08 00:00:00" "2009-08-08 00:15:00" "2009-08-08 00:29:59"
>>> [4] "2009-08-08 00:45:00" "2009-08-08 01:00:00"
>>>
>>
>> Also here is another solution:
>>
>>> paste(as.Date(x), format(x - floor(x)))
>> [1] "2009-08-08 00:00:00" "2009-08-08 00:15:00" "2009-08-08 00:30:00"
>> [4] "2009-08-08 00:45:00" "2009-08-08 01:00:00"
>>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] gsub issue in R 2.11.1, but not present in 2.9.2

2010-06-29 Thread Nordlund, Dan (DSHS/RDA)
> -Original Message-
> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-
> project.org] On Behalf Of Bert Gunter
> Sent: Tuesday, June 29, 2010 11:08 AM
> To: 'Jason Rupert'; 'Duncan Murdoch'
> Cc: r-help@r-project.org
> Subject: Re: [R] gsub issue in R 2.11.1, but not present in 2.9.2
> 
> Jason:
> 
> I think it's actually even a bit worse than what Duncan said, which
> was:
> 
> ---
> "You need to double the backslashes to enter them in an R string.  So
> 
> gsub("N\\A", "NA", original, fixed=TRUE)
> 
> should work if original contains a single backslash, and
> 
> gsub("NA", "NA", original, fixed=TRUE)
> 
> should work if it contains a double one.  Two things add to the
> confusion
> here:  First, a single backslash will be displayed doubled by print().
> .. "
> --
> 
> Well, let's see: (On R version 2.11.1, 2010-5-31 for Windows)
> 
> > astring <- "n\a"
> > print(astring)
> [1] "n\a"
> 
> So Duncan's last sentence appears to be incorrect. The "\" is not
> displayed
> doubled. However ...

But Duncan's statement is correct.   In your example above, there is no 
backslash character in the variable astring.  It contains the letter 'n' and 
the control character '\a', which is a single character (the backslash is 
printed by print() to indicated the control character).  If there was actually 
a backslash character in the string, print() would have doubled.
  

> 
> > bstring <- "N\A"
> Error: '\A' is an unrecognized escape in character string starting
> "N\A"
> 
> What's going on? Well, the "\a" in astring is a _single escape sequence
> (for
> a beep/bell sound, on Windows anyway: cat("\a") should make a sound).
> So the
> "\" in "\a" is printed as correctly undoubled. However, since the "\A"
> in
> bstring does _not_ correspond to any escape sequence, the expression
> "\A"
> cannot be parsed and an error is thrown. But:
> 
> > bstring <- "N\\A"
> > print(bstring)
> [1] "N\\A"   ## is fine
> 
> ## ... Noting that
> 
> > nchar("\\A")
> [1] 2
> 
> So whether a "\" needs to be doubled or not depends on whether the
> parser
> can interpret it as part of a legitimate escape sequence, whence
> 
> gsub("\a","","\a") ## works but
> gsub("\A","","\A") ## does not.

Whether "\" needs to be doubled depends on what you want the string value to 
be.  If you want the single control character, '\a', then you don't want to 
double it.  If you want the string to contain 2 characters '\' and 'a', then 
you must enter '\\a'.

> 
> To avoid such confusion, I think Duncan's advice to double backslashes
> should be heeded as much as possible. Unfortunately, I don't think it's
> always possible:

In this case, if you actually want a newline character, then you don't want to 
use a double backslash.

> 
> > newlineString <- "first line\nsecond line\n"
> > print(newlineString)
> [1] "first line\nsecond line\n"
> > cat(newlineString)
> first line
> second line
> 
> Cheers,

Hope this is helpful,

Dan


Daniel J. Nordlund
Washington State Department of Social and Health Services
Planning, Performance, and Accountability
Research and Data Analysis Division
Olympia, WA 98504-5204


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Conditionally constructing columns in a data frame

2010-06-29 Thread Stuart Luppescu
Hello, I have to construct 5 new columns in a data frame depending on
the value of another of the columns in the data frame. The only way I
could figure out to do this was to subset the data frame five times, do
the variable construction, and then rbind the subsets back together.
Here's part of the code I used:

read001 <- read[read$existstr=="001",]

  read001$era1end <- NA
  read001$era2base <- NA
  read001$era2end <- NA
  read001$era3base <- read001$era1base
  read001$era3end <- read001$era3base + (6 * read001$era3tr)

read011 <- read[read$existstr=="011",]

  read011$era1end <- NA
  read011$era2base <- read011$era1base
  read011$era2end <- read011$era2base + (4 * read011$era2tr)
  read011$era3base <- read011$era2end
  read011$era3end <- read011$era2end + (6 * read011$era3tr)

...

read2 <- rbind(read001, read011, read100, read110, read111)


Isn't there an easier way to do this?

Thanks.

-- 
Stuart Luppescu 
University of Chicago

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] formating chron date times for printing

2010-06-29 Thread stephen sefick
Thank you both!  If I don't want to deal with a Time Zone potentailly
converting some of the dates, which would be your suggestions.  Or,
are they all the same way to skin a cat.  Again thank you for your
wonderful help.
kindest regards,

Stephen Sefick


On Tue, Jun 29, 2010 at 1:39 PM, Gabor Grothendieck
 wrote:
> On Tue, Jun 29, 2010 at 2:22 PM, Gabor Grothendieck
>  wrote:
>> On Tue, Jun 29, 2010 at 2:01 PM, stephen sefick  wrote:
>>> the date were created with chron with this argument
>>>
>>> format=c(dates="Y/m/d", times="H:M:S"))
>>>
>>> so I have the dates being displayed as
>>>
>>> (10/06/22 12:00:00)
>>>
>>> I would like to have them displayed as
>>>
>>> "2010-06-22 12:00:00" or "%Y-%m-%d %H:%M:%S"
>>>
>>> and then I can convert these for mergeing with another data frame
>>>
>>> x <- (structure(c(14464, 14464.010417, 14464.020833, 14464.03125,
>>> 14464.041667), format = structure(c("Y/m/d", "H:M:S"), .Names = 
>>> c("dates",
>>> "times")), origin = c(1, 1, 1970), class = c("chron", "dates",
>>> "times")))
>>>
>>> reading through old posts I found this:
>>>
>>> format(x, enclosed = c("", ""))
>>>
>>> which put the which surrounds the date time with "" instead of ()
>>> now I would like to change the format of the dates to print like the
>>> above specified.
>>> kindest regards,
>>>
>>
>> Try this:
>>
>>> format(as.POSIXlt(x, tz = "GMT"))
>> [1] "2009-08-08 00:00:00" "2009-08-08 00:15:00" "2009-08-08 00:29:59"
>> [4] "2009-08-08 00:45:00" "2009-08-08 01:00:00"
>>
>
> Also here is another solution:
>
>> paste(as.Date(x), format(x - floor(x)))
> [1] "2009-08-08 00:00:00" "2009-08-08 00:15:00" "2009-08-08 00:30:00"
> [4] "2009-08-08 00:45:00" "2009-08-08 01:00:00"
>



-- 
Stephen Sefick

| Auburn University   |
| Department of Biological Sciences   |
| 331 Funchess Hall  |
| Auburn, Alabama   |
| 36849|
|___|
| sas0...@auburn.edu |
| http://www.auburn.edu/~sas0025 |
|___|

Let's not spend our time and resources thinking about things that are
so little or so large that all they really do for us is puff us up and
make us feel like gods.  We are mammals, and have not exhausted the
annoying little problems of being mammals.

-K. Mullis

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] formating chron date times for printing

2010-06-29 Thread Gabor Grothendieck
On Tue, Jun 29, 2010 at 2:22 PM, Gabor Grothendieck
 wrote:
> On Tue, Jun 29, 2010 at 2:01 PM, stephen sefick  wrote:
>> the date were created with chron with this argument
>>
>> format=c(dates="Y/m/d", times="H:M:S"))
>>
>> so I have the dates being displayed as
>>
>> (10/06/22 12:00:00)
>>
>> I would like to have them displayed as
>>
>> "2010-06-22 12:00:00" or "%Y-%m-%d %H:%M:%S"
>>
>> and then I can convert these for mergeing with another data frame
>>
>> x <- (structure(c(14464, 14464.010417, 14464.020833, 14464.03125,
>> 14464.041667), format = structure(c("Y/m/d", "H:M:S"), .Names = 
>> c("dates",
>> "times")), origin = c(1, 1, 1970), class = c("chron", "dates",
>> "times")))
>>
>> reading through old posts I found this:
>>
>> format(x, enclosed = c("", ""))
>>
>> which put the which surrounds the date time with "" instead of ()
>> now I would like to change the format of the dates to print like the
>> above specified.
>> kindest regards,
>>
>
> Try this:
>
>> format(as.POSIXlt(x, tz = "GMT"))
> [1] "2009-08-08 00:00:00" "2009-08-08 00:15:00" "2009-08-08 00:29:59"
> [4] "2009-08-08 00:45:00" "2009-08-08 01:00:00"
>

Also here is another solution:

> paste(as.Date(x), format(x - floor(x)))
[1] "2009-08-08 00:00:00" "2009-08-08 00:15:00" "2009-08-08 00:30:00"
[4] "2009-08-08 00:45:00" "2009-08-08 01:00:00"

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] how to distinguish bi-mode distribution from mono-mode distribution

2010-06-29 Thread Changbin Du
HI, Dear community,

How to distinguish bi-mode distribution from mono-mode distribution? I have
only the histograms of 3500 data set.

Thanks!

-- 
Sincerely,
Changbin
--

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] formating chron date times for printing

2010-06-29 Thread Gabor Grothendieck
On Tue, Jun 29, 2010 at 2:01 PM, stephen sefick  wrote:
> the date were created with chron with this argument
>
> format=c(dates="Y/m/d", times="H:M:S"))
>
> so I have the dates being displayed as
>
> (10/06/22 12:00:00)
>
> I would like to have them displayed as
>
> "2010-06-22 12:00:00" or "%Y-%m-%d %H:%M:%S"
>
> and then I can convert these for mergeing with another data frame
>
> x <- (structure(c(14464, 14464.010417, 14464.020833, 14464.03125,
> 14464.041667), format = structure(c("Y/m/d", "H:M:S"), .Names = c("dates",
> "times")), origin = c(1, 1, 1970), class = c("chron", "dates",
> "times")))
>
> reading through old posts I found this:
>
> format(x, enclosed = c("", ""))
>
> which put the which surrounds the date time with "" instead of ()
> now I would like to change the format of the dates to print like the
> above specified.
> kindest regards,
>

Try this:

> format(as.POSIXlt(x, tz = "GMT"))
[1] "2009-08-08 00:00:00" "2009-08-08 00:15:00" "2009-08-08 00:29:59"
[4] "2009-08-08 00:45:00" "2009-08-08 01:00:00"

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Exponential Smoothing: Forecast package

2010-06-29 Thread Stephan Kolassa

Hi Phani,

something like this looks promising:

#

library(forecast)
library(Mcomp)

MAPE.for.Holt <- function (x,series,bignum=10e6) {
  foo <- 
try(ets(series$x,model="AAN",damped=FALSE,alpha=x[1],beta=x[2],restrict=FALSE),silent=TRUE)

  if ( class(foo) == "try-error" ) {
bignum
  } else {
mean(abs(fitted(foo)-series$x)/series$x,na.rm=TRUE)
  }
}

bar <- optim(par=c(.1,.1),fn=MAPE.for.Holt,series=M3[[1]])

#

At least it converges. However, I have had problems with parameters 
leaving the allowed space (that's what the try() and the bignum is for), 
and even with convergence, some unrealistically big smoothing constants 
resulted, which in turn were not very stable for varying starting 
parameters...


HTH,
Stephan


phani kishan schrieb:

Hey,
Thanks for the tip Stephan. But you could tell me how to pass the series to
the function calling ets?
Initially I planned to do it this way:

wrapper<-function(x)
{
alpha<-x[1]
beta<-x[2]
ph<-x[3]
series<-x[4]
foofit<-ets(series,model="AZZ",alpha=alpha,beta=beta,phi=phi,additive.only=T,opt.crit=c("mse"))
accuracy(foofit)[5]  ##for MAPE
}

I then planned to use the optim using
optim(c(alpha,beta,phi,series),wrapper)

What I hoped to do is also select MAPE as a criteria for selection of my
alpha and beta.
However I shouldn't pass my series in this form right? As it would be
"optimized" too in the process? Could you suggest a way around this.
And I did find a way around could this allow me to set MAPE as a criteria?

Phani



On Tue, Jun 29, 2010 at 12:47 AM, Stephan Kolassa wrote:


Hi Phani,

to get the best Holt's model, I would simply wrap a suitable function
calling ets() within optim() and optimize for alpha and beta - the values
given by ets() without constraints would probably be good starting values,
but you had better start the optimization with a variety of starting values
to make sure you don't end up in a local minimum.

I know of no comparison just between Holt and Brown - but you could use the
above methods and the M3 Competition dataset (in Mcomp) to look how the two
methods compare on a (more or less) benchmark dataset.

HTH
Stephan


phani kishan schrieb:

 Hey,

I am using the ets() function in the forecast package to find out the best
fit parameters for my time-series. I have about 50 sets of time series
data.

I'm currently using the function as follows:

ets(x,model="AZZ",opt.crit="mse")


As to my observation about 5-10 of them have been identified by ets to
have
a trend and an alpha, beta values have been thrown up - which have been
same
in all these cases. When I read up online it came up as a Brown's double
exponential smoothing as opposed to Holt's exponential smoothing (where
alpha and beta differ). I am guessing this is happening as AIC/AICc/BIC
select a model based on accuracy as well as a weight on number of
parameters
(1 in case of brown's, 2 in case of holt's). Now if I want to see results
of
the best parameters from the Holt's method, how should I go about it?

And is there any study comparing the accuracy of brown's double
exponential
model versus holt's exponential model?

Thanks in advance,
Phani







__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] formating chron date times for printing

2010-06-29 Thread stephen sefick
Phil,
Thank you very much for your swift reply.  I have not used POSIXct
much, but I do remember having issues with time zones in POSIXx.
Should I be aware of this, or is this not a problem.  I have set the
data recorders to stay one time and one time zone and never change, so
that I don't have to worry about this.
kindest regards,

Stephen

On Tue, Jun 29, 2010 at 1:14 PM, Phil Spector  wrote:
> Stephen -
>
>> format(as.POSIXct(x),"%Y-%m-%d %H:%M:%S")
>
> [1] "2009-08-07 17:00:00" "2009-08-07 17:15:00" "2009-08-07 17:29:59"
> [4] "2009-08-07 17:45:00" "2009-08-07 18:00:00"
>
>
>                                        - Phil Spector
>                                         Statistical Computing Facility
>                                         Department of Statistics
>                                         UC Berkeley
>                                         spec...@stat.berkeley.edu
>
>
> On Tue, 29 Jun 2010, stephen sefick wrote:
>
>> the date were created with chron with this argument
>>
>> format=c(dates="Y/m/d", times="H:M:S"))
>>
>> so I have the dates being displayed as
>>
>> (10/06/22 12:00:00)
>>
>> I would like to have them displayed as
>>
>> "2010-06-22 12:00:00" or "%Y-%m-%d %H:%M:%S"
>>
>> and then I can convert these for mergeing with another data frame
>>
>> x <- (structure(c(14464, 14464.010417, 14464.020833, 14464.03125,
>> 14464.041667), format = structure(c("Y/m/d", "H:M:S"), .Names =
>> c("dates",
>> "times")), origin = c(1, 1, 1970), class = c("chron", "dates",
>> "times")))
>>
>> reading through old posts I found this:
>>
>> format(x, enclosed = c("", ""))
>>
>> which put the which surrounds the date time with "" instead of ()
>> now I would like to change the format of the dates to print like the
>> above specified.
>> kindest regards,
>>
>> --
>> Stephen Sefick
>> 
>> | Auburn University                                   |
>> | Department of Biological Sciences           |
>> | 331 Funchess Hall                                  |
>> | Auburn, Alabama                                   |
>> | 36849                                                    |
>> |___|
>> | sas0...@auburn.edu                             |
>> | http://www.auburn.edu/~sas0025             |
>> |___|
>>
>> Let's not spend our time and resources thinking about things that are
>> so little or so large that all they really do for us is puff us up and
>> make us feel like gods.  We are mammals, and have not exhausted the
>> annoying little problems of being mammals.
>>
>>                                                                -K. Mullis
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>



-- 
Stephen Sefick

| Auburn University   |
| Department of Biological Sciences   |
| 331 Funchess Hall  |
| Auburn, Alabama   |
| 36849|
|___|
| sas0...@auburn.edu |
| http://www.auburn.edu/~sas0025 |
|___|

Let's not spend our time and resources thinking about things that are
so little or so large that all they really do for us is puff us up and
make us feel like gods.  We are mammals, and have not exhausted the
annoying little problems of being mammals.

-K. Mullis

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] formating chron date times for printing

2010-06-29 Thread Phil Spector

Stephen -


format(as.POSIXct(x),"%Y-%m-%d %H:%M:%S")

[1] "2009-08-07 17:00:00" "2009-08-07 17:15:00" "2009-08-07 17:29:59"
[4] "2009-08-07 17:45:00" "2009-08-07 18:00:00"


- Phil Spector
 Statistical Computing Facility
 Department of Statistics
 UC Berkeley
 spec...@stat.berkeley.edu


On Tue, 29 Jun 2010, stephen sefick wrote:


the date were created with chron with this argument

format=c(dates="Y/m/d", times="H:M:S"))

so I have the dates being displayed as

(10/06/22 12:00:00)

I would like to have them displayed as

"2010-06-22 12:00:00" or "%Y-%m-%d %H:%M:%S"

and then I can convert these for mergeing with another data frame

x <- (structure(c(14464, 14464.010417, 14464.020833, 14464.03125,
14464.041667), format = structure(c("Y/m/d", "H:M:S"), .Names = c("dates",
"times")), origin = c(1, 1, 1970), class = c("chron", "dates",
"times")))

reading through old posts I found this:

format(x, enclosed = c("", ""))

which put the which surrounds the date time with "" instead of ()
now I would like to change the format of the dates to print like the
above specified.
kindest regards,

--
Stephen Sefick

| Auburn University   |
| Department of Biological Sciences   |
| 331 Funchess Hall  |
| Auburn, Alabama   |
| 36849|
|___|
| sas0...@auburn.edu |
| http://www.auburn.edu/~sas0025 |
|___|

Let's not spend our time and resources thinking about things that are
so little or so large that all they really do for us is puff us up and
make us feel like gods.  We are mammals, and have not exhausted the
annoying little problems of being mammals.

-K. Mullis

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] gsub issue in R 2.11.1, but not present in 2.9.2

2010-06-29 Thread Bert Gunter
Jason:

I think it's actually even a bit worse than what Duncan said, which was:

---
"You need to double the backslashes to enter them in an R string.  So

gsub("N\\A", "NA", original, fixed=TRUE)

should work if original contains a single backslash, and

gsub("NA", "NA", original, fixed=TRUE)

should work if it contains a double one.  Two things add to the confusion
here:  First, a single backslash will be displayed doubled by print(). .. "
--

Well, let's see: (On R version 2.11.1, 2010-5-31 for Windows)

> astring <- "n\a"
> print(astring)
[1] "n\a"

So Duncan's last sentence appears to be incorrect. The "\" is not displayed
doubled. However ...

> bstring <- "N\A"
Error: '\A' is an unrecognized escape in character string starting "N\A"

What's going on? Well, the "\a" in astring is a _single escape sequence (for
a beep/bell sound, on Windows anyway: cat("\a") should make a sound). So the
"\" in "\a" is printed as correctly undoubled. However, since the "\A" in
bstring does _not_ correspond to any escape sequence, the expression "\A"
cannot be parsed and an error is thrown. But:

> bstring <- "N\\A"
> print(bstring)
[1] "N\\A"   ## is fine

## ... Noting that  

> nchar("\\A")
[1] 2

So whether a "\" needs to be doubled or not depends on whether the parser
can interpret it as part of a legitimate escape sequence, whence

gsub("\a","","\a") ## works but
gsub("\A","","\A") ## does not.

To avoid such confusion, I think Duncan's advice to double backslashes
should be heeded as much as possible. Unfortunately, I don't think it's
always possible:

> newlineString <- "first line\nsecond line\n"
> print(newlineString)
[1] "first line\nsecond line\n"
> cat(newlineString)
first line
second line

Cheers,
Bert


Bert Gunter
Genentech Nonclinical Statistics


> -Original Message-
> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
> On Behalf Of Uwe Ligges
> Sent: Tuesday, June 29, 2010 4:11 AM
> To: Jason Rupert
> Cc: r-help@r-project.org
> Subject: Re: [R] gsub issue in R 2.11.1, but not present in 2.9.2
> 
> 
> 
> On 29.06.2010 12:47, Jason Rupert wrote:
> > Previously in R 2.9.2 I used the following to convert from an improperly
> formatted NA string into one that is a bit more consistent.
> >
> >
> > gsub("N\A", "NA", "N\A", fixed=TRUE)
> >
> > This worked in R 2.9.2, but now in R 2.11.1 it doesn't seem to work an
> throws the following error.
> > Error: '\A' is an unrecognized escape in character string starting "N\A"
> >
> > I guess my questions are the following:
> > (1) Is this expected behavior?
> > (2) If it is expected behavior, what is the proper way to replace "N\A"
> with "NA" and "N\\A" with "NA"?
> 
> 
> If your original text "thestring" contains "N\A", then the R
> representation is "N\\A", and hence
> 
> gsub("N\\A", "NA", thestring)
> 
> If you want to try explicitly, you need to write
> 
> gsub("N\\A", "NA", "N\\A")
> 
> If you original text contains two backslashes, both have to be escaped as
> in
> 
> gsub("NA", "NA", thestring)
> 
> Uwe Ligges
> 
> 
> > Thank you again for all the help and insight.
> >
> > __
> > R-help@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] formating chron date times for printing

2010-06-29 Thread stephen sefick
the date were created with chron with this argument

format=c(dates="Y/m/d", times="H:M:S"))

so I have the dates being displayed as

(10/06/22 12:00:00)

I would like to have them displayed as

"2010-06-22 12:00:00" or "%Y-%m-%d %H:%M:%S"

and then I can convert these for mergeing with another data frame

x <- (structure(c(14464, 14464.010417, 14464.020833, 14464.03125,
14464.041667), format = structure(c("Y/m/d", "H:M:S"), .Names = c("dates",
"times")), origin = c(1, 1, 1970), class = c("chron", "dates",
"times")))

reading through old posts I found this:

format(x, enclosed = c("", ""))

which put the which surrounds the date time with "" instead of ()
now I would like to change the format of the dates to print like the
above specified.
kindest regards,

-- 
Stephen Sefick

| Auburn University   |
| Department of Biological Sciences   |
| 331 Funchess Hall  |
| Auburn, Alabama   |
| 36849|
|___|
| sas0...@auburn.edu |
| http://www.auburn.edu/~sas0025 |
|___|

Let's not spend our time and resources thinking about things that are
so little or so large that all they really do for us is puff us up and
make us feel like gods.  We are mammals, and have not exhausted the
annoying little problems of being mammals.

-K. Mullis

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Remove observations deemed influential by influential.measure

2010-06-29 Thread GL

dbs_influential_obs <- which(apply(fit.influential.observations$is.inf, 1,
any))
dbs_sans_influential_obs <- dbs1[-dbs_influential_obs,]

-- 
View this message in context: 
http://r.789695.n4.nabble.com/Remove-observations-deemed-influential-by-influential-measure-tp2272474p2272524.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] transposing a data frame from horizontal to vertical (stacking)

2010-06-29 Thread Henrique Dallazuanna
Try this:

reshape(MyData, direction = 'long', varying = list(c('jan', 'feb')), idvar =
2:3)

On Tue, Jun 29, 2010 at 2:22 PM, Dimitri Liakhovitski <
dimitri.liakhovit...@gmail.com> wrote:

> Hello, everyone!
> I have a very simple task - I have a data frame (see MyData below) and
> I need to stack the data (see result below).
> I wrote the syntax below - it's very basic and it does what I need.
> But I am sure what I am trying to do is a very typical task and there
> must be a much shorter/more elegant way of doing it.
> Any advice?
>
> Thank you very much!
>
>
>
> MyData<-data.frame(names=c("John","Mary","Paul","Debby"),jan=c(10,15,20,25),feb=c(1,2,3,4))
> (MyData)
> months<-names(MyData)[-1]
> people<-as.character(MyData[[1]])
>
> ### Creating a temp matrix with people as columns and months as rows:
> transposed<-apply(MyData[-1],1,t)
>
> ### Putting vertical data (months as rows) - for each person - into a list:
> list.of.stacked<-list()
> for(i in 1:ncol(transposed)){
>
>  list.of.stacked[[i]]<-as.data.frame(matrix(ncol=3,nrow=length(months)))
>names(list.of.stacked[[i]])<-c("month","values","person")
>list.of.stacked[[i]][["month"]]<-months
>list.of.stacked[[i]][["values"]]<-transposed[1:nrow(transposed),i]
>list.of.stacked[[i]][["person"]]<-people[i]
> }
> (list.of.stacked)
>
> ### Creating a data frame from the list:
> result<-do.call(rbind,list.of.stacked)
> (result)
>
>
> --
> Dimitri Liakhovitski
> Ninah Consulting
> www.ninah.com
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Henrique Dallazuanna
Curitiba-Paraná-Brasil
25° 25' 40" S 49° 16' 22" O

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Reordering the correlation matrix

2010-06-29 Thread A Ezhil
Hi,

I have a correlation matrix of 1000 x 1000 genes. I have grouped (20 groups) 
genes based on function and each gene has a index. I would like to reorder the 
correlation matrix based on the group index. Could you please suggest me how to 
do that?

Thanks in advance.

Kind regards,
Ezhil

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] how to create a shape file from a polygone

2010-06-29 Thread Bastien Ferland-Raymond
Dear R-users,

I have created a map with plot location using longitude/latitude coordinates
with the PlotOnStaticMap() function of the RgoogleMaps package.  Everything
works fine until I try to put a polygon on the map.  The polygon() function
doesn’t work and I need to use the special function PlotPolysOnStaticMap()
however this requires a shapefile (which I'm not sure what it is) and not XY
coordinates.

So my question is quite simple:  How do I transform XY (lon/lat) coordinates
into a shape file.

My polygon:
x.pol.mat<-c(-66.67,-66.46,-66.54,-66.82,-67.06,-66.97)
y.pol.mat<-c(48.87,48.89,48.745,48.62,48.55,48.75)
pol.mat<-matrix(c(x.pol.mat,y.pol.mat),6,2)
polygon(pol.mat)

The writePolyShape() function from the maptools package seems to be what I'm
looking for but I can't figure how to create the SpatialPolygonsDataFrame
object that is required.

I'm using WinXP pro 32bit and R 2.11.0

Thanks for your help, 

Bastien Ferland-Raymond, 
M.Sc. candidate
Université Laval
Québec, Canada
 
 
 

__ Information from ESET NOD32 Antivirus, version of virus signature
database 5235 (20100628) __

The message was checked by ESET NOD32 Antivirus.

http://www.eset.com
 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Remove observations deemed influential by influential.measure

2010-06-29 Thread David Winsemius


On Jun 29, 2010, at 1:10 PM, GL wrote:



dbs is an existing dataframe. I fit a lm and looked at influential
observations. I want now to delete the influential observations from  
dbs,
fit another lm, and see how different the results are. What is the  
syntax to

remove the influential observations from dbs?

fit <- lm(NI ~ PG + log(TG), data=dbs)
fit.influential.observations <- influence.measures(fit)

dbs.without.influential.observations <- ?



Look (more?) carefully at the examples on the help page. The first  
example shows how to extract a vector of row numbers associated with  
infuential cases. If you need help applying htat information to a  
dataframe using "[" then you probably need to re-read the Introduction  
to R as well.


--

David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] transposing a data frame from horizontal to vertical (stacking)

2010-06-29 Thread Hadley Wickham
On Tue, Jun 29, 2010 at 12:22 PM, Dimitri Liakhovitski
 wrote:
> Hello, everyone!
> I have a very simple task - I have a data frame (see MyData below) and
> I need to stack the data (see result below).
> I wrote the syntax below - it's very basic and it does what I need.
> But I am sure what I am trying to do is a very typical task and there
> must be a much shorter/more elegant way of doing it.
> Any advice?

library(reshape)
melt(MyData)

Hadley

-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] transposing a data frame from horizontal to vertical (stacking)

2010-06-29 Thread Dimitri Liakhovitski
Hello, everyone!
I have a very simple task - I have a data frame (see MyData below) and
I need to stack the data (see result below).
I wrote the syntax below - it's very basic and it does what I need.
But I am sure what I am trying to do is a very typical task and there
must be a much shorter/more elegant way of doing it.
Any advice?

Thank you very much!


MyData<-data.frame(names=c("John","Mary","Paul","Debby"),jan=c(10,15,20,25),feb=c(1,2,3,4))
(MyData)
months<-names(MyData)[-1]
people<-as.character(MyData[[1]])

### Creating a temp matrix with people as columns and months as rows:
transposed<-apply(MyData[-1],1,t)

### Putting vertical data (months as rows) - for each person - into a list:
list.of.stacked<-list()
for(i in 1:ncol(transposed)){
list.of.stacked[[i]]<-as.data.frame(matrix(ncol=3,nrow=length(months)))
names(list.of.stacked[[i]])<-c("month","values","person")
list.of.stacked[[i]][["month"]]<-months
list.of.stacked[[i]][["values"]]<-transposed[1:nrow(transposed),i]
list.of.stacked[[i]][["person"]]<-people[i]
}
(list.of.stacked)

### Creating a data frame from the list:
result<-do.call(rbind,list.of.stacked)
(result)


-- 
Dimitri Liakhovitski
Ninah Consulting
www.ninah.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] gsub issue in R 2.11.1, but not present in 2.9.2

2010-06-29 Thread Bert Gunter
Uwe:

Did you forget to add the "fixed = TRUE" parameter to your gsub call in your
reply? 

> gsub("N\\A", "NA", "N\\A")
[1] "N\\A"

> gsub("N\\A","NA","N\\A",fixed=TRUE)
[1] "NA"

I only mention it because there is already sufficient confusion that the
typo may totally bewilder people.

-- Bert

Bert Gunter
Genentech Nonclinical Statistics

> -Original Message-
> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
> On Behalf Of Uwe Ligges
> Sent: Tuesday, June 29, 2010 4:11 AM
> To: Jason Rupert
> Cc: r-help@r-project.org
> Subject: Re: [R] gsub issue in R 2.11.1, but not present in 2.9.2
> 
> 
> 
> On 29.06.2010 12:47, Jason Rupert wrote:
> > Previously in R 2.9.2 I used the following to convert from an improperly
> formatted NA string into one that is a bit more consistent.
> >
> >
> > gsub("N\A", "NA", "N\A", fixed=TRUE)
> >
> > This worked in R 2.9.2, but now in R 2.11.1 it doesn't seem to work an
> throws the following error.
> > Error: '\A' is an unrecognized escape in character string starting "N\A"
> >
> > I guess my questions are the following:
> > (1) Is this expected behavior?
> > (2) If it is expected behavior, what is the proper way to replace "N\A"
> with "NA" and "N\\A" with "NA"?
> 
> 
> If your original text "thestring" contains "N\A", then the R
> representation is "N\\A", and hence
> 
> gsub("N\\A", "NA", thestring)
> 
> If you want to try explicitly, you need to write
> 
> gsub("N\\A", "NA", "N\\A")
> 
> If you original text contains two backslashes, both have to be escaped as
> in
> 
> gsub("NA", "NA", thestring)
> 
> Uwe Ligges
> 
> 
> > Thank you again for all the help and insight.
> >
> > __
> > R-help@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Remove observations deemed influential by influential.measure

2010-06-29 Thread GL

dbs is an existing dataframe. I fit a lm and looked at influential
observations. I want now to delete the influential observations from dbs,
fit another lm, and see how different the results are. What is the syntax to
remove the influential observations from dbs?

fit <- lm(NI ~ PG + log(TG), data=dbs)
fit.influential.observations <- influence.measures(fit)

dbs.without.influential.observations <- ?

-- 
View this message in context: 
http://r.789695.n4.nabble.com/Remove-observations-deemed-influential-by-influential-measure-tp2272474p2272474.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to allocate more memories to R?

2010-06-29 Thread stephen sefick
I am not entirely sure, but the actual amount of RAM is less than the
GB specified by the specs.  Are you directing R to a max memory
size...  If so it is too large.

On Tue, Jun 29, 2010 at 11:32 AM, Bogaso  wrote:
>
> When I use this I am getting following warning at the time of opening R from
> that shortcut :
>
> "-max-mem-size=2048MB:too large and taken as 2047M"
>
> Why I am getting this? I have 3GB ram installed and using within Vista.
>
> Thanks
> --
> View this message in context: 
> http://r.789695.n4.nabble.com/How-to-allocate-more-memories-to-R-tp2271714p2272436.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Stephen Sefick

| Auburn University   |
| Department of Biological Sciences   |
| 331 Funchess Hall  |
| Auburn, Alabama   |
| 36849|
|___|
| sas0...@auburn.edu |
| http://www.auburn.edu/~sas0025 |
|___|

Let's not spend our time and resources thinking about things that are
so little or so large that all they really do for us is puff us up and
make us feel like gods.  We are mammals, and have not exhausted the
annoying little problems of being mammals.

-K. Mullis

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to allocate more memories to R?

2010-06-29 Thread Bogaso

When I use this I am getting following warning at the time of opening R from
that shortcut :

"-max-mem-size=2048MB:too large and taken as 2047M"

Why I am getting this? I have 3GB ram installed and using within Vista.

Thanks
-- 
View this message in context: 
http://r.789695.n4.nabble.com/How-to-allocate-more-memories-to-R-tp2271714p2272436.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Sweave, xtable plus/minus sign

2010-06-29 Thread Ottorino-Luca Pantani

On 29/06/2010 17:36, Marc Schwartz wrote:

On Jun 29, 2010, at 10:08 AM, Ottorino-Luca Pantani wrote:

   


paste("$\\pm$", 1.34, sep="")
 

[1] "$\\pm$1.34"


I believe you then need to tweak the sanitize.text.function argument in 
print.xtable() to properly handle the backslashes.

HTH,

Marc Schwartz


   

Thanks, Marc.
I modified the code as follows:

foo.df$Std.Dev <-
 paste("$\\pm$", round(mySD,2), sep="")
tmpTable <-
   xtable(foo.df, caption ="Simulated data",
  label="tab:five", digits=2)
print(tmpTable, caption.placement="top",
  sanitize.text.function= function(x){x})

which result in a .tex file like

..
\begin{table}[ht]
\begin{center}
\caption{Simulated data}
\label{tab:five}
\begin{tabular}{rrl}
  \hline
& Mean & Std.Dev \\
  \hline
  1 & 0.46 & $\pm$0.42 \\
  2 & 0.81 & $\pm$0.69 \\
  3 & 0.17 & $\pm$0.56 \\
  4 & 0.15 & $\pm$1.02 \\
  5 & 0.60 & $\pm$1.37 \\
  6 & 0.48 & $\pm$1.39 \\
   \hline
\end{tabular}
\end{center}
\end{table}\end{document}

so it seems that there's no need to tweak the sanitize.text.function

Thanks again.

--
Ottorino

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Help asked for automated generation of ncdf variables

2010-06-29 Thread David Pierce
> Date: Wed, 16 Jun 2010 17:53:56 +0200
> From: "Adolf STIPS" 
>
> Hi,
>
> I'm using the "ncdf" library for creating ncdf files.
> But I need to create about 100 variables per file (e.g. single rivers),
> So I do not like to create each variable separately.
>
> Unfortunately I found no way to make this work, as I'm unable to create a
> correct list of class "var.ncdf".

Hello,

the var.def.ncdf() call returns an object of class var.ncdf.  Since
objects are lists, you can't assign them to arrays.  However, you can
easily assign them to other lists.  So instead of this:

> riv=
> var.def.ncdf(nam,"m**3/s",d1,msvf,longname=names[iriv],prec="single")
>  #class(river[iriv]) ="var.ncdf"
>  river[iriv] = riv
>  }
>
> ncw=create.ncdf(wfile,list(river))

Try something this:

  river = list()
> riv=
> var.def.ncdf(nam,"m**3/s",d1,msvf,longname=names[iriv],prec="single")
>  river[[iriv]] = riv
>  }
>
> ncw=create.ncdf(wfile,river)

Regards,

--Dave

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Sweave, xtable plus/minus sign

2010-06-29 Thread Marc Schwartz
On Jun 29, 2010, at 10:08 AM, Ottorino-Luca Pantani wrote:

> Dear R-users,
> please consider the following minimal example:
> 
> \documentclass[a4paper,titlepage,onecolumn,12pt]{article}
> \usepackage[italian]{babel}
> \usepackage{amssymb}
> \usepackage[utf8x]{inputenc}
> \usepackage[pdftex]{graphicx}
> \begin{document}
> 
> <>=
> df.data1 <-
> cbind.data.frame(A = rnorm(18),
>  B =factor(rep(LETTERS[1:6], each=3)))
> myMean <- tapply(df.data1$A, df.data1$B, FUN = mean)
> mySD <- tapply(df.data1$A, df.data1$B, FUN = sd)
> foo <- matrix(c(myMean, mySD), ncol=2, nrow=6)
> colnames(foo) <- c("Mean", "Std.Dev")
> tmpTable <- xtable(foo, caption ="Simulated data",
>label="tab:four", digits=2)
> print(tmpTable, caption.placement="top")
> @
> 
> \end{document}
> 
> Is it possible to insert the plus/minus sign (±)  between the two columns ?
> I mean within R/Sweave and not in the resulting .tex file ?
> 
> A possible workaround could be :
> ...
> foo.df <- as.data.frame(foo)
> foo.df$Std.Dev <- paste("±", round(mySD,2), sep="")
> tmpTable <- xtable(foo.df, caption ="Simulated data",
>label="tab:five", digits=2)
> print(tmpTable, caption.placement="top")
> @
> 
> Any other solution?


Don't use "±" as the character, as that will be impacted upon by various 
issues, such as locale and fonts.

Use the available LaTeX symbols, which in this case is \pm. See:

  http://www.ctan.org/tex-archive/info/symbols/comprehensive/symbols-letter.pdf

In the case of this symbol, you need to put LaTeX into math mode by using '$' 
to surround the symbol:

  $\pm$

However, with R, you need to double the backslashes, otherwise the backslash 
will be interpreted as an escape sequence. Thus, you need:

 $\\pm$


So, for example:

> paste("$\\pm$", 1.34, sep="")
[1] "$\\pm$1.34"


I believe you then need to tweak the sanitize.text.function argument in 
print.xtable() to properly handle the backslashes.

HTH,

Marc Schwartz

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Link to a pdf document

2010-06-29 Thread cgenolin

Thanks
>
>
>
> On 29/06/2010 6:55 AM, cgenolin wrote:
>> Thanks, Duncan.
>> And where am I suppose to put the file toto.pdf ?
>> In "/myPack/doc/" or in "myPack/inst/doc/"
>>
>
>
> The latter.  When your package is installed, it will be installed to
> myPack/doc.
>
> Duncan Murdoch
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
> __
> View message @ 
> http://r.789695.n4.nabble.com/Link-to-a-pdf-document-tp2271973p2272047.html
>
> To unsubscribe from Re: Link to a pdf document, click 
>  (link removed) 
>




Ce message a ete envoye par IMP, grace a l'Universite Paris 10 Nanterre




-- 
View this message in context: 
http://r.789695.n4.nabble.com/Link-to-a-pdf-document-tp2271973p2272342.html
Sent from the R help mailing list archive at Nabble.com.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Sweave, xtable plus/minus sign

2010-06-29 Thread Ottorino-Luca Pantani

Dear R-users,
please consider the following minimal example:

\documentclass[a4paper,titlepage,onecolumn,12pt]{article}
\usepackage[italian]{babel}
\usepackage{amssymb}
\usepackage[utf8x]{inputenc}
\usepackage[pdftex]{graphicx}
\begin{document}

<>=
df.data1 <-
 cbind.data.frame(A = rnorm(18),
  B =factor(rep(LETTERS[1:6], each=3)))
myMean <- tapply(df.data1$A, df.data1$B, FUN = mean)
mySD <- tapply(df.data1$A, df.data1$B, FUN = sd)
foo <- matrix(c(myMean, mySD), ncol=2, nrow=6)
colnames(foo) <- c("Mean", "Std.Dev")
tmpTable <- xtable(foo, caption ="Simulated data",
label="tab:four", digits=2)
print(tmpTable, caption.placement="top")
@

\end{document}

Is it possible to insert the plus/minus sign (±)  between the two columns ?
I mean within R/Sweave and not in the resulting .tex file ?

A possible workaround could be :
...
foo.df <- as.data.frame(foo)
foo.df$Std.Dev <- paste("±", round(mySD,2), sep="")
tmpTable <- xtable(foo.df, caption ="Simulated data",
label="tab:five", digits=2)
print(tmpTable, caption.placement="top")
@

Any other solution?
--
Ottorino-Luca Pantani, Università di Firenze
Dip.to di Scienze delle Produzioni Vegetali,
del Suolo e dell'Ambiente Forestale (DiPSA)
P.zle Cascine 28 50144 Firenze Italia
Ubuntu 10.04 -- GNU Emacs 23.1.50.1 (x86_64-pc-linux-gnu, GTK+ Version 
2.18.0)

ESS version 5.8 -- R 2.10.1

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Need help for SVM code for microarray classification

2010-06-29 Thread Steve Lianoglou
Hi,

On Tue, Jun 29, 2010 at 7:16 AM, Aadhithya  wrote:
>
> Following is the error I am getting:
> Error in svm.default(train, cl) :
>  Need numeric dependent variable for regression.

Here's a problem that you may not even know you had yet.

By the looks of your code, it seems as if you want to do
classification, but the SVM is trying to do regression.

You have to make sure that all of the columns in your data.frames are
of the right type. If you're doing classification, change your label
column to a factor. In your case, where you are explicitly passing a y
vector, instead of this:

R> model<- svm(train,cl);

do this

R> cl <- factor(cl)
R> model <- svm(train, cl)

Also, explicitly set the `type` argument in your call to `svm` so you
are doing the right thing (ie. 'C-classifiction', 'nu-classificatino',
etc).

Also, be sure that there aren't any NA values in your data.

> My dataset looks like this in both training and testing:

If both of your datasets look the same, why are you transposing your
`test` matrix when passing it to `predict` (I'm looking at the code in
your original post)?

-- 
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Help with dates and characters

2010-06-29 Thread Allan Engelhardt

If the vector elements are (still) strings, you could simply try

x<- c("2000-01-01", "2000-01-23", "2001-03-12", "2009-12-31")
substring(x, 1, 7)
# [1] "2000-01" "2000-01" "2001-03" "2009-12"


Hope this helps a little.

Allan



On 29/06/10 14:36, Thomas Jensen wrote:

Dear R Experts,

I have a vector of dates in character format like this:

date
"2000-01-01"
"2000-01-23"
"2001-03-12"
...
...
...
"2009-12-31"

I would like to delete the last part of the character string (i.e. the 
"day" part), so the vector looks like this:


date
"2000-01"
"2000-01"
"2001-03"
...
...
...
"2009-03"

I have been looking into regular expressions, but i find this very 
confusing.


Thank you for your help,
Thomas

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Help with dates and characters

2010-06-29 Thread Marc Schwartz
On Jun 29, 2010, at 8:36 AM, Thomas Jensen wrote:

> Dear R Experts,
> 
> I have a vector of dates in character format like this:
> 
> date
> "2000-01-01"
> "2000-01-23"
> "2001-03-12"
> ...
> ...
> ...
> "2009-12-31"
> 
> I would like to delete the last part of the character string (i.e. the "day" 
> part), so the vector looks like this:
> 
> date
> "2000-01"
> "2000-01"
> "2001-03"
> ...
> ...
> ...
> "2009-03"
> 
> I have been looking into regular expressions, but i find this very confusing.
> 
> Thank you for your help,
> Thomas


As is typically the case with R, there is more than one possible solution:

> x
[1] "2000-01-01" "2000-01-23" "2001-03-12"

 
# See ?substr
> substr(x, 1, 7)
[1] "2000-01" "2000-01" "2001-03"


# See ?sub and ?regex
# Replace the trailing '-' and 2 occurrences of 0-9
# with "". The '$' indicates the end of the string
> sub("-[0-9]{2}$", "", x)
[1] "2000-01" "2000-01" "2001-03"


Each of the above return a character vector, not a "Date" class object.

If you actually want the vector as a Date class object, but just 'format' the 
output as YY-MM, then you can use:

# See ?as.Date and ?strptime
> format(as.Date(x, format = "%Y-%m-%d"), "%Y-%m")
[1] "2000-01" "2000-01" "2001-03"


HTH,

Marc Schwartz

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Help with dates and characters

2010-06-29 Thread Thomas Jensen

Dear R Experts,

I have a vector of dates in character format like this:

date
"2000-01-01"
"2000-01-23"
"2001-03-12"
...
...
...
"2009-12-31"

I would like to delete the last part of the character string (i.e. the  
"day" part), so the vector looks like this:


date
"2000-01"
"2000-01"
"2001-03"
...
...
...
"2009-03"

I have been looking into regular expressions, but i find this very  
confusing.


Thank you for your help,
Thomas

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Constructing a model with multilevel response variables

2010-06-29 Thread Sam

Dear List,

I am a little unsure how to structure my model and was after some advice. I am 
a little unsure if this question is appropriate for this list, if it is not 
please just delete and accept my apologise.

I have 10 factors that are categorical variables and 5 levels of response 
variables -

A   B   
C   D   - 
Factors   RESPONSE 
2   2   
2   2   
1
2   4   
2   2   
2
2   1   
2   2   
2
2   1   
2   1   
1
2   3   
2   2   
3
2   1   
1   2   
4
2   1   
2   3   
4
1   1   
3   2   
2
2   2   
2   2   
1
2   1   
5   2   
1

The response variables relate to how threatened the species is  - from not 
threatened to extinct (1-5)

My first approach was to divide the 5 response levels into 2 - threatened ( 
levels 1+2) or non threatened (levels 3,4+5) and call

model1 <- lmer(THREAT~1+(1|ORDER/FAMILY) + A+B+C+D, family=binomial) 

Which worked well, now i want to see how the factors influence the individual 
response variables i.e do species with a response variable of 1for instance, 
posses certain factors, and it is this i am unsure how to build into a model.

My overall goal would be to use the model as a predictive model and ask -  "if a 
species has factors a ,b,c for instance , can i predict what the response level (0-5) 
would be" 

Thanks, and once again i apologise if this is not the right place to ask this 
type of question.

Sam,


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Constructing a model with multilevel response variables

2010-06-29 Thread Sam


Dear List

I am a little unsure how to structure my model and was after some advice. I am 
a little unsure if this question is appropriate for this list, if it is not 
please just delete and accept my apologise.

I have 10 factors that are categorical variables and 5 levels of response 
variables -

A   B   
C   D   - 
Factors   RESPONSE 
2   2   
2   2   
1
2   4   
2   2   
2
2   1   
2   2   
2
2   1   
2   1   
1
2   3   
2   2   
3
2   1   
1   2   
4
2   1   
2   3   
4
1   1   
3   2   
2
2   2   
2   2   
1
2   1   
5   2   
1

The response variables relate to how threatened the species is  - from not 
threatened to extinct (1-5)

My first approach was to divide the 5 response levels into 2 - threatened ( 
levels 1+2) or non threatened (levels 3,4+5) and call

model1 <- lmer(THREAT~1+(1|ORDER/FAMILY) + A+B+C+D..., family=binomial) 

Which worked well, now i want to see how the factors influence the individual 
response variables i.e do species with a response variable of 1 for instance, 
posses certain combinations of factors, and it is this i am unsure how to build 
into a model.

My overall goal would be to use the model as a predictive model and ask -  "if a 
species has factors a ,b,c for instance , can i predict what the response level (0-5) 
would be". 

Thanks, and once again i apologise if this is not the right place to ask this 
type of question.

Sam,


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] More than two font in a plot

2010-06-29 Thread Jinsong Zhao
Hi there,

I am a Chinese R user. I hope to display Chinese character in a plot,
and than save it in PostScript format. I have read the article titled
"Non-Standard Fonts in PostScript and PDF Graphics", especially the
section about CJK fonts. I also tried the code:

> pdf("chinese.pdf", width=3, height=1)
> grid.text("\u4F60\u597D", y=2/3, gp=gpar(fontfamily="CNS1"))
> grid.text("is 'hello' in (Traditional) Chinese", y=1/3)
> dev.off()

however, it's not valid with postscript(). It seems that postscript()
need to set family in postscirpt(..., family = "CNS1"). Then all the
characters are in CJK font, and it's not what I hope to get. I hope the
Latin character is displayed in Helvetica.

Any suggestions? Thanks in advance!

Regards,
Jinsong
-- 
Jinsong Zhao, Ph.D.
College of Resources and Environment
Huazhong Agricultural University
Wuhan 430070, P.R. China
E-mail: jsz...@mail.hzau.edu.cn

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Use of processor by R 32bit on a 64bit machine

2010-06-29 Thread Marc Schwartz
I suspect that it was Intel's marketing department, after a few beers at the 
local bar...

;-)

Regards,

Marc

On Jun 29, 2010, at 9:09 AM, Joris Meys wrote:

> *slaps forehead*
> Thanks. So out it goes, that hyperthreading. Who invented
> hyperthreading on a quad-core anyway?
> 
> 
> Cheers
> Joris
> 
> 2010/6/29 Uwe Ligges :
>> 
>> 
>> On 29.06.2010 15:30, Joris Meys wrote:
>>> 
>>> Dear all,
>>> 
>>> I've recently purchased a new 64bit system with an intel i7 quadcore
>>> processor. As I understood (maybe wrongly) that to date the 32bit
>>> version of R is more stable than the 64bit, I installed the 32bit
>>> version and am happily using it ever since. Now I'm running a whole
>>> lot of models, which goes smoothly, and I thought out of curiosity to
>>> check how much processor I'm using. I would have thought I used 25%
>>> (being one core), as on my old dual core R uses 50% of the total
>>> processor capacity. Funny, it turns out that R is currently using only
>>> 12-13% of my cpu, which is about half of what I expected.
>>> 
>> 
>> An Intel Core i7 Quadcore has 8 virtual cores since it supports
>> hyperthreading. R uses one of these virtual cores. Note that 2 virtual cores
>> won't be twice as fast since they are running on the same physical core.
>> Hence this is expected.
>> 
>> Uwe Ligges
>> 
>> 
>> 
>>> Did I miss something somewhere? Should I change some settings? I'm
>>> running on a Windows 7 enterprise. I looked around already, but I have
>>> the feeling I overlooked something.
>>> 
>>> Cheers
>>> Joris
>>> 
>>> sessionInfo()
>>> R version 2.10.1 (2009-12-14)
>>> i386-pc-mingw32
>>> 
>>> locale:
>>> [1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United
>>> States.1252LC_MONETARY=English_United States.1252
>>> [4] LC_NUMERIC=C   LC_TIME=English_United
>>> States.1252
>>> 
>>> attached base packages:
>>> [1] grDevices datasets  splines   graphics  stats tcltk utils
>>>methods   base
>>> 
>>> other attached packages:
>>> [1] svSocket_0.9-48 TinnR_1.0.3 R2HTML_2.0.0Hmisc_3.7-0
>>> survival_2.35-7
>>> 
>>> loaded via a namespace (and not attached):
>>> [1] cluster_1.12.3 grid_2.10.1lattice_0.18-3 svMisc_0.9-57
>>>  tools_2.10.1
>>> 
>>> 
>> 
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Use of processor by R 32bit on a 64bit machine

2010-06-29 Thread Joris Meys
*slaps forehead*
Thanks. So out it goes, that hyperthreading. Who invented
hyperthreading on a quad-core anyway?


Cheers
Joris

2010/6/29 Uwe Ligges :
>
>
> On 29.06.2010 15:30, Joris Meys wrote:
>>
>> Dear all,
>>
>> I've recently purchased a new 64bit system with an intel i7 quadcore
>> processor. As I understood (maybe wrongly) that to date the 32bit
>> version of R is more stable than the 64bit, I installed the 32bit
>> version and am happily using it ever since. Now I'm running a whole
>> lot of models, which goes smoothly, and I thought out of curiosity to
>> check how much processor I'm using. I would have thought I used 25%
>> (being one core), as on my old dual core R uses 50% of the total
>> processor capacity. Funny, it turns out that R is currently using only
>> 12-13% of my cpu, which is about half of what I expected.
>>
>
> An Intel Core i7 Quadcore has 8 virtual cores since it supports
> hyperthreading. R uses one of these virtual cores. Note that 2 virtual cores
> won't be twice as fast since they are running on the same physical core.
> Hence this is expected.
>
> Uwe Ligges
>
>
>
>> Did I miss something somewhere? Should I change some settings? I'm
>> running on a Windows 7 enterprise. I looked around already, but I have
>> the feeling I overlooked something.
>>
>> Cheers
>> Joris
>>
>> sessionInfo()
>> R version 2.10.1 (2009-12-14)
>> i386-pc-mingw32
>>
>> locale:
>> [1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United
>> States.1252    LC_MONETARY=English_United States.1252
>> [4] LC_NUMERIC=C                           LC_TIME=English_United
>> States.1252
>>
>> attached base packages:
>> [1] grDevices datasets  splines   graphics  stats     tcltk     utils
>>    methods   base
>>
>> other attached packages:
>> [1] svSocket_0.9-48 TinnR_1.0.3     R2HTML_2.0.0    Hmisc_3.7-0
>> survival_2.35-7
>>
>> loaded via a namespace (and not attached):
>> [1] cluster_1.12.3 grid_2.10.1    lattice_0.18-3 svMisc_0.9-57
>>  tools_2.10.1
>>
>>
>



-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] mixed-effects model with two fixed effects: interaction

2010-06-29 Thread Bilonick, Richard A
On Tue, 2010-06-29 at 09:09 +, Ilona Leyer wrote:
> Dear all,
> In a greenhouse experiment we tested performance of 4 different species 
> (B,H,P,R) under 3 different water levels in 10 replications. As response 
> variable e.g. the number of emerging sprouts were measured on three dates. A 
> simple Anova considering every measurement date separately shows a higly 
> significant effect of species and moisture (and partly the interaction of 
> both). The mixed-effects model with species and moisture shows a highly 
> significant effect of species and moisture as well. However, when I included 
> the interaction the t-values of the species dropped strongly and the SE 
> increase and the results for the species are not significant anymore. For me 
> this does not seem plausible. Has anybody an idea, how this can be 
> interpreted and if I have done a mistake in calculating the data? 
> 
> Thanks in advance for any help!
> Ilona
> 
> 
> model1<-lme(sprouts~species+moisture,random=~time|ID)
> model2<-lme(sprouts~species*moisture,random=~time|ID)
> 
> 
> Fixed effects: sprouts ~ species + moisture 
>  Value Std.Error  DF   t-value p-value
> (Intercept)   7.971267  1.330500 240  5.991180  0.
> speciesH -6.459344  1.536329 114 -4.204400  0.0001
> speciesP-10.063604  1.536329 114 -6.550421  0.
> speciesR -5.051894  1.536329 114 -3.288288  0.0013
> moisturemoist 2.228835  1.330500 114  1.675185  0.0966
> moisturewaterlogged  17.49  1.330500 114 12.860688  0.
> 
> 
> Fixed effects: sprouts ~ species * moisture 
>   Value Std.Error  DF   t-value p-value
> (Intercept)4.831965  1.750970 240  2.759594  0.0062
> speciesH  -4.464197  2.476245 108 -1.802809  0.0742
> speciesP  -3.986787  2.476245 108 -1.610013  0.1103
> speciesR  -0.809376  2.476245 108 -0.326856  0.7444
> moisturemoist  3.505506  2.476245 108  1.415654  0.1598
> moisturewaterlogged   24.766934  2.476245 108 10.001811  0.
> speciesH:moisturemoist-0.457291  3.501939 108 -0.130582  0.8963
> speciesP:moisturemoist-2.458125  3.501939 108 -0.701932  0.4842
> speciesR:moisturemoist-2.555356  3.501939 108 -0.729697  0.4672
> speciesH:moisturewaterlogged  -5.597498  3.501939 108 -1.598400  0.1129
> speciesP:moisturewaterlogged -15.538272  3.501939 108 -4.437048  0.
> speciesR:moisturewaterlogged -10.206874  3.501939 108 -2.914635  0.0043
> 
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
When there is an interaction effect, the main effects are difficult to
interpret. Your model is not a simple additive one when there is an
interaction. You can't predict the level of one factor without knowing
the level of the other factor. Given there is an interaction between
these factors, you could reparameterize it as a one-way analysis (i.e.,
just create 12 separate treatment groups). When there is an interaction,
you can't get a simple interpretation with just two factors.
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Use of processor by R 32bit on a 64bit machine

2010-06-29 Thread Uwe Ligges



On 29.06.2010 15:30, Joris Meys wrote:

Dear all,

I've recently purchased a new 64bit system with an intel i7 quadcore
processor. As I understood (maybe wrongly) that to date the 32bit
version of R is more stable than the 64bit, I installed the 32bit
version and am happily using it ever since. Now I'm running a whole
lot of models, which goes smoothly, and I thought out of curiosity to
check how much processor I'm using. I would have thought I used 25%
(being one core), as on my old dual core R uses 50% of the total
processor capacity. Funny, it turns out that R is currently using only
12-13% of my cpu, which is about half of what I expected.



An Intel Core i7 Quadcore has 8 virtual cores since it supports 
hyperthreading. R uses one of these virtual cores. Note that 2 virtual 
cores won't be twice as fast since they are running on the same physical 
core. Hence this is expected.


Uwe Ligges




Did I miss something somewhere? Should I change some settings? I'm
running on a Windows 7 enterprise. I looked around already, but I have
the feeling I overlooked something.

Cheers
Joris

sessionInfo()
R version 2.10.1 (2009-12-14)
i386-pc-mingw32

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United
States.1252LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C   LC_TIME=English_United
States.1252

attached base packages:
[1] grDevices datasets  splines   graphics  stats tcltk utils
methods   base

other attached packages:
[1] svSocket_0.9-48 TinnR_1.0.3 R2HTML_2.0.0Hmisc_3.7-0
survival_2.35-7

loaded via a namespace (and not attached):
[1] cluster_1.12.3 grid_2.10.1lattice_0.18-3 svMisc_0.9-57  tools_2.10.1




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Use of processor by R 32bit on a 64bit machine

2010-06-29 Thread Joris Meys
Dear all,

I've recently purchased a new 64bit system with an intel i7 quadcore
processor. As I understood (maybe wrongly) that to date the 32bit
version of R is more stable than the 64bit, I installed the 32bit
version and am happily using it ever since. Now I'm running a whole
lot of models, which goes smoothly, and I thought out of curiosity to
check how much processor I'm using. I would have thought I used 25%
(being one core), as on my old dual core R uses 50% of the total
processor capacity. Funny, it turns out that R is currently using only
12-13% of my cpu, which is about half of what I expected.

Did I miss something somewhere? Should I change some settings? I'm
running on a Windows 7 enterprise. I looked around already, but I have
the feeling I overlooked something.

Cheers
Joris

sessionInfo()
R version 2.10.1 (2009-12-14)
i386-pc-mingw32

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United
States.1252LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C   LC_TIME=English_United
States.1252

attached base packages:
[1] grDevices datasets  splines   graphics  stats tcltk utils
   methods   base

other attached packages:
[1] svSocket_0.9-48 TinnR_1.0.3 R2HTML_2.0.0Hmisc_3.7-0
survival_2.35-7

loaded via a namespace (and not attached):
[1] cluster_1.12.3 grid_2.10.1lattice_0.18-3 svMisc_0.9-57  tools_2.10.1


-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] mixed-effects model with two fixed effects: interaction

2010-06-29 Thread Thomas Stewart
IIona-

I think you may be misinterpreting the t-test.

In model 1, consider the speciesH coefficient.  A test that speciesH = 0,
essentially asks: Is speciesH the same as speciesB?  The test statistic for
this hypothesis is the t-value reported in the table.  (t-value= -4.2,
p-value=0.0001)

In model 2, the corresponding coefficient, t-value, and p-value do not
correspond to the same hypothesis test from model 1.

If your goal is to test the overall species effect, then the test you want
is:

model0<-lme(sprouts~moisture,random=~time|ID,method="ML")
model1<-lme(sprouts~species+moisture,random=~time|ID,method="ML")
model2<-lme(sprouts~species*moisture,random=~time|ID,method="ML")

anova(model0,model2)  #TEST OF SPECIES EFFECT IN MODEL WITH INTERACTION
anova(model0,model1)  #TEST OF SPECIES EFFECT IN MODEL WITHOUT INTERACTION

And as an added bonus, (which should probably be done before the test of
Species Effect)

anova(model1,model2)  #TEST OF INTERACTION.  That is, do I really need the
more complex model?

Hope that helps.
-tgs


On Tue, Jun 29, 2010 at 5:09 AM, Ilona Leyer  wrote:

> Dear all,
> In a greenhouse experiment we tested performance of 4 different species
> (B,H,P,R) under 3 different water levels in 10 replications. As response
> variable e.g. the number of emerging sprouts were measured on three dates. A
> simple Anova considering every measurement date separately shows a higly
> significant effect of species and moisture (and partly the interaction of
> both). The mixed-effects model with species and moisture shows a highly
> significant effect of species and moisture as well. However, when I included
> the interaction the t-values of the species dropped strongly and the SE
> increase and the results for the species are not significant anymore. For me
> this does not seem plausible. Has anybody an idea, how this can be
> interpreted and if I have done a mistake in calculating the data?
>
> Thanks in advance for any help!
> Ilona
>
>
> model1<-lme(sprouts~species+moisture,random=~time|ID)
> model2<-lme(sprouts~species*moisture,random=~time|ID)
>
>
> Fixed effects: sprouts ~ species + moisture
> Value Std.Error  DF   t-value p-value
> (Intercept)   7.971267  1.330500 240  5.991180  0.
> speciesH -6.459344  1.536329 114 -4.204400  0.0001
> speciesP-10.063604  1.536329 114 -6.550421  0.
> speciesR -5.051894  1.536329 114 -3.288288  0.0013
> moisturemoist 2.228835  1.330500 114  1.675185  0.0966
> moisturewaterlogged  17.49  1.330500 114 12.860688  0.
>
>
> Fixed effects: sprouts ~ species * moisture
>  Value Std.Error  DF   t-value p-value
> (Intercept)4.831965  1.750970 240  2.759594  0.0062
> speciesH  -4.464197  2.476245 108 -1.802809  0.0742
> speciesP  -3.986787  2.476245 108 -1.610013  0.1103
> speciesR  -0.809376  2.476245 108 -0.326856  0.7444
> moisturemoist  3.505506  2.476245 108  1.415654  0.1598
> moisturewaterlogged   24.766934  2.476245 108 10.001811  0.
> speciesH:moisturemoist-0.457291  3.501939 108 -0.130582  0.8963
> speciesP:moisturemoist-2.458125  3.501939 108 -0.701932  0.4842
> speciesR:moisturemoist-2.555356  3.501939 108 -0.729697  0.4672
> speciesH:moisturewaterlogged  -5.597498  3.501939 108 -1.598400  0.1129
> speciesP:moisturewaterlogged -15.538272  3.501939 108 -4.437048  0.
> speciesR:moisturewaterlogged -10.206874  3.501939 108 -2.914635  0.0043
>
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Performance enhancement for ave

2010-06-29 Thread Hadley Wickham
On Tue, Jun 29, 2010 at 8:02 AM, Matthew Dowle  wrote:
>
>> dt = data.table(d,key="grp1,grp2")
>> system.time(ans1 <- dt[ , list(mean(x),mean(y)) , by=list(grp1,grp2)])
>   user  system elapsed
>   3.89    0.00    3.91        # your 7.064 is 12.23 for me though, so this
> 3.9 should be faster for you
>
> However, Rprof() shows that 3.9 is mostly dispatch of mean to mean.default
> which then calls .Internal.  Because there are so many groups here, dispatch
> bites.
>
> So ...
>
>> system.time(ans2 <- dt[ , list(.Internal(mean(x)),.Internal(mean(y))),
>> by=list(grp1,grp2)])
>   user  system elapsed
>   0.20    0.00    0.21

Of course, we can perform the same optimisation with ave:

fast_mean <- function(x) .Internal(mean(x))
system.time({
  d$avx <- ave(d$x, interaction(d$grp1, d$grp2, drop = T), FUN = fast_mean)
  d$avy <- ave(d$y, interaction(d$grp1, d$grp2, drop = T), FUN = fast_mean)
})
#  user  system elapsed
# 3.109   0.188   3.302

Regardless, my point is that there's a simple fix available to make
ave much faster, not that it's the fastest thing out there.

Hadley

-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Performance enhancement for ave

2010-06-29 Thread Matthew Dowle

> dt = data.table(d,key="grp1,grp2")
> system.time(ans1 <- dt[ , list(mean(x),mean(y)) , by=list(grp1,grp2)])
   user  system elapsed
   3.890.003.91# your 7.064 is 12.23 for me though, so this 
3.9 should be faster for you

However, Rprof() shows that 3.9 is mostly dispatch of mean to mean.default 
which then calls .Internal.  Because there are so many groups here, dispatch 
bites.

So ...

> system.time(ans2 <- dt[ , list(.Internal(mean(x)),.Internal(mean(y))), 
> by=list(grp1,grp2)])
   user  system elapsed
   0.200.000.21

> identical(ans1,ans2)
TRUE



"Hadley Wickham"  wrote in message 
news:aanlktilh_-3_cycf_fnqmhh6w2og5jj5u0yopx_qa...@mail.gmail.com...
> library(plyr)
>
> n<-10
> grp1<-sample(1:750, n, replace=T)
> grp2<-sample(1:750, n, replace=T)
> d<-data.frame(x=rnorm(n), y=rnorm(n), grp1=grp1, grp2=grp2)
>
> system.time({
>  d$avx1 <- ave(d$x, list(d$grp1, d$grp2))
>  d$avy1 <- ave(d$y, list(d$grp1, d$grp2))
> })
> #   user  system elapsed
> # 39.300   0.279  40.809
> system.time({
>  d$avx2 <- ave(d$x, interaction(d$grp1, d$grp2, drop = T))
>  d$avy2 <- ave(d$y, interaction(d$grp1, d$grp2, drop = T))
> })
> #  user  system elapsed
> # 6.735   0.209   7.064
>
> all.equal(d$avy1, d$avy2)
> # TRUE
> all.equal(d$avx1, d$avx2)
> # TRUE
>
> i.e. ave should use g <- interaction(..., drop = TRUE)
>
> Hadley
>
> -- 
> Assistant Professor / Dobelman Family Junior Chair
> Department of Statistics / Rice University
> http://had.co.nz/
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Fwd: Fast and simple tool for re-sampling of asynchronous time series ?

2010-06-29 Thread bruno Piguet
[I replied to Gabor only, I think i may be interesting to cc the list, for
the record.]

-- Forwarded message --
From: bruno Piguet 
Date: 2010/6/29
Subject: Re: [R] Fast and simple tool for re-sampling of asynchronous time
series ?
To: Gabor Grothendieck 


2010/6/25 Gabor Grothendieck 

>
> The apply statements often have minimal performance advantage -- they
> are more for eliminating certain bookkeeping operations associated
> with loops and making the code more compact.
>

In this case, the real speed-up solution was a change in algorithm. If
time values are sorted (which is not a too unrealistic hypothesis), one
doesn't need to re-scan the whole vector to find the values close to the
next sample.

   So, I eventually coded in R just as in C ou Fortran :


Y_sync <- rep(NA, length(Tx))
jmax <- length(Ty)
j0 <- 1

for (i in 1:length(Tx))
{
   T_min <- Tx[i] - w
   T_max <- Tx[i] + w
   while ( j0<=jmax & Ty[j0] < T_min ) j0 = j0 + 1
   if (j0 > jmax)
  Y_sync[i] <- NA
   else {
  j1 <- j0
  while ( j1<=jmax & Ty[j1] <= T_max) j1 = j1 + 1
  Y_sync[i] <- mean(Y[j0:(j1-1)])
   }
}

which gives something fast enough and simple enough for my current needs.


Bruno.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] linear predicted values of the index function in an ordered probit model

2010-06-29 Thread David Winsemius


On Jun 28, 2010, at 10:58 AM, Martin Spindler wrote:


Hello,



currently I am estimating an ordered probit model with the function  
polr

(MASS package).

Is there a simple way to obtain values for the prediction of the index
function ($X*\hat{\beta}$)?

(E..g. in the GLM function there is  the linear.prediction value for  
this

purpose).


Read the help page:

"There are methods for the standard model-fitting functions, including  
predict, "




If not, is there another function / package where this feature is
implemented?


Even though polr has predict, you can also get a proportional odds  
model in the rms package (and I would not be surprised if a search  
turned up more options.


--
David.




Thank you very much for your answer in advance!



Best,



Martin


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] table() of a factor

2010-06-29 Thread Robin Hankin

thanks everyone.

I think the motto should be "always specify the levels of a factor when 
you create it

if you possibly can".


best wishes

Robin



On 06/29/2010 12:39 PM, Felix Andrews wrote:

Just use factor(), not levels(); you can pass a factor to factor() too.

   

x<- factor(c(rep("a",3),"b","d"), levels = letters[1:5])
table(x)
 

x
a b c d e
3 1 0 1 0

Cheers,
-Felix


On 29 June 2010 20:59, Robin Hankin  wrote:
   

Hi

suppose I have a factor 'x':

 

x<- as.factor(c(rep("a",3),"b","d"))
table(x)
   

x
a b d
3 1 1
 


   

But this is not what I want because
I need to include the fact that the count of "c" is zero.

I can't just change the levels of x:

 

levels(x)<- c("a","b","c","d")
table(x)
   

x
a b c d
3 1 1 0
 
   

because this records the single "d" in the original 'x' as a "c".


What I want is:

a b c d
3 1 0 1


How to get this from 'x'?
(my real application has dozens of levels with complicated names).



--
Robin K. S. Hankin
Uncertainty Analyst
University of Cambridge
19 Silver Street
Cambridge CB3 9EP
01223-764877

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

 



   



--
Robin K. S. Hankin
Uncertainty Analyst
University of Cambridge
19 Silver Street
Cambridge CB3 9EP
01223-764877

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] mixed-effects model with two fixed effects: interaction

2010-06-29 Thread Setlhare Lekgatlhamang
When I replied to this message I just hit the reply button. I am
resending it using reply to all, in case it did not go to the list.

Dear Ilona,

Looking at the estimation results you have, I think your regression
equations are correctly specified. Just thinking aloud, I do not think
the results are surprising. Model2 includes more (relevant) regressors
than model1. In this green house experiment, one would expect
performance in some cases to be jointly determined by the specifies and
level of moisture. In that case, the explanatory power of
non-interactive terms will drop or vanish when the interactive terms are
also included - meanwhile the interactive terms would be significant. I
may be wrong but that is my initial thought.

Regards
Lexi

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
On Behalf Of Ilona Leyer
Sent: Tuesday, June 29, 2010 11:10 AM
To: r-help@r-project.org
Subject: [R] mixed-effects model with two fixed effects: interaction

Dear all,
In a greenhouse experiment we tested performance of 4 different species
(B,H,P,R) under 3 different water levels in 10 replications. As response
variable e.g. the number of emerging sprouts were measured on three
dates. A simple Anova considering every measurement date separately
shows a higly significant effect of species and moisture (and partly the
interaction of both). The mixed-effects model with species and moisture
shows a highly significant effect of species and moisture as well.
However, when I included the interaction the t-values of the species
dropped strongly and the SE increase and the results for the species are
not significant anymore. For me this does not seem plausible. Has
anybody an idea, how this can be interpreted and if I have done a
mistake in calculating the data? 

Thanks in advance for any help!
Ilona


model1<-lme(sprouts~species+moisture,random=~time|ID)
model2<-lme(sprouts~species*moisture,random=~time|ID)


Fixed effects: sprouts ~ species + moisture 
 Value Std.Error  DF   t-value p-value
(Intercept)   7.971267  1.330500 240  5.991180  0.
speciesH -6.459344  1.536329 114 -4.204400  0.0001
speciesP-10.063604  1.536329 114 -6.550421  0.
speciesR -5.051894  1.536329 114 -3.288288  0.0013
moisturemoist 2.228835  1.330500 114  1.675185  0.0966
moisturewaterlogged  17.49  1.330500 114 12.860688  0.


Fixed effects: sprouts ~ species * moisture 
  Value Std.Error  DF   t-value p-value
(Intercept)4.831965  1.750970 240  2.759594  0.0062
speciesH  -4.464197  2.476245 108 -1.802809  0.0742
speciesP  -3.986787  2.476245 108 -1.610013  0.1103
speciesR  -0.809376  2.476245 108 -0.326856  0.7444
moisturemoist  3.505506  2.476245 108  1.415654  0.1598
moisturewaterlogged   24.766934  2.476245 108 10.001811  0.
speciesH:moisturemoist-0.457291  3.501939 108 -0.130582  0.8963
speciesP:moisturemoist-2.458125  3.501939 108 -0.701932  0.4842
speciesR:moisturemoist-2.555356  3.501939 108 -0.729697  0.4672
speciesH:moisturewaterlogged  -5.597498  3.501939 108 -1.598400  0.1129
speciesP:moisturewaterlogged -15.538272  3.501939 108 -4.437048  0.
speciesR:moisturewaterlogged -10.206874  3.501939 108 -2.914635  0.0043


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



DISCLAIMER:\ Sample Disclaimer added in a VBScript.\ ...{{dropped:3}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] table() of a factor

2010-06-29 Thread Felix Andrews
Just use factor(), not levels(); you can pass a factor to factor() too.

> x <- factor(c(rep("a",3),"b","d"), levels = letters[1:5])
> table(x)
x
a b c d e
3 1 0 1 0

Cheers,
-Felix


On 29 June 2010 20:59, Robin Hankin  wrote:
> Hi
>
> suppose I have a factor 'x':
>
>> x <- as.factor(c(rep("a",3),"b","d"))
>> table(x)
> x
> a b d
> 3 1 1
>>
>>
>
> But this is not what I want because
> I need to include the fact that the count of "c" is zero.
>
> I can't just change the levels of x:
>
>> levels(x) <- c("a","b","c","d")
>> table(x)
> x
> a b c d
> 3 1 1 0
>>
>
> because this records the single "d" in the original 'x' as a "c".
>
>
> What I want is:
>
> a b c d
> 3 1 0 1
>
>
> How to get this from 'x'?
> (my real application has dozens of levels with complicated names).
>
>
>
> --
> Robin K. S. Hankin
> Uncertainty Analyst
> University of Cambridge
> 19 Silver Street
> Cambridge CB3 9EP
> 01223-764877
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Felix Andrews / 安福立
Integrated Catchment Assessment and Management (iCAM) Centre
Fenner School of Environment and Society [Bldg 48a]
The Australian National University
Canberra ACT 0200 Australia
M: +61 410 400 963
T: + 61 2 6125 4670
E: felix.andr...@anu.edu.au
CRICOS Provider No. 00120C
-- 
http://www.neurofractal.org/felix/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] gsub issue in R 2.11.1, but not present in 2.9.2

2010-06-29 Thread Uwe Ligges



On 29.06.2010 12:47, Jason Rupert wrote:

Previously in R 2.9.2 I used the following to convert from an improperly 
formatted NA string into one that is a bit more consistent.


gsub("N\A", "NA", "N\A", fixed=TRUE)

This worked in R 2.9.2, but now in R 2.11.1 it doesn't seem to work an throws 
the following error.
Error: '\A' is an unrecognized escape in character string starting "N\A"

I guess my questions are the following:
(1) Is this expected behavior?
(2) If it is expected behavior, what is the proper way to replace "N\A" with "NA" and 
"N\\A" with "NA"?



If your original text "thestring" contains "N\A", then the R 
representation is "N\\A", and hence


gsub("N\\A", "NA", thestring)

If you want to try explicitly, you need to write

gsub("N\\A", "NA", "N\\A")

If you original text contains two backslashes, both have to be escaped as in

gsub("NA", "NA", thestring)

Uwe Ligges



Thank you again for all the help and insight.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Need help for SVM code for microarray classification

2010-06-29 Thread Aadhithya

Following is the error I am getting:
Error in svm.default(train, cl) : 
  Need numeric dependent variable for regression.
My dataset looks like this in both training and testing:
ALL ALL ALL ALL ALL ALL ALL ALL ALL ALL 
AML AML AML AML AML AML AML AML AML
AML
0.9389671   1.0892019   0.24647887  0.57042253  0.10798122  
0.58685446  0.0
0.5399061   0.20422535  0.2488263   0.84976524  0.7910798   
0.39906102  0.5633803
1.0938967   0.86384976  1.0633802   0.7136150.5375587   
0.07042254
1.7179487   0.0 0.32051283  0.012820513 0.0 0.0 
0.2820513   0.98717946  0.26923078
0.07692308  0.0 0.0 0.24358974  0.0 0.0 0.46153846  
0.0 0.20512821  0.20512821
0.0
1.4024506   0.20640905  0.10084826  0.09142318  0.037700284 
0.07257304  0.1206409
0.14514609  2.0 0.11310085  0.030160226 0.15834118  
0.00282752121.1630538
0.14137606  0.31479737  0.2544769   0.12629595  0.24222432  
0.0028275212
 first line Has the class whether it is ALL or AML class and from the next
line I have the expression values
is this the right way to give the dataset to R for SVM classification?
Thanks a lot for immediate reply. I am really grateful.
- Aadhithya
-- 
View this message in context: 
http://r.789695.n4.nabble.com/Need-help-for-SVM-code-for-microarray-classification-tp2271652p2272045.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


  1   2   >