Re: [R] General help - online statistics courses?

2011-10-10 Thread kanishkporwal
You should try  http://math.tutorvista.com/statistics.html Tutorvista.com  
the site offers http://math.tutorvista.com/statistics.html  online
statistics help  to anyone that is in need. The site is very interactive and
helps you get the ace in your desired subject.  

--
View this message in context: 
http://r.789695.n4.nabble.com/General-help-online-statistics-courses-tp3799327p3889240.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] ANOVA from imported data has only 1 degree of freedom

2011-10-10 Thread shardman
Hi David,

Apologies again and thankyou for your help, I've edited my original post to
clarify what I was asking. What I meant was that the factor had only 1
degrees of freedom when it should have had 2 (14 in total), so you're right
there were 14 but not in the right place.

In SPSS you select one column as a factor and another as a dependent
variable so this wouldn't happen, it's easy to use but not that versatile
and very expensive. I've been told good things about R so I'm trying to
teach myself.

I followed your suggestion and I now have the results I need,
I'll take more care with posting in future,

Yours,
Sam

--
View this message in context: 
http://r.789695.n4.nabble.com/ANOVA-from-imported-data-has-only-1-degree-of-freedom-tp3887528p3888322.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] HOW TO PASS MY JAVA ARGUMENT INTO RSCRIPT FILE

2011-10-10 Thread janarthanan murugesan
Hi ,

I am working in Eclipse IDE , I want to use rscript to produce statistical
analysis , I tested a sample rcode in the script its working fine in my
Eclipse IDE , but I don't know how to pass my java values into rscript . I
need some guidance ,Please help me .


Thanks ,

Janarthanan .M

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] convert apply to lappy

2011-10-10 Thread Alaios
Thanks alot for the tip.
Worked :)




From: Joshua Wiley jwiley.ps...@gmail.com

Cc: R-help@r-project.org R-help@r-project.org
Sent: Sunday, October 9, 2011 6:32 PM
Subject: Re: [R] convert apply to lappy

Hi Alex,

If data is a matrix, probably the easiest option would be:

tips - as.data.frame(data)

mclapply(tips, foo)

By the way, I would recommend not using 'data' (which is also a
function) as the name of the object storing your data.  If your data
set has many columns and performance is an issue I might convert it to
a list instead of a data frame.  Note that if you wanted the
equivalent of apply(tips, 1, foo), you could transpose your matrix
first:  as.data.frame(t(data)).  lapply works on columns of a data
frame because each column is basically an element of a list (list
apply).

Cheers,

Josh


 Dear all I want to convert a apply to lapply. The reason for that is that 
 there is a function mclappy that uses exact the same format as the lapply 
 function.

 My code looks like that

 mean_power_per_tip - function(data) {
     return((apply(data[,],2,MeanTip)));
 }

 where data is a [m,n] matrix.

 I would like to thank you in advance for your help

 B.R
 Alex

        [[alternative HTML version deleted]]


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.





-- 
Joshua Wiley
Ph.D. Student, Health Psychology
Programmer Analyst II, ATS Statistical Consulting Group
University of California, Los Angeles
https://joshuawiley.com/
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Vector-subsetting with ZERO - Is behavior changeable?

2011-10-10 Thread Johannes Graumann
Thank you very much. Learned something again!

Joh

William Dunlap wrote:

 You can use [1] on the output of FUN to ensure that
 exactly one value (perhaps NA from numeric(0)[1]) is
 returned.  E.g.
 
index - 1
sapply(list(c(1,2,3),c(1,2),c(1)),function(x){x[max(length(x)-
index,0)][1]})
   [1]  2  1 NA
 
 I'll also put in a plug for vapply, which throws an
 error if FUN does not return what you expect it to:
 
vapply(list(c(1,2,3),c(1,2),c(1)),function(x){x[max(length(x)-
index,0)]},
FUN.VALUE=numeric(1))
   Error in vapply(list(c(1, 2, 3), c(1, 2), c(1)), function(x) { :
 values must be length 1,
but FUN(X[[3]]) result is length 0
vapply(list(c(1,2,3),c(1,2),c(1)),function(x){x[max(length(x)-
index,0)][1]},
FUN.VALUE=numeric(1))
   [1]  2  1 NA
 
 For long input vectors vapply can save a fair bit of
 memory and time over sapply.
 
 Bill Dunlap
 Spotfire, TIBCO Software
 wdunlap tibco.com
 
 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
 On Behalf Of Johannes Graumann
 Sent: Wednesday, October 05, 2011 4:29 AM
 To: r-h...@stat.math.ethz.ch
 Subject: [R] Vector-subsetting with ZERO - Is behavior changeable?
 
 Dear All,
 
 I have trouble generizising some code.
 
  index - 0
  sapply(list(c(1,2,3),c(1,2),c(1)),function(x){x[max(length(x)-
index,0)]})
 Will yield a wished for vector like so:
 [1] 3 2 1
 
 But in this case (trying to select te second to last element in each
 vector of the list)
  index - 1
  sapply(list(c(1,2,3),c(1,2),c(1)),function(x){x[max(length(x)-
index,0)]})
 I end up with
 [[1]]
 [1] 2
 
 [[2]]
 [1] 1
 
 [[3]]
 numeric(0)
 
 I would (massively) prefer something like
 [1] 2 1 NA
 
 My current implementation looks like
  index - 1
  unlist(
   sapply(
list(c(1,2,3),c(1,2),c(1)),
function(x){
  value - x[max(length(x)-index,0)]
  if(identical(value,numeric(0))){return(NA)} else {return(value)}
}
   )
  )
 [1]  2  1 NA
 
 Quite the inelegant eyesore.
 
 Any hints on how to do this better?
 
 Thanks, Joh
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html and provide commented,
 minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Handling Time in R

2011-10-10 Thread Alaios
Thanks a lot. 

That helped.
One thing now is to have the difftime(y,x) to always report seconds. There are 
times that there is a change in the day and thus the diff will report few days 
difference. How can it always report only seconds?

I would like to thank you in advance for your help

B.R
Alex




From: jim holtman jholt...@gmail.com

Cc: R-help@r-project.org R-help@r-project.org
Sent: Friday, October 7, 2011 5:34 PM
Subject: Re: [R] Handling Time in R

?ISOdatetime


 x - ISOdatetime(2011,10,6,16,23,30.539)
 str(x)
POSIXct[1:1], format: 2011-10-06 16:23:30
 y - ISOdatetime(2011,10,6,16,23,30.939)
 difftime(y,x)
Time difference of 0.399 secs




 Dear all,
 I would like to ask your help regarding handling time stamps in R. I think 
 first I need a reference to read about their logic and how I should handle 
 them.

 For example, this is a struct I have


 str(MyStruct$TimeStamps)
  num [1:100, 1:6] 2011 2011 2011 2011 2011 ...

 MyStruct$TimeStamps[1,]
 [1] 2011.000   10.000    6.000   16.000   23.000   30.539

 the last field contains seconds.milliseconds.

 How I can for example make calculations with time stamps like see if the
 MyStruct$TimeStamps[1,]-MyStruct$TimeStamps[2,] differ more than 
 300millisecond, or 3 days have passed?

 I would like to thank you in advance for your suggestions

 B.R
 Alex

        [[alternative HTML version deleted]]


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.





-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Handling Time in R

2011-10-10 Thread Jim Holtman
check the help page; there is a parameter ('units' I thnk) that will let you 
specify that.

Sent from my iPad

On Oct 10, 2011, at 2:57, Alaios ala...@yahoo.com wrote:

 Thanks a lot. 
 That helped.
 One thing now is to have the difftime(y,x) to always report seconds. There 
 are times that there is a change in the day and thus the diff will report few 
 days difference. How can it always report only seconds?
 
 I would like to thank you in advance for your help
 
 B.R
 Alex
 
 From: jim holtman jholt...@gmail.com
 To: Alaios ala...@yahoo.com
 Cc: R-help@r-project.org R-help@r-project.org
 Sent: Friday, October 7, 2011 5:34 PM
 Subject: Re: [R] Handling Time in R
 
 ?ISOdatetime
 
 
  x - ISOdatetime(2011,10,6,16,23,30.539)
  str(x)
 POSIXct[1:1], format: 2011-10-06 16:23:30
  y - ISOdatetime(2011,10,6,16,23,30.939)
  difftime(y,x)
 Time difference of 0.399 secs
 
 
 
 On Fri, Oct 7, 2011 at 11:04 AM, Alaios ala...@yahoo.com wrote:
  Dear all,
  I would like to ask your help regarding handling time stamps in R. I think 
  first I need a reference to read about their logic and how I should handle 
  them.
 
  For example, this is a struct I have
 
 
  str(MyStruct$TimeStamps)
   num [1:100, 1:6] 2011 2011 2011 2011 2011 ...
 
  MyStruct$TimeStamps[1,]
  [1] 2011.000   10.0006.000   16.000   23.000   30.539
 
  the last field contains seconds.milliseconds.
 
  How I can for example make calculations with time stamps like see if the
  MyStruct$TimeStamps[1,]-MyStruct$TimeStamps[2,] differ more than 
  300millisecond, or 3 days have passed?
 
  I would like to thank you in advance for your suggestions
 
  B.R
  Alex
 
 [[alternative HTML version deleted]]
 
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 
 
 
 
 -- 
 Jim Holtman
 Data Munger Guru
 
 What is the problem that you are trying to solve?
 
 

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] HOW TO PASS MY JAVA ARGUMENT INTO RSCRIPT FILE

2011-10-10 Thread Marcel.
Hello,

use the rJava library to execute R code from Java or transfer values from
Java to R:

http://www.rforge.net/rJava/ http://www.rforge.net/rJava/ 





--
View this message in context: 
http://r.789695.n4.nabble.com/HOW-TO-PASS-MY-JAVA-ARGUMENT-INTO-RSCRIPT-FILE-tp3889327p3889457.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] cspade error

2011-10-10 Thread lenp
That's a very late answer, but I just ran into the same problem and thought
maybe someone else (someone browsing the archives, for instance) would
appreciate a tip. There are maybe empty lines in your data file
file_name.txt. If you remove them, or remove the corresponding
transactions in data_ex, it should work (at least, it works for me). 

--
View this message in context: 
http://r.789695.n4.nabble.com/cspade-error-tp3774834p3889448.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Handling Time in R

2011-10-10 Thread Alaios
This did the trick

as.numeric(diff(c(ISOdatetime(2011,6,1,11,59,1.09),ISOdatetime(2011,6,5,11,59,1.09
[1] 345600





From: Jim Holtman jholt...@gmail.com

Cc: R-help@r-project.org R-help@r-project.org
Sent: Monday, October 10, 2011 9:26 AM
Subject: Re: [R] Handling Time in R


check the help page; there is a parameter ('units' I thnk) that will let you 
specify that.

Sent from my iPad




Thanks a lot. 

That helped.
One thing now is to have the difftime(y,x) to always report seconds. There are 
times that there is a change in the day and thus the diff will report few days 
difference. How can it always report only seconds?


I would like to thank you in advance for your help


B.R
Alex





From: jim holtman jholt...@gmail.com

Cc: R-help@r-project.org R-help@r-project.org
Sent: Friday, October 7, 2011 5:34 PM
Subject: Re: [R] Handling Time in R

?ISOdatetime


 x - ISOdatetime(2011,10,6,16,23,30.539)
 str(x)
POSIXct[1:1], format: 2011-10-06 16:23:30
 y - ISOdatetime(2011,10,6,16,23,30.939)
 difftime(y,x)
Time difference of 0.399 secs




 Dear all,
 I would like to ask your help regarding handling time stamps in R. I think 
 first I need a reference to read about their logic and how I should handle 
 them.

 For example, this is a struct I have


 str(MyStruct$TimeStamps)
  num [1:100, 1:6] 2011 2011 2011 2011 2011 ...

 MyStruct$TimeStamps[1,]
 [1] 2011.000   10.000    6.000   16.000   23.000   30.539

 the last field contains
 seconds.milliseconds.

 How I can for example make calculations with time stamps like see if the
 MyStruct$TimeStamps[1,]-MyStruct$TimeStamps[2,] differ more than 
 300millisecond, or 3 days have passed?

 I would like to thank you in advance for your suggestions

 B.R
 Alex

        [[alternative HTML version deleted]]


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.





-- 
Jim Holtman
Data Munger
 Guru

What is the problem that you are trying to solve?



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] MGARCH BEKK estimation

2011-10-10 Thread upananda pani
Dear All,

I want to estimate Bivariate Garch Model using MGARCHBEKK package. I am not
able to understand some part of the command this function.

mvBEKK.est(eps=lrdata, order = c(1,1), params = NULL,
fixed = NULL, method = BFGS, verbose = F)

Here what exactly the eps refers to ? It would be really useful if somebody
can suggest me the meaning.

With regards,
Upananda

-- 


You may delay, but time will not.


Research Scholar
alternative mail id: up...@iitkgp.ac.in
Department of HSS, IIT KGP
KGP

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Odp: help with statistics in R - how to measure the effect of users in groups

2011-10-10 Thread Petr PIKAL
Hi

I do not understand much about your equations. I think you shall look to 
Practical Regression and Anova Using R from J.Faraway.

Having data frame DF with columns - users, groups, results you could do 

fit - lm(results~groups, data = DF)

Regards
Petr




 
 Hi,
 
 I'm a newbie to R. My knowledge of statistics is mostly self-taught. My
 problem is how to measure the effect of users in groups. I can calculate 
a
 particular attribute for a user in a group. But my hypothesis is that 
the
 user's attribute is not independent of each other and that the user's
 attribute depends on the group ie that user's behaviour change based on 
the
 group.
 
 Let me give an example:
 
 users*Group 1*Group 2*Group 3
 u1*10*5*n/a
 u2*6*n/a*4
 u3*5*2*3
 
 For example, I want to be able to prove that u1 behaviour is different 
in
 group 1 than other groups and the particular thing about Group 1 is that
 users in Group 1 tend to have a higher value of the attribute under
 measurement.
 
 
 Hence, can use R to test my hypothesis. I'm willing to learn; so if this 
is
 very simple, just point me in the direction of any online resources 
about
 it. At the moment, I don't even how to define these class of problems? 
That
 will be a start.
 
 Regards
 Gawesh
 
[[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Handling Time in R

2011-10-10 Thread Jeff Newmiller
Difftime doesn't report things. When you print it, it automatically selects 
an appropriate human-readable unit to display in, but that does not change its 
internal representation. If you must convert to seconds, you can do so using 
the as.double generic (as.double.difftime) with a units parameter.
---
Jeff Newmiller The . . Go Live...
DCN:jdnew...@dcn.davis.ca.us Basics: ##.#. ##.#. Live Go...
Live: OO#.. Dead: OO#.. Playing
Research Engineer (Solar/Batteries O.O#. #.O#. with
/Software/Embedded Controllers) .OO#. .OO#. rocks...1k
--- 
Sent from my phone. Please excuse my brevity.

Alaios ala...@yahoo.com wrote:

Thanks a lot. 

That helped.
One thing now is to have the difftime(y,x) to always report seconds. There are 
times that there is a change in the day and thus the diff will report few days 
difference. How can it always report only seconds?

I would like to thank you in advance for your help

B.R
Alex



_

From: jim holtman jholt...@gmail.com

Cc: R-help@r-project.org R-help@r-project.org
Sent: Friday, October 7, 2011 5:34 PM
Subject: Re: [R] Handling Time in R

?ISOdatetime


 x - ISOdatetime(2011,10,6,16,23,30.539)
 str(x)
POSIXct[1:1], format: 2011-10-06 16:23:30
 y - ISOdatetime(2011,10,6,16,23,30.939)
 difftime(y,x)
Time difference of 0.399 secs




 Dear all,
 I would like to ask your help regarding handling time stamps in R. I think 
 first I need a reference to read about their logic and how I should handle 
 them.

 For example, this is a struct I have


 str(MyStruct$TimeStamps)
 �num [1:100, 1:6] 2011 2011 2011 2011 2011 ...

 MyStruct$TimeStamps[1,]
 [1] 2011.000�� 10.000��� 6.000�� 16.000�� 23.000�� 
 30.539

 the last field contains seconds.milliseconds.

 How I can for example make calculations with time stamps like see if the
 MyStruct$TimeStamps[1,]-MyStruct$TimeStamps[2,] differ more than 
 300millisecond, or 3 days have passed?

 I would like to thank you in advance for your suggestions

 B.R
 Alex

 � � � �[[alternative HTML version deleted]]


_

 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.





-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
[[alternative HTML version deleted]]

_

R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Handling Time in R

2011-10-10 Thread Alaios
Do you mean something like that?

as.double(diff(c(ISOdatetime(2011,6,1,11,59,1.09),ISOdatetime(2011,6,5,11,59,1.09))),length=20)
[1] 345600





From: Jeff Newmiller jdnew...@dcn.davis.ca.us

Cc: R-help@r-project.org R-help@r-project.org
Sent: Monday, October 10, 2011 10:42 AM
Subject: Re: [R] Handling Time in R


Difftime doesn't report things. When you print it, it automatically selects 
an appropriate human-readable unit to display in, but that does not change its 
internal representation. If you must convert to seconds, you can do so using 
the as.double generic (as.double.difftime) with a units parameter.
---
Jeff NewmillerThe .   .  Go Live...
DCN:jdnew...@dcn.davis.ca.usBasics: ##.#.   ##.#.  Live Go...
Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
/Software/Embedded Controllers)   .OO#.   .OO#.  rocks...1k
---
Sent from my phone. Please excuse my brevity.



Thanks a lot. 

That helped.
One thing now is to have the difftime(y,x) to always report seconds. There are 
times that there is a change in the day and thus the diff will report few days 
difference. How can it always report only seconds?

I would like to thank you in advance for your help

B.R
Alex





From: jim holtman jholt...@gmail.com

Cc: R-help@r-project.org R-help@r-project.org
Sent: Friday, October 7, 2011 5:34 PM
Subject: Re: [R] Handling Time in R

?ISOdatetime


 x - ISOdatetime(2011,10,6,16,23,30.539)
 str(x)
POSIXct[1:1], format: 2011-10-06 16:23:30
 y - ISOdatetime(2011,10,6,16,23,30.939)
 difftime(y,x)
Time difference of 0.399 secs




 Dear all,
 I would
  like
to ask your help regarding handling time stamps in R. I think first I need a 
reference to read about their logic and how I should handle them.

 For example, this is a struct I have


 str(MyStruct$TimeStamps)
 �num [1:100, 1:6] 2011 2011 2011 2011 2011 ...

 MyStruct$TimeStamps[1,]
 [1] 2011.000�� 10.000��� 6.000�� 16.000�� 23.000�� 
 30.539

 the last field contains seconds.milliseconds.

 How I can for example make calculations with time stamps like see if the
 MyStruct$TimeStamps[1,]-MyStruct$TimeStamps[2,] differ more than 
 300millisecond, or 3 days have passed?

 I would like to thank you in advance for your suggestions

 B.R
 Alex

 � � � �[[alternative HTML version deleted]]





 R-help@r-project.org mailing list
 br
/ https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.





-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
[[alternative HTML version deleted]]



R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] help with statistics in R - how to measure the effect of users in groups

2011-10-10 Thread gj
Hi Petr,

It's not an equation. It's my mistake; the * are meant to be field
separators for the example data. I should have just use blank spaces as
follows:

users   Group1   Group2   Group3
u110   5N/A
u2 6  N/A  4
u3 5   23


Regards
Gawesh

On Mon, Oct 10, 2011 at 9:32 AM, Petr PIKAL petr.pi...@precheza.cz wrote:

 Hi

 I do not understand much about your equations. I think you shall look to
 Practical Regression and Anova Using R from J.Faraway.

 Having data frame DF with columns - users, groups, results you could do

 fit - lm(results~groups, data = DF)

 Regards
 Petr




 
  Hi,
 
  I'm a newbie to R. My knowledge of statistics is mostly self-taught. My
  problem is how to measure the effect of users in groups. I can calculate
 a
  particular attribute for a user in a group. But my hypothesis is that
 the
  user's attribute is not independent of each other and that the user's
  attribute depends on the group ie that user's behaviour change based on
 the
  group.
 
  Let me give an example:
 
  users*Group 1*Group 2*Group 3
  u1*10*5*n/a
  u2*6*n/a*4
  u3*5*2*3
 
  For example, I want to be able to prove that u1 behaviour is different
 in
  group 1 than other groups and the particular thing about Group 1 is that
  users in Group 1 tend to have a higher value of the attribute under
  measurement.
 
 
  Hence, can use R to test my hypothesis. I'm willing to learn; so if this
 is
  very simple, just point me in the direction of any online resources
 about
  it. At the moment, I don't even how to define these class of problems?
 That
  will be a start.
 
  Regards
  Gawesh
 
 [[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] help with statistics in R - how to measure the effect of users in groups

2011-10-10 Thread Petr PIKAL
 
 Hi Petr,
 
 It's not an equation. It's my mistake; the * are meant to be field
 separators for the example data. I should have just use blank spaces as
 follows:
 
 users   Group1   Group2   Group3
 u110   5N/A
 u2 6  N/A  4
 u3 5   23
 
 
 Regards
 Gawesh

OK. You shall transform your data to long format to use lm

test - read.table(clipboard, header=T, na.strings=N/A)
test.m-melt(test)
Using users as id variables
fit-lm(value~variable, data=test.m)
summary(fit)

Call:
lm(formula = value ~ variable, data = test.m)

Residuals:
   1234689 
 3.0 -1.0 -2.0  1.5 -1.5  0.5 -0.5 

Coefficients:
   Estimate Std. Error t value Pr(|t|) 
(Intercept)   7.000  1.258   5.563  0.00511 **
variableGroup2   -3.500  1.990  -1.759  0.15336 
variableGroup3   -3.500  1.990  -1.759  0.15336 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

Residual standard error: 2.179 on 4 degrees of freedom
  (2 observations deleted due to missingness)
Multiple R-squared: 0.525,  Adjusted R-squared: 0.2875 
F-statistic: 2.211 on 2 and 4 DF,  p-value: 0.2256 

No difference among groups, but I am not sure if this is the correct way 
to evaluate.

library(ggplot2)
p-ggplot(test.m, aes(x=variable, y=value, colour=users))
p+geom_point()

There is some sign that user3 has lowest value in each group. However for 
including users to fit there is not enough data.

Regards
Petr 



 
 On Mon, Oct 10, 2011 at 9:32 AM, Petr PIKAL petr.pi...@precheza.cz 
wrote:
 
  Hi
 
  I do not understand much about your equations. I think you shall look 
to
  Practical Regression and Anova Using R from J.Faraway.
 
  Having data frame DF with columns - users, groups, results you could 
do
 
  fit - lm(results~groups, data = DF)
 
  Regards
  Petr
 
 
 
 
  
   Hi,
  
   I'm a newbie to R. My knowledge of statistics is mostly self-taught. 
My
   problem is how to measure the effect of users in groups. I can 
calculate
  a
   particular attribute for a user in a group. But my hypothesis is 
that
  the
   user's attribute is not independent of each other and that the 
user's
   attribute depends on the group ie that user's behaviour change based 
on
  the
   group.
  
   Let me give an example:
  
   users*Group 1*Group 2*Group 3
   u1*10*5*n/a
   u2*6*n/a*4
   u3*5*2*3
  
   For example, I want to be able to prove that u1 behaviour is 
different
  in
   group 1 than other groups and the particular thing about Group 1 is 
that
   users in Group 1 tend to have a higher value of the attribute under
   measurement.
  
  
   Hence, can use R to test my hypothesis. I'm willing to learn; so if 
this
  is
   very simple, just point me in the direction of any online resources
  about
   it. At the moment, I don't even how to define these class of 
problems?
  That
   will be a start.
  
   Regards
   Gawesh
  
  [[alternative HTML version deleted]]
  
   __
   R-help@r-project.org mailing list
   https://stat.ethz.ch/mailman/listinfo/r-help
   PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
   and provide commented, minimal, self-contained, reproducible code.
 
 
 
[[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Converting factor into date

2011-10-10 Thread Vikram Bahure
Dear R users,

I have an elementary query.

I have a dataset which is taken from text file with the help of read.csv
command but when I generate the data in R file it converts the Dates into
factor.So for the above problem, I use as.Date to convert the Dates from
factor form to date format using the following: z has Date as a column.

*z- read.csv(data, header = TRUE, sep = \t)
z$Date- as.Date(z$Date, format = %d/%m/%y/)

*But during this operation I loose all my dates and I get NA's instead of
it.

It would be helpful to have your inputs.

Regards
Vikram

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Handling Time in R

2011-10-10 Thread Jeff Newmiller
No, read ?difftime and look at as.double. There is a units parameter that you 
must set if you want predictable results.
---
Jeff Newmiller The . . Go Live...
DCN:jdnew...@dcn.davis.ca.us Basics: ##.#. ##.#. Live Go...
Live: OO#.. Dead: OO#.. Playing
Research Engineer (Solar/Batteries O.O#. #.O#. with
/Software/Embedded Controllers) .OO#. .OO#. rocks...1k
--- 
Sent from my phone. Please excuse my brevity.

Alaios ala...@yahoo.com wrote:

Do you mean something like that?


as.double(diff(c(ISOdatetime(2011,6,1,11,59,1.09),ISOdatetime(2011,6,5,11,59,1.09))),length=20)
[1] 345600


_
From: Jeff Newmiller jdnew...@dcn.davis.ca.us
To: Alaios ala...@yahoo.com; jim holtman jholt...@gmail.com
Cc: R-help@r-project.org R-help@r-project.org
Sent: Monday, October 10, 2011 10:42 AM
Subject: Re: [R] Handling Time in R

Difftime doesn't report things. When you print it, it automatically selects 
an appropriate human-readable unit to display in, but that does not change its 
internal representation. If you must convert to seconds, you can do so using 
the as.double generic (as.double.difftime) with a units parameter.
---
Jeff Newmiller The . . Go Live...
DCN:jdnew...@dcn.davis.ca.us Basics: ##.#. ##.#. Live Go...
Live: OO#.. Dead: OO#.. Playing
Research Engineer (Solar/Batteries O.O#. #.O#. with
/Software/Embedded Controllers) .OO#. .OO#. rocks...1k
--- 
Sent from my phone. Please excuse my brevity.

Alaios ala...@yahoo.com wrote:

Thanks a lot. 

That helped.
One thing now is to have the difftime(y,x) to always report seconds. There are 
times that there is a change in the day and thus the diff will report few days 
difference. How can it always report only seconds?

I would like to thank you in advance for your help

B.R
Alex



_

From: jim holtman jholt...@gmail.com

Cc: R-help@r-project.org R-help@r-project.org
Sent: Friday, October 7, 2011 5:34 PM
Subject: Re: [R] Handling Time in R

?ISOdatetime


 x - ISOdatetime(2011,10,6,16,23,30.539)
 str(x)
POSIXct[1:1], format: 2011-10-06 16:23:30
 y - ISOdatetime(2011,10,6,16,23,30.939)
 difftime(y,x)
Time difference of 0.399 secs




 Dear all,
 I would like to ask your help regarding handling time stamps in R. I think 
 first I need a reference to read about their logic and how I should handle 
 them.

 For example, this is a struct I have


 str(MyStruct$TimeStamps)
 �num [1:100, 1:6] 2011 2011 2011 2011 2011 ...

 MyStruct$TimeStamps[1,]
 [1] 2011.000�� 10.000��� 6.000�� 16.000�� 23.000�� 
 30.539

 the last field contains seconds.milliseconds.

 How I can for example make calculations with time stamps like see if the
 MyStruct$TimeStamps[1,]-MyStruct$TimeStamps[2,] differ more than 
 300millisecond, or 3 days have passed?

 I would like to thank you in advance for your suggestions

 B.R
 Alex

 � � � �[[alternative HTML version deleted]]


_

 R-help@r-project.org mailing list br / 
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.





-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
[[alternative HTML version deleted]]

_

R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Converting factor into date

2011-10-10 Thread Jeff Newmiller
Convert to character first or use the as.is option to read.csv. The default 
is to try to convert the underlying integer form of factors to date, which is 
not what you intend.
---
Jeff Newmiller The . . Go Live...
DCN:jdnew...@dcn.davis.ca.us Basics: ##.#. ##.#. Live Go...
Live: OO#.. Dead: OO#.. Playing
Research Engineer (Solar/Batteries O.O#. #.O#. with
/Software/Embedded Controllers) .OO#. .OO#. rocks...1k
--- 
Sent from my phone. Please excuse my brevity.

Vikram Bahure economics.vik...@gmail.com wrote:

Dear R users,

I have an elementary query.

I have a dataset which is taken from text file with the help of read.csv
command but when I generate the data in R file it converts the Dates into
factor.So for the above problem, I use as.Date to convert the Dates from
factor form to date format using the following: z has Date as a column.

*z- read.csv(data, header = TRUE, sep = \t)
z$Date- as.Date(z$Date, format = %d/%m/%y/)

*But during this operation I loose all my dates and I get NA's instead of
it.

It would be helpful to have your inputs.

Regards
Vikram

[[alternative HTML version deleted]]

_

R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] help with statistics in R - how to measure the effect of users in groups

2011-10-10 Thread gj
Thanks Petr. I will try it on the real data.

But that will only show that the groups are different or not.
Is there any way I can test if the users are different when they are in
different groups?

Regards
Gawesh

On Mon, Oct 10, 2011 at 11:17 AM, Petr PIKAL petr.pi...@precheza.cz wrote:

 
  Hi Petr,
 
  It's not an equation. It's my mistake; the * are meant to be field
  separators for the example data. I should have just use blank spaces as
  follows:
 
  users   Group1   Group2   Group3
  u110   5N/A
  u2 6  N/A  4
  u3 5   23
 
 
  Regards
  Gawesh

 OK. You shall transform your data to long format to use lm

 test - read.table(clipboard, header=T, na.strings=N/A)
 test.m-melt(test)
 Using users as id variables
 fit-lm(value~variable, data=test.m)
 summary(fit)

 Call:
 lm(formula = value ~ variable, data = test.m)

 Residuals:
   1234689
  3.0 -1.0 -2.0  1.5 -1.5  0.5 -0.5

 Coefficients:
   Estimate Std. Error t value Pr(|t|)
 (Intercept)   7.000  1.258   5.563 0.00511 **
 variableGroup2   -3.500  1.990  -1.759 0.15336
 variableGroup3   -3.500  1.990  -1.759 0.15336
 ---
 Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

 Residual standard error: 2.179 on 4 degrees of freedom
  (2 observations deleted due to missingness)
 Multiple R-squared: 0.525,  Adjusted R-squared: 0.2875
 F-statistic: 2.211 on 2 and 4 DF,  p-value: 0.2256

 No difference among groups, but I am not sure if this is the correct way
 to evaluate.

 library(ggplot2)
 p-ggplot(test.m, aes(x=variable, y=value, colour=users))
 p+geom_point()

 There is some sign that user3 has lowest value in each group. However for
 including users to fit there is not enough data.

 Regards
 Petr


 
 
  On Mon, Oct 10, 2011 at 9:32 AM, Petr PIKAL petr.pi...@precheza.cz
 wrote:
 
   Hi
  
   I do not understand much about your equations. I think you shall look
 to
   Practical Regression and Anova Using R from J.Faraway.
  
   Having data frame DF with columns - users, groups, results you could
 do
  
   fit - lm(results~groups, data = DF)
  
   Regards
   Petr
  
  
  
  
   
Hi,
   
I'm a newbie to R. My knowledge of statistics is mostly self-taught.
 My
problem is how to measure the effect of users in groups. I can
 calculate
   a
particular attribute for a user in a group. But my hypothesis is
 that
   the
user's attribute is not independent of each other and that the
 user's
attribute depends on the group ie that user's behaviour change based
 on
   the
group.
   
Let me give an example:
   
users*Group 1*Group 2*Group 3
u1*10*5*n/a
u2*6*n/a*4
u3*5*2*3
   
For example, I want to be able to prove that u1 behaviour is
 different
   in
group 1 than other groups and the particular thing about Group 1 is
 that
users in Group 1 tend to have a higher value of the attribute under
measurement.
   
   
Hence, can use R to test my hypothesis. I'm willing to learn; so if
 this
   is
very simple, just point me in the direction of any online resources
   about
it. At the moment, I don't even how to define these class of
 problems?
   That
will be a start.
   
Regards
Gawesh
   
   [[alternative HTML version deleted]]
   
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
   http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
  
  
 
 [[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Type of Graph to use

2011-10-10 Thread Jurgens de Bruin
Hi,

Please advice on what type of graph can be used to display the following
data set.

I have the following:

NameClass
a Class 1
a Class4
b Class2
b Class1
d Class3
d Class5
e Class4
e Class2

So each entry in name can belong to more than one class. I want to represent
the data as to see where overlaps occur that is which names are in the same
Class Name and also which names are unique to a Class. I tough a Venn
Diagram would work but this can only present numerical values for each
Class, I would like each name to be presented by a dot or *.

Any suggestions and how to would be appreciated.

-- 
Regards/Groete/Mit freundlichen Grüßen/recuerdos/meilleures salutations/
distinti saluti/siong/duì yú/привет

Jurgens de Bruin

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] pos in panel.text

2011-10-10 Thread Allan Sikk

Hi,

I need to vary the placements of data labels but I cannot assign a 
vector to pos option. Any vectors work fine with cex, for example. 
What could be the problem here?


xyplot(Npop~Narea, data=size,
scales=list(x=list(log=TRUE), y=list(log=TRUE),
xlab=expression(N[A]), ylab=expression(N[P]),
panel=function( ...) {
panel.lines(..., type=l, col.line=black, lwd=.25)
panel.xyplot(..., type=p, col=black, cex=.5, pch=20)
panel.text(..., lab=t, cex=.5, pos=c(4,2))
})

Many thanks,
Allan

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Converting factor into date

2011-10-10 Thread Vikram Bahure
Hi,

I am getting my results from the following:

*z$Date-as.Date(as.character(z$Date),format=%d/%m/%y)*

instead of:

z$Date-as.Date(as.character(z$Date,format=%d/%m/%y))

Thanks again.

Regards
Vikram


On Mon, Oct 10, 2011 at 4:08 PM, Jeff Newmiller jdnew...@dcn.davis.ca.uswrote:

 Convert to character first or use the as.is option to read.csv. The
 default is to try to convert the underlying integer form of factors to date,
 which is not what you intend.
 ---
 Jeff Newmiller The . . Go Live...
 DCN:jdnew...@dcn.davis.ca.us Basics: ##.#. ##.#. Live Go...
 Live: OO#.. Dead: OO#.. Playing
 Research Engineer (Solar/Batteries O.O#. #.O#. with
 /Software/Embedded Controllers) .OO#. .OO#. rocks...1k
 ---

 Sent from my phone. Please excuse my brevity.

 Vikram Bahure economics.vik...@gmail.com wrote:

 Dear R users,

 I have an elementary query.

 I have a dataset which is taken from text file with the help of read.csv
 command but when I generate the data in R file it converts the Dates into
 factor.So for the above problem, I use as.Date to convert the Dates from
 factor form to date format using the following: z has Date as a column.

 *z- read.csv(data, header = TRUE, sep = \t)

 z$Date- as.Date(z$Date, format = %d/%m/%y/)

 *But during this operation I loose all my dates and I get NA's instead of

 it.

 It would be helpful to have your inputs.

 Regards
 Vikram

  [[alternative HTML version deleted]]

 --

 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the post
  ing
 guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Type of Graph to use

2011-10-10 Thread jim holtman
See if this gives you the presentation you want:

x - read.table(textConnection(NameClass
a Class1
a Class4
b Class2
b Class1
d Class3
d Class5
e Class4
e Class2), header = TRUE)
closeAllConnections()
# add columns of numeric values of factors
x$name - as.integer(x$Name)
x$class - as.integer(x$Class)
# create plot area
plot(0
, type = 'n'
, xaxt = 'n'
, yaxt = 'n'
, xlab = ''
, ylab = ''
, xlim = c(0, max(x$class))
, ylim = c(0, max(x$name))
)
# now plot the rectangles
rect(
  xleft = x$class - 1
, ybottom = x$name - 1
, xright = x$class
, ytop = x$name
, col = x$name
)
# add the labels
axis(1
, at = seq(0.5, by = 1, length = length(levels(x$Class)))
, labels = levels(x$Class)
)
axis(2
, at = seq(0.5, by = 1, length = length(levels(x$Name)))
, labels = levels(x$Name)
)



On Mon, Oct 10, 2011 at 6:49 AM, Jurgens de Bruin debrui...@gmail.com wrote:
 Hi,

 Please advice on what type of graph can be used to display the following
 data set.

 I have the following:

 Name    Class
 a             Class 1
 a             Class4
 b             Class2
 b             Class1
 d             Class3
 d             Class5
 e             Class4
 e             Class2

 So each entry in name can belong to more than one class. I want to represent
 the data as to see where overlaps occur that is which names are in the same
 Class Name and also which names are unique to a Class. I tough a Venn
 Diagram would work but this can only present numerical values for each
 Class, I would like each name to be presented by a dot or *.

 Any suggestions and how to would be appreciated.

 --
 Regards/Groete/Mit freundlichen Grüßen/recuerdos/meilleures salutations/
 distinti saluti/siong/duì yú/привет

 Jurgens de Bruin

        [[alternative HTML version deleted]]


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.





-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Type of Graph to use

2011-10-10 Thread Jurgens de Bruin
Jim, This should work, would it be possible to plot *  and not larg rec.

On 10 October 2011 14:12, jim holtman jholt...@gmail.com wrote:

 See if this gives you the presentation you want:

 x - read.table(textConnection(NameClass
 a Class1
 a Class4
 b Class2
 b Class1
 d Class3
 d Class5
 e Class4
 e Class2), header = TRUE)
 closeAllConnections()
 # add columns of numeric values of factors
 x$name - as.integer(x$Name)
 x$class - as.integer(x$Class)
 # create plot area
 plot(0
, type = 'n'
, xaxt = 'n'
, yaxt = 'n'
, xlab = ''
, ylab = ''
, xlim = c(0, max(x$class))
, ylim = c(0, max(x$name))
)
 # now plot the rectangles
 rect(
  xleft = x$class - 1
, ybottom = x$name - 1
, xright = x$class
, ytop = x$name
, col = x$name
)
 # add the labels
 axis(1
, at = seq(0.5, by = 1, length = length(levels(x$Class)))
, labels = levels(x$Class)
)
 axis(2
, at = seq(0.5, by = 1, length = length(levels(x$Name)))
, labels = levels(x$Name)
)



 On Mon, Oct 10, 2011 at 6:49 AM, Jurgens de Bruin debrui...@gmail.com
 wrote:
  Hi,
 
  Please advice on what type of graph can be used to display the following
  data set.
 
  I have the following:
 
  NameClass
  a Class 1
  a Class4
  b Class2
  b Class1
  d Class3
  d Class5
  e Class4
  e Class2
 
  So each entry in name can belong to more than one class. I want to
 represent
  the data as to see where overlaps occur that is which names are in the
 same
  Class Name and also which names are unique to a Class. I tough a Venn
  Diagram would work but this can only present numerical values for each
  Class, I would like each name to be presented by a dot or *.
 
  Any suggestions and how to would be appreciated.
 
  --
  Regards/Groete/Mit freundlichen Grüßen/recuerdos/meilleures salutations/
  distinti saluti/siong/duì yú/привет
 
  Jurgens de Bruin
 
 [[alternative HTML version deleted]]
 
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 



 --
 Jim Holtman
 Data Munger Guru

 What is the problem that you are trying to solve?




-- 
Regards/Groete/Mit freundlichen Grüßen/recuerdos/meilleures salutations/
distinti saluti/siong/duì yú/привет

Jurgens de Bruin

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] pos in panel.text

2011-10-10 Thread Allan Sikk
Thanks, Carlos,

Tried that, but no success, still getting this error message:

Warning messages:
1: In if (pos == 1) { :
   the condition has length  1 and only the first element will be used
2: In if (pos == 2) { :
   the condition has length  1 and only the first element will be used

Thanks,
Allan

On 10/10/2011 12:10, Carlos Ortega wrote:
 Hello,

 To check the possible values of pos parameter you need to review 
 text() as it is indicated in the lattice help of panel.text().
 In  text() it says:

 |pos| 

 a position specifier for the text. If specified this overrides any 
 |adj| value given. Values of |1|, |2|, |3| and |4|, respectively 
 indicate positions below, to the left of, above and to the right of 
 the specified coordinates.


 So, the coordinates should be x=4, y=2 for your case.
 Additionally you can use ltext() function which is explained in the 
 same panel.text() help.


 Regards,
 Carlos Ortega
 www.qualityexcellence.es http://www.qualityexcellence.es

 2011/10/10 Allan Sikk a.s...@ucl.ac.uk mailto:a.s...@ucl.ac.uk

 Hi,

 I need to vary the placements of data labels but I cannot assign a
 vector to pos option. Any vectors work fine with cex, for
 example. What could be the problem here?

 xyplot(Npop~Narea, data=size,
 scales=list(x=list(log=TRUE), y=list(log=TRUE),
 xlab=expression(N[A]), ylab=expression(N[P]),
 panel=function( ...) {
panel.lines(..., type=l, col.line=black, lwd=.25)
panel.xyplot(..., type=p, col=black, cex=.5, pch=20)
panel.text(..., lab=t, cex=.5, pos=c(4,2))
 })

 Many thanks,
 Allan

 __
 R-help@r-project.org mailto:R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 

DrAllan Sikk

Lecturer in Baltic Politics

University College London, School of Slavonic and East European Studies

16 Taviton St, London WC1H 0BW, United Kingdom

tel: +44 (0)20 7679 4872

http://www.homepages.ucl.ac.uk/~tjmsasi/

Latest research:

- 'Newness as a Winning Formula for New Political Parties', /Party 
Politics/, forthcoming.

- 'Parties and Populism', Centre for European Politics, Security and 
Integration (CEPSI) Working Paper (2010), http://bit.ly/partiespopulism.

- (with Rein Taagepera) 'Parsimonius Model for Predicting Mean Cabinet 
Duration on the Basis of Electoral System', /Party Politics/, 16(2), 
2010, 261-81.

- 'Force Mineure?The Effects of the EU on Party Politics in a Small 
Country: The Case of Estonia,' /Journal of Communist Studies and 
Transition Politics/, 25(4), 2009, 468-90.

- (with Rune Andersen) 'Without a Tinge of Red: The Fall and Rise of 
Estonian Greens, 1987-2007', /Journal of Baltic Studies/, 40(3), 2009, 
349-73.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Type of Graph to use

2011-10-10 Thread jim holtman
try this version:

x - read.table(textConnection(NameClass
a Class1
a Class4
b Class2
b Class1
d Class3
d Class5
e Class4
e Class2), header = TRUE)
closeAllConnections()
# add columns of numeric values of factors
x$name - as.integer(x$Name)
x$class - as.integer(x$Class)
# create plot area
plot(0
, type = 'n'
, xaxt = 'n'
, yaxt = 'n'
, xlab = ''
, ylab = ''
, xlim = c(0, max(x$class))
, ylim = c(0, max(x$name))
)
# now plot the rectangles
# rect(
  # xleft = x$class - 1
# , ybottom = x$name - 1
# , xright = x$class
# , ytop = x$name
# , col = x$name
# )
# plot * instead
points(x$class - .5, x$name - .5, pch = *, cex = 3)
# add the labels
axis(1
, at = seq(0.5, by = 1, length = length(levels(x$Class)))
, labels = levels(x$Class)
)
axis(2
, at = seq(0.5, by = 1, length = length(levels(x$Name)))
, labels = levels(x$Name)
)



On Mon, Oct 10, 2011 at 8:27 AM, Jurgens de Bruin debrui...@gmail.com wrote:
 Jim, This should work, would it be possible to plot *  and not larg rec.

 On 10 October 2011 14:12, jim holtman jholt...@gmail.com wrote:

 See if this gives you the presentation you want:

 x - read.table(textConnection(Name    Class
 a             Class1
 a             Class4
 b             Class2
 b             Class1
 d             Class3
 d             Class5
 e             Class4
 e             Class2), header = TRUE)
 closeAllConnections()
 # add columns of numeric values of factors
 x$name - as.integer(x$Name)
 x$class - as.integer(x$Class)
 # create plot area
 plot(0
    , type = 'n'
    , xaxt = 'n'
    , yaxt = 'n'
    , xlab = ''
    , ylab = ''
    , xlim = c(0, max(x$class))
    , ylim = c(0, max(x$name))
    )
 # now plot the rectangles
 rect(
      xleft = x$class - 1
    , ybottom = x$name - 1
    , xright = x$class
    , ytop = x$name
    , col = x$name
    )
 # add the labels
 axis(1
    , at = seq(0.5, by = 1, length = length(levels(x$Class)))
    , labels = levels(x$Class)
    )
 axis(2
    , at = seq(0.5, by = 1, length = length(levels(x$Name)))
    , labels = levels(x$Name)
    )



 On Mon, Oct 10, 2011 at 6:49 AM, Jurgens de Bruin debrui...@gmail.com
 wrote:
  Hi,
 
  Please advice on what type of graph can be used to display the following
  data set.
 
  I have the following:
 
  Name    Class
  a             Class 1
  a             Class4
  b             Class2
  b             Class1
  d             Class3
  d             Class5
  e             Class4
  e             Class2
 
  So each entry in name can belong to more than one class. I want to
  represent
  the data as to see where overlaps occur that is which names are in the
  same
  Class Name and also which names are unique to a Class. I tough a Venn
  Diagram would work but this can only present numerical values for each
  Class, I would like each name to be presented by a dot or *.
 
  Any suggestions and how to would be appreciated.
 
  --
  Regards/Groete/Mit freundlichen Grüßen/recuerdos/meilleures salutations/
  distinti saluti/siong/duì yú/привет
 
  Jurgens de Bruin
 
         [[alternative HTML version deleted]]
 
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 



 --
 Jim Holtman
 Data Munger Guru

 What is the problem that you are trying to solve?



 --
 Regards/Groete/Mit freundlichen Grüßen/recuerdos/meilleures salutations/
 distinti saluti/siong/duì yú/привет

 Jurgens de Bruin




-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Type of Graph to use

2011-10-10 Thread Gabor Grothendieck
On Mon, Oct 10, 2011 at 6:49 AM, Jurgens de Bruin debrui...@gmail.com wrote:
 Hi,

 Please advice on what type of graph can be used to display the following
 data set.

 I have the following:

 Name    Class
 a             Class 1
 a             Class4
 b             Class2
 b             Class1
 d             Class3
 d             Class5
 e             Class4
 e             Class2

 So each entry in name can belong to more than one class. I want to represent
 the data as to see where overlaps occur that is which names are in the same
 Class Name and also which names are unique to a Class. I tough a Venn
 Diagram would work but this can only present numerical values for each
 Class, I would like each name to be presented by a dot or *.


Assuming DF is the indicated data.frame:

library(gplots)
with(DF, balloonplot(Name, Class, rep(1, nrow(DF)), label = FALSE))


-- 
Statistics  Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] pos in panel.text

2011-10-10 Thread Allan Sikk
Here's the code. The problem seems to be specific for lattice as I can 
easily use a vector with pos in plot.

trellis.device(,width=600, height = 400)
xyplot(Npop~Narea,
scales=list(x=list(log=TRUE, at=my.at,labels = formatC(my.at, big.mark = 
,, format=d)),
y=list(log=TRUE, at=c(1,10,100,1000,1,10,100))),
panel=function(...) {
 panel.xyplot(..., type=p, col=black, cex=.5, pch=20)
 panel.text(x=log10(Narea), y=log10(Npop), lab=t,  cex=.5, pos=c(4,2))
 }

)

On 10/10/2011 13:58, Carlos Ortega wrote:
 Hi Allan,

 Please could you send the modified code where now it should appear x 
 and y coordinates?.
 I do not fully understand the error message you get.

 Regards,
 Carlos Ortega
 www.qualityexcellence.es http://www.qualityexcellence.es

 2011/10/10 Allan Sikk a.s...@ucl.ac.uk mailto:a.s...@ucl.ac.uk

 Thanks, Carlos,

 Tried that, but no success, still getting this error message:

 Warning messages:
 1: In if (pos == 1) { :
   the condition has length  1 and only the first element will be used
 2: In if (pos == 2) { :
   the condition has length  1 and only the first element will be used

 Thanks,
 Allan

 On 10/10/2011 12:10, Carlos Ortega wrote:
  Hello,
 
  To check the possible values of pos parameter you need to review
  text() as it is indicated in the lattice help of panel.text().
  In  text() it says:
 
  |pos|
 
  a position specifier for the text. If specified this overrides any
  |adj| value given. Values of |1|, |2|, |3| and |4|, respectively
  indicate positions below, to the left of, above and to the right of
  the specified coordinates.
 
 
  So, the coordinates should be x=4, y=2 for your case.
  Additionally you can use ltext() function which is explained in the
  same panel.text() help.
 
 
  Regards,
  Carlos Ortega
  www.qualityexcellence.es http://www.qualityexcellence.es
 http://www.qualityexcellence.es
 
  2011/10/10 Allan Sikk a.s...@ucl.ac.uk
 mailto:a.s...@ucl.ac.uk mailto:a.s...@ucl.ac.uk
 mailto:a.s...@ucl.ac.uk
 
  Hi,
 
  I need to vary the placements of data labels but I cannot
 assign a
  vector to pos option. Any vectors work fine with cex, for
  example. What could be the problem here?
 
  xyplot(Npop~Narea, data=size,
  scales=list(x=list(log=TRUE), y=list(log=TRUE),
  xlab=expression(N[A]), ylab=expression(N[P]),
  panel=function( ...) {
 panel.lines(..., type=l, col.line=black, lwd=.25)
 panel.xyplot(..., type=p, col=black, cex=.5, pch=20)
 panel.text(..., lab=t, cex=.5, pos=c(4,2))
  })
 
  Many thanks,
  Allan
 
  __
  R-help@r-project.org mailto:R-help@r-project.org
 mailto:R-help@r-project.org mailto:R-help@r-project.org
 mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible
 code.
 
 

 --

 DrAllan Sikk

 Lecturer in Baltic Politics

 University College London, School of Slavonic and East European
 Studies

 16 Taviton St, London WC1H 0BW, United Kingdom

 tel: +44 (0)20 7679 4872 tel:%2B44%20%280%2920%207679%204872

 http://www.homepages.ucl.ac.uk/~tjmsasi/
 http://www.homepages.ucl.ac.uk/%7Etjmsasi/

 Latest research:

 - 'Newness as a Winning Formula for New Political Parties', /Party
 Politics/, forthcoming.

 - 'Parties and Populism', Centre for European Politics, Security and
 Integration (CEPSI) Working Paper (2010),
 http://bit.ly/partiespopulism.

 - (with Rein Taagepera) 'Parsimonius Model for Predicting Mean Cabinet
 Duration on the Basis of Electoral System', /Party Politics/, 16(2),
 2010, 261-81.

 - 'Force Mineure?The Effects of the EU on Party Politics in a Small
 Country: The Case of Estonia,' /Journal of Communist Studies and
 Transition Politics/, 25(4), 2009, 468-90.

 - (with Rune Andersen) 'Without a Tinge of Red: The Fall and Rise of
 Estonian Greens, 1987-2007', /Journal of Baltic Studies/, 40(3), 2009,
 349-73.


[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailto:R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 

DrAllan Sikk

Lecturer in Baltic Politics

University College London, School of Slavonic and East European Studies

16 Taviton St, London WC1H 0BW, United Kingdom

tel: +44 (0)20 7679 4872


Re: [R] help with statistics in R - how to measure the effect of users in groups

2011-10-10 Thread Carlos Ortega
Hello,

In package qualityTools you can find one way to perform this analysis
through the gageRR() function.
The effect of an operator on the mesasurement system (Reproductibility) is
to me equivalent to the effect you try to study of your users when they are
in different groups.

Regards,
Carlos Ortega
www.qualityexcellence.es


On Mon, Oct 10, 2011 at 12:48 PM, gj gaw...@gmail.com wrote:

 Thanks Petr. I will try it on the real data.

 But that will only show that the groups are different or not.
 Is there any way I can test if the users are different when they are in
 different groups?

 Regards
 Gawesh

 On Mon, Oct 10, 2011 at 11:17 AM, Petr PIKAL petr.pi...@precheza.cz
 wrote:

  
   Hi Petr,
  
   It's not an equation. It's my mistake; the * are meant to be field
   separators for the example data. I should have just use blank spaces as
   follows:
  
   users   Group1   Group2   Group3
   u110   5N/A
   u2 6  N/A  4
   u3 5   23
  
  
   Regards
   Gawesh
 
  OK. You shall transform your data to long format to use lm
 
  test - read.table(clipboard, header=T, na.strings=N/A)
  test.m-melt(test)
  Using users as id variables
  fit-lm(value~variable, data=test.m)
  summary(fit)
 
  Call:
  lm(formula = value ~ variable, data = test.m)
 
  Residuals:
1234689
   3.0 -1.0 -2.0  1.5 -1.5  0.5 -0.5
 
  Coefficients:
Estimate Std. Error t value Pr(|t|)
  (Intercept)   7.000  1.258   5.563 0.00511 **
  variableGroup2   -3.500  1.990  -1.759 0.15336
  variableGroup3   -3.500  1.990  -1.759 0.15336
  ---
  Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
 
  Residual standard error: 2.179 on 4 degrees of freedom
   (2 observations deleted due to missingness)
  Multiple R-squared: 0.525,  Adjusted R-squared: 0.2875
  F-statistic: 2.211 on 2 and 4 DF,  p-value: 0.2256
 
  No difference among groups, but I am not sure if this is the correct way
  to evaluate.
 
  library(ggplot2)
  p-ggplot(test.m, aes(x=variable, y=value, colour=users))
  p+geom_point()
 
  There is some sign that user3 has lowest value in each group. However for
  including users to fit there is not enough data.
 
  Regards
  Petr
 
 
  
  
   On Mon, Oct 10, 2011 at 9:32 AM, Petr PIKAL petr.pi...@precheza.cz
  wrote:
  
Hi
   
I do not understand much about your equations. I think you shall look
  to
Practical Regression and Anova Using R from J.Faraway.
   
Having data frame DF with columns - users, groups, results you could
  do
   
fit - lm(results~groups, data = DF)
   
Regards
Petr
   
   
   
   

 Hi,

 I'm a newbie to R. My knowledge of statistics is mostly
 self-taught.
  My
 problem is how to measure the effect of users in groups. I can
  calculate
a
 particular attribute for a user in a group. But my hypothesis is
  that
the
 user's attribute is not independent of each other and that the
  user's
 attribute depends on the group ie that user's behaviour change
 based
  on
the
 group.

 Let me give an example:

 users*Group 1*Group 2*Group 3
 u1*10*5*n/a
 u2*6*n/a*4
 u3*5*2*3

 For example, I want to be able to prove that u1 behaviour is
  different
in
 group 1 than other groups and the particular thing about Group 1 is
  that
 users in Group 1 tend to have a higher value of the attribute under
 measurement.


 Hence, can use R to test my hypothesis. I'm willing to learn; so if
  this
is
 very simple, just point me in the direction of any online resources
about
 it. At the moment, I don't even how to define these class of
  problems?
That
 will be a start.

 Regards
 Gawesh

[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
   
   
  
  [[alternative HTML version deleted]]
  
   __
   R-help@r-project.org mailing list
   https://stat.ethz.ch/mailman/listinfo/r-help
   PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
   and provide commented, minimal, self-contained, reproducible code.
 
 

 [[alternative HTML version deleted]]


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing 

Re: [R] Type of Graph to use

2011-10-10 Thread Jurgens de Bruin
Thanks for all the help,

Their would it be possible to use a Venn Diagram for this application?

On 10 October 2011 14:49, Gabor Grothendieck ggrothendi...@gmail.comwrote:

 On Mon, Oct 10, 2011 at 6:49 AM, Jurgens de Bruin debrui...@gmail.com
 wrote:
  Hi,
 
  Please advice on what type of graph can be used to display the following
  data set.
 
  I have the following:
 
  NameClass
  a Class 1
  a Class4
  b Class2
  b Class1
  d Class3
  d Class5
  e Class4
  e Class2
 
  So each entry in name can belong to more than one class. I want to
 represent
  the data as to see where overlaps occur that is which names are in the
 same
  Class Name and also which names are unique to a Class. I tough a Venn
  Diagram would work but this can only present numerical values for each
  Class, I would like each name to be presented by a dot or *.
 

 Assuming DF is the indicated data.frame:

 library(gplots)
 with(DF, balloonplot(Name, Class, rep(1, nrow(DF)), label = FALSE))


 --
 Statistics  Software Consulting
 GKX Group, GKX Associates Inc.
 tel: 1-877-GKX-GROUP
 email: ggrothendieck at gmail.com




-- 
Regards/Groete/Mit freundlichen Grüßen/recuerdos/meilleures salutations/
distinti saluti/siong/duì yú/привет

Jurgens de Bruin

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Type of Graph to use

2011-10-10 Thread Jurgens de Bruin
Please ignore the venn digram as this will be to complex to read when more
than 3 categories are present

On 10 October 2011 15:36, Jurgens de Bruin debrui...@gmail.com wrote:

 Thanks for all the help,

 Their would it be possible to use a Venn Diagram for this application?


 On 10 October 2011 14:49, Gabor Grothendieck ggrothendi...@gmail.comwrote:

 On Mon, Oct 10, 2011 at 6:49 AM, Jurgens de Bruin debrui...@gmail.com
 wrote:
  Hi,
 
  Please advice on what type of graph can be used to display the following
  data set.
 
  I have the following:
 
  NameClass
  a Class 1
  a Class4
  b Class2
  b Class1
  d Class3
  d Class5
  e Class4
  e Class2
 
  So each entry in name can belong to more than one class. I want to
 represent
  the data as to see where overlaps occur that is which names are in the
 same
  Class Name and also which names are unique to a Class. I tough a Venn
  Diagram would work but this can only present numerical values for each
  Class, I would like each name to be presented by a dot or *.
 

 Assuming DF is the indicated data.frame:

 library(gplots)
 with(DF, balloonplot(Name, Class, rep(1, nrow(DF)), label = FALSE))


 --
 Statistics  Software Consulting
 GKX Group, GKX Associates Inc.
 tel: 1-877-GKX-GROUP
 email: ggrothendieck at gmail.com




 --
 Regards/Groete/Mit freundlichen Grüßen/recuerdos/meilleures salutations/
 distinti saluti/siong/duì yú/привет

 Jurgens de Bruin




-- 
Regards/Groete/Mit freundlichen Grüßen/recuerdos/meilleures salutations/
distinti saluti/siong/duì yú/привет

Jurgens de Bruin

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Type of Graph to use

2011-10-10 Thread Gabor Grothendieck
On Mon, Oct 10, 2011 at 8:49 AM, Gabor Grothendieck
ggrothendi...@gmail.com wrote:
 On Mon, Oct 10, 2011 at 6:49 AM, Jurgens de Bruin debrui...@gmail.com wrote:
 Hi,

 Please advice on what type of graph can be used to display the following
 data set.

 I have the following:

 Name    Class
 a             Class 1
 a             Class4
 b             Class2
 b             Class1
 d             Class3
 d             Class5
 e             Class4
 e             Class2

 So each entry in name can belong to more than one class. I want to represent
 the data as to see where overlaps occur that is which names are in the same
 Class Name and also which names are unique to a Class. I tough a Venn
 Diagram would work but this can only present numerical values for each
 Class, I would like each name to be presented by a dot or *.


 Assuming DF is the indicated data.frame:

 library(gplots)
 with(DF, balloonplot(Name, Class, rep(1, nrow(DF)), label = FALSE))


Here is one additional idea:

 xt - xtabs(~ Class + Name, DF)
 symnum(xt, cutpoints = 0:2/2, symbols = c(., +))
Name
Classa b d e
  Class1 + + . .
  Class2 . + . +
  Class3 . . + .
  Class4 + . . +
  Class5 . . + .



-- 
Statistics  Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Request for moderated posting

2011-10-10 Thread squid-dev-help
This message has been passed onto the list moderators for
approval. This is because you are not a subscriber to this
list or the related squid-users list.

If the message is relevant to the squid-dev mailinglist
one of the moderators will accept the message and it gets
automatically forwarded to the list.

The squid-devel list is restricted to discussions about the
development of Squid only. Configuration and usage questions
are not accepted except on features not yet avaiable in the
current STABLE version of Squid.

If you wish to participate in the squid-dev mailinglist, please
subscribe to the squid-dev list by first sending presentation of
yourself and which areas of Squid you are interested to help
with the development to squid-...@squid-cache.org, and then a
subscribe request as described below.

Or alternatively if you are looking for general help in how
to use or configure Squid, subscribe to the squid-users list
instead.

When when you have introduced yourself and your intentions
to the developers, you may send a request to subscribe on
the list to by sending an email to

squid-dev-subscr...@squid-cache.org

with no subject or body.

If you would like to subscribe an alternate email address
from the one you are posting from or like to see what other
subscription options there are, send an email to

squid-dev-h...@squid-cache.org

to get help on doing this.

Please remember that squid-dev is aimed at squid developers.
If you want to contribute ideas and code, this list is for
you. If you only want to track development please use the public
archives.


Thanks!
The Squid Developers

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Multiple imputation on subgroups

2011-10-10 Thread Sarah
Dear R-users,

I want to multiple impute missing scores, but only for a few subgroups in my
data (variable 'subgroups':  only impute for subgroups 2 and 3).
Does anyone knows how to do this in MICE?

This is my script for the multiple imputation:
imp - mice(data, m=20, predictorMatrix=pred, post=post, 
method=c(, , , , ,norm, 
norm,norm,norm,norm,norm), 
maxit=20)   .

The final analysis should be on the dataset as a whole, so with subgroups 2
and 3 with observed and imputed values, and for subgroup 1 with observed
values only (and missing scores).   

Thanks.

--
View this message in context: 
http://r.789695.n4.nabble.com/Multiple-imputation-on-subgroups-tp3889664p3889664.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] pos in panel.text

2011-10-10 Thread Carlos Ortega
Hello,

To check the possible values of pos parameter you need to review text()
as it is indicated in the lattice help of panel.text().
In  text() it says:

pos

a position specifier for the text. If specified this overrides any adj value
given. Values of 1, 2, 3 and 4, respectively indicate positions below, to
the left of, above and to the right of the specified coordinates.

So, the coordinates should be x=4, y=2 for your case.
Additionally you can use ltext() function which is explained in the same
panel.text() help.


Regards,
Carlos Ortega
www.qualityexcellence.es

2011/10/10 Allan Sikk a.s...@ucl.ac.uk

 Hi,

 I need to vary the placements of data labels but I cannot assign a vector
 to pos option. Any vectors work fine with cex, for example. What could
 be the problem here?

 xyplot(Npop~Narea, data=size,
 scales=list(x=list(log=TRUE), y=list(log=TRUE),
 xlab=expression(N[A]), ylab=expression(N[P]),
 panel=function( ...) {
panel.lines(..., type=l, col.line=black, lwd=.25)
panel.xyplot(..., type=p, col=black, cex=.5, pch=20)
panel.text(..., lab=t, cex=.5, pos=c(4,2))
 })

 Many thanks,
 Allan

 __**
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/**listinfo/r-helphttps://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/**
 posting-guide.html http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] variable scope for deltavar function from emdbook

2011-10-10 Thread adad

Dear all,

I want to use the deltavar() function from emdbook. I can use it 
directly from the command terminal but within a function it behaves weird.


Working example:
--
library(emdbook)

fn - function()
{
browser()
y - 2
print(deltavar(y*b2, meanval=c(b2=3), Sigma=1) )
}

x - 2
print(deltavar(x*b1, meanval=c(b1=3), Sigma=1) )
y-3

fn()


running this returns 4 for the first function call, which is fine.

For the call of deltavar in fn(), I get 9, i.e. the function uses y-3 
instead of the local y-2. If y- is commented, deltavar returns an error.


So why is the function not using the local variable and how do I make it 
use it?


Many thanks

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] pos in panel.text

2011-10-10 Thread Carlos Ortega
Hi,

OK.
Have you tried to run your code without the pos parameter?

Based on the help, pos should be just *one* parameter. pos offers a
finer adjustment of the text. But in your case, the first thing to get is
that the text label is represented at the specified coordinates. Besides
pos you can try adj which is a parameter that allows you to use two
parameters (between 0 and 1).


Regards,
Carlos Ortega
www.qualityexcellence.es

2011/10/10 Allan Sikk a.s...@ucl.ac.uk

 Here's the code. The problem seems to be specific for lattice as I can
 easily use a vector with pos in plot.

 trellis.device(,width=600, height = 400)
 xyplot(Npop~Narea,
 scales=list(x=list(log=TRUE, at=my.at,labels = formatC(my.at, big.mark =
 ,, format=d)),
 y=list(log=TRUE, at=c(1,10,100,1000,1,10,100))),
 panel=function(...) {
  panel.xyplot(..., type=p, col=black, cex=.5, pch=20)
  panel.text(x=log10(Narea), y=log10(Npop), lab=t,  cex=.5, pos=c(4,2))
 }

 )

 On 10/10/2011 13:58, Carlos Ortega wrote:
  Hi Allan,
 
  Please could you send the modified code where now it should appear x
  and y coordinates?.
  I do not fully understand the error message you get.
 
  Regards,
  Carlos Ortega
  www.qualityexcellence.es http://www.qualityexcellence.es
 
  2011/10/10 Allan Sikk a.s...@ucl.ac.uk mailto:a.s...@ucl.ac.uk
 
  Thanks, Carlos,
 
  Tried that, but no success, still getting this error message:
 
  Warning messages:
  1: In if (pos == 1) { :
the condition has length  1 and only the first element will be
 used
  2: In if (pos == 2) { :
the condition has length  1 and only the first element will be
 used
 
  Thanks,
  Allan
 
  On 10/10/2011 12:10, Carlos Ortega wrote:
   Hello,
  
   To check the possible values of pos parameter you need to review
   text() as it is indicated in the lattice help of panel.text().
   In  text() it says:
  
   |pos|
  
   a position specifier for the text. If specified this overrides any
   |adj| value given. Values of |1|, |2|, |3| and |4|, respectively
   indicate positions below, to the left of, above and to the right of
   the specified coordinates.
  
  
   So, the coordinates should be x=4, y=2 for your case.
   Additionally you can use ltext() function which is explained in the
   same panel.text() help.
  
  
   Regards,
   Carlos Ortega
   www.qualityexcellence.es http://www.qualityexcellence.es
  http://www.qualityexcellence.es
  
   2011/10/10 Allan Sikk a.s...@ucl.ac.uk
  mailto:a.s...@ucl.ac.uk mailto:a.s...@ucl.ac.uk
  mailto:a.s...@ucl.ac.uk
  
   Hi,
  
   I need to vary the placements of data labels but I cannot
  assign a
   vector to pos option. Any vectors work fine with cex, for
   example. What could be the problem here?
  
   xyplot(Npop~Narea, data=size,
   scales=list(x=list(log=TRUE), y=list(log=TRUE),
   xlab=expression(N[A]), ylab=expression(N[P]),
   panel=function( ...) {
  panel.lines(..., type=l, col.line=black, lwd=.25)
  panel.xyplot(..., type=p, col=black, cex=.5, pch=20)
  panel.text(..., lab=t, cex=.5, pos=c(4,2))
   })
  
   Many thanks,
   Allan
  
   __
   R-help@r-project.org mailto:R-help@r-project.org
  mailto:R-help@r-project.org mailto:R-help@r-project.org
  mailing list
   https://stat.ethz.ch/mailman/listinfo/r-help
   PLEASE do read the posting guide
   http://www.R-project.org/posting-guide.html
   and provide commented, minimal, self-contained, reproducible
  code.
  
  
 
  --
 
  DrAllan Sikk
 
  Lecturer in Baltic Politics
 
  University College London, School of Slavonic and East European
  Studies
 
  16 Taviton St, London WC1H 0BW, United Kingdom
 
  tel: +44 (0)20 7679 4872 tel:%2B44%20%280%2920%207679%204872
 
  http://www.homepages.ucl.ac.uk/~tjmsasi/
  http://www.homepages.ucl.ac.uk/%7Etjmsasi/
 
  Latest research:
 
  - 'Newness as a Winning Formula for New Political Parties', /Party
  Politics/, forthcoming.
 
  - 'Parties and Populism', Centre for European Politics, Security and
  Integration (CEPSI) Working Paper (2010),
  http://bit.ly/partiespopulism.
 
  - (with Rein Taagepera) 'Parsimonius Model for Predicting Mean
 Cabinet
  Duration on the Basis of Electoral System', /Party Politics/, 16(2),
  2010, 261-81.
 
  - 'Force Mineure?The Effects of the EU on Party Politics in a Small
  Country: The Case of Estonia,' /Journal of Communist Studies and
  Transition Politics/, 25(4), 2009, 468-90.
 
  - (with Rune Andersen) 'Without a Tinge of Red: The Fall and Rise of
  Estonian Greens, 1987-2007', /Journal of Baltic Studies/, 

[R] Importing from Fortan

2011-10-10 Thread Spartina
Hello all,

how do I import a Fortran file (f3.1) into one column in R? I've tried this
(I'm a total beginner as you can see): 

 FortranData-read.fwf(C:\\Users\\format3_1.txt,rep(3,20))
Warning message:
In readLines(file, n = thisblock) :
  incomplete final line found on 'C:\Users\format3_1.txt'
 FortranData
   V1  V2  V3  V4  V5  V6  V7  V8  V9 V10 V11 V12 V13 V14 V15 V16 V17 V18
V19 V20
1 2.2 3.3 4.2 2.1 3.4 2.3 2.3 4.2 2.1 3.4 2.3 2.3 4.2 2.1 3.4 2.3 2.3 4.2
2.1 3.4

As you can see, each datum gets imported into a separate column, whereas I'd
like to have everything stored under V1. I'm also unsure as to what the
error message means.

Thanks in advance for your help!

Léa

--
View this message in context: 
http://r.789695.n4.nabble.com/Importing-from-Fortan-tp3889947p3889947.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Linear programming problem, RGPLK - no feasible solution.

2011-10-10 Thread Liu Evans, Gareth
In my post at https://stat.ethz.ch/pipermail/r-help/2011-October/292019.html I 
included an undefined term ej.  The problem code should be as follows.  It 
seems like a simple linear programming problem, but for some reason my code is 
not finding the solution.

obj - c(rep(0,3),1)

col1 -c(1,0,0,1,0,0,1,-2.330078923,0)
col2 -c(0,1,0,0,1,0,1,-2.057855981,0)
col3 -c(0,0,1,0,0,1,1,-1.885177032,0)
col4 -c(-1,-1,-1,1,1,1,0,0,1)

mat - cbind(col1, col2, col3, col4)

dir - c(rep(=, 3), rep(=, 3), rep(==, 2), =)

rhs - c(rep(0, 7), 1, 0)

sol - Rglpk_solve_LP(obj, mat, dir, rhs, types = NULL, max = FALSE,
bounds = c(-100,100), verbose = TRUE)


The R output says there is no feasible solution, but e.g. (-2.3756786,  
0.3297676,  2.0459110, 2.3756786) is feasible.

The output is

GLPK Simplex Optimizer, v4.42
9 rows, 4 columns, 19 non-zeros
  0: obj =  0.0e+000  infeas = 1.000e+000 (2)
PROBLEM HAS NO FEASIBLE SOLUTION

One other thing, a possible bug - if I run this code with dir shorter than it 
should be, R crashes.  My version of R is 2.131.56322.0, and I'm running it on 
Windows 7.  

Regards,
Gareth

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] pos in panel.text

2011-10-10 Thread Carlos Ortega
Hi Allan,

Please could you send the modified code where now it should appear x and y
coordinates?.
I do not fully understand the error message you get.

Regards,
Carlos Ortega
www.qualityexcellence.es

2011/10/10 Allan Sikk a.s...@ucl.ac.uk

 Thanks, Carlos,

 Tried that, but no success, still getting this error message:

 Warning messages:
 1: In if (pos == 1) { :
   the condition has length  1 and only the first element will be used
 2: In if (pos == 2) { :
   the condition has length  1 and only the first element will be used

 Thanks,
 Allan

 On 10/10/2011 12:10, Carlos Ortega wrote:
  Hello,
 
  To check the possible values of pos parameter you need to review
  text() as it is indicated in the lattice help of panel.text().
  In  text() it says:
 
  |pos|
 
  a position specifier for the text. If specified this overrides any
  |adj| value given. Values of |1|, |2|, |3| and |4|, respectively
  indicate positions below, to the left of, above and to the right of
  the specified coordinates.
 
 
  So, the coordinates should be x=4, y=2 for your case.
  Additionally you can use ltext() function which is explained in the
  same panel.text() help.
 
 
  Regards,
  Carlos Ortega
  www.qualityexcellence.es http://www.qualityexcellence.es
 
  2011/10/10 Allan Sikk a.s...@ucl.ac.uk mailto:a.s...@ucl.ac.uk
 
  Hi,
 
  I need to vary the placements of data labels but I cannot assign a
  vector to pos option. Any vectors work fine with cex, for
  example. What could be the problem here?
 
  xyplot(Npop~Narea, data=size,
  scales=list(x=list(log=TRUE), y=list(log=TRUE),
  xlab=expression(N[A]), ylab=expression(N[P]),
  panel=function( ...) {
 panel.lines(..., type=l, col.line=black, lwd=.25)
 panel.xyplot(..., type=p, col=black, cex=.5, pch=20)
 panel.text(..., lab=t, cex=.5, pos=c(4,2))
  })
 
  Many thanks,
  Allan
 
  __
  R-help@r-project.org mailto:R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 

 --

 DrAllan Sikk

 Lecturer in Baltic Politics

 University College London, School of Slavonic and East European Studies

 16 Taviton St, London WC1H 0BW, United Kingdom

 tel: +44 (0)20 7679 4872

 http://www.homepages.ucl.ac.uk/~tjmsasi/

 Latest research:

 - 'Newness as a Winning Formula for New Political Parties', /Party
 Politics/, forthcoming.

 - 'Parties and Populism', Centre for European Politics, Security and
 Integration (CEPSI) Working Paper (2010), http://bit.ly/partiespopulism.

 - (with Rein Taagepera) 'Parsimonius Model for Predicting Mean Cabinet
 Duration on the Basis of Electoral System', /Party Politics/, 16(2),
 2010, 261-81.

 - 'Force Mineure?The Effects of the EU on Party Politics in a Small
 Country: The Case of Estonia,' /Journal of Communist Studies and
 Transition Politics/, 25(4), 2009, 468-90.

 - (with Rune Andersen) 'Without a Tinge of Red: The Fall and Rise of
 Estonian Greens, 1987-2007', /Journal of Baltic Studies/, 40(3), 2009,
 349-73.


[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Handling Time in R

2011-10-10 Thread Alaios
Thanks a lot for your answer.
It is something that I do not understand with R how the wrapper functions work 
really.
As an example, check the code below   
TimeDiffInSeconds-diff((ISOdate(timeMatrix[,1],timeMatrix[,2],timeMatrix[,3],timeMatrix[,4],timeMatrix[,5],timeMatrix[,6])),units=secs);       
 
returns an error

even though I make it

   
TimeDiffInSeconds-difftime((ISOdate(timeMatrix[,1],timeMatrix[,2],timeMatrix[,3],timeMatrix[,4],timeMatrix[,5],timeMatrix[,6])),units=secs);       
 

I would like to thank you in advance for your help

B.R
Alex




From: Jeff Newmiller jdnew...@dcn.davis.ca.us

Cc: R-help@r-project.org R-help@r-project.org
Sent: Monday, October 10, 2011 12:23 PM
Subject: Re: [R] Handling Time in R


No, read ?difftime and look at as.double. There is a units parameter that you 
must set if you want predictable results.
---
Jeff NewmillerThe .   .  Go Live...
DCN:jdnew...@dcn.davis.ca.usBasics: ##.#.   ##.#.  Live Go...
Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
/Software/Embedded Controllers)   .OO#.   .OO#.  rocks...1k
---
Sent from my phone. Please excuse my brevity.



Do you mean something like that?


as.double(diff(c(ISOdatetime(2011,6,1,11,59,1.09),ISOdatetime(2011,6,5,11,59,1.09))),length=20)
[1] 345600






From: Jeff Newmiller jdnew...@dcn.davis.ca.us

Cc: R-help@r-project.org R-help@r-project.org
Sent: Monday, October 10, 2011 10:42 AM
Subject: Re: [R] Handling Time in R


Difftime doesn't report things. When you print it, it automatically selects 
an appropriate human-readable unit to display in, but that does not change its 
internal representation. If you must convert to seconds, you can do so using 
the as.double generic (as.double.difftime) with a units parameter.
---
Jeff NewmillerThe .   .  Go Live...
DCN:jdnew...@dcn.davis.ca.usBasics: ##.#.   ##.#.  Live Go...
Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
/Software/Embedded Controllers)   .OO#.   .OO#.  rocks...1k
--- 
Sent from my phone. Please excuse my brevity.



Thanks a lot. 

That helped.
One thing now is to have the difftime(y,x) to always report seconds. There 
are times that there is a change in the day and thus the diff will report few 
days difference. How can it always report only seconds?

I would like to thank you in advance for your help

B.R
Alex





From: jim holtman jholt...@gmail.com

Cc: R-help@r-project.org R-help@r-project.org
Sent: Friday, October 7, 2011 5:34 PM
Subject: Re: [R] Handling Time in R

?ISOdatetime


 x - ISOdatetime(2011,10,6,16,23,30.539)
 str(x)
POSIXct[1:1], format: 2011-10-06 16:23:30
 y - ISOdatetime(2011,10,6,16,23,30.939)
 difftime(y,x)
Time difference of 0.399 secs




 Dear all,
 I would
  like
to ask your help regarding handling time stamps in R. I think first I need a 
reference to read about their logic and how I should handle them.

 For example, this is a struct I have


 str(MyStruct$TimeStamps)
 �num [1:100, 1:6] 2011 2011 2011 2011 2011 ...

 MyStruct$TimeStamps[1,]
 [1] 2011.000�� 10.000��� 6.000�� 16.000�� 23.000�� 
 30.539

 the last field contains seconds.milliseconds.

 How I can for example make calculations with time stamps like see if the
 MyStruct$TimeStamps[1,]-MyStruct$TimeStamps[2,] differ more than 
 300millisecond, or 3 days have passed?

 I would like to thank you in advance for your suggestions

 B.R
 Alex

 � � � �[[alternative HTML version deleted]]





 R-help@r-project.org mailing list
 br
/ https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.





-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
[[alternative HTML version deleted]]



R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide 

Re: [R] help with statistics in R - how to measure the effect of users in groups

2011-10-10 Thread Bert Gunter
Assuming your data are in a data frame, yourdat,  as:

User   Group   Value
u1 1  !0
u2 2 5
u3  3  NA
...(etc)

where Group is **explicitly coerced to be a factor,**
then you want the User x Group interaction, obtained from

lm( Value ~ Group*User,data = yourdat)

However, you'll get some kind of warning message if

a) Not all Group x User combinations are present in the data

b) Moreover, no statistics can be calculated if there are no replicates of
UserxGroup combinations.

If you do not know why either of these are the case, get local help or study
any linear models (regression) text or online tutorial, as these last issues
have nothing to do with R.

-- Bert


On Mon, Oct 10, 2011 at 3:48 AM, gj gaw...@gmail.com wrote:

 Thanks Petr. I will try it on the real data.

 But that will only show that the groups are different or not.
 Is there any way I can test if the users are different when they are in
 different groups?

 Regards
 Gawesh

 On Mon, Oct 10, 2011 at 11:17 AM, Petr PIKAL petr.pi...@precheza.cz
 wrote:

  
   Hi Petr,
  
   It's not an equation. It's my mistake; the * are meant to be field
   separators for the example data. I should have just use blank spaces as
   follows:
  
   users   Group1   Group2   Group3
   u110   5N/A
   u2 6  N/A  4
   u3 5   23
  
  
   Regards
   Gawesh
 
  OK. You shall transform your data to long format to use lm
 
  test - read.table(clipboard, header=T, na.strings=N/A)
  test.m-melt(test)
  Using users as id variables
  fit-lm(value~variable, data=test.m)
  summary(fit)
 
  Call:
  lm(formula = value ~ variable, data = test.m)
 
  Residuals:
1234689
   3.0 -1.0 -2.0  1.5 -1.5  0.5 -0.5
 
  Coefficients:
Estimate Std. Error t value Pr(|t|)
  (Intercept)   7.000  1.258   5.563 0.00511 **
  variableGroup2   -3.500  1.990  -1.759 0.15336
  variableGroup3   -3.500  1.990  -1.759 0.15336
  ---
  Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
 
  Residual standard error: 2.179 on 4 degrees of freedom
   (2 observations deleted due to missingness)
  Multiple R-squared: 0.525,  Adjusted R-squared: 0.2875
  F-statistic: 2.211 on 2 and 4 DF,  p-value: 0.2256
 
  No difference among groups, but I am not sure if this is the correct way
  to evaluate.
 
  library(ggplot2)
  p-ggplot(test.m, aes(x=variable, y=value, colour=users))
  p+geom_point()
 
  There is some sign that user3 has lowest value in each group. However for
  including users to fit there is not enough data.
 
  Regards
  Petr
 
 
  
  
   On Mon, Oct 10, 2011 at 9:32 AM, Petr PIKAL petr.pi...@precheza.cz
  wrote:
  
Hi
   
I do not understand much about your equations. I think you shall look
  to
Practical Regression and Anova Using R from J.Faraway.
   
Having data frame DF with columns - users, groups, results you could
  do
   
fit - lm(results~groups, data = DF)
   
Regards
Petr
   
   
   
   

 Hi,

 I'm a newbie to R. My knowledge of statistics is mostly
 self-taught.
  My
 problem is how to measure the effect of users in groups. I can
  calculate
a
 particular attribute for a user in a group. But my hypothesis is
  that
the
 user's attribute is not independent of each other and that the
  user's
 attribute depends on the group ie that user's behaviour change
 based
  on
the
 group.

 Let me give an example:

 users*Group 1*Group 2*Group 3
 u1*10*5*n/a
 u2*6*n/a*4
 u3*5*2*3

 For example, I want to be able to prove that u1 behaviour is
  different
in
 group 1 than other groups and the particular thing about Group 1 is
  that
 users in Group 1 tend to have a higher value of the attribute under
 measurement.


 Hence, can use R to test my hypothesis. I'm willing to learn; so if
  this
is
 very simple, just point me in the direction of any online resources
about
 it. At the moment, I don't even how to define these class of
  problems?
That
 will be a start.

 Regards
 Gawesh

[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
   
   
  
  [[alternative HTML version deleted]]
  
   __
   R-help@r-project.org mailing list
   https://stat.ethz.ch/mailman/listinfo/r-help
   PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
   and provide commented, minimal, self-contained, reproducible code.
 
 

[[alternative HTML version deleted]]


 

Re: [R] help with statistics in R - how to measure the effect of users in groups

2011-10-10 Thread Bert Gunter
I should have added...

If your design is not nearly balanced, main effects and interactions will
not have any natural interpretation because they will be (partially)
confounded. (I realize nearly is not a very useful characterization, but I
do not know a better one, as it probably depends on the scientific context
of your data).

Again, if you do not know what this means, get statistical help as I
previously suggested. Or you might want to try the stats.stackexchange.com
website.

-- Bert

On Mon, Oct 10, 2011 at 7:06 AM, Bert Gunter bgun...@gene.com wrote:

 Assuming your data are in a data frame, yourdat,  as:

 User   Group   Value
 u1 1  !0
 u2 2 5
 u3  3  NA
 ...(etc)

 where Group is **explicitly coerced to be a factor,**
 then you want the User x Group interaction, obtained from

 lm( Value ~ Group*User,data = yourdat)

 However, you'll get some kind of warning message if

 a) Not all Group x User combinations are present in the data

 b) Moreover, no statistics can be calculated if there are no replicates of
 UserxGroup combinations.

 If you do not know why either of these are the case, get local help or
 study any linear models (regression) text or online tutorial, as these last
 issues have nothing to do with R.

 -- Bert


 On Mon, Oct 10, 2011 at 3:48 AM, gj gaw...@gmail.com wrote:

 Thanks Petr. I will try it on the real data.

 But that will only show that the groups are different or not.
 Is there any way I can test if the users are different when they are in
 different groups?

 Regards
 Gawesh

 On Mon, Oct 10, 2011 at 11:17 AM, Petr PIKAL petr.pi...@precheza.cz
 wrote:

  
   Hi Petr,
  
   It's not an equation. It's my mistake; the * are meant to be field
   separators for the example data. I should have just use blank spaces
 as
   follows:
  
   users   Group1   Group2   Group3
   u110   5N/A
   u2 6  N/A  4
   u3 5   23
  
  
   Regards
   Gawesh
 
  OK. You shall transform your data to long format to use lm
 
  test - read.table(clipboard, header=T, na.strings=N/A)
  test.m-melt(test)
  Using users as id variables
  fit-lm(value~variable, data=test.m)
  summary(fit)
 
  Call:
  lm(formula = value ~ variable, data = test.m)
 
  Residuals:
1234689
   3.0 -1.0 -2.0  1.5 -1.5  0.5 -0.5
 
  Coefficients:
Estimate Std. Error t value Pr(|t|)
  (Intercept)   7.000  1.258   5.563 0.00511 **
  variableGroup2   -3.500  1.990  -1.759 0.15336
  variableGroup3   -3.500  1.990  -1.759 0.15336
  ---
  Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
 
  Residual standard error: 2.179 on 4 degrees of freedom
   (2 observations deleted due to missingness)
  Multiple R-squared: 0.525,  Adjusted R-squared: 0.2875
  F-statistic: 2.211 on 2 and 4 DF,  p-value: 0.2256
 
  No difference among groups, but I am not sure if this is the correct way
  to evaluate.
 
  library(ggplot2)
  p-ggplot(test.m, aes(x=variable, y=value, colour=users))
  p+geom_point()
 
  There is some sign that user3 has lowest value in each group. However
 for
  including users to fit there is not enough data.
 
  Regards
  Petr
 
 
  
  
   On Mon, Oct 10, 2011 at 9:32 AM, Petr PIKAL petr.pi...@precheza.cz
  wrote:
  
Hi
   
I do not understand much about your equations. I think you shall
 look
  to
Practical Regression and Anova Using R from J.Faraway.
   
Having data frame DF with columns - users, groups, results you could
  do
   
fit - lm(results~groups, data = DF)
   
Regards
Petr
   
   
   
   

 Hi,

 I'm a newbie to R. My knowledge of statistics is mostly
 self-taught.
  My
 problem is how to measure the effect of users in groups. I can
  calculate
a
 particular attribute for a user in a group. But my hypothesis is
  that
the
 user's attribute is not independent of each other and that the
  user's
 attribute depends on the group ie that user's behaviour change
 based
  on
the
 group.

 Let me give an example:

 users*Group 1*Group 2*Group 3
 u1*10*5*n/a
 u2*6*n/a*4
 u3*5*2*3

 For example, I want to be able to prove that u1 behaviour is
  different
in
 group 1 than other groups and the particular thing about Group 1
 is
  that
 users in Group 1 tend to have a higher value of the attribute
 under
 measurement.


 Hence, can use R to test my hypothesis. I'm willing to learn; so
 if
  this
is
 very simple, just point me in the direction of any online
 resources
about
 it. At the moment, I don't even how to define these class of
  problems?
That
 will be a start.

 Regards
 Gawesh

[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 

Re: [R] help with statistics in R - how to measure the effect of users in groups

2011-10-10 Thread Anupam
Groups are different treatments given to Users for your Outcome
(measurement) of interest. Take this idea forward and you will have an
answer. 

Anupam.
-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On
Behalf Of Bert Gunter
Sent: Monday, October 10, 2011 7:36 PM
To: gj
Cc: r-help@r-project.org
Subject: Re: [R] help with statistics in R - how to measure the effect of
users in groups

Assuming your data are in a data frame, yourdat,  as:

User   Group   Value
u1 1  !0
u2 2 5
u3  3  NA
...(etc)

where Group is **explicitly coerced to be a factor,** then you want the User
x Group interaction, obtained from

lm( Value ~ Group*User,data = yourdat)

However, you'll get some kind of warning message if

a) Not all Group x User combinations are present in the data

b) Moreover, no statistics can be calculated if there are no replicates of
UserxGroup combinations.

If you do not know why either of these are the case, get local help or study
any linear models (regression) text or online tutorial, as these last issues
have nothing to do with R.

-- Bert


On Mon, Oct 10, 2011 at 3:48 AM, gj gaw...@gmail.com wrote:

 Thanks Petr. I will try it on the real data.

 But that will only show that the groups are different or not.
 Is there any way I can test if the users are different when they are 
 in different groups?

 Regards
 Gawesh

 On Mon, Oct 10, 2011 at 11:17 AM, Petr PIKAL petr.pi...@precheza.cz
 wrote:

  
   Hi Petr,
  
   It's not an equation. It's my mistake; the * are meant to be field 
   separators for the example data. I should have just use blank 
   spaces as
   follows:
  
   users   Group1   Group2   Group3
   u110   5N/A
   u2 6  N/A  4
   u3 5   23
  
  
   Regards
   Gawesh
 
  OK. You shall transform your data to long format to use lm
 
  test - read.table(clipboard, header=T, na.strings=N/A)
  test.m-melt(test)
  Using users as id variables
  fit-lm(value~variable, data=test.m)
  summary(fit)
 
  Call:
  lm(formula = value ~ variable, data = test.m)
 
  Residuals:
1234689
   3.0 -1.0 -2.0  1.5 -1.5  0.5 -0.5
 
  Coefficients:
Estimate Std. Error t value Pr(|t|)
  (Intercept)   7.000  1.258   5.563 0.00511 **
  variableGroup2   -3.500  1.990  -1.759 0.15336
  variableGroup3   -3.500  1.990  -1.759 0.15336
  ---
  Signif. codes:  0 *** 0.001 ** 0.01 * 0.05 . 0.1   1
 
  Residual standard error: 2.179 on 4 degrees of freedom
   (2 observations deleted due to missingness)
  Multiple R-squared: 0.525,  Adjusted R-squared: 0.2875
  F-statistic: 2.211 on 2 and 4 DF,  p-value: 0.2256
 
  No difference among groups, but I am not sure if this is the correct 
  way to evaluate.
 
  library(ggplot2)
  p-ggplot(test.m, aes(x=variable, y=value, colour=users))
  p+geom_point()
 
  There is some sign that user3 has lowest value in each group. 
  However for including users to fit there is not enough data.
 
  Regards
  Petr
 
 
  
  
   On Mon, Oct 10, 2011 at 9:32 AM, Petr PIKAL 
   petr.pi...@precheza.cz
  wrote:
  
Hi
   
I do not understand much about your equations. I think you shall 
look
  to
Practical Regression and Anova Using R from J.Faraway.
   
Having data frame DF with columns - users, groups, results you 
could
  do
   
fit - lm(results~groups, data = DF)
   
Regards
Petr
   
   
   
   

 Hi,

 I'm a newbie to R. My knowledge of statistics is mostly
 self-taught.
  My
 problem is how to measure the effect of users in groups. I can
  calculate
a
 particular attribute for a user in a group. But my hypothesis 
 is
  that
the
 user's attribute is not independent of each other and that the
  user's
 attribute depends on the group ie that user's behaviour change
 based
  on
the
 group.

 Let me give an example:

 users*Group 1*Group 2*Group 3
 u1*10*5*n/a
 u2*6*n/a*4
 u3*5*2*3

 For example, I want to be able to prove that u1 behaviour is
  different
in
 group 1 than other groups and the particular thing about Group 
 1 is
  that
 users in Group 1 tend to have a higher value of the attribute 
 under measurement.


 Hence, can use R to test my hypothesis. I'm willing to learn; 
 so if
  this
is
 very simple, just point me in the direction of any online 
 resources
about
 it. At the moment, I don't even how to define these class of
  problems?
That
 will be a start.

 Regards
 Gawesh

[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list 
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
 and 

Re: [R] Importing from Fortan

2011-10-10 Thread David Winsemius


On Oct 10, 2011, at 7:45 AM, Spartina wrote:


Hello all,

how do I import a Fortran file (f3.1) into one column in R? I've  
tried this

(I'm a total beginner as you can see):


FortranData-read.fwf(C:\\Users\\format3_1.txt,rep(3,20))

Warning message:
In readLines(file, n = thisblock) :
 incomplete final line found on 'C:\Users\format3_1.txt'

FortranData
  V1  V2  V3  V4  V5  V6  V7  V8  V9 V10 V11 V12 V13 V14 V15 V16 V17  
V18

V19 V20
1 2.2 3.3 4.2 2.1 3.4 2.3 2.3 4.2 2.1 3.4 2.3 2.3 4.2 2.1 3.4 2.3  
2.3 4.2

2.1 3.4

As you can see, each datum gets imported into a separate column,  
whereas I'd

like to have everything stored under V1.


Have you considered transposing the result? If this is a multiple line  
file that you wnat to straighten out completely you could first  
convert to matrix, transpose, and then wrap in c() to get a vector of  
values.



I'm also unsure as to what the
error message means.


It's not an error. It's only a warning. It just menans there is not  
end of line marker on the last line.




Thanks in advance for your help!

Léa

--
View this message in context: 
http://r.789695.n4.nabble.com/Importing-from-Fortan-tp3889947p3889947.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Multiple imputation on subgroups

2011-10-10 Thread Weidong Gu
An ad-hoc method is to impute missing scores of the whole data set
including subgroup1, then change imputed scores in subgroup1 into NA.

Weidong Gu

On Mon, Oct 10, 2011 at 5:35 AM, Sarah s1327...@student.rug.nl wrote:
 Dear R-users,

 I want to multiple impute missing scores, but only for a few subgroups in my
 data (variable 'subgroups':  only impute for subgroups 2 and 3).
 Does anyone knows how to do this in MICE?

 This is my script for the multiple imputation:
 imp - mice(data, m=20, predictorMatrix=pred, post=post,
        method=c(, , , , ,norm, 
 norm,norm,norm,norm,norm),
        maxit=20)       .

 The final analysis should be on the dataset as a whole, so with subgroups 2
 and 3 with observed and imputed values, and for subgroup 1 with observed
 values only (and missing scores).

 Thanks.

 --
 View this message in context: 
 http://r.789695.n4.nabble.com/Multiple-imputation-on-subgroups-tp3889664p3889664.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] calculate multiple means of one vector

2011-10-10 Thread Martin Batholdy
Dear R-Users,


I have the following two vectors:

data   -   rnorm(40, 0, 2)

positions   -   c(3, 4, 5, 8, 9, 10, 20, 21, 22, 30, 31, 32)
 

now I would like to calculate the mean of every chunk of data-points (of the 
data-vector) as defined by the positions-vector.


So I would like to get a vector with the mean of element 3 to 5 of the 
data-vector, 8 to 10, 20 to 22 and so on.


The gaps between the chunks are arbitrary. There is no pattern (meaning the gap 
from 5 to 8, 10 to 20, 22 to 30 etc.)
But the chunks are always of length n (in this case 3).


Is there a convenient way to do this without using a for-loop?


thanks!
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] fast or space-efficient lookup?

2011-10-10 Thread Steve Lianoglou
Hi Ivo,

On Mon, Oct 10, 2011 at 10:58 AM, ivo welch ivo.we...@gmail.com wrote:
 hi steve---agreed...but is there any other computer language in which
 an expression in a [ . ] is anything except a tensor index selector?

Sure, it's a type specifier in scala generics:
http://www.scala-lang.org/node/113

Something similar to scale-eez in haskell.

Aslo, MATLAB (ugh) it's not even a tensor selector (they use normal
parens there).

But I'm not sure what that has to do w/ the price of tea in china.

With data.table, [ still is tensor-selector like, though. You can
just pass in another data.table to use as the keys to do your
selection through the `i` argument (like selecting rows), which I
guess will likely be your most common use case if you're moving to
data.table (presumably you are trying to take advantage of its
quickness over big-table-like objects.

You can use the `j` param to further manipulate columns. If you pass
in a data.table as `i`, it will add its columns to `j`.

I'll grant you that it is different than your standard rectangular
object selection in R, but the motivation isn't so strange as both
i,j params in normal calls to 'xxx[i,j]' are for selecting (ok not
manipulating) rows and columns on other rectangular like objects,
too.

-steve

-- 
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] how to extent the improveProb for survival data

2011-10-10 Thread zhu yao
Dear R users

Function improveProb in the rms library calculate NRI and IDI for
predictions of binary outcome.
Do anyone extent its use in survival data?

Many thanks.


*Yao Zhu*
*Department of Urology
Fudan University Shanghai Cancer Center
Shanghai, China*

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] How to draw 4 random weights that sum up to 1?

2011-10-10 Thread Alexander Engelhardt

Hey list,

This might be a more general question and not that R-specific. Sorry for 
that.


I'm trying to draw a random vector of 4 numbers that sum up to 1.
My first approach was something like

  a - runif(1)
  b - runif(1, max=1-a)
  c - runif(1, max=1-a-b)
  d - 1-a-b-c

but this kind of distorts the results, right?
Would the following be a good approach?

  w - sample(1:100, 4, replace=TRUE)
  w - w/sum(w)

I'd prefer a general algorithm-kind of answer to a specific R function 
(if there is any). Although a function name would help too, if I can 
sourcedive.


Thanks in advance,
 Alex

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Superposing mean line to xyplot

2011-10-10 Thread Niccolò Bassani
Dear R-users,
I'm using lattice package and function xyplot for the first time so
you will excuse me for my inexperience. I'm facing quite a simple
problem but I'm having troubles on how to solve it, I've read tons of
old mails in the archives and looked at some slides from Deepayan
Sarkar but still can not get the point.

This is the context. I've got data on 9 microRNAs, each miRNA has been
measured on three different arrays and on each array I have 4
replicates for each miRNA, which sums up to a total of 108
measurements. I've the suspect that measurement on the first array are
systematically lower than the others so I wanted to draw some line
plot where each panel correspond to a miRNA, and each line correspond
to one of the four replicates (that is: first replicate of miRNA A on
array 1 must be connected to first replicate of miRNA A on array 2 and
so on), so that for each panel there are 4 series of three points
connected by a line/segment. I've done this easily with lattice doing
this:

array = rep(c(A,B,C),each = 36) # array replicate
spot =  rep(1:4,27) # miRNA replicate on each array
miRNA = rep(rep(paste(miRNA,1:9,sep=.),each=4),3) # miRNA label
exprs = rnorm(mean=2.8,n = 108) # intensity
data = data.frame(miRNA,array,spot,exprs)
xyplot(exprs ~ array|miRNA,data=data,type=b,groups=spot,xlab=Array,ylab
= Intensity,col=black,lty=2:5,scales = list(y = list(relation =
free)))

Now, I want to superpose to each panel an other series of three points
connected by a line, where each point represent the mean of the four
replicates of the miRNA on each array, a sort of mean line. I've tried
using the following, but it's not working as expected:

xyplot(exprs ~ array|miRNA,data=array,type=b,groups=spot,xlab=Array,ylab
= Intensity,col=black,lty=2:5,scales = list(y = list(relation =
free)), panel = function(x,y,groups,subscripts){
panel.xyplot(x,y,groups=groups,subscripts=subscripts)

panel.superpose(x,y,panel.groups=panel.average,groups=groups,subscripts=subscripts)
})

This is maybe a silly question and possibly there's a trivial way to
do it, but I can not figure it out.

Thanx for any help.

niccolò

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] fast or space-efficient lookup?

2011-10-10 Thread Matthew Dowle
Ivo,

Also, perhaps FAQ 2.14 helps : Can you explain further why
data.table is inspired by A[B] syntax in base?

http://datatable.r-forge.r-project.org/datatable-faq.pdf

And, 2.15 and 2.16.

Matthew

Steve Lianoglou mailinglist.honey...@gmail.com wrote in message 
news:CAHA9McPQ4P-a2imjm=szgjfxyx0faw0j79fwq2e87dqkf9j...@mail.gmail.com...
Hi Ivo,

On Mon, Oct 10, 2011 at 10:58 AM, ivo welch ivo.we...@gmail.com wrote:
 hi steve---agreed...but is there any other computer language in which
 an expression in a [ . ] is anything except a tensor index selector?

Sure, it's a type specifier in scala generics:
http://www.scala-lang.org/node/113

Something similar to scale-eez in haskell.

Aslo, MATLAB (ugh) it's not even a tensor selector (they use normal
parens there).

But I'm not sure what that has to do w/ the price of tea in china.

With data.table, [ still is tensor-selector like, though. You can
just pass in another data.table to use as the keys to do your
selection through the `i` argument (like selecting rows), which I
guess will likely be your most common use case if you're moving to
data.table (presumably you are trying to take advantage of its
quickness over big-table-like objects.

You can use the `j` param to further manipulate columns. If you pass
in a data.table as `i`, it will add its columns to `j`.

I'll grant you that it is different than your standard rectangular
object selection in R, but the motivation isn't so strange as both
i,j params in normal calls to 'xxx[i,j]' are for selecting (ok not
manipulating) rows and columns on other rectangular like objects,
too.

-steve

-- 
Steve Lianoglou
Graduate Student: Computational Systems Biology
| Memorial Sloan-Kettering Cancer Center
| Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Superposing mean line to xyplot

2011-10-10 Thread Niccolò Bassani
Dear R-users,
I'm using lattice package and function xyplot for the first time so
you will excuse me for my inexperience. I'm facing quite a simple
problem but I'm having troubles on how to solve it, I've read tons of
old mails in the archives and looked at some slides from Deepayan
Sarkar but still can not get the point.

This is the context. I've got data on 9 microRNAs, each miRNA has been
measured on three different arrays and on each array I have 4
replicates for each miRNA, which sums up to a total of 108
measurements. I've the suspect that measurement on the first array are
systematically lower than the others so I wanted to draw some line
plot where each panel correspond to a miRNA, and each line correspond
to one of the four replicates (that is: first replicate of miRNA A on
array 1 must be connected to first replicate of miRNA A on array 2 and
so on), so that for each panel there are 4 series of three points
connected by a line/segment. I've done this easily with lattice doing
this:

array = rep(c(A,B,C),each = 36) # array replicate
spot =  rep(1:4,27) # miRNA replicate on each array
miRNA = rep(rep(paste(miRNA,1:9,sep=.),each=4),3) # miRNA label
exprs = rnorm(mean=2.8,n = 108) # intensity
data = data.frame(miRNA,array,spot,exprs)
xyplot(exprs ~ array|miRNA,data=data,type=b,groups=spot,xlab=Array,ylab
= Intensity,col=black,lty=2:5,scales = list(y = list(relation =
free)))

Now, I want to superpose to each panel an other series of three points
connected by a line, where each point represent the mean of the four
replicates of the miRNA on each array, a sort of mean line. I've tried
using the following, but it's not working as expected:

xyplot(exprs ~ array|miRNA,data=array,type=b,groups=spot,xlab=Array,ylab
= Intensity,col=black,lty=2:5,scales = list(y = list(relation =
free)), panel = function(x,y,groups,subscripts){
       panel.xyplot(x,y,groups=groups,subscripts=subscripts)
       
panel.superpose(x,y,panel.groups=panel.average,groups=groups,subscripts=subscripts)
})

This is maybe a silly question and possibly there's a trivial way to
do it, but I can not figure it out.

Thanx for any help.

niccolò

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] calculate multiple means of one vector

2011-10-10 Thread Dennis Murphy
Hi:

Here's one approach:

dat   -   rnorm(40, 0, 2)
positions   -  matrix(c(3, 4, 5,  8, 9, 10, 20, 21, 22, 30, 31, 32),
ncol = 3, byrow = TRUE)
# Subdata
t(apply(positions, 1, function(x) dat[x]))
  [,1]  [,2]   [,3]
[1,] 0.5679765  1.429396  2.9050931
[2,] 4.0878845 -2.569012  2.1209280
[3,] 4.0295221 -2.659358 -1.3566887
[4,] 1.3109707 -1.745255 -0.9462857
# means of subdata
 apply(positions, 1, function(x) mean(dat[x]))
[1]  1.63415529  1.21326693  0.00449164 -0.46019018
# or
colMeans(apply(positions, 1, function(x) dat[x]))

HTH,
Dennis

On Mon, Oct 10, 2011 at 7:56 AM, Martin Batholdy
batho...@googlemail.com wrote:
 Dear R-Users,


 I have the following two vectors:

 data   -   rnorm(40, 0, 2)

 positions   -   c(3, 4, 5,     8, 9, 10,     20, 21, 22,     30, 31, 32)


 now I would like to calculate the mean of every chunk of data-points (of the 
 data-vector) as defined by the positions-vector.


 So I would like to get a vector with the mean of element 3 to 5 of the 
 data-vector, 8 to 10, 20 to 22 and so on.


 The gaps between the chunks are arbitrary. There is no pattern (meaning the 
 gap from 5 to 8, 10 to 20, 22 to 30 etc.)
 But the chunks are always of length n (in this case 3).


 Is there a convenient way to do this without using a for-loop?


 thanks!
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] help with statistics in R - how to measure the effect of users in groups

2011-10-10 Thread gj
Hi Bert,

The real situation is like what you suggested, user x group interactions.
The users can be in more than one group.
In fact, the data that I am trying to analyse constitute of users, online
forums as groups and the attribute under measure is the number of posts made
by each user in a particular forum.

My hypothesis is that the number of posts a user makes to a forum is
dependent on the forum. For example if the user is in a forum that is active
he contributes more compared to when he is in a forum that is less active. I
guess there will be some users who contribute the same irrespective of the
forum.

I hope this makes sense.

Regards
Gawesh

On Mon, Oct 10, 2011 at 4:50 PM, Bert Gunter gunter.ber...@gene.com wrote:

 Yes, of course. But then one gets into additional problems with carryover
 effects,etc.
 Also, one then has a repeated measures problem (User is the experimental
 unit) and my previous advice is nonsense,

 Like you, I have no idea what his real situation is.

 -- Bert


 On Mon, Oct 10, 2011 at 8:39 AM, Anupam anupa...@gmail.com wrote:

 It is possible to give multiple treatments, one at a time, to same pool of
 patients. You are correct that interactions may be important in this
 problem. I am only trying to help him frame the problem using an analogy.
 

 ** **

 Anupam.

 *From:* Bert Gunter [mailto:gunter.ber...@gene.com]
 *Sent:* Monday, October 10, 2011 8:21 PM
 *To:* Anupam
 *Cc:* gj
 *Subject:* Re: [R] help with statistics in R - how to measure the effect
 of users in groups

 ** **

 If that is the case, and each user can appear in only one group, there is
 no group x user interaction, the poster's question was nonsense, and one
 analyzes the group effect only, as originally shown

 -- Bert

 On Mon, Oct 10, 2011 at 7:43 AM, Anupam anupa...@gmail.com wrote:

 Groups are different treatments given to Users for your Outcome
 (measurement) of interest. Take this idea forward and you will have an
 answer.

 Anupam.
 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
 On
 Behalf Of Bert Gunter
 Sent: Monday, October 10, 2011 7:36 PM
 To: gj
 Cc: r-help@r-project.org
 Subject: Re: [R] help with statistics in R - how to measure the effect of
 users in groups

 Assuming your data are in a data frame, yourdat,  as:

 User   Group   Value
 u1 1  !0
 u2 2 5
 u3  3  NA
 ...(etc)

 where Group is **explicitly coerced to be a factor,** then you want the
 User
 x Group interaction, obtained from

 lm( Value ~ Group*User,data = yourdat)

 However, you'll get some kind of warning message if

 a) Not all Group x User combinations are present in the data

 b) Moreover, no statistics can be calculated if there are no replicates of
 UserxGroup combinations.

 If you do not know why either of these are the case, get local help or
 study
 any linear models (regression) text or online tutorial, as these last
 issues
 have nothing to do with R.

 -- Bert


 On Mon, Oct 10, 2011 at 3:48 AM, gj gaw...@gmail.com wrote:

  Thanks Petr. I will try it on the real data.
 
  But that will only show that the groups are different or not.
  Is there any way I can test if the users are different when they are
  in different groups?
 
  Regards
  Gawesh
 
  On Mon, Oct 10, 2011 at 11:17 AM, Petr PIKAL petr.pi...@precheza.cz
  wrote:
 
   
Hi Petr,
   
It's not an equation. It's my mistake; the * are meant to be field
separators for the example data. I should have just use blank
spaces as
follows:
   
users   Group1   Group2   Group3
u110   5N/A
u2 6  N/A  4
u3 5   23
   
   
Regards
Gawesh
  
   OK. You shall transform your data to long format to use lm
  
   test - read.table(clipboard, header=T, na.strings=N/A)
   test.m-melt(test)
   Using users as id variables
   fit-lm(value~variable, data=test.m)
   summary(fit)
  
   Call:
   lm(formula = value ~ variable, data = test.m)
  
   Residuals:
 1234689
3.0 -1.0 -2.0  1.5 -1.5  0.5 -0.5
  
   Coefficients:
 Estimate Std. Error t value Pr(|t|)
   (Intercept)   7.000  1.258   5.563 0.00511 **
   variableGroup2   -3.500  1.990  -1.759 0.15336
   variableGroup3   -3.500  1.990  -1.759 0.15336
   ---
   Signif. codes:  0  ***  0.001  **  0.01  *  0.05  .  0.1 1
  
   Residual standard error: 2.179 on 4 degrees of freedom
(2 observations deleted due to missingness)
   Multiple R-squared: 0.525,  Adjusted R-squared: 0.2875
   F-statistic: 2.211 on 2 and 4 DF,  p-value: 0.2256
  
   No difference among groups, but I am not sure if this is the correct
   way to evaluate.
  
   library(ggplot2)
   p-ggplot(test.m, aes(x=variable, y=value, colour=users))
   p+geom_point()
  
   There is some sign that user3 has lowest value in each group.
   However for including users to 

Re: [R] How to draw 4 random weights that sum up to 1?

2011-10-10 Thread Greg Snow
You probably want to generate data from a Dirichlet distribution.  There are 
some functions in packages that will do this and give you more background, or 
you can just generate 4 numbers from an exponential (or gamma) distribution and 
divide them by their sum.

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.org
801.408.8111


 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-bounces@r-
 project.org] On Behalf Of Alexander Engelhardt
 Sent: Monday, October 10, 2011 10:11 AM
 To: r-help
 Subject: [R] How to draw 4 random weights that sum up to 1?
 
 Hey list,
 
 This might be a more general question and not that R-specific. Sorry
 for
 that.
 
 I'm trying to draw a random vector of 4 numbers that sum up to 1.
 My first approach was something like
 
a - runif(1)
b - runif(1, max=1-a)
c - runif(1, max=1-a-b)
d - 1-a-b-c
 
 but this kind of distorts the results, right?
 Would the following be a good approach?
 
w - sample(1:100, 4, replace=TRUE)
w - w/sum(w)
 
 I'd prefer a general algorithm-kind of answer to a specific R function
 (if there is any). Although a function name would help too, if I can
 sourcedive.
 
 Thanks in advance,
   Alex
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-
 guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to draw 4 random weights that sum up to 1?

2011-10-10 Thread Uwe Ligges



On 10.10.2011 18:10, Alexander Engelhardt wrote:

Hey list,

This might be a more general question and not that R-specific. Sorry for
that.

I'm trying to draw a random vector of 4 numbers that sum up to 1.
My first approach was something like

a - runif(1)
b - runif(1, max=1-a)
c - runif(1, max=1-a-b)
d - 1-a-b-c

but this kind of distorts the results, right?
Would the following be a good approach?

w - sample(1:100, 4, replace=TRUE)
w - w/sum(w)


Yes, although better combine both ways to

w - runif(4)
w - w / sum(w)

Uwe Ligges




I'd prefer a general algorithm-kind of answer to a specific R function
(if there is any). Although a function name would help too, if I can
sourcedive.







Thanks in advance,
Alex

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] help with statistics in R - how to measure the effect of users in groups

2011-10-10 Thread Bert Gunter
OK. So my original advice and warnings are correct.

However, now there is an additional wrinkle because your response is a
count, which is not a continuous measurement. For this, you'll need glm(...,
family = poisson) instead of lm(...), where the ... is the stuff I gave
you before. A backup approach is there aren't too many small counts (below
about 10, say) is to take the square root of the counts and analyze that via
lm().

In either approach, your interpretation becomes more difficult -- e.g. have
you any experience with glm's = generalized linear models? Moreover, if
there are large numbers of users -- e.g.  dozens (and you may have hundreds
or thousands -- of course the interaction will be significant, but so what?
For this you'll need to re-frame the question.

So given all this and what appears to be your relative ignorance of
statistics, I strongly recommend that you get local statistical help. Or
just forget about formal statistical analysis altogether and do some
sensible plotting.

Finally, that's it for me on this. I will offer you no more advice.

-- Bert

On Mon, Oct 10, 2011 at 9:40 AM, gj gaw...@gmail.com wrote:

 Hi Bert,

 The real situation is like what you suggested, user x group interactions.
 The users can be in more than one group.
 In fact, the data that I am trying to analyse constitute of users, online
 forums as groups and the attribute under measure is the number of posts made
 by each user in a particular forum.

 My hypothesis is that the number of posts a user makes to a forum is
 dependent on the forum. For example if the user is in a forum that is active
 he contributes more compared to when he is in a forum that is less active. I
 guess there will be some users who contribute the same irrespective of the
 forum.

 I hope this makes sense.

 Regards
 Gawesh

 On Mon, Oct 10, 2011 at 4:50 PM, Bert Gunter gunter.ber...@gene.comwrote:

 Yes, of course. But then one gets into additional problems with carryover
 effects,etc.
 Also, one then has a repeated measures problem (User is the experimental
 unit) and my previous advice is nonsense,

 Like you, I have no idea what his real situation is.

 -- Bert


 On Mon, Oct 10, 2011 at 8:39 AM, Anupam anupa...@gmail.com wrote:

 It is possible to give multiple treatments, one at a time, to same pool
 of patients. You are correct that interactions may be important in this
 problem. I am only trying to help him frame the problem using an analogy.
 

 ** **

 Anupam.

 *From:* Bert Gunter [mailto:gunter.ber...@gene.com]
 *Sent:* Monday, October 10, 2011 8:21 PM
 *To:* Anupam
 *Cc:* gj
 *Subject:* Re: [R] help with statistics in R - how to measure the effect
 of users in groups

 ** **

 If that is the case, and each user can appear in only one group, there is
 no group x user interaction, the poster's question was nonsense, and one
 analyzes the group effect only, as originally shown

 -- Bert

 On Mon, Oct 10, 2011 at 7:43 AM, Anupam anupa...@gmail.com wrote:

 Groups are different treatments given to Users for your Outcome
 (measurement) of interest. Take this idea forward and you will have an
 answer.

 Anupam.
 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
 On
 Behalf Of Bert Gunter
 Sent: Monday, October 10, 2011 7:36 PM
 To: gj
 Cc: r-help@r-project.org
 Subject: Re: [R] help with statistics in R - how to measure the effect of
 users in groups

 Assuming your data are in a data frame, yourdat,  as:

 User   Group   Value
 u1 1  !0
 u2 2 5
 u3  3  NA
 ...(etc)

 where Group is **explicitly coerced to be a factor,** then you want the
 User
 x Group interaction, obtained from

 lm( Value ~ Group*User,data = yourdat)

 However, you'll get some kind of warning message if

 a) Not all Group x User combinations are present in the data

 b) Moreover, no statistics can be calculated if there are no replicates
 of
 UserxGroup combinations.

 If you do not know why either of these are the case, get local help or
 study
 any linear models (regression) text or online tutorial, as these last
 issues
 have nothing to do with R.

 -- Bert


 On Mon, Oct 10, 2011 at 3:48 AM, gj gaw...@gmail.com wrote:

  Thanks Petr. I will try it on the real data.
 
  But that will only show that the groups are different or not.
  Is there any way I can test if the users are different when they are
  in different groups?
 
  Regards
  Gawesh
 
  On Mon, Oct 10, 2011 at 11:17 AM, Petr PIKAL petr.pi...@precheza.cz
  wrote:
 
   
Hi Petr,
   
It's not an equation. It's my mistake; the * are meant to be field
separators for the example data. I should have just use blank
spaces as
follows:
   
users   Group1   Group2   Group3
u110   5N/A
u2 6  N/A  4
u3 5   23
   
   
Regards
Gawesh
  
   OK. You shall transform your data to long format to 

Re: [R] Superposing mean line to xyplot

2011-10-10 Thread Dennis Murphy
Hi:

Here's one way to do it, adding the latticeExtra package:

array = rep(c(A,B,C),each = 36) # array replicate
spot =  rep(1:4,27) # miRNA replicate on each array
miRNA = rep(rep(paste(miRNA,1:9,sep=.),each=4),3) # miRNA label
exprs = rnorm(mean=2.8,n = 108) # intensity
dat = data.frame(miRNA,array,spot,exprs)

library(latticeExtra)
p0 - xyplot(exprs ~ array|miRNA, data=dat, type=b, groups = spot,
  xlab=Array, ylab = Intensity, col=black, lty = 2:5,
  scales = list(y = list(relation = free))
  )
p1 - xyplot(exprs ~ array|miRNA, data=dat, type=a,
  xlab=Array, ylab = Intensity, col=red, lty = 1,
  lwd = 2, scales = list(y = list(relation = free))
  )

p0 + p1

You can also write a panel function to do this if you wish.

HTH,
Dennis

2011/10/10 Niccolò Bassani biostatist...@gmail.com:
 Dear R-users,
 I'm using lattice package and function xyplot for the first time so
 you will excuse me for my inexperience. I'm facing quite a simple
 problem but I'm having troubles on how to solve it, I've read tons of
 old mails in the archives and looked at some slides from Deepayan
 Sarkar but still can not get the point.

 This is the context. I've got data on 9 microRNAs, each miRNA has been
 measured on three different arrays and on each array I have 4
 replicates for each miRNA, which sums up to a total of 108
 measurements. I've the suspect that measurement on the first array are
 systematically lower than the others so I wanted to draw some line
 plot where each panel correspond to a miRNA, and each line correspond
 to one of the four replicates (that is: first replicate of miRNA A on
 array 1 must be connected to first replicate of miRNA A on array 2 and
 so on), so that for each panel there are 4 series of three points
 connected by a line/segment. I've done this easily with lattice doing
 this:

 array = rep(c(A,B,C),each = 36) # array replicate
 spot =  rep(1:4,27) # miRNA replicate on each array
 miRNA = rep(rep(paste(miRNA,1:9,sep=.),each=4),3) # miRNA label
 exprs = rnorm(mean=2.8,n = 108) # intensity
 data = data.frame(miRNA,array,spot,exprs)
 xyplot(exprs ~ array|miRNA,data=data,type=b,groups=spot,xlab=Array,ylab
 = Intensity,col=black,lty=2:5,scales = list(y = list(relation =
 free)))

 Now, I want to superpose to each panel an other series of three points
 connected by a line, where each point represent the mean of the four
 replicates of the miRNA on each array, a sort of mean line. I've tried
 using the following, but it's not working as expected:

 xyplot(exprs ~ array|miRNA,data=array,type=b,groups=spot,xlab=Array,ylab
 = Intensity,col=black,lty=2:5,scales = list(y = list(relation =
 free)), panel = function(x,y,groups,subscripts){
        panel.xyplot(x,y,groups=groups,subscripts=subscripts)
        
 panel.superpose(x,y,panel.groups=panel.average,groups=groups,subscripts=subscripts)
 })

 This is maybe a silly question and possibly there's a trivial way to
 do it, but I can not figure it out.

 Thanx for any help.

 niccolò

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Merge Data by time stamps

2011-10-10 Thread Alaios
Dear all,
I have some device measurements and the time stamps I get from it have the 
below format:

MyStruct$TimeStamps[1,]
 [1] 2011.000   10.000    6.000   16.000   23.000   30.539

I can convert them easily with ISOdate() to a number and do the calculations I 
need.

One of my problems is that I want to gather my measurements to piles of 
duration (let's say) 5 minutes.
Afterwards I will apply a function to these piles.
As the device is not super-precise please find below the time needed for one 
operation to complete (in seconds)
.

1.10
1.90
1.34
1.23
1.56
1.22
1.34


as you understand I can not say that 5 minutes measurements are specific to X 
consecutive measurements but differ. How I can ask from R to do the summation 
and whenever there is a 5 minute data set to split it so to apply it into a 
function?

I would like to thank you in advance for your help

B.R
Alex

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] multicore by(), like mclapply?

2011-10-10 Thread ivo welch
dear r experts---Is there a multicore equivalent of by(), just like
mclapply() is the multicore equivalent of lapply()?

if not, is there a fast way to convert a data.table into a list based
on a column that lapply and mclapply can consume?

advice appreciated...as always.

regards,

/iaw

Ivo Welch (ivo.we...@gmail.com)

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] package dlm: dlmForecast()

2011-10-10 Thread Giovanni Petris
I haven't tried this, but I am pretty confident that using dlmFilter()
with fictitious future values of the observations set to NA should do
the job. 

Hope this helps,
Giovanni Petris

On Sat, 2011-10-08 at 13:21 +, YuHong wrote:
 
 May I have a question about dlmForecast() function in the package 'dlm'?
 
 This function 'dlmForecast()' currently only deals with constant
 models.  May anyone suggest on how to predict using non-constant
 model?  Thanks a lot!
  
 Regards,
  
 Hong Yu
  
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to draw 4 random weights that sum up to 1?

2011-10-10 Thread David Winsemius


On Oct 10, 2011, at 12:44 PM, Uwe Ligges wrote:




On 10.10.2011 18:10, Alexander Engelhardt wrote:

Hey list,

This might be a more general question and not that R-specific.  
Sorry for

that.

I'm trying to draw a random vector of 4 numbers that sum up to 1.
My first approach was something like

a - runif(1)
b - runif(1, max=1-a)
c - runif(1, max=1-a-b)
d - 1-a-b-c

but this kind of distorts the results, right?
Would the following be a good approach?

w - sample(1:100, 4, replace=TRUE)
w - w/sum(w)


Yes, although better combine both ways to

w - runif(4)
w - w / sum(w)


For the non-statisticians in the audience like myself who didn't know  
what that distribution might look like (it being difficult to  
visualize densities on your 3-dimensional manifold in 4-space),  here  
is my effort to get an appreciation:


 M4 - matrix(runif(4), ncol=4)
 M4 - M4/rowSums(M4)
# just a larger realization of Ligges' advice
 colMeans(M4)
[1] 0.2503946 0.2499594 0.2492118 0.2504342
 plot(density(M4[,1]))
 lines(density(M4[,2]),col=red)
 lines(density(M4[,3]),col=blue)
 lines(density(M4[,4]),col=green)

plot(density(rowSums(M4[,1:2])))

 plot(density(rowSums(M4[,1:3])))
plot(density(rowSums(M4[,2:4])))

# rather kewl results, noting that these are a reflecion around 0.5 of  
the single vector densities.




Uwe Ligges



I'd prefer a general algorithm-kind of answer to a specific R  
function

(if there is any). Although a function name would help too, if I can
sourcedive.


--

David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] multicore by(), like mclapply?

2011-10-10 Thread Joshua Wiley
Hi Ivo,

My suggestion would be to only pass lapply (or mclapply) the indices.
That should be fast, subsetting with data table should also be fast,
and then you do whatever computations you will.  For example:

require(data.table)
DT - data.table(x=rep(c(a,b,c),each=3), y=c(1,3,6), v=1:9)
setkey(DT, x)

lapply(as.character(unique(DT[,x])), function(i) DT[i])

the DT[i] object is the subset of the data table you want.  You can
pass this to whatever function for computations you need.

Hope this helps,

Josh


On Mon, Oct 10, 2011 at 10:41 AM, ivo welch ivo.we...@gmail.com wrote:
 dear r experts---Is there a multicore equivalent of by(), just like
 mclapply() is the multicore equivalent of lapply()?

 if not, is there a fast way to convert a data.table into a list based
 on a column that lapply and mclapply can consume?

 advice appreciated...as always.

 regards,

 /iaw
 
 Ivo Welch (ivo.we...@gmail.com)

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Joshua Wiley
Ph.D. Student, Health Psychology
Programmer Analyst II, ATS Statistical Consulting Group
University of California, Los Angeles
https://joshuawiley.com/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] multicore by(), like mclapply?

2011-10-10 Thread Matthew Dowle
Package plyr has .parallel.

Searching datatable-help for multicore, say on Nabble here,

http://r.789695.n4.nabble.com/datatable-help-f2315188.html

yields three relevant posts and examples.

Please check wiki do's and don'ts to make sure you didn't
fall into one of those traps, though (we don't know data or task so
just guessing) :

http://rwiki.sciviews.org/doku.php?id=packages:cran:data.table

HTH
Matthew

ivo welch ivo.we...@gmail.com wrote in message 
news:CAPr7RtUroPQtQvoh5uBuT60OYkwGR+ufGr_Z=g5g+vljeoj...@mail.gmail.com...
 dear r experts---Is there a multicore equivalent of by(), just like
 mclapply() is the multicore equivalent of lapply()?

 if not, is there a fast way to convert a data.table into a list based
 on a column that lapply and mclapply can consume?

 advice appreciated...as always.

 regards,

 /iaw
 
 Ivo Welch (ivo.we...@gmail.com)


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Merge Data by time stamps

2011-10-10 Thread David Winsemius


On Oct 10, 2011, at 1:28 PM, Alaios wrote:


Dear all,
I have some device measurements and the time stamps I get from it  
have the below format:


MyStruct$TimeStamps[1,]

[1] 2011.000   10.0006.000   16.000   23.000   30.539


I can convert them easily with ISOdate() to a number and do the  
calculations I need.


One of my problems is that I want to gather my measurements to piles  
of duration (let's say) 5 minutes.

Afterwards I will apply a function to these piles.
As the device is not super-precise please find below the time needed  
for one operation to complete (in seconds)

.

1.10
1.90
1.34
1.23
1.56
1.22
1.34



Assuming I understand your presentation and lacking  R-coded examples  
and desired output on which to test:


?cumsum
?cut



as you understand I can not say that 5 minutes measurements are  
specific to X consecutive measurements but differ. How I can ask  
from R to do the summation and whenever there is a 5 minute data set  
to split it so to apply it into a function?


I would like to thank you in advance for your help


--

David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to draw 4 random weights that sum up to 1?

2011-10-10 Thread Greg Snow
As an interesting extension to David's post, try:

M4.e - matrix(rexp(4,1), ncol=4)

Instead of the uniform and rerun the rest of the code (note the limits on the 
x-axis).

With 3 dimensions and the restriction we can plot in 2 dimensions to compare:

library(TeachingDemos)

m3.unif - matrix(runif(3000),  ncol=3)
m3.unif - m3.unif/rowSums(m3.unif)

m3.exp  - matrix(rexp(3000,1), ncol=3)
m3.exp  - m3.exp/rowSums(m3.exp)


dev.new()
triplot(m3.unif)

dev.new()
triplot(m3.exp)

now compare the 2 plots on the density of the points near the corners.


-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.org
801.408.8111


 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-bounces@r-
 project.org] On Behalf Of David Winsemius
 Sent: Monday, October 10, 2011 12:05 PM
 To: Uwe Ligges
 Cc: r-help; Alexander Engelhardt
 Subject: Re: [R] How to draw 4 random weights that sum up to 1?
 
 
 On Oct 10, 2011, at 12:44 PM, Uwe Ligges wrote:
 
 
 
  On 10.10.2011 18:10, Alexander Engelhardt wrote:
  Hey list,
 
  This might be a more general question and not that R-specific.
  Sorry for
  that.
 
  I'm trying to draw a random vector of 4 numbers that sum up to 1.
  My first approach was something like
 
  a - runif(1)
  b - runif(1, max=1-a)
  c - runif(1, max=1-a-b)
  d - 1-a-b-c
 
  but this kind of distorts the results, right?
  Would the following be a good approach?
 
  w - sample(1:100, 4, replace=TRUE)
  w - w/sum(w)
 
  Yes, although better combine both ways to
 
  w - runif(4)
  w - w / sum(w)
 
 For the non-statisticians in the audience like myself who didn't know
 what that distribution might look like (it being difficult to
 visualize densities on your 3-dimensional manifold in 4-space),  here
 is my effort to get an appreciation:
 
   M4 - matrix(runif(4), ncol=4)
   M4 - M4/rowSums(M4)
 # just a larger realization of Ligges' advice
   colMeans(M4)
 [1] 0.2503946 0.2499594 0.2492118 0.2504342
   plot(density(M4[,1]))
   lines(density(M4[,2]),col=red)
   lines(density(M4[,3]),col=blue)
   lines(density(M4[,4]),col=green)
 
 plot(density(rowSums(M4[,1:2])))
 
   plot(density(rowSums(M4[,1:3])))
 plot(density(rowSums(M4[,2:4])))
 
 # rather kewl results, noting that these are a reflecion around 0.5 of
 the single vector densities.
 
 
  Uwe Ligges
 
 
 
  I'd prefer a general algorithm-kind of answer to a specific R
  function
  (if there is any). Although a function name would help too, if I can
  sourcedive.
 
 --
 
 David Winsemius, MD
 West Hartford, CT
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-
 guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] multicore by(), like mclapply?

2011-10-10 Thread ivo welch
hi josh---thx.  I had a different version of this, and discarded it
because I think it was very slow.  the reason is that on each
application, your version has to scan my (very long) data vector.  (I
have many thousand different cases, too.)  I presume that by() has one
scan through the vector that makes all splits.

regards,

/iaw

Ivo Welch (ivo.we...@gmail.com)




On Mon, Oct 10, 2011 at 11:07 AM, Joshua Wiley jwiley.ps...@gmail.com wrote:
 Hi Ivo,

 My suggestion would be to only pass lapply (or mclapply) the indices.
 That should be fast, subsetting with data table should also be fast,
 and then you do whatever computations you will.  For example:

 require(data.table)
 DT - data.table(x=rep(c(a,b,c),each=3), y=c(1,3,6), v=1:9)
 setkey(DT, x)

 lapply(as.character(unique(DT[,x])), function(i) DT[i])

 the DT[i] object is the subset of the data table you want.  You can
 pass this to whatever function for computations you need.

 Hope this helps,

 Josh


 On Mon, Oct 10, 2011 at 10:41 AM, ivo welch ivo.we...@gmail.com wrote:
 dear r experts---Is there a multicore equivalent of by(), just like
 mclapply() is the multicore equivalent of lapply()?

 if not, is there a fast way to convert a data.table into a list based
 on a column that lapply and mclapply can consume?

 advice appreciated...as always.

 regards,

 /iaw
 
 Ivo Welch (ivo.we...@gmail.com)

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




 --
 Joshua Wiley
 Ph.D. Student, Health Psychology
 Programmer Analyst II, ATS Statistical Consulting Group
 University of California, Los Angeles
 https://joshuawiley.com/


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] pmml for random forest rules

2011-10-10 Thread Patrick McCann
Hi,

I am having some trouble using R 2.13.1 for generating a pmml object
of of class c('randomForest.formula', 'randomForest')

I see that these methods are available:
 methods(pmml)
 [1] pmml.coxph*pmml.hclust*   pmml.itemsets* pmml.kmeans*
pmml.ksvm* pmml.lm*   pmml.multinom* pmml.nnet*
pmml.rpart*
[10] pmml.rsf*  pmml.rules*pmml.survreg*


However, the R journal 1/1 pg 64 says there should be a method
available ( 
http://journal.r-project.org/2009-1/RJournal_2009-1_Guazzelli+et+al.pdf
):

Random Forest (and randomSurvivalForest)
— randomForest (Breiman and Cutler. R
port by A. Liaw and M. Wiener, 2009) and randomSurvivalForest
(Ishwaran and Kogalur ,
2009): PMML export of a randomSurvivalForest rsf object. This
function gives the user
the ability to export PMML containing the geometry of a forest.

However, if I run these lines of code:

library(randomForest)
(iris.rf- randomForest(Species ~ ., data=iris))
pmml(iris.rf)

I get this error:

Error in UseMethod(pmml) :
  no applicable method for 'pmml' applied to an object of class
c('randomForest.formula', 'randomForest')



Also, if I run these lines of code
data(Adult)
## Mine association rules.
rules - apriori(Adult,
 parameter = list(supp = 0.5, conf = 0.9,
  target = rules))
 pmml(rules)


I get this error:
 pmml(rules)
Error in function (classes, fdef, mtable)  :
  unable to find an inherited method for function size, for
signature itemMatrix


With this traceback:

 traceback()
5: stop(unable to find an inherited method for function \, fdef@generic,
   \, for signature , cnames)
4: function (classes, fdef, mtable)
   {
   methods - .findInheritedMethods(classes, fdef, mtable)
   if (length(methods) == 1L)
   return(methods[[1L]])
   else if (length(methods) == 0L) {
   cnames - paste(\, sapply(classes, as.character),
   \, sep = , collapse = , )
   stop(unable to find an inherited method for function \,
   fdef@generic, \, for signature , cnames)
   }
   else stop(Internal error in finding inherited methods; didn't
return a unique method)
   }(list(itemMatrix), function (object)
   standardGeneric(size), environment)
3: size(is.unique)
2: pmml.rules(rules)
1: pmml(rules)

Thanks,
Patrick McCann

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] ANOVA from imported data has only 1 degree of freedom

2011-10-10 Thread shardman
Thanks bbolker, that's really helpful. I'll look out for the book too, it
could be helpful!

Yours,
Sam

--
View this message in context: 
http://r.789695.n4.nabble.com/ANOVA-from-imported-data-has-only-1-degree-of-freedom-tp3887528p3891246.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Question about ggplot2 and stat_smooth

2011-10-10 Thread Dylan Beaudette
Hi Tom,

Just wanted to chime-in and let you know that the linked figures are really 
cool! Keep up the good work.

On an un-related note, any talk of future GRASS training sessions?

Cheers,
Dylan

On Tuesday, October 04, 2011, thomas.ad...@noaa.gov wrote:
 Hadley,
 
 Thanks for responding. No, not smoothed quantile regression. If you go here: 
http://www.erh.noaa.gov/mmefs/index.php and click on one of the colored 
squares, you can see we have 'boxplots'. What I want to express is the 
uncertainty as depicted in the example from my previous email where I can 
specify the limits calculated for the 'boxplots' using  5%, 25%,75%, 95% 
limits as we have with the 'boxplots'.
 
 Tom
 
 - Original Message -
 From: Hadley Wickham had...@rice.edu
 Date: Tuesday, October 4, 2011 10:23 am
 Subject: Re: [R] Question about ggplot2 and stat_smooth
 To: Thomas Adams thomas.ad...@noaa.gov
 Cc: R-help forum r-help@r-project.org
 
 
  On Mon, Oct 3, 2011 at 12:24 PM, Thomas Adams thomas.ad...@noaa.gov 
  wrote:
I'm interested in creating a graphic -like- this:
  
   c - ggplot(mtcars, aes(qsec, wt))
   c + geom_point() + stat_smooth(fill=blue, colour=darkblue, 
  size=2, alpha
   = 0.2)
  
   but I need to show 2 sets of bands (with different shading) using 
  5%, 25%,
   75%, 95% limits that I specify and where the heavy blue line is the 
  median.
   I don't understand how to do this with ggplot2.
  
  Exactly what sort of limits do you want?  It sounds like maybe you are
  looking for smoothed quantile regression.
  
  Hadley
  
  -- 
  Assistant Professor / Dobelman Family Junior Chair
  Department of Statistics / Rice University
 
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 


-- 
Dylan E. Beaudette
USDA-NRCS Soil Scientist
California Soil Resource Lab
http://casoilresource.lawr.ucdavis.edu/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] question about string to boor?

2011-10-10 Thread song_gpqg
Hello!
So I am handling this problem with some arrays grp1-grp7, I want to write a
loop to avoid tedious work, but I don't know how to transform string to
boor? 
For example I used 
i=1 
paste(grp,i, sep=)
I only got grp1 instead of grp1, which can't be manipulate using mean() or
other function. 

I am not sure if I make myself clear...
THANKS

Nellie

 

--
View this message in context: 
http://r.789695.n4.nabble.com/question-about-string-to-boor-tp3890983p3890983.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Dealing with missing data in ave() functions

2011-10-10 Thread evelin.comper
Dear all,

I would be grateful if you could help me in some way! Well, I have one
dataset already uploaded into R with some missing data. My dataset is made
of 19 columns, one of these is the sector (naf code). There are many rows
relating to the single enterprises. What I want to do is the mean of each
variable (column) by sector (naf code), so I type for example:

col_5 = data.frame(ave(x[,5], naf, na.rm = TRUE))

Col_5 is referred to the first variable of interest. But since there are
some missing data, it results in a series of NA all the column ahead (all
the rows are NA). How can I do to discard and not taking into account these
missing values??? How could I do to replace missing values with a simple
zero just for this analysis???

Thanks so much friends, bye!

--
View this message in context: 
http://r.789695.n4.nabble.com/Dealing-with-missing-data-in-ave-functions-tp3891001p3891001.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] correlation matrix

2011-10-10 Thread 1Rnwb
Hello Gurus
I have two correlation matrices 'xa' and 'xb'
set.seed(100)
d=cbind(x=rnorm(20)+1,
x1=rnorm(20)+1,
x2=rnorm(20)+1)

 
d1=cbind(x=rnorm(20)+2,
x1=rnorm(20)+2,
x2=rnorm(20)+2)

xa=cor(d,use='complete')  

xb=cor(d1,use='complete')



I want to combine these two to get a third matrix which should have half
values from 'xa' and half values from 'xb'
   x x1 x2
x  1.000  -0.15157123 -0.23085308
x1 0.3466155 1.  -0.01061675
x2 0.1234507 0.01775527 1.

I would like to generate a heatmap for correlation values in disease and non
disease phenotype

I would appreciate if someone can point me in correct direction.
Thanks
sharad

--
View this message in context: 
http://r.789695.n4.nabble.com/correlation-matrix-tp3891085p3891085.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] how to calculate the statistics of a yearly window with a rolling step as 1 day?

2011-10-10 Thread ecoc
Hope someone can help me here.

I have a daily time series, say

2003-02-01 2003-02-03 2003-02-07 2003-02-09 2003-02-14 .. 2004-02-01
2004-02-04
 0.4914798 -1.1857653 -1.6982844 -0.3559572 -0.2333087  ...
0.44553-0.45222

I need to calculate the statistics for the overlapping rolling yearly window
with rolling step as 1 day

so for each of the intervals: (2003-02-01 ~ 2004-02-01), (2003-02-03 ~
2004-02-04), 
i need to calculate some statistics.

Could you please help me out how to extract these intervals? Right now I am
using index. But since the dates doesn't match exactly, I have to do it
like:

a(index(a)=index(b)  index(a)=index(b)+365), which is very time-consuming
since it's a long time series.

Could someone help me? Really appreicate!!!
 



--
View this message in context: 
http://r.789695.n4.nabble.com/how-to-calculate-the-statistics-of-a-yearly-window-with-a-rolling-step-as-1-day-tp3891404p3891404.html
Sent from the R help mailing list archive at Nabble.com.
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] question about string to boor?

2011-10-10 Thread David Winsemius


On Oct 10, 2011, at 12:52 PM, song_gpqg wrote:


Hello!
So I am handling this problem with some arrays grp1-grp7, I want to  
write a
loop to avoid tedious work, but I don't know how to transform string  
to

boor?
For example I used
i=1
paste(grp,i, sep=)


?get

e.g.
get( paste(grp,i, sep=) )

I only got grp1 instead of grp1, which can't be manipulate using  
mean() or

other function.

I am not sure if I make myself clear...
THANKS

Nellie



--
View this message in context: 
http://r.789695.n4.nabble.com/question-about-string-to-boor-tp3890983p3890983.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] round() and negative digits

2011-10-10 Thread Michael Friendly

On 10/9/2011 6:18 AM, Prof Brian Ripley wrote:


Sometimes it is better not to document things than try to give precise
details which may get changed *and* there will be useRs who misread (and
maybe even file bug reports on their misreadings). The source is the
ultimate documentation.


I can't agree with this less.  The source does the computation. The 
documentation says how to use it and what it should do.  Corner cases

can be trapped in code or mentioned in Notes.  But the source is
only useful if you can easily find it and then can understand what it is
doing, particularly for a .Primitive like round().
The source is only the documentation of last resort.

-Michael

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] multicore by(), like mclapply?

2011-10-10 Thread Thomas Lumley
On Tue, Oct 11, 2011 at 7:54 AM, ivo welch ivo.we...@gmail.com wrote:
 hi josh---thx.  I had a different version of this, and discarded it
 because I think it was very slow.  the reason is that on each
 application, your version has to scan my (very long) data vector.  (I
 have many thousand different cases, too.)  I presume that by() has one
 scan through the vector that makes all splits.

 by.data.frame() is basically a wrapper for tapply(), and the key line
in tapply() is
   ans - lapply(split(X, group), FUN, ...)
which should be easy to adapt for mclapply.

-- 
Thomas Lumley
Professor of Biostatistics
University of Auckland

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to calculate the statistics of a yearly window with a rolling step as 1 day?

2011-10-10 Thread Gabor Grothendieck
On Mon, Oct 10, 2011 at 2:55 PM, ecoc liting...@gmail.com wrote:
 Hope someone can help me here.

 I have a daily time series, say

 2003-02-01 2003-02-03 2003-02-07 2003-02-09 2003-02-14 .. 2004-02-01
 2004-02-04
  0.4914798 -1.1857653 -1.6982844 -0.3559572 -0.2333087  ...
 0.44553    -0.45222

 I need to calculate the statistics for the overlapping rolling yearly window
 with rolling step as 1 day

 so for each of the intervals: (2003-02-01 ~ 2004-02-01), (2003-02-03 ~
 2004-02-04), 
 i need to calculate some statistics.

 Could you please help me out how to extract these intervals? Right now I am
 using index. But since the dates doesn't match exactly, I have to do it
 like:

 a(index(a)=index(b)  index(a)=index(b)+365), which is very time-consuming
 since it's a long time series.

 Could someone help me? Really appreicate!!!


Fill in the missing days with NAs using zoo FAQ 15 or otherwise and
then use rollapply(z, 365, f, ...whatever...)  such that your function
f first removes any NAs.


-- 
Statistics  Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] multicore by(), like mclapply?

2011-10-10 Thread Joshua Wiley
I could be waay off base here, but my concern about presplitting the data is 
that you will have your data, and a second copy of our data that is something 
like a list where each element contains the portion of the data for that split. 
 Good speed wise, bad memory wise.  My hope with the technique I showed (again 
I may not have accomplished it) was to only have at anyone time, the original 
data and a copy of the particular elements being worked with.  Of course  this 
is not an issue if you have plenty of memory.

On Oct 10, 2011, at 12:19, Thomas Lumley tlum...@uw.edu wrote:

 On Tue, Oct 11, 2011 at 7:54 AM, ivo welch ivo.we...@gmail.com wrote:
 hi josh---thx.  I had a different version of this, and discarded it
 because I think it was very slow.  the reason is that on each
 application, your version has to scan my (very long) data vector.  (I
 have many thousand different cases, too.)  I presume that by() has one
 scan through the vector that makes all splits.
 
 by.data.frame() is basically a wrapper for tapply(), and the key line
 in tapply() is
   ans - lapply(split(X, group), FUN, ...)
 which should be easy to adapt for mclapply.
 
 -- 
 Thomas Lumley
 Professor of Biostatistics
 University of Auckland

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] correlation matrix

2011-10-10 Thread Daniel Malter
What a pleasant post to respond to - with self-contained code. :)

heat-matrix(0,nrow=dim(xa)[1],ncol=dim(xa)[2])

heat[lower.tri(heat)]-xa[lower.tri(xa)]
heat[upper.tri(heat)]-xb[upper.tri(xb)]
diag(heat)-1

heat

HTH,
Daniel


1Rnwb wrote:
 
 Hello Gurus
 I have two correlation matrices 'xa' and 'xb'
 set.seed(100)
 d=cbind(x=rnorm(20)+1,
 x1=rnorm(20)+1,
 x2=rnorm(20)+1)
 
  
 d1=cbind(x=rnorm(20)+2,
 x1=rnorm(20)+2,
 x2=rnorm(20)+2)
 
 xa=cor(d,use='complete')  
 
 xb=cor(d1,use='complete')
 
 
 
 I want to combine these two to get a third matrix which should have half
 values from 'xa' and half values from 'xb'
x x1 x2
 x  1.000  -0.15157123 -0.23085308
 x1 0.3466155 1.  -0.01061675
 x2 0.1234507 0.01775527 1.
 
 I would like to generate a heatmap for correlation values in disease and
 non disease phenotype
 
 I would appreciate if someone can point me in correct direction.
 Thanks
 sharad
 

--
View this message in context: 
http://r.789695.n4.nabble.com/correlation-matrix-tp3891085p3891685.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] correlation matrix

2011-10-10 Thread Peter Alspach
Tena koe Sharad

If I understand you correctly, you want the lower triangle of your combined 
matrix to be the lower triangle of one of the correlation matrices, and the 
upper triangle to be the upper triangle from the other.  If so, check 
lower.tri() and upper.tri().

HTH 

Peter Alspach

 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-bounces@r-
 project.org] On Behalf Of 1Rnwb
 Sent: Tuesday, 11 October 2011 6:20 a.m.
 To: r-help@r-project.org
 Subject: [R] correlation matrix
 
 Hello Gurus
 I have two correlation matrices 'xa' and 'xb'
 set.seed(100)
 d=cbind(x=rnorm(20)+1,
 x1=rnorm(20)+1,
 x2=rnorm(20)+1)
 
 
 d1=cbind(x=rnorm(20)+2,
 x1=rnorm(20)+2,
 x2=rnorm(20)+2)
 
 xa=cor(d,use='complete')
 
 xb=cor(d1,use='complete')
 
 
 
 I want to combine these two to get a third matrix which should have
 half
 values from 'xa' and half values from 'xb'
x x1 x2
 x  1.000  -0.15157123 -0.23085308
 x1 0.3466155 1.  -0.01061675
 x2 0.1234507 0.01775527 1.
 
 I would like to generate a heatmap for correlation values in disease
 and non
 disease phenotype
 
 I would appreciate if someone can point me in correct direction.
 Thanks
 sharad
 
 --
 View this message in context:
 http://r.789695.n4.nabble.com/correlation-matrix-tp3891085p3891085.html
 Sent from the R help mailing list archive at Nabble.com.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-
 guide.html
 and provide commented, minimal, self-contained, reproducible code.

The contents of this e-mail are confidential and may be subject to legal 
privilege.
 If you are not the intended recipient you must not use, disseminate, 
distribute or
 reproduce all or any part of this e-mail or attachments.  If you have received 
this
 e-mail in error, please notify the sender and delete all material pertaining 
to this
 e-mail.  Any opinion or views expressed in this e-mail are those of the 
individual
 sender and may not represent those of The New Zealand Institute for Plant and
 Food Research Limited.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] multicore by(), like mclapply?

2011-10-10 Thread Thomas Lumley
This is the sort of thing that should be measured, rather than
speculated about, but if you're using multicore all those subsets can
be made at the same time, not sequentially, so they add up to a copy
of the whole data.   Using data.table rather than a data.frame would
help, of course.

I would guess that splitting, garbage collecting, and then forking
would be most efficient -- reducing the chance that all the separate
processes end up separately garbage collecting the results of the
split.

It's a pity that forking messes up the profilers; makes it harder to
measure these things.

-thomas


On Tue, Oct 11, 2011 at 9:14 AM, Joshua Wiley jwiley.ps...@gmail.com wrote:
 I could be waay off base here, but my concern about presplitting the data is 
 that you will have your data, and a second copy of our data that is something 
 like a list where each element contains the portion of the data for that 
 split.  Good speed wise, bad memory wise.  My hope with the technique I 
 showed (again I may not have accomplished it) was to only have at anyone 
 time, the original data and a copy of the particular elements being worked 
 with.  Of course  this is not an issue if you have plenty of memory.

 On Oct 10, 2011, at 12:19, Thomas Lumley tlum...@uw.edu wrote:

 On Tue, Oct 11, 2011 at 7:54 AM, ivo welch ivo.we...@gmail.com wrote:
 hi josh---thx.  I had a different version of this, and discarded it
 because I think it was very slow.  the reason is that on each
 application, your version has to scan my (very long) data vector.  (I
 have many thousand different cases, too.)  I presume that by() has one
 scan through the vector that makes all splits.

 by.data.frame() is basically a wrapper for tapply(), and the key line
 in tapply() is
   ans - lapply(split(X, group), FUN, ...)
 which should be easy to adapt for mclapply.

 --
 Thomas Lumley
 Professor of Biostatistics
 University of Auckland




-- 
Thomas Lumley
Professor of Biostatistics
University of Auckland

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] correlation matrix

2011-10-10 Thread Dénes TÓTH

And you might also consider packages like corrplot, corrgram etc. for
other plotting options of a correlation matrix.
They can be more informative than simply invoking image(heat)



 What a pleasant post to respond to - with self-contained code. :)

 heat-matrix(0,nrow=dim(xa)[1],ncol=dim(xa)[2])

 heat[lower.tri(heat)]-xa[lower.tri(xa)]
 heat[upper.tri(heat)]-xb[upper.tri(xb)]
 diag(heat)-1

 heat

 HTH,
 Daniel


 1Rnwb wrote:

 Hello Gurus
 I have two correlation matrices 'xa' and 'xb'
 set.seed(100)
 d=cbind(x=rnorm(20)+1,
 x1=rnorm(20)+1,
 x2=rnorm(20)+1)


 d1=cbind(x=rnorm(20)+2,
 x1=rnorm(20)+2,
 x2=rnorm(20)+2)

 xa=cor(d,use='complete')

 xb=cor(d1,use='complete')



 I want to combine these two to get a third matrix which should have half
 values from 'xa' and half values from 'xb'
x x1 x2
 x  1.000  -0.15157123 -0.23085308
 x1 0.3466155 1.  -0.01061675
 x2 0.1234507 0.01775527 1.

 I would like to generate a heatmap for correlation values in disease and
 non disease phenotype

 I would appreciate if someone can point me in correct direction.
 Thanks
 sharad


 --
 View this message in context:
 http://r.789695.n4.nabble.com/correlation-matrix-tp3891085p3891685.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] How to test if two C statistics are significantly different?

2011-10-10 Thread Yujie Wang
Hey all,

In order to test if a marker is a risk factor, I built two models (using cox
proportional hazard model). One model included this marker, and the other is
not.

Then, I use R package risksetROC to test how much predictive value did the
marker add to this model. I get two C statistics by analyzing the linear
predictors of the two models into this package.

The qustion is How to test if two C statistics are significantly different?

Your help will be greatly appreciated!

Yujie

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] correlation matrix

2011-10-10 Thread 1Rnwb
okay so fixed what i need to do this way

finit=0
for(ri in 1:dim(xa)[1])
{
finit=finit+1
xc[ri,1:finit]-xa[ri,1:finit]
xc[1:finit,ri]-xb[1:finit,ri]
}

but getting error in heatmap.2

 mycol - colorpanel(n=40,low=red,mid=white,high=blue)
 heatmap.2(xc, breaks=pairs.breaks, col=mycol, Rowv=FALSE, symm=TRUE,
 key=TRUE, symkey=FALSE, density.info=none, trace=none, cexRow=0.5, 
+  scale = none, dendrogram=none)
Error in heatmap.2(xc, breaks = pairs.breaks, col = mycol, Rowv = FALSE,  : 
  `x' must be a numeric matrix
any pointers are appreciated

--
View this message in context: 
http://r.789695.n4.nabble.com/correlation-matrix-tp3891085p3891584.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] question about string to boor?

2011-10-10 Thread Christoph Molnar
Hi Nellie,

hope I got you right. I guess you want something like that.

for (i in 1:7) {
   oneOfNelliesArray - eval(parse(text=paste(grp,i, sep=)))
   anyFunction(oneOfNelliesArray)
}

Paste() just returns you a string. But you want R to evaluate the
expression. So you have to parse it and tell R to evaluate it.

Christoph

2011/10/10 song_gpqg song_g...@126.com

 Hello!
 So I am handling this problem with some arrays grp1-grp7, I want to write a
 loop to avoid tedious work, but I don't know how to transform string to
 boor?
 For example I used
 i=1
 paste(grp,i, sep=)
 I only got grp1 instead of grp1, which can't be manipulate using mean()
 or
 other function.

 I am not sure if I make myself clear...
 THANKS

 Nellie



 --
 View this message in context:
 http://r.789695.n4.nabble.com/question-about-string-to-boor-tp3890983p3890983.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] invalid or not-yet-implemented 'Matrix' subsetting

2011-10-10 Thread collegegurl69
I have this error and I can't figure out whats wrong:

invalid or not-yet-implemented 'Matrix' subsetting


it pops up when I try to run this line of code: 

 S - B[indices.mod,union(mir.e.nc,mir.negatives.nc)]


--
View this message in context: 
http://r.789695.n4.nabble.com/invalid-or-not-yet-implemented-Matrix-subsetting-tp3891550p3891550.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Linear programming problem, RGPLK - no feasible solution.

2011-10-10 Thread Hans W Borchers
Liu Evans, Gareth Gareth.Liu-Evans at liverpool.ac.uk writes:

 In my post at https://stat.ethz.ch/pipermail/r-help/2011-October/292019.html
 I included an undefined term ej.  The problem code should be as follows.
 It seems like a simple linear programming problem, but for some reason my
 code is not finding the solution.
 
 obj - c(rep(0,3),1)
 
 col1 -c(1,0,0,1,0,0,1,-2.330078923,0)
 col2 -c(0,1,0,0,1,0,1,-2.057855981,0)
 col3 -c(0,0,1,0,0,1,1,-1.885177032,0)
 col4 -c(-1,-1,-1,1,1,1,0,0,1)
 
 mat - cbind(col1, col2, col3, col4)
 
 dir - c(rep(=, 3), rep(=, 3), rep(==, 2), =)
 
 rhs - c(rep(0, 7), 1, 0)
 
 sol - Rglpk_solve_LP(obj, mat, dir, rhs, types = NULL, max = FALSE,
 bounds = c(-100,100), verbose = TRUE)
 
 The R output says there is no feasible solution, but e.g.
 (-2.3756786,  0.3297676,  2.0459110, 2.3756786) is feasible.
 
 The output is
 
 GLPK Simplex Optimizer, v4.42
 9 rows, 4 columns, 19 non-zeros
   0: obj =  0.0e+000  infeas = 1.000e+000 (2)
 PROBLEM HAS NO FEASIBLE SOLUTION

Please have a closer look at the help page ?Rglpk_solve_LP. The way to
define the bounds is a bit clumsy, but then it works:

sol - Rglpk_solve_LP(obj, mat, dir, rhs, types = NULL, max = FALSE,
bounds = list(lower=list(ind=1:4, val=rep(-100,4)),
  upper=list(ind=1:4, val=rep(100,4))),
verbose=TRUE)

GLPK Simplex Optimizer, v4.42
9 rows, 4 columns, 19 non-zeros
  0: obj =  -1.0e+02  infeas =  1.626e+03 (2)
*10: obj =   1.0e+02  infeas =  0.000e+00 (0)
*13: obj =   2.247686558e+00  infeas =  0.000e+00 (0)
OPTIMAL SOLUTION FOUND

 sol
$optimum
[1] 2.247687
$solution
[1] -2.247687e+00 -6.446292e-31  2.247687e+00  2.247687e+00

 One other thing, a possible bug - if I run this code with dir shorter than
 it should be, R crashes.  My version of R is 2.131.56322.0, and I'm running
 it on Windows 7.  

If you can reproduce that R crashes -- which it shall never do -- inform the
maintainer of this package. On Mac it doesn't crash, it goes into an infinite
loop with Execution aborted.Error detected in file glplib03.c at line 83.

Regards, Hans Werner

 Regards,
 Gareth

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] multicore by(), like mclapply?

2011-10-10 Thread Hadley Wickham
On Mon, Oct 10, 2011 at 4:14 PM, Joshua Wiley jwiley.ps...@gmail.com wrote:
 I could be waay off base here, but my concern about presplitting the data is 
 that you will have your data, and a second copy of our data that is something 
 like a list where each element contains the portion of the data for that 
 split.  Good speed wise, bad memory wise.  My hope with the technique I 
 showed (again I may not have accomplished it) was to only have at anyone 
 time, the original data and a copy of the particular elements being worked 
 with.  Of course  this is not an issue if you have plenty of memory.

That's exactly what plyr does behind the scenes.

Hadley

-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] question about string to boor?

2011-10-10 Thread Rolf Turner

On 11/10/11 08:21, Christoph Molnar wrote:

Hi Nellie,

hope I got you right. I guess you want something like that.

for (i in 1:7) {
oneOfNelliesArray- eval(parse(text=paste(grp,i, sep=)))
anyFunction(oneOfNelliesArray)
}

Paste() just returns you a string. But you want R to evaluate the
expression. So you have to parse it and tell R to evaluate it.


But using get() is so much simpler and safer.

cheers,

Rolf Turner

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Fwd: WHO Anthro growth curve macros and R

2011-10-10 Thread Gustaf Rydevik
Hi all,
some years ago, I sent a question to the mailing list regarding the WHO
anthro macros. Since I've now received three mails asking how I solved it, I
thought I'd cc R-help in for future reference. Attaching a zip file
with  the relevant code parts that
I used that I'm not sure gets through (if anyone has recommendations on how
to manage such files for the list, I'd be grateful.
  What I ended up doing was importing the data in SPSS format, and
 adapting the Splus function igrowup.standard slightly.
igrowup.standard2.R is the adapted function, while the ssc files are
original splus functions. Let me know if anyone gets problems in figuring
out how to use the files.

best regards,
Gustaf


-- 
Gustaf Rydevik, M.Sci.
tel: +44(0)704 253 760 42
address:St John's hill 18/5  EH8 9UQ Edinburgh, UK
skype:gustaf_rydevik
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Scheirer-Ray-Hare

2011-10-10 Thread bhvonkorff
I have been trying to use this test recently, following the text from this
link: 
http://books.google.com/books?id=1eTyuMDND94Cpg=PA145lpg=PA145dq=nonparam#v=onepageqf=false

I ordered my data based on ranks, and ran a type III ANOVA from the car
package - something like
Anova(lm(var1~var2*var3,contrasts=list(var2='contr.sum',var3='contr.sum')),type='III').
Then I calculate SStotal (sum of the SS for the factors, interaction, and
residual). I calculate MStotal, which is the SStotal divided by the degrees
of freedom (add up the DF for factors, interaction, and residual). Then,
calculate SS/MStotal for each factor and combination of factors. The p value
for each is calculated as follows: 1-pchisq(the SS/MStotal, the degrees of
freedom). I'm pretty sure there is an error in the text, as the first
example they give calculates the SS as 1496, which includes the
intercept-SS, and their math doesn't work out then (1496/16 is not = to 22).
The second example makes more sense, and they don't include the
intercept-SS. Anyhow, this seems like a useful test, but I think it should
be used with caution. Hopefully, this helps, and if I'm doing something
wrong here, that would be great to know (:


--
View this message in context: 
http://r.789695.n4.nabble.com/Scheirer-Ray-Hare-tp3818476p3891860.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Text Mining with Facebook Reviews (XML and FQL)

2011-10-10 Thread Kenneth Zhang
Hello,

I am trying to use XML package to download Facebook reviews in the following
way:

require(XML)
mydata.vectors - character(0)
Qword - URLencode('#IBM')
QUERY - paste('SELECT review_id, message, rating from review where message
LIKE %',Qword,'%',sep='')
Facebook_url =  paste('https://api.facebook.com/method/fql.query?query=
',QUERY,sep='')
mydata.xml - xmlParseDoc(Facebook_url, asText=F)
mydata.vector - xpathSApply(mydata.xml, '//s:entry/s:title', xmlValue,
namespaces =c('s'='http://www.w3.org/2005/Atom'))

The mydata.xml is NULL therefore no further step can be execute. I am not so
familiar with XML or FQL. Any suggestion will be appreciated. Thank you!

Best regards,
Kenneth

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Fwd: WHO Anthro growth curve macros and R

2011-10-10 Thread David Winsemius


On Oct 10, 2011, at 4:48 PM, Gustaf Rydevik wrote:


Hi all,
some years ago, I sent a question to the mailing list regarding the  
WHO
anthro macros. Since I've now received three mails asking how I  
solved it, I

thought I'd cc R-help in for future reference. Attaching a zip file
with  the relevant code parts that
I used that I'm not sure gets through (if anyone has recommendations  
on how

to manage such files for the list, I'd be grateful.
 What I ended up doing was importing the data in SPSS format, and
adapting the Splus function igrowup.standard slightly.
igrowup.standard2.R is the adapted function, while the ssc files are
original splus functions. Let me know if anyone gets problems in  
figuring

out how to use the files.



The only files that reach the readership are .pdf and .txt files. I do  
not know how carefully these get inspected, so it is possible that a  
zip file named something.txt might make it through.




best regards,
Gustaf


\
David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] SLOW split() function

2011-10-10 Thread ivo welch
dear R experts:  apologies for all my speed and memory questions.  I
have a bet with my coauthors that I can make R reasonably efficient
through R-appropriate programming techniques.  this is not just for
kicks, but for work.  for benchmarking, my [3 year old] Mac Pro has
2.8GHz Xeons, 16GB of RAM, and R 2.13.1.

right now, it seems that 'split()' is why I am losing my bet.  (split
is an integral component of *apply() and by(), so I need split() to be
fast.  its resulting list can then be fed, e.g., to mclapply().)  I
made up an example to illustrate my ills:

library(data.table)
N - 1000
T - N*10
d - data.table(data.frame( key= rep(1:T, rep(N,T)), val=rnorm(N*T) ))
setkey(d, key); gc() ## force a garbage collection
cat(N=, N, .  Size of d=, object.size(d)/1024/1024, MB\n)
print(system.time( s-split(d, d$key) ))

My ordered input data table (or data frame; doesn't make a difference)
is 114MB in size.  it takes about a second to create.  split() only
needs to reshape it.  this simple operation takes almost 5 minutes on
my computer.

with a data set that is larger, this explodes further.

am I doing something wrong?  is there an alternative to split()?

sincerely,

/iaw


Ivo Welch (ivo.we...@gmail.com)

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Rolling optimization

2011-10-10 Thread Darius H





Hello everyone,

I would like assistance with a snippet I have written to do a recursive 
portfolio optimization given time-varying return forecasts. 

In my case, I have forecast the monthly returns for nearly 55 years out on 8 
asset classes.
I need to calculate the weights for the optimal (tangency) portfolio based on 
my monthly forecasts and an arbitrary covariance matrix 
Getting these weights have proven difficult.

# these are forecast (out of sample) returns; each is a 648x1 matrix
cash_forecast2=as.ts(cash_forecast)

larg_forecast2=as.ts(larg_forecast)

valu_forecast2=as.ts(valu_forecast)

grow_forecast2=as.ts(grow_forecast)

smal_forecast2=as.ts(smal_forecast)

tres_forecast2=as.ts(tres_forecast)

cred_forecast2=as.ts(cred_forecast)

comm_forecast2=as.ts(comm_forecast)



# make a matrix of all expected returns
# each line corresponds to forecast monthly returns for each asset class; this 
is a 648x8 matrix

asset_forecast=ts.intersect(cash_forecast2, larg_forecast2,valu_forecast2, 

grow_forecast2, smal_forecast2, tres_forecast2, cred_forecast2, comm_forecast2)



# make a covariance matrix based on the entire data

actual_ret=cbind(cash_ret, 
larg_ret,valu_ret,grow_ret,smal_ret,tres_ret,cred_ret,comm_ret)


cov_matrix=cov(actual_ret)



opt_port = ts(matrix(,nrow=648,ncol=8))


for (i in 1:648) opt_port[i,]= portfolio.optim(asset_forecast[i,], 

riskless=FALSE, shorts=TRUE, covmat = cov_matrix,

by.column = FALSE, by=1, align=right)


I get the following error message; Error in 
portfolio.optim.default(asset_forecast[i, ], shorts = TRUE, covmat = 
cov_matrix,  :  x is not a matrix

So clearly, asset_forecast[i,] is not a matrix. So I need another method to do 
this. Can anyone suggest a solution that would allow my to set sail in the 
right direction?

Many thanks,
Bond, Jamessssorry that's my screen name... Darius :)


  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Rolling optimization

2011-10-10 Thread R. Michael Weylandt
Not having played with portfolio.opim() much, I can't guarantee this
will fix it, but if it requires a matrix rather than a vector and you
are sure about the rest of the syntax, this might do the trick:

asset_forecast[i, , drop = FALSE]

This is because:

R x = matrix(1:9, 3)
R is.matrix(x[,1])
FALSE
R is.matrix(x[,1,drop=FALSE])
TRUE

Michael

On Mon, Oct 10, 2011 at 9:33 PM, Darius H xeno...@hotmail.com wrote:





 Hello everyone,

 I would like assistance with a snippet I have written to do a recursive 
 portfolio optimization given time-varying return forecasts.

 In my case, I have forecast the monthly returns for nearly 55 years out on 8 
 asset classes.
 I need to calculate the weights for the optimal (tangency) portfolio based on 
 my monthly forecasts and an arbitrary covariance matrix
 Getting these weights have proven difficult.

 # these are forecast (out of sample) returns; each is a 648x1 matrix
 cash_forecast2=as.ts(cash_forecast)

 larg_forecast2=as.ts(larg_forecast)

 valu_forecast2=as.ts(valu_forecast)

 grow_forecast2=as.ts(grow_forecast)

 smal_forecast2=as.ts(smal_forecast)

 tres_forecast2=as.ts(tres_forecast)

 cred_forecast2=as.ts(cred_forecast)

 comm_forecast2=as.ts(comm_forecast)



 # make a matrix of all expected returns
 # each line corresponds to forecast monthly returns for each asset class; 
 this is a 648x8 matrix

 asset_forecast=ts.intersect(cash_forecast2, larg_forecast2,valu_forecast2,

 grow_forecast2, smal_forecast2, tres_forecast2, cred_forecast2, 
 comm_forecast2)



 # make a covariance matrix based on the entire data

 actual_ret=cbind(cash_ret, 
 larg_ret,valu_ret,grow_ret,smal_ret,tres_ret,cred_ret,comm_ret)


 cov_matrix=cov(actual_ret)



 opt_port = ts(matrix(,nrow=648,ncol=8))


 for (i in 1:648) opt_port[i,]= portfolio.optim(asset_forecast[i,],

 riskless=FALSE, shorts=TRUE, covmat = cov_matrix,

 by.column = FALSE, by=1, align=right)


 I get the following error message; Error in 
 portfolio.optim.default(asset_forecast[i, ], shorts = TRUE, covmat = 
 cov_matrix,  :  x is not a matrix

 So clearly, asset_forecast[i,] is not a matrix. So I need another method to 
 do this. Can anyone suggest a solution that would allow my to set sail in the 
 right direction?

 Many thanks,
 Bond, Jamessssorry that's my screen name... Darius :)


        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] SLOW split() function

2011-10-10 Thread Jim Holtman
instead of spliting the entire dataframe, split the indices and then use these 
to access your data: try 

system.time(s - split(seq(nrow(d)), d$key))

this should be faster and less memory intensive.  you can then use the indices 
to access the subset:

result - lapply(s, function(.indx){
doSomething - sum(d$someCol[.indx])
})

Sent from my iPad

On Oct 10, 2011, at 21:01, ivo welch ivo.we...@gmail.com wrote:

 dear R experts:  apologies for all my speed and memory questions.  I
 have a bet with my coauthors that I can make R reasonably efficient
 through R-appropriate programming techniques.  this is not just for
 kicks, but for work.  for benchmarking, my [3 year old] Mac Pro has
 2.8GHz Xeons, 16GB of RAM, and R 2.13.1.
 
 right now, it seems that 'split()' is why I am losing my bet.  (split
 is an integral component of *apply() and by(), so I need split() to be
 fast.  its resulting list can then be fed, e.g., to mclapply().)  I
 made up an example to illustrate my ills:
 
library(data.table)
N - 1000
T - N*10
d - data.table(data.frame( key= rep(1:T, rep(N,T)), val=rnorm(N*T) ))
setkey(d, key); gc() ## force a garbage collection
cat(N=, N, .  Size of d=, object.size(d)/1024/1024, MB\n)
print(system.time( s-split(d, d$key) ))
 
 My ordered input data table (or data frame; doesn't make a difference)
 is 114MB in size.  it takes about a second to create.  split() only
 needs to reshape it.  this simple operation takes almost 5 minutes on
 my computer.
 
 with a data set that is larger, this explodes further.
 
 am I doing something wrong?  is there an alternative to split()?
 
 sincerely,
 
 /iaw
 
 
 Ivo Welch (ivo.we...@gmail.com)
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


  1   2   >