Re: [R] Rstudio Error in View : 'wildcard' is missing

2013-04-29 Thread Jeff Newmiller
a) RStudio has its own support forum on its website. If your problem only 
happens in RStudio, then your question belongs there. If not, demonstrate the 
sequence of steps it takes to obtain your error using plain R and re-post.

b) This kind of thing can happen when you corrupt your workspace. Beware of 
auto saving your workspace... instead, build scripts that analyze your data 
from raw input to in-memory analysis results.
---
Jeff NewmillerThe .   .  Go Live...
DCN:jdnew...@dcn.davis.ca.usBasics: ##.#.   ##.#.  Live Go...
  Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
/Software/Embedded Controllers)   .OO#.   .OO#.  rocks...1k
--- 
Sent from my phone. Please excuse my brevity.

Ellen Sebastian elle...@stanford.edu wrote:

Hello,

Whenever I try to view anything (matrix, data frame, etc) using View()
in
RStudio, I get the error:
Error in View : 'wildcard' is missing.

Google hasn't returned any relevant help...

Does anyone have an idea as to how I can fix this??

Thanks!!

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] quesion about model g1

2013-04-29 Thread Achim Zeileis

On Mon, 29 Apr 2013, meng wrote:


Hello Achim:
Sorry for another question about the model g1 in the last mail.

As to model g2 and g3:
g2 - glm(Freq ~ (age + drug + case)^2, data = df, family = poisson)
g3 - glm(Freq ~ age * drug * case, data = df, family = poisson)
anova(g2, g3, test = Chisq)

I know clearly that the only difference between g2 and g3 is that g2 has 
no 3-way interaction while g3 has,and anova tests whether this only 
difference(i.e.  3-way interaction) is significant or not.


But as to g1 and g3:
g1 - glm(Freq ~ age/(drug + case), data = df, family = poisson)

I can't find out the only difference between g1 and g3,so I don't know 
what anova(g1, g3, test = Chisq) tests for. Also, what / sign 
following age in g1 refers to?


The / could be replaced by a * here and the fitted values and 
corresponding log-likelihood would not change. Only the coefficients 
change: / induces a nested coding while * employs the interaction 
coding.


Breaking everything down to main and interaction effects (and ignoring the 
particular coding of the coefficients), the three models are


g1: a + d + c + a:d + a:c
g2: a + d + c + a:d + a:c + d:c
g3: a + d + c + a:d + a:c + d:c + a:d:c

with interpretations:

g1: conditional independence of drug and case given age
g2: no three-way interaction (case depends on drug but in the same way for 
different levels of age)

g3: saturated model



Many thanks and sorry for many quesions.


Best














At 2013-04-24 22:22:55,Achim Zeileis achim.zeil...@uibk.ac.at wrote:

On Wed, 24 Apr 2013, meng wrote:


Hi,Achim:
Can all the analysis you mentioned via loglm be performed via
glm(...,family=poisson)?


Yes.

## transform table back to data.frame
df - as.data.frame(tab)

## fit models: conditional independence, no-three way interaction,
## and saturated
g1 - glm(Freq ~ age/(drug + case), data = df, family = poisson)
g2 - glm(Freq ~ (age + drug + case)^2, data = df, family = poisson)
g3 - glm(Freq ~ age * drug * case, data = df, family = poisson)

## likelihood-ratio tests (against saturated)
anova(g1, g3, test = Chisq)
anova(g2, g3, test = Chisq)

## compare fitted frequencies (which are essentially equal)
all.equal(as.numeric(fitted(g1)),
  as.data.frame(as.table(fitted(m1)))$Freq)
all.equal(as.numeric(fitted(g2)),
  as.data.frame(as.table(fitted(m2)))$Freq)

The difference is mainly that loglm() has a specialized user interface and
that it uses a different optimizer (iterative proportional fitting rather
than iterative reweighted least squares).

Best,
Z


Many thanks.







At 2013-04-24 19:37:10,Achim Zeileis achim.zeil...@uibk.ac.at wrote:

On Wed, 24 Apr 2013, meng wrote:


Hi all:
For stratified count data,how to perform regression analysis?

My data:
age case oc count
1   1 121
1   1 226
1   2 117
1   2 259
2   1 118
2   1 288
2   2 1 7
2   2 2   95

age:
1:40y
2:40y

case:
1:patient
2:health

oc:
1:use drug
2:not use drug

My purpose:
Anaysis whether case and oc are correlated, and age is a stratified varia

ble.


My solution:
1,Mantel-Haenszel test by using function mantelhaen.test
2,loglinear regression by using function glm(count~case*oc,family=poisson

).But I don't know how to handle variable age,which is the stratified vari
able.


Instead of using glm(family = poisson) it is also convenient to use
loglm() from package MASS for the associated convenience table.

The code below shows how to set up the contingency table, visualize it
using package vcd, and then fit two models using loglm. The models
considered are conditional independence of case and drug given age and the
no three-way interaction already suggested by Peter. Both models are
also accompanied by visualizations of the residuals.

## contingency table with nice labels
tab - expand.grid(drug = 1:2, case = 1:2, age = 1:2)
tab$count - c(21, 26, 17, 59, 18, 88, 7, 95)
tab$age - factor(tab$age, levels = 1:2, labels = c(40, 40))
tab$case - factor(tab$case, levels = 1:2, labels = c(patient,
healthy))
tab$drug - factor(tab$drug, levels = 1:2, labels = c(yes, no))
tab - xtabs(count ~ age + drug + case, data = tab)

## visualize case explained by drug given age
library(vcd)
mosaic(case ~ drug | age, data = tab,
  split_vertical = c(TRUE, TRUE, FALSE))

## test wheter drug and case are independent given age
m1 - loglm(~ age/(drug + case), data = tab)
m1

## visualize corresponding residuals from independence model
mosaic(case ~ drug | age, data = tab,
  split_vertical = c(TRUE, TRUE, FALSE),
  residuals_type = deviance,
  gp = shading_hcl, gp_args = list(interpolate = 1.2))
mosaic(case ~ drug | age, data = tab,
  split_vertical = c(TRUE, TRUE, FALSE),
  residuals_type = pearson,
  gp = shading_hcl, gp_args = list(interpolate = 1.2))

## test whether there is no three-way interaction
## (i.e., dependence of case on drug is the same for both age groups)
m2 - loglm(~ (age + drug + case)^2, data = tab)
m2

## 

[R] Comparing two different 'survival' events for the same subject using survdiff?

2013-04-29 Thread Polwart Calum (COUNTY DURHAM AND DARLINGTON NHS FOUNDATION TRUST)
I have a dataset which for the sake of simplicity has two endpoints.  We would 
like to test if two different end-points have the same eventual meaning.  To 
try and take an example that people might understand better:

Lets assume we had a group of subjects who all received a treatment.  The could 
stop treatment for any reason (side effects, treatment stops working etc).  
Getting that data is very easy.  Measuring if treatment stops working is very 
hard to capture... so we would like to test if duration on treatment (easy) is 
the same as time to treatment failure (hard).

My data might look like this:

A = c(9.77,  0.43,  0.03,  3.50,  7.07,  6.57,  8.57,  2.30,  6.17,  3.27,  
2.57,  0.77)
B = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1)  # 1 = yes (censored)
C = c( 9.80,  0.43,  5.93,  8.43,  6.80,  2.60,  8.93,  8.37, 12.23,  5.83, 
13.17,  0.77)
D = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1) # 1 = yes (censored)
myData = data.frame (TimeOnTx = A, StillOnTx = B, TimeToFailure = C, NotFailed 
= D)

We can do a survival analysis on those individually:
OnTxFit = survfit (Surv ( TimeOnTx, StillOnTx==0 ) ~ 1 , data = myData)

FailedFit = survfit (Surv ( TimeToFailure , NotFailed==0 ) ~ 1 , data = myData)

plot(OnTxFit)
lines(OnTxFit)

But how can I do a survdiff type of comparison between the two?  Do I have to 
restructure the data so that Time's are all in one column, Event in another and 
then a Group to indicate what type of event it is?  Seems a complex way to do 
it (especially as the dataset is of course more complex than I've just 
shown)... so I thought maybe I'm missing something...





This message may contain confidential information. If yo...{{dropped:19}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Counting number of consecutive occurrences per rows

2013-04-29 Thread zuzana zajkova
Hi,

I would appreciate if somebody could help me with following calculation.
I have a dataframe, by 10 minutes time, for mostly one year data. This is
small example:

 dput(test)
structure(list(jul = structure(c(14655, 14655, 14655, 14655,
14655, 14655, 14655, 14655, 14655, 14655, 14655, 14655, 14655,
14655, 14655, 14655), origin = structure(0, class = Date)),
time = structure(c(1266258354, 1266258954, 1266259554, 1266260154,
1266260754, 1266261354, 1266261954, 1266262554, 1266263154,
1266263754, 1266264354, 1266264954, 1266265554, 1266266154,
1266266754, 1266267354), class = c(POSIXct, POSIXt), tzone =
GMT),
act = c(130, 23, 45, 200, 200, 200, 199, 150, 0, 0, 0, 0,
34, 200, 200, 145), day = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
0, 0, 0, 0, 0, 0)), .Names = c(jul, time, act, day
), class = data.frame, row.names = c(510L, 512L, 514L, 516L,
518L, 520L, 522L, 524L, 526L, 528L, 530L, 532L, 534L, 536L, 538L,
540L))

Looks like this:

 test
 jultime act day
510 14655 2010-02-15 18:25:54 130   1
512 14655 2010-02-15 18:35:54  23   1
514 14655 2010-02-15 18:45:54  45   1
516 14655 2010-02-15 18:55:54 200   1
518 14655 2010-02-15 19:05:54 200   1
520 14655 2010-02-15 19:15:54 200   1
522 14655 2010-02-15 19:25:54 199   1
524 14655 2010-02-15 19:35:54 150   1
526 14655 2010-02-15 19:45:54   0   1
528 14655 2010-02-15 19:55:54   0   1
530 14655 2010-02-15 20:05:54   0   0
532 14655 2010-02-15 20:15:54   0   0
534 14655 2010-02-15 20:25:54  34   0
536 14655 2010-02-15 20:35:54 200   0
538 14655 2010-02-15 20:45:54 200   0
540 14655 2010-02-15 20:55:54 145   0


What I would like to calculate is the number of consecutive occurrences of
values 200,  0 and together values from 1 til 199 (in fact the values that
differ from 200 and 0) in column act.

I would like to get something like this (result$res)

 result
  jultime act day res res2
510 14655 2010-02-15 18:25:54 130   1   33
512 14655 2010-02-15 18:35:54  23   1   33
514 14655 2010-02-15 18:45:54  45   1   33
516 14655 2010-02-15 18:55:54 200   1   33
518 14655 2010-02-15 19:05:54 200   1   33
520 14655 2010-02-15 19:15:54 200   1   33
522 14655 2010-02-15 19:25:54 199   1   22
524 14655 2010-02-15 19:35:54 150   1   22
526 14655 2010-02-15 19:45:54   0   1   42
528 14655 2010-02-15 19:55:54   0   1   42
530 14655 2010-02-15 20:05:54   0   0   42
532 14655 2010-02-15 20:15:54   0   0   42
534 14655 2010-02-15 20:25:54  34   0   11
536 14655 2010-02-15 20:35:54 200   0   22
538 14655 2010-02-15 20:45:54 200   0   22
540 14655 2010-02-15 20:55:54 145   0   11

And if possible, distinguish among day==1 and day==0 (see the act values
of 0 for example), results as in result$res2.

After it I would like to make a resume table per days (jul):
where maxres is max(result$res) for the act value
where minres is min(result$res) for the act value
where sumres is sum(result$res) for the act value (for example, if the
200 value ocurrs in different times per day(jul) consecutively 3, 5, 1, 6
and 7 times the sumres would be 3+5+1+6+7= 22)

something like this (this are made up numbers):

julact maxres  minres sumres
146550  4   1   25
14655 200 32  48
146551-199   3171
146560   8238
14656 200 15360
146561-199   114 46
...
(theoretically the sum of sumres per day(jul) should be 144)


 sessionInfo()
R version 2.15.2 (2012-10-26)
Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)


I hope my explanation is sufficient. I appreciate any hint.
Thank you,

Zuzana

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] How to call an object given a string?

2013-04-29 Thread Rui Esteves
Hello,

This is very basic and very frustrating.

Suppose this:
A=5
B=5
C=10

 ls()
A
B
C

I would like this
xpto()
5
5
10

How can I do xpto()?

Thanks
Rui

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Convert continuous variable into discrete variable

2013-04-29 Thread Frank Harrell
It is important to check for lack of fit of the categorized variable.  One
way to do this is to test for the additional predictive ability of the
original continuous variable after adjusting for its categorized version. 
It is very uncommon for a categorized continuous variable to fit well,
because its assumed discontinuities seldom exist in nature and most
relationships are not piecewise flat.
Frank

levanovd wrote
 Or even simpler (no need to specify labels):
 
 x-runif(100,0,100) 
 u - cut(x, breaks = c(0, 3, 4.5, 6, 8, Inf), labels = FALSE)





-
Frank Harrell
Department of Biostatistics, Vanderbilt University
--
View this message in context: 
http://r.789695.n4.nabble.com/Convert-continuous-variable-into-discrete-variable-tp3671032p4665699.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to call an object given a string?

2013-04-29 Thread Eik Vettorazzi
Hi Rui,
how about this

sapply(ls(),get)

cheers

Am 29.04.2013 13:07, schrieb Rui Esteves:
 Hello,
 
 This is very basic and very frustrating.
 
 Suppose this:
 A=5
 B=5
 C=10
 
 ls()
 A
 B
 C
 
 I would like this
 xpto()
 5
 5
 10
 
 How can I do xpto()?
 
 Thanks
 Rui
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 


-- 
Eik Vettorazzi
Institut für Medizinische Biometrie und Epidemiologie
Universitätsklinikum Hamburg-Eppendorf

Martinistr. 52
20246 Hamburg

T ++49/40/7410-58243
F ++49/40/7410-57790
--
Pflichtangaben gemäß Gesetz über elektronische Handelsregister und 
Genossenschaftsregister sowie das Unternehmensregister (EHUG):

Universitätsklinikum Hamburg-Eppendorf; Körperschaft des öffentlichen Rechts; 
Gerichtsstand: Hamburg

Vorstandsmitglieder: Prof. Dr. Martin Zeitz (Vorsitzender), Dr. Alexander 
Kirstein, Joachim Prölß, Prof. Dr. Dr. Uwe Koch-Gromus

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Comparing two different 'survival' events for the same subject using survdiff?

2013-04-29 Thread Andrews, Chris
It isn't that complex:

myDataLong - data.frame(Time=c(A, C), Censored=c(B, D), group=rep(0:1, 
times=c(length(A), length(C
Fit = survfit(Surv(Time, Censored==0) ~ group, data=myDataLong)
plot(Fit, col=1:2)
survdiff(Surv(Time, Censored==0) ~ group, data=myDataLong)


However, your approach (a 'wide' data frame) suggests that there are equal 
numbers in the two survival studies.  Are they even the same people?  Is it 
even the same study?  If so, this is a competing risks question and would have 
to be approached differently.

And, of course, absence of evidence is not evidence of absence.  Failing to 
reject the null hypothesis that the distributions are different is not proof 
that the distributions are equal.

Chris


-Original Message-
From: Polwart Calum (COUNTY DURHAM AND DARLINGTON NHS FOUNDATION TRUST) 
[mailto:calum.polw...@nhs.net] 
Sent: Monday, April 29, 2013 4:48 AM
To: r-help@r-project.org
Subject: [R] Comparing two different 'survival' events for the same subject 
using survdiff?

I have a dataset which for the sake of simplicity has two endpoints.  We would 
like to test if two different end-points have the same eventual meaning.  To 
try and take an example that people might understand better:

Lets assume we had a group of subjects who all received a treatment.  The could 
stop treatment for any reason (side effects, treatment stops working etc).  
Getting that data is very easy.  Measuring if treatment stops working is very 
hard to capture... so we would like to test if duration on treatment (easy) is 
the same as time to treatment failure (hard).

My data might look like this:

A = c(9.77,  0.43,  0.03,  3.50,  7.07,  6.57,  8.57,  2.30,  6.17,  3.27,  
2.57,  0.77) B = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1)  # 1 = yes (censored) C 
= c( 9.80,  0.43,  5.93,  8.43,  6.80,  2.60,  8.93,  8.37, 12.23,  5.83, 
13.17,  0.77) D = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1) # 1 = yes (censored) 
myData = data.frame (TimeOnTx = A, StillOnTx = B, TimeToFailure = C, NotFailed 
= D)

We can do a survival analysis on those individually:
OnTxFit = survfit (Surv ( TimeOnTx, StillOnTx==0 ) ~ 1 , data = myData)

FailedFit = survfit (Surv ( TimeToFailure , NotFailed==0 ) ~ 1 , data = myData)

plot(OnTxFit)
lines(OnTxFit)

But how can I do a survdiff type of comparison between the two?  Do I have to 
restructure the data so that Time's are all in one column, Event in another and 
then a Group to indicate what type of event it is?  Seems a complex way to do 
it (especially as the dataset is of course more complex than I've just 
shown)... so I thought maybe I'm missing something...





This message may contain confidential information. If yo...{{dropped:7}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Comparing two different 'survival' events for the same subject using survdiff?

2013-04-29 Thread Polwart Calum (COUNTY DURHAM AND DARLINGTON NHS FOUNDATION TRUST)
 It isn't that complex:

 myDataLong - data.frame(Time=c(A, C), Censored=c(B, D), group=rep(0:1, 
 times=c(length(A), length(C
 Fit = survfit(Surv(Time, Censored==0) ~ group, data=myDataLong)
 plot(Fit, col=1:2)
 survdiff(Surv(Time, Censored==0) ~ group, data=myDataLong)

Yes - for the example its not complex - but once we get down to having more 
data columns I think it may...  Maybe I ignore those and just build 
'myDataLong' for this specific test.

 However, your approach (a 'wide' data frame) suggests that there are equal 
 numbers in the two survival
 studies.  Are they even the same people?  Is it even the same study?  If so, 
 this is a competing risks question
 and would have to be approached differently.

Yes its the same patients. The two events are technically independant of each 
other but the hope is that the easier outcome measure would predict the 
other...  I'm not familliar with competing risks and so will have to read up on 
it but it isn't a scenario where A or B happens, A happens and B happens and 
you might expect A happened because B happened...

 And, of course, absence of evidence is not evidence of absence.  Failing to 
 reject the null hypothesis that the
 distributions are different is not proof that the distributions are equal.

Yes absolutely - however I'm half expecting to detect a difference and so then 
dismiss using A as a surrogate of B...

Thanks



-Original Message-
From: Polwart Calum (COUNTY DURHAM AND DARLINGTON NHS FOUNDATION TRUST) 
[mailto:calum.polw...@nhs.net]
Sent: Monday, April 29, 2013 4:48 AM
To: r-help@r-project.org
Subject: [R] Comparing two different 'survival' events for the same subject 
using survdiff?

I have a dataset which for the sake of simplicity has two endpoints.  We would 
like to test if two different end-points have the same eventual meaning.  To 
try and take an example that people might understand better:

Lets assume we had a group of subjects who all received a treatment.  The could 
stop treatment for any reason (side effects, treatment stops working etc).  
Getting that data is very easy.  Measuring if treatment stops working is very 
hard to capture... so we would like to test if duration on treatment (easy) is 
the same as time to treatment failure (hard).

My data might look like this:

A = c(9.77,  0.43,  0.03,  3.50,  7.07,  6.57,  8.57,  2.30,  6.17,  3.27,  
2.57,  0.77) B = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1)  # 1 = yes (censored) C 
= c( 9.80,  0.43,  5.93,  8.43,  6.80,  2.60,  8.93,  8.37, 12.23,  5.83, 
13.17,  0.77) D = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1) # 1 = yes (censored) 
myData = data.frame (TimeOnTx = A, StillOnTx = B, TimeToFailure = C, NotFailed 
= D)

We can do a survival analysis on those individually:
OnTxFit = survfit (Surv ( TimeOnTx, StillOnTx==0 ) ~ 1 , data = myData)

FailedFit = survfit (Surv ( TimeToFailure , NotFailed==0 ) ~ 1 , data = myData)

plot(OnTxFit)
lines(OnTxFit)

But how can I do a survdiff type of comparison between the two?  Do I have to 
restructure the data so that Time's are all in one column, Event in another and 
then a Group to indicate what type of event it is?  Seems a complex way to do 
it (especially as the dataset is of course more complex than I've just 
shown)... so I thought maybe I'm missing something...





This message may contain confidential information. If yo...{{dropped:29}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Counting number of consecutive occurrences per rows

2013-04-29 Thread jim holtman
Forgot the last part of the question:

 test - structure(list(jul = structure(c(14655, 14655, 14655, 14655,
+ 14655, 14655, 14655, 14655, 14655, 14655, 14655, 14655, 14655,
+ 14655, 14655, 14655), origin = structure(0, class = Date)),
+ time = structure(c(1266258354, 1266258954, 1266259554, 1266260154,
+ 1266260754, 1266261354, 1266261954, 1266262554, 1266263154,
+ 1266263754, 1266264354, 1266264954, 1266265554, 1266266154,
+ 1266266754, 1266267354), class = c(POSIXct, POSIXt), tzone =
+ GMT),
+ act = c(130, 23, 45, 200, 200, 200, 199, 150, 0, 0, 0, 0,
+ 34, 200, 200, 145), day = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
+ 0, 0, 0, 0, 0, 0)), .Names = c(jul, time, act, day
+ ), class = data.frame, row.names = c(510L, 512L, 514L, 516L,
+ 518L, 520L, 522L, 524L, 526L, 528L, 530L, 532L, 534L, 536L, 538L,
+ 540L))

 # add key to separate data
 test$key - ifelse(test$act == 0
+ , 1L  # 0
+ , ifelse(test$act == 200
+ , 3L  # 200
+ , 2L  # 1-199
+ )
+ )
 # mark changes in sequence
 test$resChange - cumsum(c(1L, abs(diff(test$key
 test$res - ave(test$resChange, test$resChange, FUN = length)

 test$res2 - ave(test$resChange, test$resChange, test$day, FUN = length)

 require(data.table)  # use this for aggregation
 test - data.table(test)
 testResume - test[
+ , list(maxres = max(res)
+ , minres = min(res)
+ , sumres = length(unique(resChange))
+ )
+ , keyby = c('day', 'key')
+ ]
 # change 'key'
 testResume$key - c('0', '1-199', '200')[testResume$key]
 testResume
   day   key maxres minres sumres
1:   0 0  4  4  1
2:   0 1-199  1  1  2
3:   0   200  2  2  1
4:   1 0  4  4  1
5:   1 1-199  3  2  2
6:   1   200  3  3  1




On Mon, Apr 29, 2013 at 6:44 AM, zuzana zajkova zuzu...@gmail.com wrote:

 Hi,

 I would appreciate if somebody could help me with following calculation.
 I have a dataframe, by 10 minutes time, for mostly one year data. This is
 small example:

  dput(test)
 structure(list(jul = structure(c(14655, 14655, 14655, 14655,
 14655, 14655, 14655, 14655, 14655, 14655, 14655, 14655, 14655,
 14655, 14655, 14655), origin = structure(0, class = Date)),
 time = structure(c(1266258354, 1266258954, 1266259554, 1266260154,
 1266260754, 1266261354, 1266261954, 1266262554, 1266263154,
 1266263754, 1266264354, 1266264954, 1266265554, 1266266154,
 1266266754, 1266267354), class = c(POSIXct, POSIXt), tzone =
 GMT),
 act = c(130, 23, 45, 200, 200, 200, 199, 150, 0, 0, 0, 0,
 34, 200, 200, 145), day = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
 0, 0, 0, 0, 0, 0)), .Names = c(jul, time, act, day
 ), class = data.frame, row.names = c(510L, 512L, 514L, 516L,
 518L, 520L, 522L, 524L, 526L, 528L, 530L, 532L, 534L, 536L, 538L,
 540L))

 Looks like this:

  test
  jultime act day
 510 14655 2010-02-15 18:25:54 130   1
 512 14655 2010-02-15 18:35:54  23   1
 514 14655 2010-02-15 18:45:54  45   1
 516 14655 2010-02-15 18:55:54 200   1
 518 14655 2010-02-15 19:05:54 200   1
 520 14655 2010-02-15 19:15:54 200   1
 522 14655 2010-02-15 19:25:54 199   1
 524 14655 2010-02-15 19:35:54 150   1
 526 14655 2010-02-15 19:45:54   0   1
 528 14655 2010-02-15 19:55:54   0   1
 530 14655 2010-02-15 20:05:54   0   0
 532 14655 2010-02-15 20:15:54   0   0
 534 14655 2010-02-15 20:25:54  34   0
 536 14655 2010-02-15 20:35:54 200   0
 538 14655 2010-02-15 20:45:54 200   0
 540 14655 2010-02-15 20:55:54 145   0


 What I would like to calculate is the number of consecutive occurrences of
 values 200,  0 and together values from 1 til 199 (in fact the values that
 differ from 200 and 0) in column act.

 I would like to get something like this (result$res)

  result
   jultime act day res res2
 510 14655 2010-02-15 18:25:54 130   1   33
 512 14655 2010-02-15 18:35:54  23   1   33
 514 14655 2010-02-15 18:45:54  45   1   33
 516 14655 2010-02-15 18:55:54 200   1   33
 518 14655 2010-02-15 19:05:54 200   1   33
 520 14655 2010-02-15 19:15:54 200   1   33
 522 14655 2010-02-15 19:25:54 199   1   22
 524 14655 2010-02-15 19:35:54 150   1   22
 526 14655 2010-02-15 19:45:54   0   1   42
 528 14655 2010-02-15 19:55:54   0   1   42
 530 14655 2010-02-15 20:05:54   0   0   42
 532 14655 2010-02-15 20:15:54   0   0   42
 534 14655 2010-02-15 20:25:54  34   0   11
 536 14655 2010-02-15 20:35:54 200   0   22
 538 14655 2010-02-15 20:45:54 200   0   22
 540 14655 2010-02-15 20:55:54 145   0   11

 And if possible, distinguish among day==1 and day==0 (see the act values
 of 0 for example), results as in result$res2.

 After it I would like to make a resume table per days (jul):
 where maxres is max(result$res) for the act value
 where minres is min(result$res) 

[R] Hi

2013-04-29 Thread Fatos Baruti
What is the entry code formula autocovariance and autocorrelation in R
program for these data?

ac(2,3.5,3.5,2.2,2.2,3.3,2.5,2.5,3.2,2.5,2.5,2.7,1.7,2.7,2.9,2.3,2.7,3,1.8,2.5,3.1,2.5,2.5,3.2,2.7,1.9,2.6,2.3,2.7,3.2,2.2,1.5,2.3,2.6,2.5,2.9,2,2.5,2.6,2.4,2.6,2.8,2.5,2.6,3.2,1.8,2.7,3.4,2.2,2.9,3.2)

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] BCP utility

2013-04-29 Thread Stephane.COLAS
Hello,

Currently we can load the data with the Bulkload facility with SAS using the 
BCP utility instead of the t-sql command BULK INSERT to copy data from a file 
to a SQL table.
From now I can see that RODBC package use only the t-sql command BULK INSERT.

It could be interesting to see if the R command can accept the use of the BCP 
utility instead of the use of the t-sql command Bulk insert.
Using BCP should avoid the need of the high privilege Bulkadmin requested with 
the t-sql command BULK INSERT.

Some of you know if the BCP utility is usable with R?

Stef.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] quesion about model g1

2013-04-29 Thread meng
Thanks for your reply.
 
As to g2 and g3:
g2: a + d + c + a:d + a:c + d:c
g3: a + d + c + a:d + a:c + d:c + a:d:c
The only difference between g2 and g3 is a:d:c,which refers to case depends 
on drug but in the same way for different levels of age. And anova tests 
whether this only differenceis significant.

But as to g1 and g3:
g1: a + d + c + a:d + a:c
g3: a + d + c + a:d + a:c + d:c + a:d:c
The only difference between g1 and g3 is d:c + a:d:c.
What's d:c + a:d:c refers to?
 
Thanks.

 
 





 



At 2013-04-29 14:33:25,Achim Zeileis achim.zeil...@uibk.ac.at wrote:
On Mon, 29 Apr 2013, meng wrote:

 Hello Achim:
 Sorry for another question about the model g1 in the last mail.

 As to model g2 and g3:
 g2 - glm(Freq ~ (age + drug + case)^2, data = df, family = poisson)
 g3 - glm(Freq ~ age * drug * case, data = df, family = poisson)
 anova(g2, g3, test = Chisq)

 I know clearly that the only difference between g2 and g3 is that g2 has 
 no 3-way interaction while g3 has,and anova tests whether this only 
 difference(i.e.  3-way interaction) is significant or not.

 But as to g1 and g3:
 g1 - glm(Freq ~ age/(drug + case), data = df, family = poisson)

 I can't find out the only difference between g1 and g3,so I don't know 
 what anova(g1, g3, test = Chisq) tests for. Also, what / sign 
 following age in g1 refers to?

The / could be replaced by a * here and the fitted values and 
corresponding log-likelihood would not change. Only the coefficients 
change: / induces a nested coding while * employs the interaction 
coding.

Breaking everything down to main and interaction effects (and ignoring the 
particular coding of the coefficients), the three models are

g1: a + d + c + a:d + a:c
g2: a + d + c + a:d + a:c + d:c
g3: a + d + c + a:d + a:c + d:c + a:d:c

with interpretations:

g1: conditional independence of drug and case given age
g2: no three-way interaction (case depends on drug but in the same way for 
different levels of age)
g3: saturated model


 Many thanks and sorry for many quesions.


 Best














 At 2013-04-24 22:22:55,Achim Zeileis achim.zeil...@uibk.ac.at wrote:
 On Wed, 24 Apr 2013, meng wrote:

 Hi,Achim:
 Can all the analysis you mentioned via loglm be performed via
 glm(...,family=poisson)?

 Yes.

 ## transform table back to data.frame
 df - as.data.frame(tab)

 ## fit models: conditional independence, no-three way interaction,
 ## and saturated
 g1 - glm(Freq ~ age/(drug + case), data = df, family = poisson)
 g2 - glm(Freq ~ (age + drug + case)^2, data = df, family = poisson)
 g3 - glm(Freq ~ age * drug * case, data = df, family = poisson)

 ## likelihood-ratio tests (against saturated)
 anova(g1, g3, test = Chisq)
 anova(g2, g3, test = Chisq)

 ## compare fitted frequencies (which are essentially equal)
 all.equal(as.numeric(fitted(g1)),
   as.data.frame(as.table(fitted(m1)))$Freq)
 all.equal(as.numeric(fitted(g2)),
   as.data.frame(as.table(fitted(m2)))$Freq)

 The difference is mainly that loglm() has a specialized user interface and
 that it uses a different optimizer (iterative proportional fitting rather
 than iterative reweighted least squares).

 Best,
 Z

 Many thanks.







 At 2013-04-24 19:37:10,Achim Zeileis achim.zeil...@uibk.ac.at wrote:
 On Wed, 24 Apr 2013, meng wrote:

 Hi all:
 For stratified count data,how to perform regression analysis?

 My data:
 age case oc count
 1   1 121
 1   1 226
 1   2 117
 1   2 259
 2   1 118
 2   1 288
 2   2 1 7
 2   2 2   95

 age:
 1:40y
 2:40y

 case:
 1:patient
 2:health

 oc:
 1:use drug
 2:not use drug

 My purpose:
 Anaysis whether case and oc are correlated, and age is a stratified varia
 ble.

 My solution:
 1,Mantel-Haenszel test by using function mantelhaen.test
 2,loglinear regression by using function glm(count~case*oc,family=poisson
 ).But I don't know how to handle variable age,which is the stratified 
 vari
 able.

 Instead of using glm(family = poisson) it is also convenient to use
 loglm() from package MASS for the associated convenience table.

 The code below shows how to set up the contingency table, visualize it
 using package vcd, and then fit two models using loglm. The models
 considered are conditional independence of case and drug given age and the
 no three-way interaction already suggested by Peter. Both models are
 also accompanied by visualizations of the residuals.

 ## contingency table with nice labels
 tab - expand.grid(drug = 1:2, case = 1:2, age = 1:2)
 tab$count - c(21, 26, 17, 59, 18, 88, 7, 95)
 tab$age - factor(tab$age, levels = 1:2, labels = c(40, 40))
 tab$case - factor(tab$case, levels = 1:2, labels = c(patient,
 healthy))
 tab$drug - factor(tab$drug, levels = 1:2, labels = c(yes, no))
 tab - xtabs(count ~ age + drug + case, data = tab)

 ## visualize case explained by drug given age
 library(vcd)
 mosaic(case ~ drug | age, data = tab,
   split_vertical = c(TRUE, TRUE, FALSE))

 ## test 

[R] all.vars for nested expressions

2013-04-29 Thread flxms
Dear R fellows,
 
Assume I define

a - expression(fn+tp)
sen - expression(tp/a)

Now I'd like to know, which variables are necessary for calculating sen

all.vars(sen)

This results in a vector c(tp,a). But I'd like all.vars to evaluate the
sen-object down to the ground level, which would result in a vector
c(tp,fn) (because a was defined as fn+tp). In other words, I'd like
all.vars to expand the a-object (and all other downstream objects). I am
looking for a solution, that works with much more levels. This is just a
very simple example.

I'd appreciate any suggestions how to do that very much!
Thanks in advance,

Felix


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to call an object given a string?

2013-04-29 Thread arun
Hi, res- unlist(mget(ls()))
names(res)-NULL
 res
#[1]  5  5 10
A.K.



- Original Message -
From: Rui Esteves ruimax...@gmail.com
To: r-help@r-project.org
Cc: 
Sent: Monday, April 29, 2013 7:07 AM
Subject: [R] How to call an object given a string?

Hello,

This is very basic and very frustrating.

Suppose this:
A=5
B=5
C=10

 ls()
A
B
C

I would like this
xpto()
5
5
10

How can I do xpto()?

Thanks
Rui

    [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] all.vars for nested expressions

2013-04-29 Thread flxms
Dear R fellows,
 
Assume I define

a - expression(fn+tp)
sen - expression(tp/a)

Now I'd like to know, which variables are necessary for calculating sen

all.vars(sen)

This results in a vector c(tp,a). But I'd like all.vars to evaluate the
sen-object down to the ground level, which would result in a vector
c(tp,fn) (because a was defined as fn+tp). In other words, I'd like
all.vars to expand the a-object (and all other downstream objects). I am
looking for a solution, that works with much more levels. This is just a
very simple example.

I'd appreciate any suggestions how to do that very much!
Thanks in advance,

Felix

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Counting number of consecutive occurrences per rows

2013-04-29 Thread jim holtman
try this:

 test - structure(list(jul = structure(c(14655, 14655, 14655, 14655,
+ 14655, 14655, 14655, 14655, 14655, 14655, 14655, 14655, 14655,
+ 14655, 14655, 14655), origin = structure(0, class = Date)),
+ time = structure(c(1266258354, 1266258954, 1266259554, 1266260154,
+ 1266260754, 1266261354, 1266261954, 1266262554, 1266263154,
+ 1266263754, 1266264354, 1266264954, 1266265554, 1266266154,
+ 1266266754, 1266267354), class = c(POSIXct, POSIXt), tzone =
+ GMT),
+ act = c(130, 23, 45, 200, 200, 200, 199, 150, 0, 0, 0, 0,
+ 34, 200, 200, 145), day = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
+ 0, 0, 0, 0, 0, 0)), .Names = c(jul, time, act, day
+ ), class = data.frame, row.names = c(510L, 512L, 514L, 516L,
+ 518L, 520L, 522L, 524L, 526L, 528L, 530L, 532L, 534L, 536L, 538L,
+ 540L))

 # add key to separate data
 test$key - ifelse(test$act == 0
+ , 1L  # 0
+ , ifelse(test$act == 200
+ , 3L  # 200
+ , 2L  # 1-199
+ )
+ )
 # mark changes in sequence
 test$resChange - cumsum(c(1L, abs(diff(test$key
 test$res - ave(test$resChange, test$resChange, FUN = length)

 test$res2 - ave(test$resChange, test$resChange, test$day, FUN = length)

 test
  jultime act day key resChange res res2
510 14655 2010-02-15 18:25:54 130   1   2 1   33
512 14655 2010-02-15 18:35:54  23   1   2 1   33
514 14655 2010-02-15 18:45:54  45   1   2 1   33
516 14655 2010-02-15 18:55:54 200   1   3 2   33
518 14655 2010-02-15 19:05:54 200   1   3 2   33
520 14655 2010-02-15 19:15:54 200   1   3 2   33
522 14655 2010-02-15 19:25:54 199   1   2 3   22
524 14655 2010-02-15 19:35:54 150   1   2 3   22
526 14655 2010-02-15 19:45:54   0   1   1 4   42
528 14655 2010-02-15 19:55:54   0   1   1 4   42
530 14655 2010-02-15 20:05:54   0   0   1 4   42
532 14655 2010-02-15 20:15:54   0   0   1 4   42
534 14655 2010-02-15 20:25:54  34   0   2 5   11
536 14655 2010-02-15 20:35:54 200   0   3 6   22
538 14655 2010-02-15 20:45:54 200   0   3 6   22
540 14655 2010-02-15 20:55:54 145   0   2 7   11




On Mon, Apr 29, 2013 at 6:44 AM, zuzana zajkova zuzu...@gmail.com wrote:

 Hi,

 I would appreciate if somebody could help me with following calculation.
 I have a dataframe, by 10 minutes time, for mostly one year data. This is
 small example:

  dput(test)
 structure(list(jul = structure(c(14655, 14655, 14655, 14655,
 14655, 14655, 14655, 14655, 14655, 14655, 14655, 14655, 14655,
 14655, 14655, 14655), origin = structure(0, class = Date)),
 time = structure(c(1266258354, 1266258954, 1266259554, 1266260154,
 1266260754, 1266261354, 1266261954, 1266262554, 1266263154,
 1266263754, 1266264354, 1266264954, 1266265554, 1266266154,
 1266266754, 1266267354), class = c(POSIXct, POSIXt), tzone =
 GMT),
 act = c(130, 23, 45, 200, 200, 200, 199, 150, 0, 0, 0, 0,
 34, 200, 200, 145), day = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
 0, 0, 0, 0, 0, 0)), .Names = c(jul, time, act, day
 ), class = data.frame, row.names = c(510L, 512L, 514L, 516L,
 518L, 520L, 522L, 524L, 526L, 528L, 530L, 532L, 534L, 536L, 538L,
 540L))

 Looks like this:

  test
  jultime act day
 510 14655 2010-02-15 18:25:54 130   1
 512 14655 2010-02-15 18:35:54  23   1
 514 14655 2010-02-15 18:45:54  45   1
 516 14655 2010-02-15 18:55:54 200   1
 518 14655 2010-02-15 19:05:54 200   1
 520 14655 2010-02-15 19:15:54 200   1
 522 14655 2010-02-15 19:25:54 199   1
 524 14655 2010-02-15 19:35:54 150   1
 526 14655 2010-02-15 19:45:54   0   1
 528 14655 2010-02-15 19:55:54   0   1
 530 14655 2010-02-15 20:05:54   0   0
 532 14655 2010-02-15 20:15:54   0   0
 534 14655 2010-02-15 20:25:54  34   0
 536 14655 2010-02-15 20:35:54 200   0
 538 14655 2010-02-15 20:45:54 200   0
 540 14655 2010-02-15 20:55:54 145   0


 What I would like to calculate is the number of consecutive occurrences of
 values 200,  0 and together values from 1 til 199 (in fact the values that
 differ from 200 and 0) in column act.

 I would like to get something like this (result$res)

  result
   jultime act day res res2
 510 14655 2010-02-15 18:25:54 130   1   33
 512 14655 2010-02-15 18:35:54  23   1   33
 514 14655 2010-02-15 18:45:54  45   1   33
 516 14655 2010-02-15 18:55:54 200   1   33
 518 14655 2010-02-15 19:05:54 200   1   33
 520 14655 2010-02-15 19:15:54 200   1   33
 522 14655 2010-02-15 19:25:54 199   1   22
 524 14655 2010-02-15 19:35:54 150   1   22
 526 14655 2010-02-15 19:45:54   0   1   42
 528 14655 2010-02-15 19:55:54   0   1   42
 530 14655 2010-02-15 20:05:54   0   0   42
 532 14655 2010-02-15 20:15:54   0   0   42
 534 14655 2010-02-15 

[R] Adding elements in data.frame subsets and also subtracting an element from the rest elements in data.frame

2013-04-29 Thread Katherine Gobin
Dear R forum

I have a data.frame as

cashflow_df = data.frame(instrument = 
c(ABC,ABC,ABC,ABC,ABC,ABC,ABC,ABC,ABC,ABC,ABC,ABC,ABC,ABC,
 ABC, PQR, PQR, PQR,PQR,PQR,PQR,PQR,PQR,PQR,PQR, PQR, 
PQR, PQR,PQR, PQR,PQR,PQR,PQR, PQR,PQR,UVWXYZ,UVWXYZ, 
UVWXYZ, UVWXYZ, UVWXYZ,UVWXYZ,UVWXYZ,UVWXYZ, UVWXYZ, UVWXYZ),

id = c(1,1,1,2,2,2,3,3,3,4,4,4,5,5,5,1,1,1,1,2,2,2,2,3,3,3,3,4,4,4,4,5,5,5,5, 
1,1,2,2,3,3,4,4, 5,5),

cashflow = c(5000,5000,505000,5000,5000,505000,5000,5000,505000, 5000,5000, 
505000, 
5000,5000,505000,500,500,500,102000,500,500,500,102000,500,500,500,102000,500,500,500,102000,500,500,500,102000,8000,808000,8000,808000,8000,808000,8000,808000,8000,808000),

cashflows_pv = c(4931.054, 4479.1116, 431160.8529,4931.9604, 4485.6393, 
432064.0228, 
4932.5438,4489.8451,432646.2398,4932.1548,4487.0404,432257.9551,4932.6087,4490.3129,432711.0084,493.6326,474.0524,455.2489,82252.0304,493.8083,474.7543,456.4356,82744.9157,493.6003,473.9235,455.031,82161.7368,493.8175,474.7913,456.4982,82770.9849,493.8592,474.9581,456.7804,82888.4556,7451.3118,681810.5522,7462.0148,684153.4992,7441.1294,679585.9186,7426.6407,676427.7274,7427.1225,676532.6262))

#  __

 cashflow_df
   instrument id cashflow cashflows_pv
1 ABC  1 5000    4931.0540
2 ABC  1 5000    4479.1116
3 ABC  1   505000  431160.8529
4 ABC  2 5000    4931.9604
5 ABC  2 5000    4485.6393
6 ABC  2   505000  432064.0228
7 ABC  3 5000    4932.5438
8 ABC  3 5000    4489.8451
9 ABC  3   505000  432646.2398
10    ABC  4 5000    4932.1548
11    ABC  4 5000    4487.0404
12    ABC  4   505000  432257.9551
13    ABC  5 5000    4932.6087
14    ABC  5 5000    4490.3129
15    ABC  5   505000  432711.0084
16    PQR  1  500 493.6326
17    PQR  1  500 474.0524
18    PQR  1  500 455.2489
19    PQR  1   102000   82252.0304
20    PQR  2  500 493.8083
21    PQR  2  500 474.7543
22    PQR  2  500 456.4356
23    PQR  2   102000   82744.9157
24    PQR  3  500 493.6003
25    PQR  3  500 473.9235
26    PQR  3  500 455.0310
27    PQR  3   102000   82161.7368
28    PQR  4  500 493.8175
29    PQR  4  500 474.7913
30    PQR  4  500 456.4982
31    PQR  4   102000   82770.9849
32    PQR  5  500 493.8592
33    PQR  5  500 474.9581
34    PQR  5  500 456.7804
35    PQR  5   102000   82888.4556
36 UVWXYZ  1 8000    7451.3118
37 UVWXYZ  1   808000  681810.5522
38 UVWXYZ  2 8000    7462.0148
39 UVWXYZ  2   808000  684153.4992
40 UVWXYZ  3 8000    7441.1294
41 UVWXYZ  3   808000  679585.9186
42 UVWXYZ  4 8000    7426.6407
43 UVWXYZ  4   808000  676427.7274
44 UVWXYZ  5 8000    7427.1225
45 UVWXYZ  5   808000  676532.6262

# ===

# My PROBLEM


For a given instrument and id, I need the totals of cashflow and cashflows_pv  
and also the difference of (total_cashflow_pv pertaining to the first ID for 
the given instrument from total_cashflow_pv for the same instrument) as shown 
in the fourth column of following output.

output

   instrument id   total_cashflow   total_cashflow_pv
1 ABC  1 515000 440571.02
2 ABC  2 515000 441481.62
3 ABC  3 515000 442068.63
4 ABC  4 515000 441677.15
5 ABC  5 515000 442133.93
6 PQR  1 103500  83674.96
7 PQR  2 103500  84169.91
8 PQR  3 103500  83584.29
9 PQR  4 103500  84196.09
10    PQR  5 103500  84314.05
11 UVWXYZ  1 816000 689261.86
12 UVWXYZ  2 816000 691615.51
13 UVWXYZ  3 816000 687027.05
14 UVWXYZ  4 816000 683854.37
15 UVWXYZ  5 816000 683959.75
 

 cashflow_change
1   0.  # This is  (440571.02 -  440571.02) 1st ID value - 1st 
ID value for ABC 
2 910.6040    # This is  (441481.62 -  440571.02) 2nd ID value - 1st ID 
value for ABC
3    1497.6102   # This is  (442068.63 -  440571.02) 3rd ID value - 1st ID 
value for ABC
4    1106.1318
5    1562.9115
6   0.    # This is  (83674.96 - 83674.96) 1st ID value - 1st ID 
value for PQR 
7 494.9496
8 -90.6727
9 521.1276
10    639.0890
11  0.
12   2353.6500
13  -2234.8160
14  -5407.4959
15  -5302.1153   # This is  (683959.75 -689261.86 ) 5th ID value - 1st ID 
value for UVWXYZ


Kindly guide

Regards

Katherine


[[alternative HTML version deleted]]


Re: [R] Counting number of consecutive occurrences per rows

2013-04-29 Thread PIKAL Petr
Hi

rrr-rle(as.numeric(cut(test$act, c(0,1,199,200), include.lowest=T)))
test$res - rep(rrr$lengths, rrr$lengths)

If you put it in function

fff- function(x, limits=c(0,1,199,200)) {
rrr-rle(as.numeric(cut(x, limits, include.lowest=T)))
res - rep(rrr$lengths, rrr$lengths)
res
}

you can use split/lapply approach

test$res2-unlist(lapply(split(test$act, factor(test$day, levels=c(1,0))), fff))

Beware of correct ordering of days in output. Without correct leveling of 
factor 0 precedes 1.

And for the last part probably aggregate can be the way.

 aggregate(test$res, list(test$jul, cut(test$act, c(0,1,199,200), 
 include.lowest=T)), max)
  Group.1   Group.2 x
1   14655 [0,1] 4
2   14655   (1,199] 3
3   14655 (199,200] 3
 aggregate(test$res, list(test$jul, cut(test$act, c(0,1,199,200), 
 include.lowest=T)), min)
  Group.1   Group.2 x
1   14655 [0,1] 4
2   14655   (1,199] 1
3   14655 (199,200] 2

Regards
Petr

 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-bounces@r-
 project.org] On Behalf Of zuzana zajkova
 Sent: Monday, April 29, 2013 12:45 PM
 To: r-help@r-project.org
 Subject: [R] Counting number of consecutive occurrences per rows
 
 Hi,
 
 I would appreciate if somebody could help me with following
 calculation.
 I have a dataframe, by 10 minutes time, for mostly one year data. This
 is small example:
 
  dput(test)
 structure(list(jul = structure(c(14655, 14655, 14655, 14655, 14655,
 14655, 14655, 14655, 14655, 14655, 14655, 14655, 14655, 14655, 14655,
 14655), origin = structure(0, class = Date)),
 time = structure(c(1266258354, 1266258954, 1266259554, 1266260154,
 1266260754, 1266261354, 1266261954, 1266262554, 1266263154,
 1266263754, 1266264354, 1266264954, 1266265554, 1266266154,
 1266266754, 1266267354), class = c(POSIXct, POSIXt), tzone =
 GMT),
 act = c(130, 23, 45, 200, 200, 200, 199, 150, 0, 0, 0, 0,
 34, 200, 200, 145), day = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
 0, 0, 0, 0, 0, 0)), .Names = c(jul, time, act, day
 ), class = data.frame, row.names = c(510L, 512L, 514L, 516L, 518L,
 520L, 522L, 524L, 526L, 528L, 530L, 532L, 534L, 536L, 538L,
 540L))
 
 Looks like this:
 
  test
  jultime act day
 510 14655 2010-02-15 18:25:54 130   1
 512 14655 2010-02-15 18:35:54  23   1
 514 14655 2010-02-15 18:45:54  45   1
 516 14655 2010-02-15 18:55:54 200   1
 518 14655 2010-02-15 19:05:54 200   1
 520 14655 2010-02-15 19:15:54 200   1
 522 14655 2010-02-15 19:25:54 199   1
 524 14655 2010-02-15 19:35:54 150   1
 526 14655 2010-02-15 19:45:54   0   1
 528 14655 2010-02-15 19:55:54   0   1
 530 14655 2010-02-15 20:05:54   0   0
 532 14655 2010-02-15 20:15:54   0   0
 534 14655 2010-02-15 20:25:54  34   0
 536 14655 2010-02-15 20:35:54 200   0
 538 14655 2010-02-15 20:45:54 200   0
 540 14655 2010-02-15 20:55:54 145   0
 
 
 What I would like to calculate is the number of consecutive occurrences
 of values 200,  0 and together values from 1 til 199 (in fact the
 values that differ from 200 and 0) in column act.
 
 I would like to get something like this (result$res)
 
  result
   jultime act day res res2
 510 14655 2010-02-15 18:25:54 130   1   33
 512 14655 2010-02-15 18:35:54  23   1   33
 514 14655 2010-02-15 18:45:54  45   1   33
 516 14655 2010-02-15 18:55:54 200   1   33
 518 14655 2010-02-15 19:05:54 200   1   33
 520 14655 2010-02-15 19:15:54 200   1   33
 522 14655 2010-02-15 19:25:54 199   1   22
 524 14655 2010-02-15 19:35:54 150   1   22
 526 14655 2010-02-15 19:45:54   0   1   42
 528 14655 2010-02-15 19:55:54   0   1   42
 530 14655 2010-02-15 20:05:54   0   0   42
 532 14655 2010-02-15 20:15:54   0   0   42
 534 14655 2010-02-15 20:25:54  34   0   11
 536 14655 2010-02-15 20:35:54 200   0   22
 538 14655 2010-02-15 20:45:54 200   0   22
 540 14655 2010-02-15 20:55:54 145   0   11
 
 And if possible, distinguish among day==1 and day==0 (see the act
 values of 0 for example), results as in result$res2.
 
 After it I would like to make a resume table per days (jul):
 where maxres is max(result$res) for the act value where minres is
 min(result$res) for the act value where sumres is sum(result$res) for
 the act value (for example, if the 200 value ocurrs in different
 times per day(jul) consecutively 3, 5, 1, 6 and 7 times the sumres
 would be 3+5+1+6+7= 22)
 
 something like this (this are made up numbers):
 
 julact maxres  minres sumres
 146550  4   1   25
 14655 200 32  48
 146551-199   3171
 146560   8238
 14656 200 15360
 146561-199   114 46
 ...
 (theoretically the sum of sumres per 

[R] Arma - estimate of variance of white noise variables

2013-04-29 Thread Preetam Pal
Hi all,

Suppose I am fitting an arma(p,q) model to a time series y_t.
So, my model should contain (q+1) white noise variables.
As far as I know, each of them should have the same variance.
How do I get the estimate of this variance by running the arma(y) function
(or is there any other way)?

Appreciate your help.

Thanks,
Preetam

-- 
Preetam Pal
(+91)-9432212774
M-Stat 2nd Year, Room No. N-114
Statistics Division,   C.V.Raman
Hall
Indian Statistical Institute, B.H.O.S.
Kolkata.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Hi

2013-04-29 Thread Ben Bolker
Fatos Baruti fatosbaruti at gmail.com writes:

 
 What is the entry code formula autocovariance and autocorrelation in R
 program for these data?
 
 a-c(2,3.5,3.5,2.2,2.2,3.3,2.5,2.5,3.2,2.5,2.5,2.7,1.7,2.7,2.9,2.
  3,2.7,3,1.8,2.5,3.1,2.5,2.5,3.2,2.7,1.9,2.6,2.3,2.7,3.2,
2.2,1.5,2.3,2.6,2.5,2.9,2,2.5,2.6,2.4,2.6,2.8,2.5,2.6,3.2,1.8,
2.7,3.4,2.2,2.9,3.2)

  How hard did you look for an answer before posting to the list ... ?

?acf
acf(a)
acf(a)$acf
acf(a)$acf*var(a)

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] cannot compile R on Cray XE6 HLRS HERMIT

2013-04-29 Thread Martin Ivanov
 Dear All,

I am trying to compile R-3.0 on Cray xe6 (HLRS) HERMIT, no success so far.
Here is my experience:

I use this to configure and make R:

CC=cc \
CXX=CC \
F77=ftn \
FC=ftn \
CPPFLAGS=-I$PREFIX/include \
LDFLAGS=-L$PREFIX/lib${LIBDIRSUFFIX} \
./configure --prefix=$PREFIX \
--exec-prefix=$PREFIX \
--bindir=$PREFIX/bin \
--sbindir=$PREFIX/sbin \
--sysconfdir=$PKG/etc \
--localstatedir=$PKG/var \
--libdir=$PREFIX/lib${LIBDIRSUFFIX} \
--datarootdir=$PREFIX/share \
--datadir=$PREFIX/share \
--infodir=$PREFIX/info \
--mandir=$PREFIX/man \
--docdir=$PREFIX/doc/$PRGNAM-$VERSION \
rdocdir=$PREFIX/doc/$PRGNAM-$VERSION \
rincludedir=$PREFIX/include \
rsharedir=$PREFIX/share \
--disable-BLAS-shlib \
--with-blas \
--with-lapack \
--without-x \
|| exit 1

make || exit 1

My environment is as follows:

  1) modules/3.2.6.7 13) 
udreg/2.3.2-1.0401.5929.3.3.gem 25) 
configuration/1.0-1.0401.35391.1.2.gem
  2) xtpe-network-gemini 14) ugni/4.0-1.0401.5928.9.5.gem   
 26) hosts/1.0-1.0401.35364.1.115.gem
  3) xtpe-interlagos 15) pmi/4.0.1-1..9421.73.3.gem 
 27) lbcd/2.1-1.0401.35360.1.2.gem
  4) cray-mpich2/5.6.4   16) 
dmapp/3.2.1-1.0401.5983.4.5.gem 28) 
nodehealth/5.0-1.0401.38460.12.18.gem
  5) eswrap/1.0.917) 
gni-headers/2.1-1.0401.5675.4.4.gem 29) pdsh/2.26-1.0401.37449.1.1.gem
  6) torque/2.5.918) xpmem/0.1-2.0401.36790.4.3.gem 
 30) shared-root/1.0-1.0401.37253.3.50.gem
  7) moab/6.1.5.s199219) 
job/1.5.5-0.1_2.0401.35380.1.10.gem 31) switch/1.0-1.0401.36779.2.72.gem
  8) system/ws_tools 20) 
csa/3.0.0-1_2.0401.37452.4.50.gem   32) xe-sysroot/4.1.40
  9) system/hlrs-defaults21) 
dvs/1.8.6_0.9.0-1.0401.1401.1.120   33) atp/1.6.2
 10) xt-asyncpe/5.19 22) rca/1.0.0-2.0401.38656.2.2.gem 

 11) gcc/4.7.2   23) 
audit/1.0.0-1.0401.37969.2.32.gem 
 12) xt-libsci/12.0.01   24) ccm/2.2.0-1.0401.37254.2.142

1. PrgEnv-gnu/4.1.40
checking for C libraries of cc -std=gnu99...  
-L/univ_2/ws3/ws/ipmiva-WRF_331_CORDEX-0/system/usr/lib64 
-L/opt/cray/udreg/2.3.2-1.0401.5929.3.3.gem/lib64 
-L/opt/cray/ugni/4.0-1.0401.5928.9.5.gem/lib64 
-L/opt/cray/pmi/4.0.1-1..9421.73.3.gem/lib64 
-L/opt/cray/dmapp/3.2.1-1.0401.5983.4.5.gem/lib64 
-L/opt/cray/xpmem/0.1-2.0401.36790.4.3.gem/lib64 
-L/opt/cray/rca/1.0.0-2.0401.38656.2.2.gem/lib64 
-L/opt/cray/mpt/5.6.4/gni/mpich2-gnu/47/lib 
-L/opt/cray/libsci/12.0.01/gnu/47/interlagos/lib 
-L/opt/cray/xe-sysroot/4.1.40/usr/lib64 -L/opt/cray/xe-sysroot/4.1.40/lib64 
-L/opt/cray/xe-sysroot/4.1.40/usr/lib/alps -L/usr/lib/alps 
-L/opt/gcc/4.7.2/snos/lib/gcc/x86_64-suse-linux/4.7.2 
-L/opt/gcc/4.7.2/snos/lib/gcc/x86_64-suse-linux/4.7.2/../../../../lib64 
-L/lib/../lib64 -L/usr/lib/../lib64 
-L/opt/gcc/4.7.2/snos/lib/gcc/x86_64-suse-linux/4.7.2/../../.. -lrca 
-L/opt/cray/atp/1.6.2/lib/ -lAtpSigHCommData -lAtpSigHandler -lgfortran 
-lscicpp_gnu -lsci_gnu_mp -lstdc++ -lmpich_gnu_47 -lmpl -lrt 
 -lxpmem -ldmapp -lugni -lpmi -lalpslli -lalpsutil -ludreg -lpthread -lm -lgomp 
-lgcc_eh
checking for dummy main to link with Fortran 77 libraries... none
checking for Fortran 77 name-mangling scheme... lower case, underscore, no 
extra underscore
checking whether ftn appends underscores to external names... yes
checking whether ftn appends extra underscores to external names... no
checking whether mixed C/Fortran code can be run... yes
checking whether ftn and cc -std=gnu99 agree on int and double... configure: 
WARNING: ftn and cc -std=gnu99 disagree on int and double
configure: error: Maybe change CFLAGS or FFLAGS?

2. PrgEnv-gnu/4.1.40 + craype-target-native

R is now configured for x86_64-unknown-linux-gnu

  Source directory:  .
  Installation directory:/univ_2/ws3/ws/ipmiva-WRF_331_CORDEX-0/system/usr

  C compiler:cc -std=gnu99  -g -O2
  Fortran 77 compiler:   gfortran  -g -O2

  C++ compiler:  CC  -g -O2
  Fortran 90/95 compiler:ftn -g -O2
  Obj-C compiler:

  Interfaces supported:
  External libraries:readline
  Additional capabilities:   JPEG
  Options enabled:   shared R library, shared BLAS, R profiling, memory 
profiling, strict barrier, static HTML

  Recommended packages:  yes

nmath/*.o` ../extra/zlib/libz.a ../extra/bzip2/libbz2.a ../extra/pcre/libpcre.a 
../extra/tre/libtre.a  ../extra/xz/liblzma.a  -L../../lib -lRblas -lgfortran 
-lm -lquadmath   -lreadline -lncurses  -lrt -ldl -lm
make[4]: Entering directory 
`/univ_2/ws3/ws/ipmiva-WRF_331_CORDEX-0/build/tmp/R-3.0.0/src/main'
mkdir -p -- /univ_2/ws3/ws/ipmiva-WRF_331_CORDEX-0/build/tmp/R-3.0.0/bin/exec
make[4]: Leaving directory 
`/univ_2/ws3/ws/ipmiva-WRF_331_CORDEX-0/build/tmp/R-3.0.0/src/main'
make[3]: Leaving 

Re: [R] Adding elements in data.frame subsets and also subtracting an element from the rest elements in data.frame

2013-04-29 Thread Bert Gunter
If this is a homework problem, there is a no homework policy on this list.

-- Bert



On Mon, Apr 29, 2013 at 5:24 AM, Katherine Gobin
katherine_go...@yahoo.com wrote:
 Dear R forum

 I have a data.frame as

 cashflow_df = data.frame(instrument = 
 c(ABC,ABC,ABC,ABC,ABC,ABC,ABC,ABC,ABC,ABC,ABC,ABC,ABC,ABC,
  ABC, PQR, PQR, PQR,PQR,PQR,PQR,PQR,PQR,PQR,PQR, PQR, 
 PQR, PQR,PQR, PQR,PQR,PQR,PQR, PQR,PQR,UVWXYZ,UVWXYZ, 
 UVWXYZ, UVWXYZ, UVWXYZ,UVWXYZ,UVWXYZ,UVWXYZ, UVWXYZ, UVWXYZ),

 id = c(1,1,1,2,2,2,3,3,3,4,4,4,5,5,5,1,1,1,1,2,2,2,2,3,3,3,3,4,4,4,4,5,5,5,5, 
 1,1,2,2,3,3,4,4, 5,5),

 cashflow = c(5000,5000,505000,5000,5000,505000,5000,5000,505000, 5000,5000, 
 505000, 
 5000,5000,505000,500,500,500,102000,500,500,500,102000,500,500,500,102000,500,500,500,102000,500,500,500,102000,8000,808000,8000,808000,8000,808000,8000,808000,8000,808000),

 cashflows_pv = c(4931.054, 4479.1116, 431160.8529,4931.9604, 4485.6393, 
 432064.0228, 
 4932.5438,4489.8451,432646.2398,4932.1548,4487.0404,432257.9551,4932.6087,4490.3129,432711.0084,493.6326,474.0524,455.2489,82252.0304,493.8083,474.7543,456.4356,82744.9157,493.6003,473.9235,455.031,82161.7368,493.8175,474.7913,456.4982,82770.9849,493.8592,474.9581,456.7804,82888.4556,7451.3118,681810.5522,7462.0148,684153.4992,7441.1294,679585.9186,7426.6407,676427.7274,7427.1225,676532.6262))

 #  __

 cashflow_df
instrument id cashflow cashflows_pv
 1 ABC  1 50004931.0540
 2 ABC  1 50004479.1116
 3 ABC  1   505000  431160.8529
 4 ABC  2 50004931.9604
 5 ABC  2 50004485.6393
 6 ABC  2   505000  432064.0228
 7 ABC  3 50004932.5438
 8 ABC  3 50004489.8451
 9 ABC  3   505000  432646.2398
 10ABC  4 50004932.1548
 11ABC  4 50004487.0404
 12ABC  4   505000  432257.9551
 13ABC  5 50004932.6087
 14ABC  5 50004490.3129
 15ABC  5   505000  432711.0084
 16PQR  1  500 493.6326
 17PQR  1  500 474.0524
 18PQR  1  500 455.2489
 19PQR  1   102000   82252.0304
 20PQR  2  500 493.8083
 21PQR  2  500 474.7543
 22PQR  2  500 456.4356
 23PQR  2   102000   82744.9157
 24PQR  3  500 493.6003
 25PQR  3  500 473.9235
 26PQR  3  500 455.0310
 27PQR  3   102000   82161.7368
 28PQR  4  500 493.8175
 29PQR  4  500 474.7913
 30PQR  4  500 456.4982
 31PQR  4   102000   82770.9849
 32PQR  5  500 493.8592
 33PQR  5  500 474.9581
 34PQR  5  500 456.7804
 35PQR  5   102000   82888.4556
 36 UVWXYZ  1 80007451.3118
 37 UVWXYZ  1   808000  681810.5522
 38 UVWXYZ  2 80007462.0148
 39 UVWXYZ  2   808000  684153.4992
 40 UVWXYZ  3 80007441.1294
 41 UVWXYZ  3   808000  679585.9186
 42 UVWXYZ  4 80007426.6407
 43 UVWXYZ  4   808000  676427.7274
 44 UVWXYZ  5 80007427.1225
 45 UVWXYZ  5   808000  676532.6262

 # ===

 # My PROBLEM


 For a given instrument and id, I need the totals of cashflow and cashflows_pv 
  and also the difference of (total_cashflow_pv pertaining to the first ID for 
 the given instrument from total_cashflow_pv for the same instrument) as shown 
 in the fourth column of following output.

 output

instrument id   total_cashflow   total_cashflow_pv
 1 ABC  1 515000 440571.02
 2 ABC  2 515000 441481.62
 3 ABC  3 515000 442068.63
 4 ABC  4 515000 441677.15
 5 ABC  5 515000 442133.93
 6 PQR  1 103500  83674.96
 7 PQR  2 103500  84169.91
 8 PQR  3 103500  83584.29
 9 PQR  4 103500  84196.09
 10PQR  5 103500  84314.05
 11 UVWXYZ  1 816000 689261.86
 12 UVWXYZ  2 816000 691615.51
 13 UVWXYZ  3 816000 687027.05
 14 UVWXYZ  4 816000 683854.37
 15 UVWXYZ  5 816000 683959.75


  cashflow_change
 1   0.  # This is  (440571.02 -  440571.02) 1st ID value - 
 1st ID value for ABC
 2 910.6040# This is  (441481.62 -  440571.02) 2nd ID value - 1st 
 ID value for ABC
 31497.6102   # This is  (442068.63 -  440571.02) 3rd ID value - 1st 
 ID value for ABC
 41106.1318
 51562.9115
 6   0.# This is  (83674.96 - 83674.96) 1st ID value - 1st ID 
 value for PQR
 7 494.9496
 8 -90.6727
 9 521.1276
 10639.0890
 11  0.
 12 

[R] Need help on matrix calculation

2013-04-29 Thread Christofer Bogaso
Hello again,

Let say I have 1 matrix:

Mat - matrix(1:12, 4, 3)
rownames(Mat) - letters[1:4]

Now I want to subscript of Mat in following way:

Subscript_Vec - c(a, e, b, c)

However when I want to use this vector, I am geting following error:

Mat[Subscript_Vec, ]
Error: subscript out of bounds

Basically I want to get my final matrix in following way:

  V1 V2 V3
a  1  5  9
e NA NA NA
b  2  6 10
c  3  7 11

i.e. if some of the element(s) in 'Subscript_Vec' is not in 'Mat' then
that row would be filled by NA, WITHOUT altering the sequence of
'Subscript_Vec'

Is there any direct way to achieve that?

Thanks and regards,

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Adding elements in data.frame subsets and also subtracting an element from the rest elements in data.frame

2013-04-29 Thread arun


Hi Katherine,
res1-aggregate(cbind(cashflow,cashflows_pv)~instrument+id,data=cashflow_df,sum)
res2-res1[order(res1$instrument),]
 res2$cashflow_change-with(res2,ave(cashflows_pv,instrument,FUN=function(x) 
x-head(x,1)))
names(res2)[3:4]- paste0(total_,names(res2)[3:4])
res2
 #  instrument id total_cashflow total_cashflows_pv cashflow_change
#1 ABC  1 515000  440571.02  0.
#4 ABC  2 515000  441481.62    910.6040
#7 ABC  3 515000  442068.63   1497.6102
#10    ABC  4 515000  441677.15   1106.1318
#13    ABC  5 515000  442133.93   1562.9115
#2 PQR  1 103500   83674.96  0.
#5 PQR  2 103500   84169.91    494.9496
#8 PQR  3 103500   83584.29    -90.6727
#11    PQR  4 103500   84196.09    521.1276
#14    PQR  5 103500   84314.05    639.0890
#3  UVWXYZ  1 816000  689261.86  0.
#6  UVWXYZ  2 816000  691615.51   2353.6500
#9  UVWXYZ  3 816000  687027.05  -2234.8160
#12 UVWXYZ  4 816000  683854.37  -5407.4959
#15 UVWXYZ  5 816000  683959.75  -5302.1153

 A.K.

- Original Message -
From: Katherine Gobin katherine_go...@yahoo.com
To: r-help@r-project.org
Cc: 
Sent: Monday, April 29, 2013 8:24 AM
Subject: [R] Adding elements in data.frame subsets and also subtracting an
element from the rest elements in data.frame

Dear R forum

I have a data.frame as

cashflow_df = data.frame(instrument = 
c(ABC,ABC,ABC,ABC,ABC,ABC,ABC,ABC,ABC,ABC,ABC,ABC,ABC,ABC,
 ABC, PQR, PQR, PQR,PQR,PQR,PQR,PQR,PQR,PQR,PQR, PQR, 
PQR, PQR,PQR, PQR,PQR,PQR,PQR, PQR,PQR,UVWXYZ,UVWXYZ, 
UVWXYZ, UVWXYZ, UVWXYZ,UVWXYZ,UVWXYZ,UVWXYZ, UVWXYZ, UVWXYZ),

id = c(1,1,1,2,2,2,3,3,3,4,4,4,5,5,5,1,1,1,1,2,2,2,2,3,3,3,3,4,4,4,4,5,5,5,5, 
1,1,2,2,3,3,4,4, 5,5),

cashflow = c(5000,5000,505000,5000,5000,505000,5000,5000,505000, 5000,5000, 
505000, 
5000,5000,505000,500,500,500,102000,500,500,500,102000,500,500,500,102000,500,500,500,102000,500,500,500,102000,8000,808000,8000,808000,8000,808000,8000,808000,8000,808000),

cashflows_pv = c(4931.054, 4479.1116, 431160.8529,4931.9604, 4485.6393, 
432064.0228, 
4932.5438,4489.8451,432646.2398,4932.1548,4487.0404,432257.9551,4932.6087,4490.3129,432711.0084,493.6326,474.0524,455.2489,82252.0304,493.8083,474.7543,456.4356,82744.9157,493.6003,473.9235,455.031,82161.7368,493.8175,474.7913,456.4982,82770.9849,493.8592,474.9581,456.7804,82888.4556,7451.3118,681810.5522,7462.0148,684153.4992,7441.1294,679585.9186,7426.6407,676427.7274,7427.1225,676532.6262))

#  __

 cashflow_df
   instrument id cashflow cashflows_pv
1 ABC  1 5000    4931.0540
2 ABC  1 5000    4479.1116
3 ABC  1   505000  431160.8529
4 ABC  2 5000    4931.9604
5 ABC  2 5000    4485.6393
6 ABC  2   505000  432064.0228
7 ABC  3 5000    4932.5438
8 ABC  3 5000    4489.8451
9 ABC  3   505000  432646.2398
10    ABC  4 5000    4932.1548
11    ABC  4 5000    4487.0404
12    ABC  4   505000  432257.9551
13    ABC  5 5000    4932.6087
14    ABC  5 5000    4490.3129
15    ABC  5   505000  432711.0084
16    PQR  1  500 493.6326
17    PQR  1  500 474.0524
18    PQR  1  500 455.2489
19    PQR  1   102000   82252.0304
20    PQR  2  500 493.8083
21    PQR  2  500 474.7543
22    PQR  2  500 456.4356
23    PQR  2   102000   82744.9157
24    PQR  3  500 493.6003
25    PQR  3  500 473.9235
26    PQR  3  500 455.0310
27    PQR  3   102000   82161.7368
28    PQR  4  500 493.8175
29    PQR  4  500 474.7913
30    PQR  4  500 456.4982
31    PQR  4   102000   82770.9849
32    PQR  5  500 493.8592
33    PQR  5  500 474.9581
34    PQR  5  500 456.7804
35    PQR  5   102000   82888.4556
36 UVWXYZ  1 8000    7451.3118
37 UVWXYZ  1   808000  681810.5522
38 UVWXYZ  2 8000    7462.0148
39 UVWXYZ  2   808000  684153.4992
40 UVWXYZ  3 8000    7441.1294
41 UVWXYZ  3   808000  679585.9186
42 UVWXYZ  4 8000    7426.6407
43 UVWXYZ  4   808000  676427.7274
44 UVWXYZ  5 8000    7427.1225
45 UVWXYZ  5   808000  676532.6262

# ===

# My PROBLEM


For a given instrument and id, I need the totals of cashflow and cashflows_pv  
and also the difference of (total_cashflow_pv pertaining to the first ID for 
the given instrument from total_cashflow_pv for the same instrument) as shown 
in the fourth 

Re: [R] Need help on matrix calculation

2013-04-29 Thread Jorge I Velez
Christofer,

The following should get you started:

r - Mat[match(rownames(Mat), Subscript_Vec),]
rownames(r) - Subscript_Vec
r

HTH,
Jorge.-



On Mon, Apr 29, 2013 at 11:38 PM, Christofer Bogaso 
bogaso.christo...@gmail.com wrote:

 Hello again,

 Let say I have 1 matrix:

 Mat - matrix(1:12, 4, 3)
 rownames(Mat) - letters[1:4]

 Now I want to subscript of Mat in following way:

 Subscript_Vec - c(a, e, b, c)

 However when I want to use this vector, I am geting following error:

 Mat[Subscript_Vec, ]
 Error: subscript out of bounds

 Basically I want to get my final matrix in following way:

   V1 V2 V3
 a  1  5  9
 e NA NA NA
 b  2  6 10
 c  3  7 11

 i.e. if some of the element(s) in 'Subscript_Vec' is not in 'Mat' then
 that row would be filled by NA, WITHOUT altering the sequence of
 'Subscript_Vec'

 Is there any direct way to achieve that?

 Thanks and regards,

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Adding elements in data.frame subsets and also subtracting an element from the rest elements in data.frame

2013-04-29 Thread arun
You can also use:
library(plyr)
 
res-mutate(ddply(cashflow_df,.(instrument,id),numcolwise(sum)),cashflow_change=ave(cashflows_pv,instrument,FUN=function(x)
 x-head(x,1)))
 names(res)[3:4]- paste0(total_,names(res)[3:4])
 res
#   instrument id total_cashflow total_cashflows_pv cashflow_change
#1 ABC  1 515000  440571.02  0.
#2 ABC  2 515000  441481.62    910.6040
#3 ABC  3 515000  442068.63   1497.6102
#4 ABC  4 515000  441677.15   1106.1318
#5 ABC  5 515000  442133.93   1562.9115
#6 PQR  1 103500   83674.96  0.
#7 PQR  2 103500   84169.91    494.9496
#8 PQR  3 103500   83584.29    -90.6727
#9 PQR  4 103500   84196.09    521.1276
#10    PQR  5 103500   84314.05    639.0890
#11 UVWXYZ  1 816000  689261.86  0.
#12 UVWXYZ  2 816000  691615.51   2353.6500
#13 UVWXYZ  3 816000  687027.05  -2234.8160
#14 UVWXYZ  4 816000  683854.37  -5407.4959
#15 UVWXYZ  5 816000  683959.75  -5302.1153
A.K.



- Original Message -
From: arun smartpink...@yahoo.com
To: Katherine Gobin katherine_go...@yahoo.com
Cc: R help r-help@r-project.org
Sent: Monday, April 29, 2013 9:43 AM
Subject: Re: [R] Adding elements in data.frame subsets and also subtracting an  
element from the rest elements in data.frame



Hi Katherine,
res1-aggregate(cbind(cashflow,cashflows_pv)~instrument+id,data=cashflow_df,sum)
res2-res1[order(res1$instrument),]
 res2$cashflow_change-with(res2,ave(cashflows_pv,instrument,FUN=function(x) 
x-head(x,1)))
names(res2)[3:4]- paste0(total_,names(res2)[3:4])
res2
 #  instrument id total_cashflow total_cashflows_pv cashflow_change
#1 ABC  1 515000  440571.02  0.
#4 ABC  2 515000  441481.62    910.6040
#7 ABC  3 515000  442068.63   1497.6102
#10    ABC  4 515000  441677.15   1106.1318
#13    ABC  5 515000  442133.93   1562.9115
#2 PQR  1 103500   83674.96  0.
#5 PQR  2 103500   84169.91    494.9496
#8 PQR  3 103500   83584.29    -90.6727
#11    PQR  4 103500   84196.09    521.1276
#14    PQR  5 103500   84314.05    639.0890
#3  UVWXYZ  1 816000  689261.86  0.
#6  UVWXYZ  2 816000  691615.51   2353.6500
#9  UVWXYZ  3 816000  687027.05  -2234.8160
#12 UVWXYZ  4 816000  683854.37  -5407.4959
#15 UVWXYZ  5 816000  683959.75  -5302.1153

 A.K.

- Original Message -
From: Katherine Gobin katherine_go...@yahoo.com
To: r-help@r-project.org
Cc: 
Sent: Monday, April 29, 2013 8:24 AM
Subject: [R] Adding elements in data.frame subsets and also subtracting an
    element from the rest elements in data.frame

Dear R forum

I have a data.frame as

cashflow_df = data.frame(instrument = 
c(ABC,ABC,ABC,ABC,ABC,ABC,ABC,ABC,ABC,ABC,ABC,ABC,ABC,ABC,
 ABC, PQR, PQR, PQR,PQR,PQR,PQR,PQR,PQR,PQR,PQR, PQR, 
PQR, PQR,PQR, PQR,PQR,PQR,PQR, PQR,PQR,UVWXYZ,UVWXYZ, 
UVWXYZ, UVWXYZ, UVWXYZ,UVWXYZ,UVWXYZ,UVWXYZ, UVWXYZ, UVWXYZ),

id = c(1,1,1,2,2,2,3,3,3,4,4,4,5,5,5,1,1,1,1,2,2,2,2,3,3,3,3,4,4,4,4,5,5,5,5, 
1,1,2,2,3,3,4,4, 5,5),

cashflow = c(5000,5000,505000,5000,5000,505000,5000,5000,505000, 5000,5000, 
505000, 
5000,5000,505000,500,500,500,102000,500,500,500,102000,500,500,500,102000,500,500,500,102000,500,500,500,102000,8000,808000,8000,808000,8000,808000,8000,808000,8000,808000),

cashflows_pv = c(4931.054, 4479.1116, 431160.8529,4931.9604, 4485.6393, 
432064.0228, 
4932.5438,4489.8451,432646.2398,4932.1548,4487.0404,432257.9551,4932.6087,4490.3129,432711.0084,493.6326,474.0524,455.2489,82252.0304,493.8083,474.7543,456.4356,82744.9157,493.6003,473.9235,455.031,82161.7368,493.8175,474.7913,456.4982,82770.9849,493.8592,474.9581,456.7804,82888.4556,7451.3118,681810.5522,7462.0148,684153.4992,7441.1294,679585.9186,7426.6407,676427.7274,7427.1225,676532.6262))

#  __

 cashflow_df
   instrument id cashflow cashflows_pv
1 ABC  1 5000    4931.0540
2 ABC  1 5000    4479.1116
3 ABC  1   505000  431160.8529
4 ABC  2 5000    4931.9604
5 ABC  2 5000    4485.6393
6 ABC  2   505000  432064.0228
7 ABC  3 5000    4932.5438
8 ABC  3 5000    4489.8451
9 ABC  3   505000  432646.2398
10    ABC  4 5000    4932.1548
11    ABC  4 5000    4487.0404
12    ABC  4   

Re: [R] Need help on matrix calculation

2013-04-29 Thread arun

r
#  [,1] [,2] [,3]
#a    1    5    9
#e    3    7   11
#b    4    8   12
#c   NA   NA   NA
I guess you meant: 
 r1- Mat[match(Subscript_Vec,rownames(Mat)),]
rownames(r1)- Subscript_Vec
 r1
#  [,1] [,2] [,3]
#a    1    5    9
#e   NA   NA   NA
#b    2    6   10
#c    3    7   11
A.K.



- Original Message -
From: Jorge I Velez jorgeivanve...@gmail.com
To: Christofer Bogaso bogaso.christo...@gmail.com
Cc: r-help r-help@r-project.org
Sent: Monday, April 29, 2013 9:45 AM
Subject: Re: [R] Need help on matrix calculation

Christofer,

The following should get you started:

r - Mat[match(rownames(Mat), Subscript_Vec),]
rownames(r) - Subscript_Vec
r

HTH,
Jorge.-



On Mon, Apr 29, 2013 at 11:38 PM, Christofer Bogaso 
bogaso.christo...@gmail.com wrote:

 Hello again,

 Let say I have 1 matrix:

 Mat - matrix(1:12, 4, 3)
 rownames(Mat) - letters[1:4]

 Now I want to subscript of Mat in following way:

 Subscript_Vec - c(a, e, b, c)

 However when I want to use this vector, I am geting following error:

 Mat[Subscript_Vec, ]
 Error: subscript out of bounds

 Basically I want to get my final matrix in following way:

   V1 V2 V3
 a  1  5  9
 e NA NA NA
 b  2  6 10
 c  3  7 11

 i.e. if some of the element(s) in 'Subscript_Vec' is not in 'Mat' then
 that row would be filled by NA, WITHOUT altering the sequence of
 'Subscript_Vec'

 Is there any direct way to achieve that?

 Thanks and regards,

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


    [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] all.vars for nested expressions

2013-04-29 Thread Eik Vettorazzi
Hi Felix,
I thought, this could be an easy task for substitute, and the following
works as expected:

all.vars(substitute(expression(tp/a),list(a=expression(fn+tp
# [1] tp fn

But (of course)
all.vars(substitute(sen,list(a=a)))
does not yield the desired result, and I can't figure out, how to set up
as.name, bquote, eval, deparse etc to do the task properly.

Instead, my approach is a recursive call to all.vars

xall.help-function(x){
  #check if there is an object with name x
  if(exists(x)) lapply(all.vars(get(x)),xall.help) else x}

xall.vars-function(x){
   if (!is.character(x)) x-paste(substitute(x))
   #for convenience put in a single vecotr
   #xall.help returns a 'parsed tree'
   unique(unlist(xall.help(x)))
  }

#example
fn-expression(n1+n2)
a - expression(fn+tp)
sen - expression(tp/a)

xall.vars(sen)
# [1] tp n1 n2

cheers.

Am 29.04.2013 13:33, schrieb flxms:
 Dear R fellows,
  
 Assume I define
 
 a - expression(fn+tp)
 sen - expression(tp/a)
 
 Now I'd like to know, which variables are necessary for calculating sen
 
 all.vars(sen)
 
 This results in a vector c(tp,a). But I'd like all.vars to evaluate the
 sen-object down to the ground level, which would result in a vector
 c(tp,fn) (because a was defined as fn+tp). In other words, I'd like
 all.vars to expand the a-object (and all other downstream objects). I am
 looking for a solution, that works with much more levels. This is just a
 very simple example.
 
 I'd appreciate any suggestions how to do that very much!
 Thanks in advance,
 
 Felix
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 


-- 
Eik Vettorazzi
Institut für Medizinische Biometrie und Epidemiologie
Universitätsklinikum Hamburg-Eppendorf

Martinistr. 52
20246 Hamburg

T ++49/40/7410-58243
F ++49/40/7410-57790
--
Pflichtangaben gemäß Gesetz über elektronische Handelsregister und 
Genossenschaftsregister sowie das Unternehmensregister (EHUG):

Universitätsklinikum Hamburg-Eppendorf; Körperschaft des öffentlichen Rechts; 
Gerichtsstand: Hamburg

Vorstandsmitglieder: Prof. Dr. Martin Zeitz (Vorsitzender), Dr. Alexander 
Kirstein, Joachim Prölß, Prof. Dr. Dr. Uwe Koch-Gromus

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] rbinding some elements from a list and obtain another list

2013-04-29 Thread De Castro Pascual, Montserrat
Hi everybody,



I have a list, where every element of this list is a data frame.



An example:



Mylist-list(A=data.frame, B=data.frame, C=data.frame, D=data.frame)



I want to rbind some elements of this list.

As an example:



Output-list(AB=data.frame, CD=data.frame)



Where

AB=rbind(A,B)

CD=rbind(C,D)





I’ve tried:



f-function(x){

  for (i in seq(1,length(names(x)),2)){

aa-do.call(rbind,x[i:i+1])

aa

  }}

bb-f(mylist)



or



f-function(x){

  for (i in seq(1,length(names(x)),2)){

aa[i]-do.call(rbind,x[i:i+1])

list(aa[i])

}}

bb-f (mylist)



but it doesn’t works



f-function(x){

+   for (i in seq(1,length(names(x)),2)){

+ aa-do.call(rbind,x[i:i+1])

+ aa

+   }}

 bb-f(mylist)

 bb

NULL

 f-function(x){

+   for (i in seq(1,length(names(x)),2)){

+ aa-do.call(rbind,x[i:i+1])

+ aa

+   }}

 bb-f(mylist)



 f-function(x){

+   for (i in seq(1,length(names(x)),2)){

+ aa[i]-do.call(rbind,x[i:i+1])

+ list(aa[i])

+   }}

 bb-f(mylist)

Mensajes de aviso perdidos

1: In aa[i] - do.call(rbind, x[i:i + 1]) :

  número de items para para sustituir no es un múltiplo de la longitud del
reemplazo

2: In aa[i] - do.call(rbind, x[i:i + 1]) :

  número de items para para sustituir no es un múltiplo de la longitud del
reemplazo

3: In aa[i] - do.call(rbind, x[i:i + 1]) :

  número de items para para sustituir no es un múltiplo de la longitud del
reemplazo

4: In aa[i] - do.call(rbind, x[i:i + 1]) :

  número de items para para sustituir no es un múltiplo de la longitud del
reemplazo

5: In aa[i] - do.call(rbind, x[i:i + 1]) :

  número de items para para sustituir no es un múltiplo de la longitud del
reemplazo

6: In aa[i] - do.call(rbind, x[i:i + 1]) :

  número de items para para sustituir no es un múltiplo de la longitud del
reemplazo





Thanks!



Montserrat




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] prcomp( and cmdscale( not equivalent?

2013-04-29 Thread David Carlson
I may not understand completely, but it seems you have a 45x45 distance
matrix of stimuli and you want to use to determine which stimuli are
similar. Wouldn't hierarchical clustering be a more straightforward
approach?

?hclust

-
David L Carlson
Associate Professor of Anthropology
Texas AM University
College Station, TX 77840-4352

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On
Behalf Of Bob Wiley
Sent: Friday, April 26, 2013 4:33 PM
To: r-help@r-project.org
Subject: [R] prcomp( and cmdscale( not equivalent?

Hello,

I have a dilemma that I'm hoping the R gurus will be able to help resolve.
For background:
My data is in the form of a (dis)similarity matrix created from taking the
inverse of normalized reaction times. That is, each cell of the matrix
represents how long it took to distinguish two stimuli from one another-- a
square matrix of 45X45 where the diagonal values are all zero (since this
represents two identical stimuli).

I have been using cmdscale with this matrix as the input--  So:

X = cmdscale(mydata,k=44,add=FALSE,eig=TRUE)$points returns a 45x34 matrix
because only 34 of the eigenvalues  0

I then run prcomp on the (transposition of) this matrix:
prcomp(t(X),scale.=TRUE)

The goal is to take the original matrix of inverse reaction times and
transform that data such that we have PCs that show how stimuli are grouping
together-- high absolute value loadings/coordinates on a given dimension
should reflect how similar the stimuli are to one another.

My concern is that I'm not fully understanding the mathematics behind
cmdscale( and prcomp(, and that I may just be losing a lot of information or
introducting noise? Or is my approach theoretically sound... I've read a TON
on this now but I can't see exactly what R is doing with these two
functions.
thank you!

-bob
JHU





Robert (Bob) Wiley

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Function for Data Frame

2013-04-29 Thread Sparks, John James
Dear R Helpers,

I have about 20 data frames that I need to do a series of data scrubbing
steps to.  I have the list of data frames in a list so that I can use
lapply.  I am trying to build a function that will do the data scrubbing
that I need.  However, I am new to functions and there is something
fundamental that I am not understanding.  I use the return function at the
end of the function and this completes the data processing specified in
the function, but leaves the data frame that I want changed unaffected. 
How do I get my function to apply its results to the data frame in
question instead of simply displaying the results to the screen?

Any helpful guidance would be most appreciated.

--John Sparks


x=as.data.frame(matrix(c(1,2,3,
1,2,3,
1,2,2,
1,2,2,
   1,1,1),ncol=3,byrow=T))


myfunc-function(DF){
 DF-subset(DF,select=-c(V1))
 return(DF)
}

myfunc(x)

#How to get this change to data frame x?
#And preferrably not send the results to the screen?
x

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Need help on matrix calculation

2013-04-29 Thread Jorge I Velez
Sorry, the first line should have been

Mat[match( Subscript_Vec, rownames(Mat)),]

and the rest remains the same.

Best,
Jorge.-


On Mon, Apr 29, 2013 at 11:45 PM, Jorge I Velez jorgeivanve...@gmail.comwrote:

 Christofer,

 The following should get you started:

 r - Mat[match(rownames(Mat), Subscript_Vec),]
 rownames(r) - Subscript_Vec
 r

 HTH,
 Jorge.-



 On Mon, Apr 29, 2013 at 11:38 PM, Christofer Bogaso 
 bogaso.christo...@gmail.com wrote:

 Hello again,

 Let say I have 1 matrix:

 Mat - matrix(1:12, 4, 3)
 rownames(Mat) - letters[1:4]

 Now I want to subscript of Mat in following way:

 Subscript_Vec - c(a, e, b, c)

 However when I want to use this vector, I am geting following error:

 Mat[Subscript_Vec, ]
 Error: subscript out of bounds

 Basically I want to get my final matrix in following way:

   V1 V2 V3
 a  1  5  9
 e NA NA NA
 b  2  6 10
 c  3  7 11

 i.e. if some of the element(s) in 'Subscript_Vec' is not in 'Mat' then
 that row would be filled by NA, WITHOUT altering the sequence of
 'Subscript_Vec'

 Is there any direct way to achieve that?

 Thanks and regards,

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] all.vars for nested expressions

2013-04-29 Thread William Dunlap
Try poking around in the codetools package.  E.g., you can do things like the 
following

   expr1 - quote(a - fn + tp) # put 'a' in the expression
   expr2 - quote( tp / a + fn)
   expr12 - call({, expr1, expr2)
   expr12
   # {
   #a - fn + tp
   #tp/a + fn
   # }
   library(codetools)
   findLocals(expr12) # from codetools
   # [1] a
   setdiff(all.vars(expr12), findLocals(expr12))
   # [1] fn tp

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com


 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
 Behalf
 Of flxms
 Sent: Monday, April 29, 2013 4:34 AM
 To: r-help@r-project.org
 Subject: [R] all.vars for nested expressions
 
 Dear R fellows,
 
 Assume I define
 
 a - expression(fn+tp)
 sen - expression(tp/a)
 
 Now I'd like to know, which variables are necessary for calculating sen
 
 all.vars(sen)
 
 This results in a vector c(tp,a). But I'd like all.vars to evaluate the
 sen-object down to the ground level, which would result in a vector
 c(tp,fn) (because a was defined as fn+tp). In other words, I'd like
 all.vars to expand the a-object (and all other downstream objects). I am
 looking for a solution, that works with much more levels. This is just a
 very simple example.
 
 I'd appreciate any suggestions how to do that very much!
 Thanks in advance,
 
 Felix
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] expanding a presence only dataset into presence/absence

2013-04-29 Thread Matthew Venesky
Hello,

I'm working with a very large dataset (250,000+ lines in its' current form)
that includes presence only data on various species (which is nested within
different sites and sampling dates). I need to convert this into a dataset
with presence/absence for each species. For example, I would like to expand
My current data to Desired data:

My current data

Species Site Date
a 1 1
b 1 1
b 1 2
c 1 3

Desired data

Species Present Site Date
a 1 1 1
b 1 1 1
c 0 1 1
a 0 2 2
b 1 2 2
C 0 2 2
a 0 3 3
b 0 3 3
c 1 3 3

I've scoured the web, including Rseek and haven't found a resolution (and
note that a similar question was asked sometime in 2011 without an answer).
Does anyone have any thoughts? Thank you in advance.

--

Matthew D. Venesky, Ph.D.

Postdoctoral Research Associate,
Department of Integrative Biology,
The University of South Florida,
Tampa, FL 33620

Website: http://mvenesky.myweb.usf.edu/

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Function for Data Frame

2013-04-29 Thread arun
Hi,
If it is for the list:
lst1- list(x,x,x)
 lst1-lapply(lst1,myfunc)



- Original Message -
From: arun smartpink...@yahoo.com
To: Sparks, John James jspa...@uic.edu
Cc: R help r-help@r-project.org
Sent: Monday, April 29, 2013 12:13 PM
Subject: Re: [R] Function for Data Frame



Hi,
If I understand it correctly, 
x-myfunc(x)
x
#  V2 V3
#1  2  3
#2  2  3
#3  2  2
#4  2  2
#5  1  1
A.K.

- Original Message -
From: Sparks, John James jspa...@uic.edu
To: r-help@r-project.org
Cc: 
Sent: Monday, April 29, 2013 10:23 AM
Subject: [R] Function for Data Frame

Dear R Helpers,

I have about 20 data frames that I need to do a series of data scrubbing
steps to.  I have the list of data frames in a list so that I can use
lapply.  I am trying to build a function that will do the data scrubbing
that I need.  However, I am new to functions and there is something
fundamental that I am not understanding.  I use the return function at the
end of the function and this completes the data processing specified in
the function, but leaves the data frame that I want changed unaffected. 
How do I get my function to apply its results to the data frame in
question instead of simply displaying the results to the screen?

Any helpful guidance would be most appreciated.

--John Sparks


x=as.data.frame(matrix(c(1,2,3,
        1,2,3,
        1,2,2,
        1,2,2,
       1,1,1),ncol=3,byrow=T))


myfunc-function(DF){
DF-subset(DF,select=-c(V1))
return(DF)
}

myfunc(x)

#How to get this change to data frame x?
#And preferrably not send the results to the screen?
x

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to add new rows in a dataframe?

2013-04-29 Thread arun
Hi,
dat1- read.table(text=
id    t scores
2 0    1.2
2 2 2.3
2 3    3.6
2 4    5.6
2 6    7.8
3 0    1.6
3 1 1.2
3 4 1.5 
,sep=,header=TRUE)
library(zoo)
res1-do.call(rbind,lapply(split(dat1,dat1$id),function(x) 
{t1-seq(min(x$t),max(x$t));scores1-na.locf(x$scores[match(t1,x$t)]);data.frame(id=rep(unique(x$id),length(t1)),t1,scores1)}))
 row.names(res1)- 1:nrow(res1)

 res1
#   id t1 scores1
#1   2  0 1.2
#2   2  1 1.2
#3   2  2 2.3
#4   2  3 3.6
#5   2  4 5.6
#6   2  5 5.6
#7   2  6 7.8
#8   3  0 1.6
#9   3  1 1.2
#10  3  2 1.2
#11  3  3 1.2
#12  3  4 1.5
libray(plyr)
 dat2-ddply(dat1,.(id),summarize,t=seq(min(t),max(t)))
res2-mutate(join(dat2,dat1,type=full),scores=na.locf(scores))
identical(res1,res2)
#[1] TRUE
 res2
#   id t scores
#1   2 0    1.2
#2   2 1    1.2
#3   2 2    2.3
#4   2 3    3.6
#5   2 4    5.6
#6   2 5    5.6
#7   2 6    7.8
#8   3 0    1.6
#9   3 1    1.2
#10  3 2    1.2
#11  3 3    1.2
#12  3 4    1.5

A.K.


Hello , dear  experts, 
I have my data like this: 

id                t                     scores 
2                 0                        1.2 
2                 2                         2.3 
2                 3                        3.6 
2                 4                        5.6 
2                 6                        7.8 
3                 0                        1.6 
3                 1                         1.2 
3                 4                         1.5 

I want to fullifill the t, so i want to add the rows with the data of (t-1) 

just get another dataframe like this: 

id              t                 scores 
2               0                 1.2 
2               1                  1.2 
2                2                         2.3 
2                3                        3.6 
2               4                        5.6             
2               5                         5.6 
2                6                         7.8 
3                 0                        1.6 
3                 1                         1.2 
3                2                         1.2 
3                4                        1.5 

How can i get the result like this? In reality, i have 4000 obervations, so 
it's difficult to add the lines manuelly. 

Thank you so much.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] rbinding some elements from a list and obtain another list

2013-04-29 Thread David Winsemius

On Apr 29, 2013, at 6:54 AM, De Castro Pascual, Montserrat wrote:

 Hi everybody,
 
 
 
 I have a list, where every element of this list is a data frame.
 
 
 
 An example:
 
 
 
 Mylist-list(A=data.frame, B=data.frame, C=data.frame, D=data.frame)

I'm looking at this apparently malformed command and wondering if that is the 
root of all your problems. Do you know how to make a simple successful example 
of a list of dataframes?

-- 
David.

 I want to rbind some elements of this list.
 
 As an example:
 
 
 
 Output-list(AB=data.frame, CD=data.frame)
 
 
 
 Where
 
 AB=rbind(A,B)
 
 CD=rbind(C,D)
 
 
 
 
 
 I’ve tried:
 
 
 
 f-function(x){
 
  for (i in seq(1,length(names(x)),2)){
 
aa-do.call(rbind,x[i:i+1])
 
aa
 
  }}
 
 bb-f(mylist)
 
 
 
 or
 
 
 
 f-function(x){
 
  for (i in seq(1,length(names(x)),2)){
 
aa[i]-do.call(rbind,x[i:i+1])
 
list(aa[i])
 
}}
 
 bb-f (mylist)
 
 
 
 but it doesn’t works
 
 
 
 f-function(x){
 
 +   for (i in seq(1,length(names(x)),2)){
 
 + aa-do.call(rbind,x[i:i+1])
 
 + aa
 
 +   }}
 
 bb-f(mylist)
 
 bb
 
 NULL
 
 f-function(x){
 
 +   for (i in seq(1,length(names(x)),2)){
 
 + aa-do.call(rbind,x[i:i+1])
 
 + aa
 
 +   }}
 
 bb-f(mylist)
 
 
 
 f-function(x){
 
 +   for (i in seq(1,length(names(x)),2)){
 
 + aa[i]-do.call(rbind,x[i:i+1])
 
 + list(aa[i])
 
 +   }}
 
 bb-f(mylist)
 
 Mensajes de aviso perdidos
 
 1: In aa[i] - do.call(rbind, x[i:i + 1]) :
 
  número de items para para sustituir no es un múltiplo de la longitud del
 reemplazo
 
 2: In aa[i] - do.call(rbind, x[i:i + 1]) :
 
  número de items para para sustituir no es un múltiplo de la longitud del
 reemplazo
 
 3: In aa[i] - do.call(rbind, x[i:i + 1]) :
 
  número de items para para sustituir no es un múltiplo de la longitud del
 reemplazo
 
 4: In aa[i] - do.call(rbind, x[i:i + 1]) :
 
  número de items para para sustituir no es un múltiplo de la longitud del
 reemplazo
 
 5: In aa[i] - do.call(rbind, x[i:i + 1]) :
 
  número de items para para sustituir no es un múltiplo de la longitud del
 reemplazo
 
 6: In aa[i] - do.call(rbind, x[i:i + 1]) :
 
  número de items para para sustituir no es un múltiplo de la longitud del
 reemplazo
 
 
 
 
 
 Thanks!
 
 
 
 Montserrat
 
 
 
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

David Winsemius
Alameda, CA, USA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Stratified Random Sampling Proportional to Size

2013-04-29 Thread William Dunlap
This problem in sampling::strata() comes from calling cbind on a zero-row 
data.frame
with a scalar number.

   library(sampling)
   strata(mtcars[,c(mpg,hp,gear)], strat=gear, size=c(5,5,0))
  Error in data.frame(..., check.names = FALSE) :
arguments imply differing number of rows: 0, 1
  In addition: Warning message:
  In strata(mtcars[, c(mpg, hp, gear)], strat = gear, size = c(5,  :
the method is not specified; by default, the method is srswor
   traceback()
  5: stop(arguments imply differing number of rows: , paste(unique(nrows),
 collapse = , ))
  4: data.frame(..., check.names = FALSE)
  3: cbind(deparse.level, ...)
  2: cbind(r, i)
  1: strata(mtcars[, c(mpg, hp, gear)], strat = gear, size = c(5,
 5, 0))

Changing that cbind call from cbind(r, i) to cbind(r, rep(i, 
length.out=nrow(r)))
would fix it up.

cbind is not entirely consistent with what it does with a 0-row rectangular 
input
and a scalar.

With a matrix you get a 0-row result and a warning
   m - matrix(numeric(), nrow=0, ncol=3, 
dimnames=list(NULL,paste0(Col,1:3)))
   str(cbind(m, 666))
   num[0 , 1:4] 
   - attr(*, dimnames)=List of 2
..$ : NULL
..$ : chr [1:4] Col1 Col2 Col3 
  Warning message:
  In cbind(m, 666) :
number of rows of result is not a multiple of vector length (arg 2)

With a data.frame you get an error
   str(cbind(data.frame(m), 666))
  Error in data.frame(..., check.names = FALSE) : 
arguments imply differing number of rows: 0, 1

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com


 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
 Behalf
 Of Thomas Lumley
 Sent: Sunday, April 28, 2013 1:31 PM
 To: Jeff Newmiller
 Cc: R help (r-help@r-project.org)
 Subject: Re: [R] Stratified Random Sampling Proportional to Size
 
 It looks as though you can't sample zero observations from a stratum.  If
 you take the example on the help page and change one of the sample sizes to
 zero you get exactly the same error.
 
 From the fact that there isn't a more explicit error message, I would guess
 that the author just never considered the possibility that someone would
 have a population stratum and not sample from it.
 
 -thomas
 
 
 On Sun, Apr 28, 2013 at 7:14 PM, Jeff Newmiller 
 jdnew...@dcn.davis.ca.uswrote:
 
  a) Please post plain text
 
  b) Please make reproducible examples (e.g. telling us how you accessed a
  database that we have no access to is not helpful). See ?head, ?dput and [1]
 
  c) I don't know anything about the sampling package or the strata
  function, but I would recommend eliminating the rows that have zeros from
  the input data. E.g.:
 
  stratum_cp - stratum_cp[ 0stratum_cp$stratp, ]
 
  [1] http://stackoverflow.com/**questions/5963269/how-to-make-**
  a-great-r-reproducible-examplehttp://stackoverflow.com/questions/5963269/how-
 to-make-a-great-r-reproducible-example
 
  On Fri, 26 Apr 2013, Lopez, Dan wrote:
 
   Hello R Experts,
 
  I kindly request your assistance on figuring out how to get a stratified
  random sampling proportional to 100.
 
  Below is my r code showing what I did and the error I'm getting with
  sampling::strata
 
  # FIRST I summarized count of records by the two variables I want to use
  as strata
 
  Library(RODBC)
  library(sqldf)
  library(sampling)
  #After establishing connection I query the data and sort it by strata
  APPT_TYP_CD_LL and EMPL_TYPE and store it in a dataframe
  CURRPOP-sqlQuery(ch,SELECT APPT_TYP_CD_LL,
 EMPL_TYPE,ASOFDATE,EMPLID,**
  NAME,DEPTID,JOBCODE,JOBTITLE,**SAL_ADMIN_PLAN,RET_TYP_CD_LL FROM
  PS_EMPLOYEES_LL WHERE EMPL_STATUS NOT IN('R','T') ORDER BY
 APPT_TYP_CD_LL,
  EMPL_TYPE)
  #ROWID is a dummy ID I added and repositioned after the strat columns for
  later use
  CURRPOP$ROWID-seq(nrow(**CURRPOP))
  CURRPOP-CURRPOP[,c(1:2,11,3:**10)]
 
  # My strata.  Stratp is how many I want to sampled from each strata. NOTE
  THERE ARE SOME 0's which just means I won't sample from that group.
  stratum_cp-sqldf(SELECT APPT_TYP_CD_LL,EMPL_TYPE, count(*) HC FROM
  CURRPOP GROUP BY APPT_TYP_CD_LL,EMPL_TYPE)
  stratum_cp$stratp-round(**stratum_cp$HC/nrow(CURRPOP)***100)
 
   stratum_cp
 
APPT_TYP_CD_LL EMPL_TYPE   HC stratp
  1  FA S1  0
  2  FC S5  0
  3  FP S  173  3
  4  FR H  170  3
  5  FX H   49  1
  6  FX S   57  1
  7  IN H 1589 25
  8  IN S 3987 63
  9  IP H7  0
  10 IP S   53  1
  11 SA H8  0
  12 SE S   43  1
  13 SF H   14  0
  14 SF S1  0
  15 SG S   10  0
  16 ST H  107  2
  17 ST S6  0
 
  #THEN I attempted to use 

[R] plspm error: singular matrix 'a' in 'solve'

2013-04-29 Thread Mitch Hunter
Hello,

I am running a simple plspm for a class project due later today and I am
receiving the following error despite following along exactly with Gaston
Sanchez's directions in PLS Path Modeling with R:

Error in solve.qr(qr(X.blok), Z[, j]) : singular matrix 'a' in 'solve'

I would greatly appreciate any help resolving this matter.  I got the same
error after changing the inner model matrix to be the same as the model
Sanchez uses in his first example.  The package seems to be working because
innerplot() has worked.  My code is below.

Thanks very much!
-Mitch Hunter


Early = c(0, 0, 0)
Late = c(0, 0, 0)
Weediness = c(1, 1, 0)
wd.inner = rbind(Early, Late, Weediness)
colnames(wd.inner) = rownames(wd.inner)

innerplot(wd.inner, box.size = 0.1)

wd.outer = list(2:5,6:9,10)

wd.modes = c(B, B, A)

wd.pls1 = plspm(md, wd.inner, wd.outer, wd.modes)

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Function for Data Frame

2013-04-29 Thread arun


Hi,
If I understand it correctly, 
x-myfunc(x)
x
#  V2 V3
#1  2  3
#2  2  3
#3  2  2
#4  2  2
#5  1  1
A.K.

- Original Message -
From: Sparks, John James jspa...@uic.edu
To: r-help@r-project.org
Cc: 
Sent: Monday, April 29, 2013 10:23 AM
Subject: [R] Function for Data Frame

Dear R Helpers,

I have about 20 data frames that I need to do a series of data scrubbing
steps to.  I have the list of data frames in a list so that I can use
lapply.  I am trying to build a function that will do the data scrubbing
that I need.  However, I am new to functions and there is something
fundamental that I am not understanding.  I use the return function at the
end of the function and this completes the data processing specified in
the function, but leaves the data frame that I want changed unaffected. 
How do I get my function to apply its results to the data frame in
question instead of simply displaying the results to the screen?

Any helpful guidance would be most appreciated.

--John Sparks


x=as.data.frame(matrix(c(1,2,3,
        1,2,3,
        1,2,2,
        1,2,2,
       1,1,1),ncol=3,byrow=T))


myfunc-function(DF){
DF-subset(DF,select=-c(V1))
return(DF)
}

myfunc(x)

#How to get this change to data frame x?
#And preferrably not send the results to the screen?
x

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] expanding a presence only dataset into presence/absence

2013-04-29 Thread Daniel Nordlund
 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
 On Behalf Of Matthew Venesky
 Sent: Monday, April 29, 2013 8:13 AM
 To: r-help@r-project.org
 Subject: [R] expanding a presence only dataset into presence/absence
 
 Hello,
 
 I'm working with a very large dataset (250,000+ lines in its' current
 form)
 that includes presence only data on various species (which is nested
 within
 different sites and sampling dates). I need to convert this into a dataset
 with presence/absence for each species. For example, I would like to
 expand
 My current data to Desired data:
 
 My current data
 
 Species Site Date
 a 1 1
 b 1 1
 b 1 2
 c 1 3
 
 Desired data
 
 Species Present Site Date
 a 1 1 1
 b 1 1 1
 c 0 1 1
 a 0 2 2
 b 1 2 2
 C 0 2 2
 a 0 3 3
 b 0 3 3
 c 1 3 3
 
 I've scoured the web, including Rseek and haven't found a resolution (and
 note that a similar question was asked sometime in 2011 without an
 answer).
 Does anyone have any thoughts? Thank you in advance.
 

Matthew,

You need to clarify your requirements before anyone can help you.  Your 
presence-only data only contains one site, but your desired data has three.  
How are we to know how many sites there are?  Also, your presence-only data has 
species c present at site 1 on date 3, but it is not present in your desired 
data.  It is not at all clear (nor is it deducible) how you get from your 
example data to your desired data.  If you clarify your requirements, maybe 
someone will be able to help.

Dan 

Daniel Nordlund
Bothell, WA USA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Function for Data Frame

2013-04-29 Thread Ben Tupper
Hi,

On Apr 29, 2013, at 10:23 AM, Sparks, John James wrote:

 Dear R Helpers,
 
 I have about 20 data frames that I need to do a series of data scrubbing
 steps to.  I have the list of data frames in a list so that I can use
 lapply.  I am trying to build a function that will do the data scrubbing
 that I need.  However, I am new to functions and there is something
 fundamental that I am not understanding.  I use the return function at the
 end of the function and this completes the data processing specified in
 the function, but leaves the data frame that I want changed unaffected. 
 How do I get my function to apply its results to the data frame in
 question instead of simply displaying the results to the screen?
 
 Any helpful guidance would be most appreciated.
 
 --John Sparks
 
 
 x=as.data.frame(matrix(c(1,2,3,
1,2,3,
1,2,2,
1,2,2,
   1,1,1),ncol=3,byrow=T))
 
 
 myfunc-function(DF){
 DF-subset(DF,select=-c(V1))
 return(DF)
 }
 
 myfunc(x)
 
 #How to get this change to data frame x?
 #And preferrably not send the results to the screen?
 x
 

Good question!  In your example, x is passed into myfunc by value (a copy of 
the value of x) rather than by reference (like passing in the social security 
number of x).  So your scrubbing within the function is done on a copy of x, 
which you call DF. To update the value of x outside of your function, you have 
to assign the returned value of myfunc to x

x - myfunc(x)

See more at ... 
http://cran.r-project.org/doc/manuals/r-release/R-intro.html#Writing-your-own-functions

Cheers,
Ben





 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

Ben Tupper
Bigelow Laboratory for Ocean Sciences
60 Bigelow Drive, P.O. Box 380
East Boothbay, Maine 04544
http://www.bigelow.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] rbinding some elements from a list and obtain another list

2013-04-29 Thread arun
Hi,
Try this:
set.seed(24)
 lst1-lapply(1:4,function(x) 
as.data.frame(matrix(sample(1:20,20,replace=TRUE),ncol=5)))
 names(lst1)- LETTERS[1:4]

res-lapply(list(c(A,B),c(C,D)), function(x) do.call(rbind,lst1[x]))
 res
#[[1]]
 #   V1 V2 V3 V4 V5
#A.1  6 14 17 14  4
#A.2  5 19  6 14  1
#A.3 15  6 13  7 11
#A.4 11 16  8 19  3
#B.1  2  5 13  8 15
#B.2 12 14  1  3 13
#B.3 15  2  7 19 14
#B.4  3 12  5  5 20
#
#[[2]]
 #   V1 V2 V3 V4 V5
#C.1 10  1  6 10 10
#C.2  8  2  7 15  6
#C.3  6  8 10 11  4
#C.4  5  8 18 20  3
#D.1 10 15 15  1 12
#D.2  5  7 10 20 17
#D.3  6 19  3 13  1
#D.4  3 20  5  7 15
A.K.





- Original Message -
From: De Castro Pascual, Montserrat mdecas...@creal.cat
To: r-help@r-project.org
Cc: 
Sent: Monday, April 29, 2013 9:54 AM
Subject: [R] rbinding some elements from a list and obtain another list

Hi everybody,



I have a list, where every element of this list is a data frame.



An example:



Mylist-list(A=data.frame, B=data.frame, C=data.frame, D=data.frame)



I want to rbind some elements of this list.

As an example:



Output-list(AB=data.frame, CD=data.frame)



Where

AB=rbind(A,B)

CD=rbind(C,D)





I’ve tried:



f-function(x){

  for (i in seq(1,length(names(x)),2)){

    aa-do.call(rbind,x[i:i+1])

    aa

  }}

bb-f(mylist)



or



f-function(x){

  for (i in seq(1,length(names(x)),2)){

    aa[i]-do.call(rbind,x[i:i+1])

    list(aa[i])

    }}

bb-f (mylist)



but it doesn’t works



f-function(x){

+   for (i in seq(1,length(names(x)),2)){

+     aa-do.call(rbind,x[i:i+1])

+     aa

+   }}

 bb-f(mylist)

 bb

NULL

 f-function(x){

+   for (i in seq(1,length(names(x)),2)){

+     aa-do.call(rbind,x[i:i+1])

+     aa

+   }}

 bb-f(mylist)



 f-function(x){

+   for (i in seq(1,length(names(x)),2)){

+     aa[i]-do.call(rbind,x[i:i+1])

+     list(aa[i])

+   }}

 bb-f(mylist)

Mensajes de aviso perdidos

1: In aa[i] - do.call(rbind, x[i:i + 1]) :

  número de items para para sustituir no es un múltiplo de la longitud del
reemplazo

2: In aa[i] - do.call(rbind, x[i:i + 1]) :

  número de items para para sustituir no es un múltiplo de la longitud del
reemplazo

3: In aa[i] - do.call(rbind, x[i:i + 1]) :

  número de items para para sustituir no es un múltiplo de la longitud del
reemplazo

4: In aa[i] - do.call(rbind, x[i:i + 1]) :

  número de items para para sustituir no es un múltiplo de la longitud del
reemplazo

5: In aa[i] - do.call(rbind, x[i:i + 1]) :

  número de items para para sustituir no es un múltiplo de la longitud del
reemplazo

6: In aa[i] - do.call(rbind, x[i:i + 1]) :

  número de items para para sustituir no es un múltiplo de la longitud del
reemplazo





Thanks!



Montserrat




    [[alternative HTML version deleted]]


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] expanding a presence only dataset into presence/absence

2013-04-29 Thread arun
Hi,

Your output dataset is bit confusing as it contains Sites that were not in the 
input.
Using your input dataset, I am getting this:


dat1- read.table(text=
Species Site Date
a 1 1
b 1 1
b 1 2
c 1 3
,sep=,header=TRUE,stringsAsFactors=FALSE)
dat1$Present- 1
dat2-expand.grid(unique(dat1$Species),unique(dat1$Site),unique(dat1$Date))
 colnames(dat2)- colnames(dat1)
res-merge(dat1,dat2,by=c(Species,Site,Date),all=TRUE)
res[is.na(res)]- 0
 res-res[order(res$Date),]
 res
#  Species Site Date Present
#1   a    1    1   1
#4   b    1    1   1
#7   c    1    1   0
#2   a    1    2   0
#5   b    1    2   1
#8   c    1    2   0
#3   a    1    3   0
#6   b    1    3   0
#9   c    1    3   1
A.K.





- Original Message -
From: Matthew Venesky mvene...@gmail.com
To: r-help@r-project.org
Cc: 
Sent: Monday, April 29, 2013 11:12 AM
Subject: [R] expanding a presence only dataset into presence/absence

Hello,

I'm working with a very large dataset (250,000+ lines in its' current form)
that includes presence only data on various species (which is nested within
different sites and sampling dates). I need to convert this into a dataset
with presence/absence for each species. For example, I would like to expand
My current data to Desired data:

My current data

Species Site Date
a 1 1
b 1 1
b 1 2
c 1 3

Desired data

Species Present Site Date
a 1 1 1
b 1 1 1
c 0 1 1
a 0 2 2
b 1 2 2
C 0 2 2
a 0 3 3
b 0 3 3
c 1 3 3

I've scoured the web, including Rseek and haven't found a resolution (and
note that a similar question was asked sometime in 2011 without an answer).
Does anyone have any thoughts? Thank you in advance.

--

Matthew D. Venesky, Ph.D.

Postdoctoral Research Associate,
Department of Integrative Biology,
The University of South Florida,
Tampa, FL 33620

Website: http://mvenesky.myweb.usf.edu/

    [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] lavaan and semTools warning message

2013-04-29 Thread Duarte Viana
Hello all,

I am running a simple path analysis with the function sem.mi (of semTools)
after doing multiple imputation in my (missing) data. However, depending on
the option to combine the chi-square, I get the following warning messages:

Warning messages:
1: In estimateVCOV(lavaanModel, samplestats = lavaanSampleStats,  ... :
  lavaan WARNING: could not compute standard errors!

2: In pchisq(chisq, df) : NaNs produced
3: In estimateVCOV(lavaanModel, samplestats = lavaanSampleStats,  ... :
  lavaan WARNING: could not compute standard errors!

4: In pchisq(chisq, df) : NaNs produced
5: In estimateVCOV(lavaanModel, samplestats = lavaanSampleStats,  ... :
  lavaan WARNING: could not compute standard errors!

and so forth.

The options chi=mr and mplus result in these warning messages, but
options chi=lmrr and none run fine (no warning messages). Even when I
get these warning messages, all estimates (including se and chi) are
printed out in the results (using summary, for example).

Also, using the function sem (of lavaan package) directly (with one of the
replicated datasets) runs fine.


Here is the code I'm using:

# start code

# model syntax
model1 - '
# regressions

r1 ~ p1+p2+p3+p4+p5+p6+p7+p8+p9+p10+p11
r2 ~ r1+p2+p4+p5+p6+p8+p9+p10+p11+p12+p13+p14
'

# run sem (N=124); data are already imputed
out1 - sem.mi(model1,imputedData,m=20,chi=mr,fixed.x=T,std.ov=T)
summary(out1)
inspect(out1, imputed) # the combined chi is presented

# end code


Can someone tell me why am I getting these warning messages?

Thanks,

Duarte

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Stratified Random Sampling Proportional to Size

2013-04-29 Thread Lopez, Dan
Hi Jeff,
a  b) points taken. Thanks for the reference too.
c) taking the zero's out did the trick.

Dan

-Original Message-
From: Jeff Newmiller [mailto:jdnew...@dcn.davis.ca.us] 
Sent: Sunday, April 28, 2013 12:15 AM
To: Lopez, Dan
Cc: R help (r-help@r-project.org)
Subject: Re: [R] Stratified Random Sampling Proportional to Size

a) Please post plain text

b) Please make reproducible examples (e.g. telling us how you accessed a 
database that we have no access to is not helpful). See ?head, ?dput and [1]

c) I don't know anything about the sampling package or the strata function, but 
I would recommend eliminating the rows that have zeros from the input data. 
E.g.:

stratum_cp - stratum_cp[ 0stratum_cp$stratp, ]

[1]
http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example

On Fri, 26 Apr 2013, Lopez, Dan wrote:

 Hello R Experts,

 I kindly request your assistance on figuring out how to get a 
 stratified random sampling proportional to 100.

 Below is my r code showing what I did and the error I'm getting with 
 sampling::strata

 # FIRST I summarized count of records by the two variables I want to 
 use as strata

 Library(RODBC)
 library(sqldf)
 library(sampling)
 #After establishing connection I query the data and sort it by strata 
 APPT_TYP_CD_LL and EMPL_TYPE and store it in a dataframe 
 CURRPOP-sqlQuery(ch,SELECT APPT_TYP_CD_LL, 
 EMPL_TYPE,ASOFDATE,EMPLID,NAME,DEPTID,JOBCODE,JOBTITLE,SAL_ADMIN_PLAN,
 RET_TYP_CD_LL FROM PS_EMPLOYEES_LL WHERE EMPL_STATUS NOT IN('R','T') 
 ORDER BY APPT_TYP_CD_LL, EMPL_TYPE) #ROWID is a dummy ID I added and 
 repositioned after the strat columns for later use
 CURRPOP$ROWID-seq(nrow(CURRPOP))
 CURRPOP-CURRPOP[,c(1:2,11,3:10)]

 # My strata.  Stratp is how many I want to sampled from each strata. NOTE 
 THERE ARE SOME 0's which just means I won't sample from that group.
 stratum_cp-sqldf(SELECT APPT_TYP_CD_LL,EMPL_TYPE, count(*) HC FROM 
 CURRPOP GROUP BY APPT_TYP_CD_LL,EMPL_TYPE)
 stratum_cp$stratp-round(stratum_cp$HC/nrow(CURRPOP)*100)

 stratum_cp
   APPT_TYP_CD_LL EMPL_TYPE   HC stratp
 1  FA S1  0
 2  FC S5  0
 3  FP S  173  3
 4  FR H  170  3
 5  FX H   49  1
 6  FX S   57  1
 7  IN H 1589 25
 8  IN S 3987 63
 9  IP H7  0
 10 IP S   53  1
 11 SA H8  0
 12 SE S   43  1
 13 SF H   14  0
 14 SF S1  0
 15 SG S   10  0
 16 ST H  107  2
 17 ST S6  0

 #THEN I attempted to use sampling::strata using the instructions in 
 that package and got an error


 #I use stratum_cp$stratp for my sizes.



 s-strata(CURRPOP,c(APPT_TYP_CD_LL,EMPL_TYPE),size=stratum_cp$str
 atp,method=srswor)

 Error in data.frame(..., check.names = FALSE) :

  arguments imply differing number of rows: 0, 1

 traceback()

 5: stop(arguments imply differing number of rows: , 
 paste(unique(nrows),

   collapse = , ))

 4: data.frame(..., check.names = FALSE)

 3: cbind(deparse.level, ...)

 2: cbind(r, i)

 1: strata(CURRPOP, c(APPT_TYP_CD_LL, EMPL_TYPE), size = 
 stratum_cp$stratp,

   method = srswor)



 #In lieu of a reproducible sample here is some info regarding most of 
 my data
 dim(CURRPOP)
 [1] 6280   11
 #Cols w/ personal info have been removed in this output

 str(CURRPOP[,c(1:3,7:11)])

 'data.frame':  6280 obs. of  8 variables:

 $ APPT_TYP_CD_LL: Factor w/ 12 levels FA,FC,FP,..: 1 2 2 2 2 2 3 3 3 3 
 ...

 $ EMPL_TYPE : Factor w/ 2 levels H,S: 2 2 2 2 2 2 2 2 2 2 ...

 $ ROWID : int  1 2 3 4 5 6 7 8 9 10 ...

 $ DEPTID: int  9825 9613 9613 9852 9772 9852 9853 9853 9853 9854 ...

 $ JOBCODE   : Factor w/ 325 levels 055.2,055.3,..: 311 112 112 112 
 112 112 298 299 299 300 ...

 $ JOBTITLE  : Factor w/ 325 levels Accounting Assistant,..: 227 192 192 
 192 192 192 190 191 191 153 ...

 $ SAL_ADMIN_PLAN: Factor w/ 40 levels ADE,AME,ASE,..: 36 38 38 38 38 38 
 31 31 31 31 ...

 $ RET_TYP_CD_LL : Factor w/ 2 levels TCP1,TCP2: 2 2 2 2 2 2 2 2 2 2 ...

 Daniel Lopez
 Workforce Analyst
 HRIM - Workforce Analytics  Metrics
 Strategic Human Resources Management
 wf-analytics-metr...@lists.llnl.govmailto:wf-analytics-metrics@lists.
 llnl.gov
 (925) 422-0814


   [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


---
Jeff NewmillerThe 

Re: [R] cannot compile R on Cray XE6 HLRS HERMIT

2013-04-29 Thread Ben Bolker
Martin Ivanov tramni at abv.bg writes:

  Dear All, I am trying to compile R-3.0 on Cray xe6 (HLRS) HERMIT,
  no success so far.  Here is my experience:

  You might be better off posting this to the r-de...@r-project.org
mailing list (the list is for developer queries: technically this
isn't development, but queries about compilation on exotic systems
appear more often there, and often require input from R-core members
...)

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] expanding a presence only dataset into presence/absence

2013-04-29 Thread arun



I am sorry.  I forgot to update the code:dat1- read.table(text=
Species Site Date
a 1 1
b 1 1
b 1 2
c 1 3
,sep=,header=TRUE,stringsAsFactors=FALSE)
dat1$Present- 1
dat2-expand.grid(unique(dat1$Species),unique(dat1$Site),unique(dat1$Date))
 colnames(dat2)- colnames(dat1)[-4] #changed here
res-merge(dat1,dat2,by=c(Species,Site,Date),all=TRUE)
res[is.na(res)]- 0
 res-res[order(res$Date),]

row.names(res)- 1:nrow(res)
res
#  Species Site Date Present
#1   a    1    1   1
#2   b    1    1   1
#3   c    1    1   0
#4   a    1    2   0
#5   b    1    2   1
#6   c    1    2   0
#7   a    1    3   0
#8   b    1    3   0
#9   c    1    3   1
A.K.



From: Matthew Venesky mvene...@gmail.com
To: arun smartpink...@yahoo.com 
Sent: Monday, April 29, 2013 1:58 PM
Subject: Re: [R] expanding a presence only dataset into presence/absence



The output that you prepared (for Site 1) looks good... however, I can't get 
that code to work. I get the following error:

 dat2-expand.grid(unique(dat1$Species),unique(dat1$Site),unique(dat1$Date))colnames(dat2)-
  colnames(dat1)
Error: unexpected symbol in 
dat2-expand.grid(unique(dat1$Species),unique(dat1$Site),unique(dat1$Date))colnames






--
Matthew D. Venesky, Ph.D.


Postdoctoral Research Associate,
Department of Integrative Biology,
The University of South Florida,
Tampa, FL 33620

Website: http://mvenesky.myweb.usf.edu/


On Mon, Apr 29, 2013 at 1:44 PM, arun smartpink...@yahoo.com wrote:

Hi Matthew,

So, do you think the output I gave is different from what you expected?
Thanks,
Arun







From: Matthew Venesky mvene...@gmail.com
To: arun smartpink...@yahoo.com
Sent: Monday, April 29, 2013 1:15 PM
Subject: Re: [R] expanding a presence only dataset into presence/absence




I see what you are confused about. 

I'm sorry. I gave extra sites as examples in my table called Desired Data 
such that there are 3 sites in the Desired Data and only 1 site in the My 
current data. Ignore sites 2 and 3; you should see what I am trying to do 
using only site 1.




--
Matthew D. Venesky, Ph.D.


Postdoctoral Research Associate,
Department of Integrative Biology,
The University of South Florida,
Tampa, FL 33620

Website: http://mvenesky.myweb.usf.edu/


On Mon, Apr 29, 2013 at 1:11 PM, Matthew Venesky mvene...@gmail.com wrote:

That is part of the difficulty. If Species C was present only on Date 3, we 
need to have the code manually add Species C as absent (i.e., assign it a 
value of 0) at that site on the previous sampling dates. 


Or, is there something else that is confusing you that I am not explaining?




--


Matthew D. Venesky, Ph.D.


Postdoctoral Research Associate,
Department of Integrative Biology,
The University of South Florida,
Tampa, FL 33620
 
Website: http://mvenesky.myweb.usf.edu/


On Mon, Apr 29, 2013 at 12:47 PM, arun smartpink...@yahoo.com wrote:

Hi,

Your output dataset is bit confusing as it contains Sites that were not in 
the input.
Using your input dataset, I am getting this:


dat1- read.table(text=

Species Site Date
a 1 1
b 1 1
b 1 2
c 1 3
,sep=,header=TRUE,stringsAsFactors=FALSE)
dat1$Present- 1
dat2-expand.grid(unique(dat1$Species),unique(dat1$Site),unique(dat1$Date))
 colnames(dat2)- colnames(dat1)
res-merge(dat1,dat2,by=c(Species,Site,Date),all=TRUE)
res[is.na(res)]- 0
 res-res[order(res$Date),]
 res
#  Species Site Date Present
#1   a    1    1   1
#4   b    1    1   1
#7   c    1    1   0
#2   a    1    2   0
#5   b    1    2   1
#8   c    1    2   0
#3   a    1    3   0
#6   b    1    3   0
#9   c    1    3   1
A.K.






- Original Message -
From: Matthew Venesky mvene...@gmail.com
To: r-help@r-project.org
Cc:
Sent: Monday, April 29, 2013 11:12 AM
Subject: [R] expanding a presence only dataset into presence/absence

Hello,

I'm working with a very large dataset (250,000+ lines in its' current form)
that includes presence only data on various species (which is nested within
different sites and sampling dates). I need to convert this into a dataset
with presence/absence for each species. For example, I would like to expand
My current data to Desired data:

My current data

Species Site Date
a 1 1
b 1 1
b 1 2
c 1 3

Desired data

Species Present Site Date
a 1 1 1
b 1 1 1
c 0 1 1
a 0 2 2
b 1 2 2
C 0 2 2
a 0 3 3
b 0 3 3
c 1 3 3

I've scoured the web, including Rseek and haven't found a resolution (and
note that a similar question was asked sometime in 2011 without an answer).
Does anyone have any thoughts? Thank you in advance.

--

Matthew D. Venesky, Ph.D.

Postdoctoral Research Associate,
Department of Integrative Biology,
The University of South Florida,
Tampa, FL 33620

Website: http://mvenesky.myweb.usf.edu/

    [[alternative HTML version deleted]]

__
R-help@r-project.org mailing 

Re: [R] parSapply can't find function

2013-04-29 Thread Kaiyin Zhong (Victor Chung)
Hi, Uwe.

I still don't get how this can be done correctly. Here is what I tried.

In the file funcs.R, define these functions:

library('modeest')
x = vector(length=500)
x = sapply(x, function(i) i=sample(c(1,0), 1))
pastK = function(n, x, k) {
if (nk) { return(x[(n-k):(n-1)]) }
else {return(NA)}
}
predR = function(x, k) {
pastList = lapply(1:length(x), function(n) pastK(n, x, k))
pred = sapply(pastList, function(v) mfv(v)[1])
ratio = sum(pred==x, na.rm=T)/(length(pred) - sum(is.na(pred)))
}



Then do the following:

library('snow')
cl = makeCluster(rep('localhost', 12), 'SOCK')
clusterSetupRNG(cl)
clusterEvalQ(cl, 'source(funcs.R)')
testK = function() {
k = seq(3, 25, 2)
r = parSapply(cl, k, function(i) predR(x, i))
print(r)
}
testK()
stopCluster(cl)


The error still pops up:

Error in checkForRemoteErrors(val) :
  12 nodes produced errors; first error: could not find function predR





Best regards,

Kaiyin ZHONG
--
FMB, Erasmus MC
k.zh...@erasmusmc.nl
kindlych...@gmail.com


On Tue, Apr 23, 2013 at 3:44 PM, Uwe Ligges lig...@statistik.tu-dortmund.de
 wrote:



 On 23.04.2013 15:00, Kaiyin Zhong (Victor Chung) wrote:

 Thanks for the reply.

 How can i make the functions known to all nodes?


 See ?clusterEvalQ

 you may also want to try the parallel packages.

 Best,
 Uwe Ligges




 Best regards,

 Kaiyin ZHONG
 --**--**
 
 FMB, Erasmus MC
 k.zh...@erasmusmc.nl mailto:k.zh...@erasmusmc.nl
 kindlych...@gmail.com mailto:kindlych...@gmail.com



 On Tue, Apr 23, 2013 at 2:43 PM, Uwe Ligges
 lig...@statistik.tu-dortmund.**de lig...@statistik.tu-dortmund.de
 mailto:lig...@statistik.tu-**dortmund.delig...@statistik.tu-dortmund.de
 wrote:



 On 18.04.2013 11:11, Kaiyin Zhong (Victor Chung) wrote:

 Here is the code, assuming 8 cores in the cpu.

 library('modeest')
 library('snow')

 cl = makeCluster(rep('localhost', 8), 'SOCK')
 x = vector(length=50)
 x = sapply(x, function(i) i=sample(c(1,0), 1))

 pastK = function(n, x, k) {
   if (nk) { return(x[(n-k):(n-1)]) }
   else {return(NA)}
 }

 predR = function(x, k) {
   pastList = lapply(1:length(x), function(n) pastK(n, x, k))
   pred = sapply(pastList, function(v) mfv(v)[1])
   ratio = sum(pred==x, na.rm=T)/(length(pred) - sum(is.na
 http://is.na(pred)))

 }

 testK = function() {
   k = seq(3, 25, 2)
   r = parSapply(cl, k, function(i) predR(x, i))
 #r = sapply(k, function(i) predR(x, i))
 }

 r = testK()
 stopCluster(cl)

 Here is the error:
 Error in checkForRemoteErrors(val) :
 8 nodes produced errors; first error: could not find
 function predR



 predR is not yet known on all nodes, just on the master. You have to
 tell the nodes about the definition first.

 Best,
 Uwe Ligges






 Best regards,

 Kaiyin ZHONG
 --

 FMB, Erasmus MC
 k.zh...@erasmusmc.nl mailto:k.zh...@erasmusmc.nl
 kindlych...@gmail.com mailto:kindlych...@gmail.com

  [[alternative HTML version deleted]]

 __**__
 R-help@r-project.org mailto:R-help@r-project.org mailing list
 
 https://stat.ethz.ch/mailman/_**_listinfo/r-helphttps://stat.ethz.ch/mailman/__listinfo/r-help

 
 https://stat.ethz.ch/mailman/**listinfo/r-helphttps://stat.ethz.ch/mailman/listinfo/r-help
 
 PLEASE do read the posting guide
 
 http://www.R-project.org/__**posting-guide.htmlhttp://www.R-project.org/__posting-guide.html

 
 http://www.R-project.org/**posting-guide.htmlhttp://www.R-project.org/posting-guide.html
 
 and provide commented, minimal, self-contained, reproducible code.




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] parSapply can't find function

2013-04-29 Thread David Winsemius

On Apr 29, 2013, at 11:16 AM, Kaiyin Zhong (Victor Chung) wrote:

 Hi, Uwe.
 
 I still don't get how this can be done correctly. Here is what I tried.
 
 In the file funcs.R, define these functions:
 
 library('modeest')
 x = vector(length=500)
 x = sapply(x, function(i) i=sample(c(1,0), 1))
 pastK = function(n, x, k) {
if (nk) { return(x[(n-k):(n-1)]) }
else {return(NA)}
 }
 predR = function(x, k) {
pastList = lapply(1:length(x), function(n) pastK(n, x, k))
pred = sapply(pastList, function(v) mfv(v)[1])
ratio = sum(pred==x, na.rm=T)/(length(pred) - sum(is.na(pred)))
 }
 
 
 
 Then do the following:
 
 library('snow')
 cl = makeCluster(rep('localhost', 12), 'SOCK')
 clusterSetupRNG(cl)
 clusterEvalQ(cl, 'source(funcs.R)')

Are you sure those outer single quote marks are not the problem?

-- 
David.


 testK = function() {
k = seq(3, 25, 2)
r = parSapply(cl, k, function(i) predR(x, i))
print(r)
 }
 testK()
 stopCluster(cl)
 
 
 The error still pops up:
 
 Error in checkForRemoteErrors(val) :
  12 nodes produced errors; first error: could not find function predR
 
 
 
 
 
 Best regards,
 
 Kaiyin ZHONG
 --
 FMB, Erasmus MC
 k.zh...@erasmusmc.nl
 kindlych...@gmail.com
 
 
 On Tue, Apr 23, 2013 at 3:44 PM, Uwe Ligges lig...@statistik.tu-dortmund.de
 wrote:
 
 
 
 On 23.04.2013 15:00, Kaiyin Zhong (Victor Chung) wrote:
 
 Thanks for the reply.
 
 How can i make the functions known to all nodes?
 
 
 See ?clusterEvalQ
 
 you may also want to try the parallel packages.
 
 Best,
 Uwe Ligges
 
 
 
 
 Best regards,
 
 Kaiyin ZHONG
 --**--**
 
 FMB, Erasmus MC
 k.zh...@erasmusmc.nl mailto:k.zh...@erasmusmc.nl
 kindlych...@gmail.com mailto:kindlych...@gmail.com
 
 
 
 On Tue, Apr 23, 2013 at 2:43 PM, Uwe Ligges
 lig...@statistik.tu-dortmund.**de lig...@statistik.tu-dortmund.de
 mailto:lig...@statistik.tu-**dortmund.delig...@statistik.tu-dortmund.de
 wrote:
 
 
 
On 18.04.2013 11:11, Kaiyin Zhong (Victor Chung) wrote:
 
Here is the code, assuming 8 cores in the cpu.
 
library('modeest')
library('snow')
 
cl = makeCluster(rep('localhost', 8), 'SOCK')
x = vector(length=50)
x = sapply(x, function(i) i=sample(c(1,0), 1))
 
pastK = function(n, x, k) {
  if (nk) { return(x[(n-k):(n-1)]) }
  else {return(NA)}
}
 
predR = function(x, k) {
  pastList = lapply(1:length(x), function(n) pastK(n, x, k))
  pred = sapply(pastList, function(v) mfv(v)[1])
  ratio = sum(pred==x, na.rm=T)/(length(pred) - sum(is.na
http://is.na(pred)))
 
}
 
testK = function() {
  k = seq(3, 25, 2)
  r = parSapply(cl, k, function(i) predR(x, i))
#r = sapply(k, function(i) predR(x, i))
}
 
r = testK()
stopCluster(cl)
 
Here is the error:
Error in checkForRemoteErrors(val) :
8 nodes produced errors; first error: could not find
function predR
 
 
 
predR is not yet known on all nodes, just on the master. You have to
tell the nodes about the definition first.
 
Best,
Uwe Ligges
 
 
 
 
 
 
Best regards,
 
Kaiyin ZHONG
--
 
FMB, Erasmus MC
k.zh...@erasmusmc.nl mailto:k.zh...@erasmusmc.nl
kindlych...@gmail.com mailto:kindlych...@gmail.com
 
 [[alternative HTML version deleted]]
 
__**__
R-help@r-project.org mailto:R-help@r-project.org mailing list

 https://stat.ethz.ch/mailman/_**_listinfo/r-helphttps://stat.ethz.ch/mailman/__listinfo/r-help
 

 https://stat.ethz.ch/mailman/**listinfo/r-helphttps://stat.ethz.ch/mailman/listinfo/r-help
 
PLEASE do read the posting guide

 http://www.R-project.org/__**posting-guide.htmlhttp://www.R-project.org/__posting-guide.html
 

 http://www.R-project.org/**posting-guide.htmlhttp://www.R-project.org/posting-guide.html
 
and provide commented, minimal, self-contained, reproducible code.
 
 
 
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

David Winsemius
Alameda, CA, USA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] parSapply can't find function

2013-04-29 Thread Kaiyin Zhong (Victor Chung)
Oh, indeed, that IS the problem. Thank you!!!

Best regards,

Kaiyin ZHONG
--
FMB, Erasmus MC
k.zh...@erasmusmc.nl
kindlych...@gmail.com


On Mon, Apr 29, 2013 at 8:22 PM, David Winsemius dwinsem...@comcast.netwrote:


 On Apr 29, 2013, at 11:16 AM, Kaiyin Zhong (Victor Chung) wrote:

  Hi, Uwe.
 
  I still don't get how this can be done correctly. Here is what I tried.
 
  In the file funcs.R, define these functions:
 
  library('modeest')
  x = vector(length=500)
  x = sapply(x, function(i) i=sample(c(1,0), 1))
  pastK = function(n, x, k) {
 if (nk) { return(x[(n-k):(n-1)]) }
 else {return(NA)}
  }
  predR = function(x, k) {
 pastList = lapply(1:length(x), function(n) pastK(n, x, k))
 pred = sapply(pastList, function(v) mfv(v)[1])
 ratio = sum(pred==x, na.rm=T)/(length(pred) - sum(is.na(pred)))
  }
 
 
 
  Then do the following:
 
  library('snow')
  cl = makeCluster(rep('localhost', 12), 'SOCK')
  clusterSetupRNG(cl)
  clusterEvalQ(cl, 'source(funcs.R)')

 Are you sure those outer single quote marks are not the problem?

 --
 David.


  testK = function() {
 k = seq(3, 25, 2)
 r = parSapply(cl, k, function(i) predR(x, i))
 print(r)
  }
  testK()
  stopCluster(cl)
 
 
  The error still pops up:
 
  Error in checkForRemoteErrors(val) :
   12 nodes produced errors; first error: could not find function predR
 
 
 
 
 
  Best regards,
 
  Kaiyin ZHONG
  --
  FMB, Erasmus MC
  k.zh...@erasmusmc.nl
  kindlych...@gmail.com
 
 
  On Tue, Apr 23, 2013 at 3:44 PM, Uwe Ligges 
 lig...@statistik.tu-dortmund.de
  wrote:
 
 
 
  On 23.04.2013 15:00, Kaiyin Zhong (Victor Chung) wrote:
 
  Thanks for the reply.
 
  How can i make the functions known to all nodes?
 
 
  See ?clusterEvalQ
 
  you may also want to try the parallel packages.
 
  Best,
  Uwe Ligges
 
 
 
 
  Best regards,
 
  Kaiyin ZHONG
  --**--**
  
  FMB, Erasmus MC
  k.zh...@erasmusmc.nl mailto:k.zh...@erasmusmc.nl
  kindlych...@gmail.com mailto:kindlych...@gmail.com
 
 
 
  On Tue, Apr 23, 2013 at 2:43 PM, Uwe Ligges
  lig...@statistik.tu-dortmund.**de lig...@statistik.tu-dortmund.de
  mailto:lig...@statistik.tu-**dortmund.de
 lig...@statistik.tu-dortmund.de
  wrote:
 
 
 
 On 18.04.2013 11:11, Kaiyin Zhong (Victor Chung) wrote:
 
 Here is the code, assuming 8 cores in the cpu.
 
 library('modeest')
 library('snow')
 
 cl = makeCluster(rep('localhost', 8), 'SOCK')
 x = vector(length=50)
 x = sapply(x, function(i) i=sample(c(1,0), 1))
 
 pastK = function(n, x, k) {
   if (nk) { return(x[(n-k):(n-1)]) }
   else {return(NA)}
 }
 
 predR = function(x, k) {
   pastList = lapply(1:length(x), function(n) pastK(n, x, k))
   pred = sapply(pastList, function(v) mfv(v)[1])
   ratio = sum(pred==x, na.rm=T)/(length(pred) - sum(is.na
 http://is.na(pred)))
 
 }
 
 testK = function() {
   k = seq(3, 25, 2)
   r = parSapply(cl, k, function(i) predR(x, i))
 #r = sapply(k, function(i) predR(x, i))
 }
 
 r = testK()
 stopCluster(cl)
 
 Here is the error:
 Error in checkForRemoteErrors(val) :
 8 nodes produced errors; first error: could not find
 function predR
 
 
 
 predR is not yet known on all nodes, just on the master. You have to
 tell the nodes about the definition first.
 
 Best,
 Uwe Ligges
 
 
 
 
 
 
 Best regards,
 
 Kaiyin ZHONG
 --
 
 FMB, Erasmus MC
 k.zh...@erasmusmc.nl mailto:k.zh...@erasmusmc.nl
 kindlych...@gmail.com mailto:kindlych...@gmail.com
 
  [[alternative HTML version deleted]]
 
 __**__
 R-help@r-project.org mailto:R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/_**_listinfo/r-help
 https://stat.ethz.ch/mailman/__listinfo/r-help
 
 https://stat.ethz.ch/mailman/**listinfo/r-help
 https://stat.ethz.ch/mailman/listinfo/r-help
 
 PLEASE do read the posting guide
 http://www.R-project.org/__**posting-guide.html
 http://www.R-project.org/__posting-guide.html
 
 http://www.R-project.org/**posting-guide.html
 http://www.R-project.org/posting-guide.html
 
 and provide commented, minimal, self-contained, reproducible
 code.
 
 
 
 
[[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.

 David Winsemius
 Alameda, CA, USA



[[alternative HTML 

Re: [R] parSapply can't find function

2013-04-29 Thread Duncan Murdoch

On 29/04/2013 2:16 PM, Kaiyin Zhong (Victor Chung) wrote:

Hi, Uwe.

I still don't get how this can be done correctly. Here is what I tried.

In the file funcs.R, define these functions:

library('modeest')
x = vector(length=500)
x = sapply(x, function(i) i=sample(c(1,0), 1))
pastK = function(n, x, k) {
 if (nk) { return(x[(n-k):(n-1)]) }
 else {return(NA)}
}
predR = function(x, k) {
 pastList = lapply(1:length(x), function(n) pastK(n, x, k))
 pred = sapply(pastList, function(v) mfv(v)[1])
 ratio = sum(pred==x, na.rm=T)/(length(pred) - sum(is.na(pred)))
}



Then do the following:

library('snow')
cl = makeCluster(rep('localhost', 12), 'SOCK')
clusterSetupRNG(cl)
clusterEvalQ(cl, 'source(funcs.R)')


The expression being evaluated there is a string,

'source(funcs.R)'

You want an expression, e.g.

clusterEvalQ(cl, source(funcs.R))

Duncan Murdoch



testK = function() {
 k = seq(3, 25, 2)
 r = parSapply(cl, k, function(i) predR(x, i))
 print(r)
}
testK()
stopCluster(cl)


The error still pops up:

Error in checkForRemoteErrors(val) :
   12 nodes produced errors; first error: could not find function predR





Best regards,

Kaiyin ZHONG
--
FMB, Erasmus MC
k.zh...@erasmusmc.nl
kindlych...@gmail.com


On Tue, Apr 23, 2013 at 3:44 PM, Uwe Ligges lig...@statistik.tu-dortmund.de
 wrote:



 On 23.04.2013 15:00, Kaiyin Zhong (Victor Chung) wrote:

 Thanks for the reply.

 How can i make the functions known to all nodes?


 See ?clusterEvalQ

 you may also want to try the parallel packages.

 Best,
 Uwe Ligges




 Best regards,

 Kaiyin ZHONG
 --**--**
 
 FMB, Erasmus MC
 k.zh...@erasmusmc.nl mailto:k.zh...@erasmusmc.nl
 kindlych...@gmail.com mailto:kindlych...@gmail.com



 On Tue, Apr 23, 2013 at 2:43 PM, Uwe Ligges
 lig...@statistik.tu-dortmund.**de lig...@statistik.tu-dortmund.de
 mailto:lig...@statistik.tu-**dortmund.delig...@statistik.tu-dortmund.de
 wrote:



 On 18.04.2013 11:11, Kaiyin Zhong (Victor Chung) wrote:

 Here is the code, assuming 8 cores in the cpu.

 library('modeest')
 library('snow')

 cl = makeCluster(rep('localhost', 8), 'SOCK')
 x = vector(length=50)
 x = sapply(x, function(i) i=sample(c(1,0), 1))

 pastK = function(n, x, k) {
   if (nk) { return(x[(n-k):(n-1)]) }
   else {return(NA)}
 }

 predR = function(x, k) {
   pastList = lapply(1:length(x), function(n) pastK(n, x, k))
   pred = sapply(pastList, function(v) mfv(v)[1])
   ratio = sum(pred==x, na.rm=T)/(length(pred) - sum(is.na
 http://is.na(pred)))

 }

 testK = function() {
   k = seq(3, 25, 2)
   r = parSapply(cl, k, function(i) predR(x, i))
 #r = sapply(k, function(i) predR(x, i))
 }

 r = testK()
 stopCluster(cl)

 Here is the error:
 Error in checkForRemoteErrors(val) :
 8 nodes produced errors; first error: could not find
 function predR



 predR is not yet known on all nodes, just on the master. You have to
 tell the nodes about the definition first.

 Best,
 Uwe Ligges






 Best regards,

 Kaiyin ZHONG
 --

 FMB, Erasmus MC
 k.zh...@erasmusmc.nl mailto:k.zh...@erasmusmc.nl
 kindlych...@gmail.com mailto:kindlych...@gmail.com

  [[alternative HTML version deleted]]

 __**__
 R-help@r-project.org mailto:R-help@r-project.org mailing list
 
https://stat.ethz.ch/mailman/_**_listinfo/r-helphttps://stat.ethz.ch/mailman/__listinfo/r-help

 
https://stat.ethz.ch/mailman/**listinfo/r-helphttps://stat.ethz.ch/mailman/listinfo/r-help
 
 PLEASE do read the posting guide
 
http://www.R-project.org/__**posting-guide.htmlhttp://www.R-project.org/__posting-guide.html

 
http://www.R-project.org/**posting-guide.htmlhttp://www.R-project.org/posting-guide.html
 
 and provide commented, minimal, self-contained, reproducible code.




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] parSapply can't find function

2013-04-29 Thread Kaiyin Zhong (Victor Chung)
Sorry, I got some new error:

Error in cut.default(i, breaks) : 'breaks' are not unique
traceback()
20: stop('breaks' are not unique)
19: cut.default(i, breaks)
18: cut(i, breaks)
17: split.default(i, cut(i, breaks))
16: split(i, cut(i, breaks))
15: structure(split(i, cut(i, breaks)), names = NULL)
14: splitIndices(length(x), ncl)
13: lapply(splitIndices(length(x), ncl), function(i) x[i])
12: splitList(x, length(cl))
11: staticClusterApply(cl, fun, length(x), argfun)
10: clusterApply(cl, splitList(x, length(cl)), lapply, fun, ...)
9: lapply(args, enquote)
8: do.call(fun, lapply(args, enquote))
7: docall(c, clusterApply(cl, splitList(x, length(cl)), lapply,
   fun, ...))
6: parLapply(cl, as.list(X), FUN, ...)
5: parSapply(cl, eff, pow_error) at testing.R#5
4: eval(expr, envir, enclos)
3: eval(ei, envir)
2: withVisible(eval(ei, envir))
1: source(testing.R)

The sequential run was ok.

Best regards,

Kaiyin ZHONG
--
FMB, Erasmus MC
k.zh...@erasmusmc.nl
kindlych...@gmail.com


On Mon, Apr 29, 2013 at 8:26 PM, Duncan Murdoch murdoch.dun...@gmail.comwrote:

 On 29/04/2013 2:16 PM, Kaiyin Zhong (Victor Chung) wrote:

 Hi, Uwe.

 I still don't get how this can be done correctly. Here is what I tried.

 In the file funcs.R, define these functions:

 library('modeest')
 x = vector(length=500)
 x = sapply(x, function(i) i=sample(c(1,0), 1))
 pastK = function(n, x, k) {
  if (nk) { return(x[(n-k):(n-1)]) }
  else {return(NA)}
 }
 predR = function(x, k) {
  pastList = lapply(1:length(x), function(n) pastK(n, x, k))
  pred = sapply(pastList, function(v) mfv(v)[1])
  ratio = sum(pred==x, na.rm=T)/(length(pred) - sum(is.na(pred)))
 }



 Then do the following:

 library('snow')
 cl = makeCluster(rep('localhost', 12), 'SOCK')
 clusterSetupRNG(cl)
 clusterEvalQ(cl, 'source(funcs.R)')


 The expression being evaluated there is a string,

 'source(funcs.R)'

 You want an expression, e.g.

 clusterEvalQ(cl, source(funcs.R))

 Duncan Murdoch


  testK = function() {
  k = seq(3, 25, 2)
  r = parSapply(cl, k, function(i) predR(x, i))
  print(r)
 }
 testK()
 stopCluster(cl)


 The error still pops up:

 Error in checkForRemoteErrors(val) :
12 nodes produced errors; first error: could not find function predR





 Best regards,

 Kaiyin ZHONG
 --
 FMB, Erasmus MC
 k.zh...@erasmusmc.nl
 kindlych...@gmail.com


 On Tue, Apr 23, 2013 at 3:44 PM, Uwe Ligges 
 lig...@statistik.tu-dortmund.**de lig...@statistik.tu-dortmund.de
  wrote:

 
 
  On 23.04.2013 15:00, Kaiyin Zhong (Victor Chung) wrote:
 
  Thanks for the reply.
 
  How can i make the functions known to all nodes?
 
 
  See ?clusterEvalQ
 
  you may also want to try the parallel packages.
 
  Best,
  Uwe Ligges
 
 
 
 
  Best regards,
 
  Kaiyin ZHONG
  --**--**

  
  FMB, Erasmus MC
  k.zh...@erasmusmc.nl mailto:k.zh...@erasmusmc.nl
  kindlych...@gmail.com mailto:kindlych...@gmail.com
 
 
 
  On Tue, Apr 23, 2013 at 2:43 PM, Uwe Ligges
  lig...@statistik.tu-dortmund.de lig...@statistik.tu-dortmund.**
 de lig...@statistik.tu-dortmund.de
  mailto:lig...@statistik.tu-dortmund.de http://dortmund.de
 ligges@statistik.**tu-dortmund.de lig...@statistik.tu-dortmund.de

  wrote:
 
 
 
  On 18.04.2013 11:11, Kaiyin Zhong (Victor Chung) wrote:
 
  Here is the code, assuming 8 cores in the cpu.
 
  library('modeest')
  library('snow')
 
  cl = makeCluster(rep('localhost', 8), 'SOCK')
  x = vector(length=50)
  x = sapply(x, function(i) i=sample(c(1,0), 1))
 
  pastK = function(n, x, k) {
if (nk) { return(x[(n-k):(n-1)]) }
else {return(NA)}
  }
 
  predR = function(x, k) {
pastList = lapply(1:length(x), function(n) pastK(n, x,
 k))
pred = sapply(pastList, function(v) mfv(v)[1])
ratio = sum(pred==x, na.rm=T)/(length(pred) - sum(is.na
  http://is.na(pred)))
 
  }
 
  testK = function() {
k = seq(3, 25, 2)
r = parSapply(cl, k, function(i) predR(x, i))
  #r = sapply(k, function(i) predR(x, i))
  }
 
  r = testK()
  stopCluster(cl)
 
  Here is the error:
  Error in checkForRemoteErrors(val) :
  8 nodes produced errors; first error: could not find
  function predR
 
 
 
  predR is not yet known on all nodes, just on the master. You have
 to
  tell the nodes about the definition first.
 
  Best,
  Uwe Ligges
 
 
 
 
 
 
  Best regards,
 
  Kaiyin ZHONG
  --
 
  FMB, Erasmus MC
  k.zh...@erasmusmc.nl mailto:k.zh...@erasmusmc.nl
  kindlych...@gmail.com mailto:kindlych...@gmail.com
 
   [[alternative HTML version deleted]]
 
  

Re: [R] speed of a vector operation question

2013-04-29 Thread Mikhail Umorin

Thank you all very much for your time and suggestions. The link to 
stackoverflow was very helpful. Here are some timings in case someone wants to 
know. (I noticed that microbenchmark results vary, depending on how many 
functions one tries to benchmark at a time. However, the min stays about the 
same)

# just to refresh, most of the code is from stackoverflow link provided by 
Martin Morgan  : http://stackoverflow.com/questions/16213029/more-efficient-
strategy-for-which-or-match

f0 - function(v) length(which(v  0))

f1 - function(v) sum(v  0)

f2 - function(v) which.min(v  0) - 1L

f3 - function(x) { # binary search implemented in R
imin - 1L
imax - length(x)
while (imax = imin) {
imid - as.integer(imin + (imax - imin) / 2)
if (x[imid] = 0)
imax - imid - 1L
else
imin - imid + 1L
}
imax
}

f3.c - cmpfun(f3) # pre-compiled

# binary search in C
f4 - cfunction(c(x = numeric),  
int imin = 0, imax = Rf_length(x) - 1, imid;
while (imax = imin) {
imid = imin + (imax - imin) / 2;
if (REAL(x)[imid] = 0)
imax = imid - 1;
else
imin = imid + 1;
}
return ScalarInteger(imax + 1);
)

# this one is separate suggestion by William Dunlap :
f5 - function(v) {
  tabulate(findInterval(v, c(-Inf, 0, 1, Inf)))[1]
}

vec - c(seq(-100,-1,length.out=1e6), rep(0,20), seq(1,100,length.out=1e6))
# the identity of results was verified

microbenchmark(f1(vec), f2(vec), f3(vec), f3.c(vec), f4(vec), f5(vec))
Unit: microseconds
  expr   min lqmedian uq   max neval
   f1(vec) 17054.233 17831.1385 18514.305 19512.4705 54603.435   100
   f2(vec) 23624.353 25026.4265 26034.785 29322.1150 60014.458   100
   f3(vec)76.90293.2340   111.834   116.8370   129.888   100
 f3.c(vec)21.88330.753037.75754.125062.939   100
   f4(vec) 6.57510.588530.38931.938537.610   100
   f5(vec) 35365.088 36767.6175 38317.103 40671.2000 69209.425   100


So, i'll try to go with the inline binary search and see if I can precompile 
complex conditions.

Thank you, again, for your help!

Mikhail.




On Friday, April 26, 2013 20:52:27 Suzen, Mehmet wrote:
 Hello Mikhail,
 
 I could suggest you to use ff package for fast access to large data
 structures:
 
 http://cran.r-project.org/web/packages/ff/index.html
 http://wsopuppenkiste.wiso.uni-goettingen.de/ff/ff_1.0/inst/doc/ff.pdf
 
 Best
 
 Mehmet
 
 On 26 April 2013 18:12, Mikhail Umorin mike...@gmail.com wrote:
  Hello,
  
  I am dealing with numeric vectors 10^5 to 10^6 elements long. The values
  are sorted (with duplicates) in the vector (v). I am obtaining the length
  of vectors such as (v  c) or (v  c1  v  c2), where c, c1, c2 are some
  scalar variables. What is the most efficient way to do this?
  
  I am using sum(v  c) since TRUE's are 1's and FALSE's are 0's. This seems
  to me more efficient than length(which(v  c)), but, please, correct me
  if I'm wrong. So, is there anything faster than what I already use?
  
  I'm running R 2.14.2 on Linux kernel 3.4.34.
  
  I appreciate your time,
  
  Mikhail
  
  [[alternative HTML version deleted]]
  
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html and provide commented,
  minimal, self-contained, reproducible code.
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Function for Data Frame

2013-04-29 Thread MacQueen, Don
Just to add a little, don't get distracted by the return() function.
Functions return the value of their final expression, provided it isn't an
assignment.

For your example, this will do the job:

myfunc - function(DF) subset(DF, select=-V1)

If you want to modify the data frames in place, one way is to use a loop
instead of lapply.

mydfs - list(DF1, DF2, DF3)

for (il in 1:3) mydfs[[il]] - myfunc(mydfs[[il]])


But so should
  mydfs - lapply(mydfs,myfunc)

I doubt very much you'll see any performance difference between using
lapply() and using an explicit loop.

-Don

-- 
Don MacQueen

Lawrence Livermore National Laboratory
7000 East Ave., L-627
Livermore, CA 94550
925-423-1062





On 4/29/13 7:23 AM, Sparks, John James jspa...@uic.edu wrote:

Dear R Helpers,

I have about 20 data frames that I need to do a series of data scrubbing
steps to.  I have the list of data frames in a list so that I can use
lapply.  I am trying to build a function that will do the data scrubbing
that I need.  However, I am new to functions and there is something
fundamental that I am not understanding.  I use the return function at the
end of the function and this completes the data processing specified in
the function, but leaves the data frame that I want changed unaffected.
How do I get my function to apply its results to the data frame in
question instead of simply displaying the results to the screen?

Any helpful guidance would be most appreciated.

--John Sparks


x=as.data.frame(matrix(c(1,2,3,
1,2,3,
1,2,2,
1,2,2,
   1,1,1),ncol=3,byrow=T))


myfunc-function(DF){
 DF-subset(DF,select=-c(V1))
 return(DF)
}

myfunc(x)

#How to get this change to data frame x?
#And preferrably not send the results to the screen?
x

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] rbinding some elements from a list and obtain another list

2013-04-29 Thread MacQueen, Don
In addition to the other responses, consider this:

 i - 3
 i:i+1
[1] 4
 i:(i+1)
[1] 3 4

-Don


-- 
Don MacQueen

Lawrence Livermore National Laboratory
7000 East Ave., L-627
Livermore, CA 94550
925-423-1062





On 4/29/13 6:54 AM, De Castro Pascual, Montserrat mdecas...@creal.cat
wrote:

Hi everybody,



I have a list, where every element of this list is a data frame.



An example:



Mylist-list(A=data.frame, B=data.frame, C=data.frame, D=data.frame)



I want to rbind some elements of this list.

As an example:



Output-list(AB=data.frame, CD=data.frame)



Where

AB=rbind(A,B)

CD=rbind(C,D)





I¹ve tried:



f-function(x){

  for (i in seq(1,length(names(x)),2)){

aa-do.call(rbind,x[i:i+1])

aa

  }}

bb-f(mylist)



or



f-function(x){

  for (i in seq(1,length(names(x)),2)){

aa[i]-do.call(rbind,x[i:i+1])

list(aa[i])

}}

bb-f (mylist)



but it doesn¹t works



f-function(x){

+   for (i in seq(1,length(names(x)),2)){

+ aa-do.call(rbind,x[i:i+1])

+ aa

+   }}

 bb-f(mylist)

 bb

NULL

 f-function(x){

+   for (i in seq(1,length(names(x)),2)){

+ aa-do.call(rbind,x[i:i+1])

+ aa

+   }}

 bb-f(mylist)



 f-function(x){

+   for (i in seq(1,length(names(x)),2)){

+ aa[i]-do.call(rbind,x[i:i+1])

+ list(aa[i])

+   }}

 bb-f(mylist)

Mensajes de aviso perdidos

1: In aa[i] - do.call(rbind, x[i:i + 1]) :

  número de items para para sustituir no es un múltiplo de la longitud del
reemplazo

2: In aa[i] - do.call(rbind, x[i:i + 1]) :

  número de items para para sustituir no es un múltiplo de la longitud del
reemplazo

3: In aa[i] - do.call(rbind, x[i:i + 1]) :

  número de items para para sustituir no es un múltiplo de la longitud del
reemplazo

4: In aa[i] - do.call(rbind, x[i:i + 1]) :

  número de items para para sustituir no es un múltiplo de la longitud del
reemplazo

5: In aa[i] - do.call(rbind, x[i:i + 1]) :

  número de items para para sustituir no es un múltiplo de la longitud del
reemplazo

6: In aa[i] - do.call(rbind, x[i:i + 1]) :

  número de items para para sustituir no es un múltiplo de la longitud del
reemplazo





Thanks!



Montserrat




   [[alternative HTML version deleted]]


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] expanding a presence only dataset into presence/absence

2013-04-29 Thread arun


HI,
Check if this is what you wanted.  I am not sure the Hopeful outcome includes 
all the possible combinations.

dat1- read.csv(Matthewdat.csv,sep=,,header=TRUE,stringsAsFactors=FALSE)
dat1
 #    Species CallingIndex Site  Date
#1 Pseudacris crucifer    2 3608 3/31/2001
#2    Anaxyrus fowleri    2 3638 4/13/2001
#3 Pseudacris crucifer    3 3641 3/23/2001
#4    Pseudacris kalmi    1 3641 3/23/2001
#5 Lithobates catesbeianus    1 3641 4/27/2001
#6 Pseudacris crucifer    2 3641 4/27/2001
#7 Pseudacris crucifer    3 3663  4/5/2001
#8 Pseudacris crucifer    2 3663  5/2/2001
#9    Lithobates clamitans    1 3663  6/6/2001



dat1New-do.call(rbind,lapply(split(dat1,dat1$Site), function(x) {x$Present-1; 
x}))
row.names(dat1New)-1:nrow(dat1New)
 dat2-do.call(rbind,lapply(split(dat1,dat1$Site),function(x) 
expand.grid(unique(x$Species),unique(x$Site),unique(x$Date
 row.names(dat2)- 1:nrow(dat2)
 colnames(dat2)- colnames(dat1)[c(1,3,4)]
 res-merge(dat1New,dat2,by=c(Species,Site,Date),all=TRUE)
 res[is.na(res)]-0
 res
#   Species Site  Date CallingIndex Present
#1 Anaxyrus fowleri 3638 4/13/2001    2   1
#2  Lithobates catesbeianus 3641 3/23/2001    0   0
#3  Lithobates catesbeianus 3641 4/27/2001    1   1
#4 Lithobates clamitans 3663  4/5/2001    0   0
#5 Lithobates clamitans 3663  5/2/2001    0   0
#6 Lithobates clamitans 3663  6/6/2001    1   1
#7  Pseudacris crucifer 3608 3/31/2001    2   1
#8  Pseudacris crucifer 3641 3/23/2001    3   1
#9  Pseudacris crucifer 3641 4/27/2001    2   1
#10 Pseudacris crucifer 3663  4/5/2001    3   1
#11 Pseudacris crucifer 3663  5/2/2001    2   1
#12 Pseudacris crucifer 3663  6/6/2001    0   0
#13    Pseudacris kalmi 3641 3/23/2001    1   1
#14    Pseudacris kalmi 3641 4/27/2001    0   0


A.K.




From: Matthew Venesky mvene...@gmail.com
To: arun smartpink...@yahoo.com 
Sent: Monday, April 29, 2013 3:54 PM
Subject: Re: [R] expanding a presence only dataset into presence/absence



Arun,

Thanks again for your time on this. We are getting very close but not quite 
there. The problem is that I only gave you a very simple example because I 
didn't want to bog any of the readers of the blog down. If you have any 
interest or time, I was wondering if you could consider the full example and 
some actual data (attached CSV).

As you'll see, there is an additional column titled CallingIndex, which is an 
estimate of the species abundance (range of 1-3). If they were present, they 
were given a value that ranged from 1-3; if they were absent, they were not 
given any value. Editing your code to reflect this wasn't a problem.

However, what I didn't explain in enough detail to you is the specific contexts 
when we want to add zeros to the data. Essentially, we want to nest species 
within site and date and add zeros accordingly. If a species is never found at 
a site, we do not want to make any adjustments to the data. For example, 
Anaxyrus fowleri was not found at site 3608, so we do not want the code to add 
a row with Anaxyrus fowleri to site 3608 (in your code, it would add this). 
What we do want, however, is to add a zero for a species that was found on one 
date at a site but never found again on other dates. For example, Lithobates 
clamitans was found at site 3663 on 6/6/2001 but not observed on the other 2 
sampling dates, so we want to assign a calling index of 0 for Lithobates 
clamitans on sampling date 4/5/2001 and 5/2/2001 for site 3663 (and also make 
the appropriate addition for Pseudacris crucifer on the appropriate sampling 
dates).

You should be able to visualize what I am looking to do in the CSV file 
attached to this email. 

Does this make sense? Do you know of any code to do this task? 








--
Matthew D. Venesky, Ph.D.


Postdoctoral Research Associate,
Department of Integrative Biology,
The University of South Florida,
Tampa, FL 33620

Website: http://mvenesky.myweb.usf.edu/


On Mon, Apr 29, 2013 at 2:11 PM, arun smartpink...@yahoo.com wrote:



Hi Matthew,
No problem.
Regards,

Arun

From: Matthew Venesky mvene...@gmail.com
To: arun smartpink...@yahoo.com
Sent: Monday, April 29, 2013 2:09 PM

Subject: Re: [R] expanding a presence only dataset into presence/absence



This, my friend, is a stroke of genius.

I'll give it a try on the real data and I will keep you posted.

Many, many, thanks.






--
Matthew D. Venesky, Ph.D.


Postdoctoral Research Associate,
Department of Integrative Biology,
The University of South Florida,
Tampa, FL 33620

Website: http://mvenesky.myweb.usf.edu/


On Mon, Apr 29, 2013 at 2:05 PM, arun 

Re: [R] Arma - estimate of variance of white noise variables

2013-04-29 Thread Rui Barradas

Hello,

Em 29-04-2013 13:49, Preetam Pal escreveu:

Hi all,

Suppose I am fitting an arma(p,q) model to a time series y_t.
So, my model should contain (q+1) white noise variables.


Why? How on hearth can you say this?


As far as I know, each of them should have the same variance.
How do I get the estimate of this variance by running the arma(y) function
(or is there any other way)?


I'm not certain that the following is what you're looking for.

library(tseries)
fit - arma(y, ...etc...)
var(resid(fit))


Hope this helps,

Rui Barradas


Appreciate your help.

Thanks,
Preetam



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] biplot for principal componens analysis

2013-04-29 Thread capricy gao


I did a PCA for my data which has a dimension of 19000X4 using princomp
pca2=princomp((data), cor=F)





and obtained a biplot with 19000 labels which were very busy. How can I just 
show 19000 spot w/o labels?
biplot(pca2)

Thanks a lot:))


-data

               A1         A2     L1                           L2
E_6  0.23  4.05 13.35   11.86
E_00011    118.74    177.87    144.20  136.05
E_00062  8.50  0.60 73.11   45.81
E_00070  1.31  4.92  0.98    1.23
E_00071 97.41 39.90 31.15  150.77
E_00104  0.00  0.43 18.93   31.28
.
.
.
.
.
.
.

.
.

E_18586  0.00   0.0  0.00    0.95
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] getting started in parallel computing on a windows OS

2013-04-29 Thread Benjamin Caldwell
Martin,

This worked, thanks again!

*Ben Caldwell*

Graduate Fellow
University of California, Berkeley
130 Mulford Hall #3114
Berkeley, CA 94720
Office 223 Mulford Hall
(510)859-3358


On Thu, Apr 25, 2013 at 10:04 PM, Benjamin Caldwell btcaldw...@berkeley.edu
 wrote:

 Thanks for this martin. I'll start retooling and let you know how it goes.

 Ben Caldwell
 Graduate fellow
 On Apr 24, 2013 4:34 PM, Martin Morgan mtmor...@fhcrc.org wrote:

 On 04/24/2013 02:50 PM, Benjamin Caldwell wrote:

 Dear R help,

 I've what I think is a fairly simple parallel problem, and am getting
 bogged down in documentation and packages for much more complex
 situations.

 I have a big matrix  (30^5,5]. I have a function that will act on each
 row
 of that matrix sequentially and output the 'best' result from the whole
 matrix (it compares the result from each row to the last and keeps the
 'better' result). I would like to divide that first large matrix into
 chunks equal to the number of cores I have available to me, and work
 through each chunk, then output the results from each chunk.

 I'm really having trouble making head or tail of how to do this on a
 windows machine - lots of different false starts on several different
 packages now. Basically, I have the function, and I can of course easily
 divide the matrix into chunks. I just need a way to process each chunk
 in parallel (other than opening new R sessions for each core manually).

 Any help much appreciated - after two days of trying to get this to work
 I'm pretty burnt out.


 Hi Ben -- in your code from this morning you had a function

 fitting - function(ndx.grd=two,dt.grd=**one,ind.vr='ind',rsp.vr='res') {
 ## ... setup
 for(i in 1:length(ndx.grd[,1])){
 ## ... do work
 }
 ## ... collate results
 }

 that you're trying to run in parallel. Obviously the ## ... represent
 lines I've removed. When you say something like

 y - foreach(icount(length(two))) %dopar% fitting()

 its saying that you want to run fitting() length(two) times. So you're
 actually doing the same thing length(two) times, whereas you really want to
 divide the work thats inside fitting() into chunks, and do those on
 separate cores!

 Conceptually what you'd like to do is

 fit_one - function(idx, ndx.grd, dt.grd, ind.vr, rsp.vr) {
 ## ... do work on row idx _ONLY_
 }

 and then evaluate with

 ## ... setup
 y -
   foreach (idx = icount(nrow(two)) %dopar% one_fit(idx, two, one, ind,
 res)
 ## ... collate

 so that fit_one fits just one of your combinations. foreach will worry
 about distributing the work. Make sure that fit_one works first, before
 trying to run this in parallel; your use of try(), trying to fit different
 data types (character, integer, numeric) into a matrix rather than
 data.frame, and the type coercions all indicate that you're fighting with R
 rather than working with it.

 Hope that helps,

 Martin


 Thanks

 *Ben Caldwell*

 [[alternative HTML version deleted]]

 __**
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/**listinfo/r-helphttps://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/**
 posting-guide.html http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



 --
 Computational Biology / Fred Hutchinson Cancer Research Center
 1100 Fairview Ave. N.
 PO Box 19024 Seattle, WA 98109

 Location: Arnold Building M1 B861
 Phone: (206) 667-2793



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] interesting behavior from aaply

2013-04-29 Thread Benjamin Caldwell
Dear helpers,

I'm using plyr to process a large matrix for the first time. My code is set
up to work with matrixes, since I learned the hard way that dataframes
are considerably slower to process.

I started using aaply(), but the data was rearranged from a flat matrix to
a [, , 4] array for larger input matrixes. I'm sure something clever is
happening that I'm just not seeing - anyone have any insight? I can provide
the code for index.frames() if you like, but it's pretty turgid stuff.

For now I'm just using adply(), since it gives an output that I'd expect.

Best


toy.mat - all.combinations[10:30,]
 aaply(toy.mat,1, index.frames)

Var1 1234
  46  3599.848 3665.454 12946.41 12946.41
  51  3600.020 3666.424 12946.41 12946.41
  56  3600.167 3667.301 12946.41 12946.41
  61  3600.291 3668.058 12946.41 12946.41
  66  3600.404 3668.766 12946.41 12946.41
  71  3600.563 3669.779 12946.41 12946.41
  76  3600.563 3669.779 12946.41 12946.41
  81  3600.563 3669.779 12946.41 12946.41
  86  3600.563 3669.779 12946.41 12946.41
  91  3600.563 3669.779 12946.41 12946.41
  96  3600.563 3669.779 12946.41 12946.41
  101 3600.563 3669.779 12946.41 12946.41
  106 3600.563 3669.779 12946.41 12946.41
  111 3600.563 3669.779 12946.41 12946.41
  116 3600.563 3669.779 12946.41 12946.41
  121 3600.563 3669.779 12946.41 12946.41
  126 3600.563 3669.779 12946.41 12946.41
  131 3600.563 3669.779 12946.41 12946.41
  136 3600.563 3669.779 12946.41 12946.41
  141 3600.563 3669.779 12946.41 12946.41
  146 3600.563 3669.779 12946.41 12946.41

 toy.mat - all.combinations[10:100,]
 aaply(toy.mat,1, index.frames, .progress=win)
, ,  = 1

 Var2
Var1 27   12   17
  1 NA 3632.275 3630.730 3627.652
  6 NA 3638.913 3638.271 3635.214
  11NA 3592.933 3593.322 3595.973
  16NA 3588.024 3588.232 3589.256
  21NA 3593.917 3594.088 3594.834
  26NA 3596.888 3597.051 3597.752
  31NA 3597.896 3598.056 3598.741
  36NA 3598.994 3599.153 3599.837
  41NA 3599.571 3599.729 3600.413
  46  3599.848 3599.848 3600.006 3600.689
  51  3600.020 3600.020 3600.178   NA
  56  3600.167 3600.167 3600.325   NA
  61  3600.291 3600.291 3600.448   NA
  66  3600.404 3600.404 3600.561   NA
  71  3600.563 3600.563 3600.721   NA
  76  3600.563 3600.563 3600.721   NA
  81  3600.563 3600.563 3600.721   NA
  86  3600.563 3600.563 3600.721   NA
  91  3600.563 3600.563 3600.721   NA
  96  3600.563 3600.563 3600.721   NA
  101 3600.563 3600.563 3600.721   NA
  106 3600.563 3600.563 3600.721   NA
  111 3600.563 3600.563 3600.721   NA
  116 3600.563 3600.563 3600.721   NA
  121 3600.563 3600.563 3600.721   NA
  126 3600.563 3600.563 3600.721   NA
  131 3600.563 3600.563 3600.721   NA
  136 3600.563 3600.563 3600.721   NA
  141 3600.563 3600.563 3600.721   NA
  146 3600.563 3600.563 3600.721   NA

, ,  = 2

 Var2
Var1 27   12   17
  1 NA 3681.001 3698.490 3688.247
  6 NA 3664.453 3676.527 3666.970
  11NA 3662.162 3662.919 3668.211
  16NA 3661.484 3661.476 3661.975
  21NA 3648.731 3647.986 3650.290
  26NA 3649.497 3648.367 3653.130
  31NA 3652.778 3651.586 3656.638
  36NA 3660.082 3659.050 3662.755
  41NA 3663.944 3663.025 3665.933
  46  3665.454 3665.454 3664.572 3667.226
  51  3666.424 3666.424 3665.571   NA
  56  3667.301 3667.301 3666.477   NA
  61  3668.058 3668.058 3667.261   NA
  66  3668.766 3668.766 3667.998   NA
  71  3669.779 3669.779 3669.056   NA
  76  3669.779 3669.779 3669.056   NA
  81  3669.779 3669.779 3669.056   NA
  86  3669.779 3669.779 3669.056   NA
  91  3669.779 3669.779 3669.056   NA
  96  3669.779 3669.779 3669.056   NA
  101 3669.779 3669.779 3669.056   NA
  106 3669.779 3669.779 3669.056   NA
  111 3669.779 3669.779 3669.056   NA
  116 3669.779 3669.779 3669.056   NA
  121 3669.779 3669.779 3669.056   NA
  126 3669.779 3669.779 3669.056   NA
  131 3669.779 3669.779 3669.056   NA
  136 3669.779 3669.779 3669.056   NA
  141 3669.779 3669.779 3669.056   NA
  146 3669.779 3669.779 3669.056   NA

, ,  = 3

 Var2
Var1 27   12   17
  1 NA 12946.41 12946.41 12946.41
  6 NA 12946.41 12946.41 12946.41
  11NA 12946.41 12946.41 12946.41
  16NA 12946.41 12946.41 12946.41
  21NA 12946.41 12946.41 12946.41
  26NA 12946.41 12946.41 12946.41
  31NA 12946.41 12946.41 12946.41
  36NA 12946.41 12946.41 12946.41
  41NA 12946.41 12946.41 12946.41
  46  12946.41 12946.41 12946.41 12946.41
  51  12946.41 12946.41 12946.41   NA
  56  12946.41 12946.41 12946.41   NA
  61  12946.41 12946.41 12946.41   NA
  66  12946.41 12946.41 12946.41   NA
  71  12946.41 12946.41 12946.41   NA
  76  

[R] bigmemory and R 3.0

2013-04-29 Thread Benjamin Caldwell
Dear helpers,

Does anyone have information on the status of bigmemory and R3.0? Will it
just take time for the devs to re-code for the new environment? Or is there
an alternative for this new version?

Thanks

Ben Caldwell

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] R help - bootstrap with survival analysis

2013-04-29 Thread Fayaaz Khatri
Hi,

I'm not sure if this is the proper way to ask questions, sorry if not.  But
here's my problem:

I'm trying to do a bootstrap estimate of the mean for some survival data.
Is there a way to specifically call upon the rmean value, in order to store
it in an object? I've used print(...,print.rmean=T) to print the summary of
survfit, but I'm not sure how to access only rmean because it does not show
up under attributes for survfit.

Thanks for any help in advance!

Fayaaz Khatri

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] bigmemory and R 3.0

2013-04-29 Thread Dirk Eddelbuettel

On 29 April 2013 at 15:46, Benjamin Caldwell wrote:
| Dear helpers,
| 
| Does anyone have information on the status of bigmemory and R3.0? Will it
| just take time for the devs to re-code for the new environment? Or is there
| an alternative for this new version?

It just works, with R 3.0.0 and other versions (see below).  Did you maybe
forget to reinstall any of the relevant packages?

Dirk

edd@max:~$ R

R version 3.0.0 (2013-04-03) -- Masked Marvel
Copyright (C) 2013 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.  

  Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

R library(bigmemory)
Loading required package: bigmemory.sri
Loading required package: BH

bigmemory = 4.0 is a major revision since 3.1.2; please see packages
biganalytics and and bigtabulate and http://www.bigmemory.org for more 
information.

R 

-- 
Dirk Eddelbuettel | e...@debian.org | http://dirk.eddelbuettel.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Is there a function that print a string vertically (by adding \n)?

2013-04-29 Thread jpm miao
Hi,

  I'd like to print a string vertically. For example, I would like to print
abcd as  a\nb\nc\nd

  Is there a function in R such that

Input: abcd
Output: a\nb\nc\nd?

   Thanks,

Miao

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Is there a function that print a string vertically (by adding \n)?

2013-04-29 Thread arun
Hi,
May be this helps:

 cat(paste(strsplit(abcd,)[[1]],collapse=\n))
#a
#b
#c
#d 
A.K.


- Original Message -
From: jpm miao miao...@gmail.com
To: r-help r-help@r-project.org
Cc: 
Sent: Monday, April 29, 2013 9:41 PM
Subject: [R] Is there a function that print a string vertically (by adding
\n)?

Hi,

  I'd like to print a string vertically. For example, I would like to print
abcd as  a\nb\nc\nd

  Is there a function in R such that

Input: abcd
Output: a\nb\nc\nd?

   Thanks,

Miao

    [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Looping Over Data Frames

2013-04-29 Thread David Winsemius

On Apr 29, 2013, at 6:03 PM, Sparks, John James wrote:

 Dear R Helpers,
 
 I am re-phrasing a question that I put forth earlier today due to some
 particulars in the solution that I am searching for.  Many thanks to those
 who answered the previous post and to any who would be willing to answer
 this one.
 
 I have a set of data frames.  I need to perform some data scrubbing on
 each of them.  I am trying to figure out how to perform the same steps on
 each data frame using some sort of loop or something along those lines.
 
 Because my actual data frames are quite large and the steps I am taking
 moderately complicated, I would very much prefer not to put them all
 together in a list because when I get an error, I can't determine which
 part of the list is the source of the error.  So, I would really
 appreciate it if someone could post a way to perform the following sub
 setting function on the three simple data frames in the example below with
 some sort of loop or something along those lines, which would work
 directly on the data frames in question.
 
 Many thanks in advance.  Please let me know if there is anything I can do
 to make the question more clear.
 
 --John Sparks
 
 
 x=as.data.frame(matrix(c(1,2,3,
1,2,3,
1,2,2,
1,2,2,
   1,1,1),ncol=3,byrow=T))
 
 y=as.data.frame(matrix(c(1,2,3,
1,2,3,
1,2,2,
1,2,2,
   1,1,1),ncol=3,byrow=T))
 
 z=as.data.frame(matrix(c(1,2,3,
1,2,3,
1,2,2,
1,2,2,
   1,1,1),ncol=3,byrow=T))
 
 #Want to build some sort of loop for this.
 x-subset(x,select=-c(V1))
 y-subset(y,select=-c(V1))
 z-subset(z,select=-c(V1))


 for(i in letters[24:26] ) assign( i, subset(get(i), select=-c(V1))  )
 x
  V2 V3
1  2  3
2  2  3
3  2  2
4  2  2
5  1  1


-- 

David Winsemius
Alameda, CA, USA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Question regarding error x and y lengths differ

2013-04-29 Thread Sean Doyle
Hello, I'm a first semester statistics
studenthttp://r.789695.n4.nabble.com/Question-regarding-error-quot-x-and-y-lengths-differ-quot-td4665773.html#and
I am using R for roughly the third time ever. I am following a
tutorial
and yet I still get the error x and y lengths differ. I am very new to
this program, and I have searched for solutions, but because I do not
understand the program too well, I am not sure which solution may apply to
me. Any help is much appreciated!

The question is as follows:

Do this given your population standard deviation. If we pick a confidence
interval, say á
= 0.1(90% confidence), we can compute a confidence interval for our measure
of the
population mean for each one of our samples. Now let¢s compute and plot the
confidence intervals for the 50 samples: m = 50; n = 40; mu = mean(pop);
sigma = sd(pop);
 SE = sigma/sqrt(n) # Standard error in mean
 alpha = 0.10 ; zstar = qnorm(1-alpha/2); # Find z for 90%
confidence
 matplot(rbind( samp_mean - zstar*SE, samp_mean +
zstar*SE),rbind(1:m,1:m), type=l, lty=1);
 abline(v=mu)

I am receiving the error Error in xy.coords(x, y, xlabel, ylabel, log =
log) :
  'x' and 'y' lengths differ when inputting  matplot(rbind( samp_mean -
zstar*SE, samp_mean +
zstar*SE),rbind(1:m,1:m), type=l, lty=1);

Thank you very much!

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Is there a function that print a string vertically (by adding \n)?

2013-04-29 Thread David Winsemius

On Apr 29, 2013, at 6:41 PM, jpm miao wrote:

 Hi,
 
  I'd like to print a string vertically. For example, I would like to print
 abcd as  a\nb\nc\nd
 
  Is there a function in R such that
 
 Input: abcd
 Output: a\nb\nc\nd?

 do.call( paste, list( strsplit(abcd, )[[1]] , collapse=\\n))
[1] a\\nb\\nc\\nd

Notice that I am refusing to acquiese by your request because I do not think 
you understand how escaped characters are represented in R. (In programming the 
customer is not always right.)

-- 

David Winsemius
Alameda, CA, USA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Is there a function that print a string vertically (by adding \n)?

2013-04-29 Thread David Winsemius

On Apr 29, 2013, at 6:50 PM, David Winsemius wrote:

 
 On Apr 29, 2013, at 6:41 PM, jpm miao wrote:
 
 Hi,
 
 I'd like to print a string vertically. For example, I would like to print
 abcd as  a\nb\nc\nd
 
 Is there a function in R such that
 
 Input: abcd
 Output: a\nb\nc\nd?
 
 do.call( paste, list( strsplit(abcd, )[[1]] , collapse=\\n))
 [1] a\\nb\\nc\\nd
 
 Notice that I am refusing to acquiese by your request because I do not think 
 you understand how escaped characters are represented in R. (In programming 
 the customer is not always right.)

Not is the programmer. I see that:
cat( a\nb\nc\nd)
... is probably what you wanted and my answer was not. Apologies for the noise.
-- 

David Winsemius
Alameda, CA, USA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] R/3.0.0 serialize limits

2013-04-29 Thread Sam.Moskwa
Hello,

I am wondering if I am misinterpreting something from R/3.0.0 NEWS


LONG VECTORS:

  This section applies only to 64-bit platforms.
...
o serialize() to a raw vector is unlimited in size (except by
  resources).


However when I try the following it fails:

 foo - raw(25)
 print(object.size(foo),units=auto)
2.3 Gb
 bar - serialize(foo, NULL)
Error: serialization is too large to store in a raw vector

However this works:

 foo - raw(20)
 print(object.size(foo),units=auto)
1.9 Gb
 bar - serialize(foo, NULL)


So it appears there may be a 2GB limit, which I've read should only be the case 
for 32-bit or pre-R/3.0.0 installations.

 sessionInfo()
R version 3.0.0 (2013-04-03)
Platform: x86_64-pc-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_AU.UTF-8   LC_NUMERIC=C  
 [3] LC_TIME=en_AU.UTF-8LC_COLLATE=en_AU.UTF-8
 [5] LC_MONETARY=en_AU.UTF-8LC_MESSAGES=en_AU.UTF-8   
 [7] LC_PAPER=C LC_NAME=C 
 [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_AU.UTF-8 LC_IDENTIFICATION=C   

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base 


I've tested this on a few different hosts with no success - Ubuntu Raring 
Ringtail (apt-get), SLES 11.2 (building R from source), and Windows binaries



Regards,
Sam

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Question regarding error x and y lengths differ

2013-04-29 Thread Jim Lemon

On 04/30/2013 11:38 AM, Sean Doyle wrote:

Hello, I'm a first semester statistics
studenthttp://r.789695.n4.nabble.com/Question-regarding-error-quot-x-and-y-lengths-differ-quot-td4665773.html#and
I am using R for roughly the third time ever. I am following a
tutorial
and yet I still get the error x and y lengths differ. I am very new to
this program, and I have searched for solutions, but because I do not
understand the program too well, I am not sure which solution may apply to
me. Any help is much appreciated!

The question is as follows:

Do this given your population standard deviation. If we pick a confidence
interval, say á
= 0.1(90% confidence), we can compute a confidence interval for our measure
of the
population mean for each one of our samples. Now let¢s compute and plot the
confidence intervals for the 50 samples:  m = 50; n = 40; mu = mean(pop);
sigma = sd(pop);

SE = sigma/sqrt(n) # Standard error in mean
alpha = 0.10 ; zstar = qnorm(1-alpha/2); # Find z for 90%

confidence

matplot(rbind( samp_mean - zstar*SE, samp_mean +

zstar*SE),rbind(1:m,1:m), type=l, lty=1);

abline(v=mu)


I am receiving the error Error in xy.coords(x, y, xlabel, ylabel, log =
log) :
   'x' and 'y' lengths differ when inputtingmatplot(rbind( samp_mean -
zstar*SE, samp_mean +
zstar*SE),rbind(1:m,1:m), type=l, lty=1);


Hi Sean,
Without doing your homework for you, I would suggest trying this:

length(samp_mean)
length(zstar*SE)
length(1:m)

I would be very surprised if you got the same answer for all three. Also 
you might want to read the help page for matplot carefully to decide 
which is the x and which is the y you want to plot.


Jim

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] bigmemory and R 3.0

2013-04-29 Thread Prof Brian Ripley

On 29/04/2013 23:46, Benjamin Caldwell wrote:

Dear helpers,

Does anyone have information on the status of bigmemory and R3.0? Will it
just take time for the devs to re-code for the new environment? Or is there
an alternative for this new version?


What are you asking about?  'bigmemory' has been available for R 3.0.0 
(sic) for a long time for all OSes bar Solaris and Windows, where the 
maintainers excluded it a long time ago (not just for R 3.0.0).




Thanks

Ben Caldwell

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Please see what it has to say about mis-reading R version numbers and 
HTML mail, and asking maintainers about their packages.



--
Brian D. Ripley,  rip...@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.