Re: [R] vectorized sub, gsub, grep, etc.

2015-07-31 Thread John Thaden
Adam,   
You reopened an old thread noting its age, but did you begin at its beginning?
> Subject: vectorized sub, gsub, grep, etc.> Date: Oct 7, 2008
> R pattern-matching and replacement functions are
> vectorized: they can operate on vectors of targets.
> However, they can only use one pattern and replacement.
> Here is code to apply a different pattern and replacement
> for every target.  My question: can it be done better?

sub2 <- function(pattern, replacement, x) {
len <- length(x)
if (length(pattern) == 1) 
pattern <- rep(pattern, len)
if (length(replacement) == 1) 
replacement <- rep(replacement, len)
FUN <- function(i, ...) {
sub(pattern[i], replacement[i], x[i], fixed = TRUE)
}
idx <- 1:length(x)
sapply(idx, FUN)
}

#Example
X <- c("ab", "cd", "ef")
patt <- c("b", "cd", "a")
repl <- c("B", "CD", "A")
sub2(patt, repl, X)
 
If you run that code, you'll see the correct answer is not "" "CD" "", it is

[1] "aB" "CD" "ef"

And the same answer is given by the shorter (but slower) code suggested later 
that day by Christos

mapply(function(p, r, x) sub(p, r, x, fixed = TRUE), p=patt, r=repl, x=X)

  b   cd    a
"aB" "CD" "ef"

By talking instead about simple string matching, I'm afraid you've rather 
hijacked the thread.

-John
 -John 

Adam wrote


  
 > I'm not sure I understand your question. Both functions return "" "CD" "" 
 > because they 
> perform exact string matching. The first demonstrates how string or character 
> replacements 
> can be vectorized, while the second merely demonstrates how Rcpp can 
> accelerate this type of operation.

On Jul 30, 2015, at 21:09, John Thaden  wrote:



| Can you show what is its solution for the original sample data? Why that 
discrepancy for you original sub2() function? 
|  From:"Adam Erickson" 
Date:Thu, Jul 30, 2015 at 6:11 pm
Subject:Re: [R] vectorized sub, gsub, grep, etc.

 Here is a Rcpp version for exact character matching (for example) written in 
C++ that is substantially faster. Hence, I think this is the way to go where 
loops may be unavoidable. However, the input vector length has to match the 
length of the pattern and replacement vectors, as your original code did. That 
can be changed though.
#include using namespace Rcpp;
// [[Rcpp::export]]CharacterVector subCPP(CharacterVector pattern, 
CharacterVector replacement, CharacterVector x) {  int len = x.size();  
CharacterVector y(len);  int patlen = pattern.size();  int replen = 
replacement.size();  if (patlen != replen)    Rcout<<"Error: Pattern and 
replacement length do not match";  for(int i = 0; i < patlen; ++i) {    if 
(*(char*)x[i] == *(char*)pattern[i])      y[x[i] == pattern[i]] = 
replacement[i];  }  return y;}
""   "CD" ""  

system.time(for(i in 1:5) subCPP(patt, repl, X))   user  system elapsed    
0.16    0.00    0.16 
Cheers,
Adam
On Wednesday, July 29, 2015 at 2:42:23 PM UTC-7, Adam Erickson wrote:
Further refining the vectorized (within a loop) exact string match function, I 
get times below 0.9 seconds while maintaining error checking. This is 
accomplished by removing which() and replacing 1:length() with seq_along().
sub2 <- function(pattern, replacement, x) {   len    <- length(x)   y      <- 
character(length=len)   patlen <- length(pattern)   replen <- 
length(replacement)   if(patlen != replen) stop('Error: Pattern and replacement 
length do not match')   for(i in seq_along(pattern)) {     y[x==pattern[i]] <- 
replacement[i]   }   return(y) }
system.time(for(i in 1:5) sub2(patt, repl, X))   user  system elapsed    
0.86    0.00    0.86 
Since the ordered vectors are perfectly aligned, might as well do an exact 
string match. Hence, I think this is not off-topic.
Cheers,
Adam
On Wednesday, July 29, 2015 at 8:15:52 AM UTC-7, Bert Gunter wrote:
There is confusion here. apply() family functions are **NOT**
vectorization -- they ARE loops (at the interpreter level), just done
in "functionalized" form. Please read background material (John
Chambers's books, MASS, or numerous others) to improve your
understanding and avoid posting erroneous comments.

Cheers,
Bert


Bert Gunter

"Data is not information. Information is not knowledge. And knowledge
is certainly not wisdom."
   -- Clifford Stoll


On Tue, Jul 28, 2015 at 3:00 PM, John Thaden  wrote:
> Adam,    The method you propose gives a different result than the prior 
> methods for these example vectors
> X <- c("ab", "cd", "ef")
> patt <- c("b", "cd", "a")
> repl <- c("B", "CD", "A")
>
> Old method 1
>
> mapply(function(p, r, x) sub(p, r, x, fixed = TRUE), p=patt, r=repl, x=X)
> gives
>   b   cd    a
> "aB" "CD" "ef"
>
> Old method 2
>
> sub2 <- function(pattern, replacement, x) {
>     len <- length(x)
>     if (length(pattern) == 1)
>         pattern <- rep(pattern, len)
>     if (length(replacement) == 1)
>         replacement <- rep(replacement, len)
>     FUN <- function(i, ...) {

[R] Error when compiling R-2.5.1 / *** [d-p-q-r-tests.Rout] Fehler 1

2015-07-31 Thread Joerg Kirschner
Hi everyone,
I am new to Linux and R - but I managed to build R-2.5.1 from source to use
it in Genepattern. Genepattern does only support R-2.5.1 which I could not
find anywhere for installation via apt-get or in the Ubuntu Software-Centre
(I am using Ubuntu 14.04 (Trusty Tahr) 32-bit)

But after doing

make check


I get

comparing 'method-dispatch.Rout' to './method-dispatch.Rout.save' ... OK
running code in 'd-p-q-r-tests.R' ...make[3]: *** [d-p-q-r-tests.Rout]
Fehler 1
make[3]: Verzeichnis »/home/karin/Downloads/R-2.5.1/tests« wird verlassen
make[2]: *** [test-Specific] Fehler 2
make[2]: Verzeichnis »/home/karin/Downloads/R-2.5.1/tests« wird verlassen
make[1]: *** [test-all-basics] Fehler 1
make[1]: Verzeichnis »/home/karin/Downloads/R-2.5.1/tests« wird verlassen
make: *** [check] Fehler 2


but I can make install and use R for simple plots etc. afterwards - still I
am worried something is wrong, can you give some advice.

A closer look at the error gives

> ## PR#7099 : pf() with large df1 or df2:
> nu <- 2^seq(25,34, 0.5)
> y <- 1e9*(pf(1,1,nu) - 0.68268949)
> stopifnot(All.eq(pf(1,1,Inf), 0.68268949213708596),
+   diff(y) > 0, # i.e. pf(1,1, *) is monotone increasing
+   All.eq(y [1], -5.07420372386491),
+   All.eq(y[19],  2.12300110824515))
Error: All.eq(y[1], -5.07420372386491) is not TRUE
Execution halted


As I understand so far some errors are critical some are not - can you
please give some advice on the error above? Can I still use R installed
with that error? What do I need to solve the error?

Thanks, Joerg

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Clarification on Simulation and Iteration

2015-07-31 Thread David Winsemius

On Jul 31, 2015, at 8:41 PM, Christopher Kelvin wrote:

> Thanks Dave.
> 
> What I actually want is to obtain say 10, different sets of (n=50) data for 
> every 10,000 iterations I run. You will realise that the current code 
> produces one set of data (n=50). I want 10 different sets of 50 observations 
> at one run. I hope this makes sense.

I would think that either `replicate(50, ...)` or `for(i in 1:50) {...}` would 
suffice. Unless of course the phrase "want 10 different sets of 50 observations 
at one run` means something different than it appears to request.


> 
> Chris Guure 
> 
> 
> 
> On Saturday, August 1, 2015 3:32 AM, David Winsemius  
> wrote:
> 
> 
> On Jul 31, 2015, at 6:36 PM, Christopher Kelvin via R-help wrote:
> 
>> Dear All,
>> I am performing some simulations for a new model. I run about 10,000 
>> iterations with a sample of 50 datasets and this returns one set of 50 
>> simulated data. 
>> 
>> Now, what I need to obtain is 10 sets of the 50 simulated data out of the 
>> 10,000 iterations and not just only 1 set.  The model is the Copas selection 
>> model for publication bias in Mete-analysis. Any one who knows this model 
>> has any suggestion for the improvement of my code is most welcome.
>> 
>> Below is my code. 
>> 
>> 
>> Kind regards
>> 
>> 
>> Chris Guure
>> University of Ghana
>> 
>> 
>> 
>> 
>> install.packages("msm") 
>> library(msm) 
>> 
>> 
>> rho1=-0.3; tua=0.020; n=50; d=-0.2; rr=1; a=-1.3; b=0.06 
>> si<-rtnorm(n, mean=0, sd=1, lower=0, upper=0.2)# I used this to generate 
>> standard errors for each study 
>> set.seed(2)   ## I have stored the data and the output in this seed 
>> 
>> for( i in 1:rr){ 
>> 
>> mu<-rnorm(n,d,tua^2)  # prob. of each effect estimate 
>> rho<-si*rho1/sqrt(tua^2 + si^2) # estimate of the correlation coefficient 
>> mu0<- a + b/si   # mean of the truncated normal model (Copas selection 
>> model) 
>> y1<-rnorm(mu,si^2)# observed effects zise 
>> z<-rnorm(mu0,1)   # selection model 
>> rho2<-cor(y1, z) 
>> 
>> select<-pnorm((mu0 + rho*(y1-mu)/sqrt(tua^2 + si^2))/sqrt(1-rho^2)) 
>> probselect<-ifelse(select> 
>> probselect 
>> data<-data.frame(probselect,si)# this contains both include and exclude 
>> data 
>> data 
>> data1<-data[complete.cases(data),] # Contains only the included data for 
>> analysis 
>> data1 
>> 
>> 
>> }
>> 
> 
> OK. The code runs without error. So  what exactly is the problem? (I have 
> no experience with the Copas selection model if in fact that is what is being 
> exemplified.)
> 
> -- 
> 
> David Winsemius
> Alameda, CA, USA

David Winsemius
Alameda, CA, USA

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] vectorized sub, gsub, grep, etc.

2015-07-31 Thread Adam Erickson
I'm not sure I understand your question. Both functions return "" "CD" "" 
because they perform exact string matching. The first demonstrates how string 
or character replacements can be vectorized, while the second merely 
demonstrates how Rcpp can accelerate this type of operation.

Cheers,

Adam

> On Jul 30, 2015, at 21:09, John Thaden  wrote:
> 
> 
> Can you show what is its solution for the original sample data? Why that 
> discrepancy for you original sub2() function?
> 
> From:"Adam Erickson" 
> Date:Thu, Jul 30, 2015 at 6:11 pm
> Subject:Re: [R] vectorized sub, gsub, grep, etc.
> 
> Here is a Rcpp version for exact character matching (for example) written in 
> C++ that is substantially faster. Hence, I think this is the way to go where 
> loops may be unavoidable. However, the input vector length has to match the 
> length of the pattern and replacement vectors, as your original code did. 
> That can be changed though.
> 
> #include 
> using namespace Rcpp;
> 
> // [[Rcpp::export]]
> CharacterVector subCPP(CharacterVector pattern, CharacterVector replacement, 
> CharacterVector x) {
>   int len = x.size();
>   CharacterVector y(len);
>   int patlen = pattern.size();
>   int replen = replacement.size();
>   if (patlen != replen)
> Rcout<<"Error: Pattern and replacement length do not match";
>   for(int i = 0; i < patlen; ++i) {
> if (*(char*)x[i] == *(char*)pattern[i])
>   y[x[i] == pattern[i]] = replacement[i];
>   }
>   return y;
> }
> 
> ""   "CD" ""  
> 
> system.time(for(i in 1:5) subCPP(patt, repl, X))
>user  system elapsed 
>0.160.000.16 
> 
> Cheers,
> 
> Adam
> 
>> On Wednesday, July 29, 2015 at 2:42:23 PM UTC-7, Adam Erickson wrote:
>> Further refining the vectorized (within a loop) exact string match function, 
>> I get times below 0.9 seconds while maintaining error checking. This is 
>> accomplished by removing which() and replacing 1:length() with seq_along().
>> 
>> sub2 <- function(pattern, replacement, x) {
>>len<- length(x)
>>y  <- character(length=len)
>>patlen <- length(pattern)
>>replen <- length(replacement)
>>if(patlen != replen) stop('Error: Pattern and replacement length do not 
>> match')
>>for(i in seq_along(pattern)) {
>>  y[x==pattern[i]] <- replacement[i]
>>}
>>return(y)
>>  }
>> 
>> system.time(for(i in 1:5) sub2(patt, repl, X))
>>user  system elapsed 
>>0.860.000.86 
>> 
>> Since the ordered vectors are perfectly aligned, might as well do an exact 
>> string match. Hence, I think this is not off-topic.
>> 
>> Cheers,
>> 
>> Adam
>> 
>>> On Wednesday, July 29, 2015 at 8:15:52 AM UTC-7, Bert Gunter wrote:
>>> There is confusion here. apply() family functions are **NOT** 
>>> vectorization -- they ARE loops (at the interpreter level), just done 
>>> in "functionalized" form. Please read background material (John 
>>> Chambers's books, MASS, or numerous others) to improve your 
>>> understanding and avoid posting erroneous comments. 
>>> 
>>> Cheers, 
>>> Bert 
>>> 
>>> 
>>> Bert Gunter 
>>> 
>>> "Data is not information. Information is not knowledge. And knowledge 
>>> is certainly not wisdom." 
>>>-- Clifford Stoll 
>>> 
>>> 
>>> On Tue, Jul 28, 2015 at 3:00 PM, John Thaden  wrote: 
>>> > Adam,The method you propose gives a different result than the prior 
>>> > methods for these example vectors 
>>> > X <- c("ab", "cd", "ef") 
>>> > patt <- c("b", "cd", "a") 
>>> > repl <- c("B", "CD", "A") 
>>> > 
>>> > Old method 1 
>>> > 
>>> > mapply(function(p, r, x) sub(p, r, x, fixed = TRUE), p=patt, r=repl, x=X) 
>>> > gives 
>>> >   b   cda 
>>> > "aB" "CD" "ef" 
>>> > 
>>> > Old method 2 
>>> > 
>>> > sub2 <- function(pattern, replacement, x) { 
>>> > len <- length(x) 
>>> > if (length(pattern) == 1) 
>>> > pattern <- rep(pattern, len) 
>>> > if (length(replacement) == 1) 
>>> > replacement <- rep(replacement, len) 
>>> > FUN <- function(i, ...) { 
>>> > sub(pattern[i], replacement[i], x[i], fixed = TRUE) 
>>> > } 
>>> > idx <- 1:length(x) 
>>> > sapply(idx, FUN) 
>>> > } 
>>> > sub2(patt, repl, X) 
>>> >  gives 
>>> > [1] "aB" "CD" "ef" 
>>> > 
>>> > Your method (I gave it the unique name "sub3") 
>>> >  sub3 <- function(pattern, replacement, x) {   len<- length(x)  y 
>>> >  <- character(length=len)  patlen <- length(pattern)  replen <- 
>>> > length(replacement)  if(patlen != replen) stop('Error: Pattern and 
>>> > replacement length do not match')  for(i in 1:replen) {
>>> > y[which(x==pattern[i])] <- replacement[i]  }  return(y)}sub3(patt, repl, 
>>> > X) 
>>> > gives[1] ""   "CD" "" 
>>> > 
>>> > Granted, whatever it does, it does it faster 
>>> > #Old method 1 
>>> > system.time(for(i in 1:5) 
>>> > mapply(function(p,r,x) sub(p,r,x, fixed = TRUE),p=patt,r=repl,x=X)) 
>>> >user  system elapsed 
>>> >2.530.002.52 
>>> > 
>>> > #Old method 2 
>>> > system.time(for(i in 1:5)sub2

Re: [R] Clarification on Simulation and Iteration

2015-07-31 Thread David Winsemius

On Jul 31, 2015, at 6:36 PM, Christopher Kelvin via R-help wrote:

> Dear All,
> I am performing some simulations for a new model. I run about 10,000 
> iterations with a sample of 50 datasets and this returns one set of 50 
> simulated data. 
> 
> Now, what I need to obtain is 10 sets of the 50 simulated data out of the 
> 10,000 iterations and not just only 1 set.  The model is the Copas selection 
> model for publication bias in Mete-analysis. Any one who knows this model has 
> any suggestion for the improvement of my code is most welcome.
> 
> Below is my code. 
> 
> 
> Kind regards
> 
> 
> Chris Guure
> University of Ghana
> 
> 
> 
> 
> install.packages("msm") 
> library(msm) 
> 
> 
> rho1=-0.3; tua=0.020; n=50; d=-0.2; rr=1; a=-1.3; b=0.06 
> si<-rtnorm(n, mean=0, sd=1, lower=0, upper=0.2)# I used this to generate 
> standard errors for each study 
> set.seed(2)   ## I have stored the data and the output in this seed 
> 
> for( i in 1:rr){ 
> 
> mu<-rnorm(n,d,tua^2)  # prob. of each effect estimate 
> rho<-si*rho1/sqrt(tua^2 + si^2) # estimate of the correlation coefficient 
> mu0<- a + b/si   # mean of the truncated normal model (Copas selection 
> model) 
> y1<-rnorm(mu,si^2)# observed effects zise 
> z<-rnorm(mu0,1)   # selection model 
> rho2<-cor(y1, z) 
> 
> select<-pnorm((mu0 + rho*(y1-mu)/sqrt(tua^2 + si^2))/sqrt(1-rho^2)) 
> probselect<-ifelse(select 
> probselect 
> data<-data.frame(probselect,si)# this contains both include and exclude 
> data 
> data 
> data1<-data[complete.cases(data),] # Contains only the included data for 
> analysis 
> data1 
> 
> 
> }
> 

OK. The code runs without error. So  what exactly is the problem? (I have 
no experience with the Copas selection model if in fact that is what is being 
exemplified.)

-- 

David Winsemius
Alameda, CA, USA

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Clarification on Simulation and Iteration

2015-07-31 Thread Christopher Kelvin via R-help
Thanks Dave.

What I actually want is to obtain say 10, different sets of (n=50) data for 
every 10,000 iterations I run. You will realise that the current code produces 
one set of data (n=50). I want 10 different sets of 50 observations at one run. 
I hope this makes sense.

Chris Guure 



On Saturday, August 1, 2015 3:32 AM, David Winsemius  
wrote:


On Jul 31, 2015, at 6:36 PM, Christopher Kelvin via R-help wrote:

> Dear All,
> I am performing some simulations for a new model. I run about 10,000 
> iterations with a sample of 50 datasets and this returns one set of 50 
> simulated data. 
> 
> Now, what I need to obtain is 10 sets of the 50 simulated data out of the 
> 10,000 iterations and not just only 1 set.  The model is the Copas selection 
> model for publication bias in Mete-analysis. Any one who knows this model has 
> any suggestion for the improvement of my code is most welcome.
> 
> Below is my code. 
> 
> 
> Kind regards
> 
> 
> Chris Guure
> University of Ghana
> 
> 
> 
> 
> install.packages("msm") 
> library(msm) 
> 
> 
> rho1=-0.3; tua=0.020; n=50; d=-0.2; rr=1; a=-1.3; b=0.06 
> si<-rtnorm(n, mean=0, sd=1, lower=0, upper=0.2)# I used this to generate 
> standard errors for each study 
> set.seed(2)   ## I have stored the data and the output in this seed 
> 
> for( i in 1:rr){ 
> 
> mu<-rnorm(n,d,tua^2)  # prob. of each effect estimate 
> rho<-si*rho1/sqrt(tua^2 + si^2) # estimate of the correlation coefficient 
> mu0<- a + b/si   # mean of the truncated normal model (Copas selection 
> model) 
> y1<-rnorm(mu,si^2)# observed effects zise 
> z<-rnorm(mu0,1)   # selection model 
> rho2<-cor(y1, z) 
> 
> select<-pnorm((mu0 + rho*(y1-mu)/sqrt(tua^2 + si^2))/sqrt(1-rho^2)) 
> probselect<-ifelse(select 
> probselect 
> data<-data.frame(probselect,si)# this contains both include and exclude 
> data 
> data 
> data1<-data[complete.cases(data),] # Contains only the included data for 
> analysis 
> data1 
> 
> 
> }
> 

OK. The code runs without error. So  what exactly is the problem? (I have 
no experience with the Copas selection model if in fact that is what is being 
exemplified.)

-- 

David Winsemius
Alameda, CA, USA

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Clarification on Simulation and Iteration

2015-07-31 Thread Christopher Kelvin via R-help
Dear All,
I am performing some simulations for a new model. I run about 10,000 iterations 
with a sample of 50 datasets and this returns one set of 50 simulated data. 

Now, what I need to obtain is 10 sets of the 50 simulated data out of the 
10,000 iterations and not just only 1 set.  The model is the Copas selection 
model for publication bias in Mete-analysis. Any one who knows this model has 
any suggestion for the improvement of my code is most welcome.

Below is my code. 


Kind regards


Chris Guure
University of Ghana




install.packages("msm") 
library(msm) 


rho1=-0.3; tua=0.020; n=50; d=-0.2; rr=1; a=-1.3; b=0.06 
si<-rtnorm(n, mean=0, sd=1, lower=0, upper=0.2)# I used this to generate 
standard errors for each study 
set.seed(2)   ## I have stored the data and the output in this seed 

for( i in 1:rr){ 

mu<-rnorm(n,d,tua^2)  # prob. of each effect estimate 
rho<-si*rho1/sqrt(tua^2 + si^2) # estimate of the correlation coefficient 
mu0<- a + b/si   # mean of the truncated normal model (Copas selection 
model) 
y1<-rnorm(mu,si^2)# observed effects zise 
z<-rnorm(mu0,1)   # selection model 
rho2<-cor(y1, z) 

select<-pnorm((mu0 + rho*(y1-mu)/sqrt(tua^2 + si^2))/sqrt(1-rho^2)) 
probselect<-ifelse(selecthttps://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] x11() hangs in 3.2.1

2015-07-31 Thread Steven Backus
marc_schwa...@me.com writes:

> First, just an FYI, that this would be better posted to R-SIG-Fedora:

>   https://stat.ethz.ch/mailman/listinfo/r-sig-fedora

Thanks, I'll give it a try.

> Can you run:
>   capabilities()
> in a terminal session and see what it shows for X11:

Yes, here is the output:

   jpeg pngtiff   tcltk X11aqua 
   TRUETRUETRUE   FALSETRUE   FALSE 
   http/ftp sockets  libxmlfifo  cledit   iconv 
   TRUETRUETRUETRUETRUETRUE 
NLS profmem   cairo ICU long.double libcurl 
   TRUE   FALSETRUE   FALSETRUE   FALSE 

> Did you install R via local compilation?

Yes, local compilation.

> Presuming local compilation, I would check your configure and build
> logs for warnings/errors. It is possible that you are missing an
> X11 header or lib someplace.

The config.log file has lots of errors but nothing obviously
related to this problem.

Thanks for your help,
  Steve
-- 
Steven J. BackusComputer Systems Manager
University of Utah  E-Mail:  steven.bac...@utah.edu
Genetic EpidemiologyAlternate:  bac...@math.utah.edu
391 Chipeta Way -- Suite D  Office:  801.587.9308
Salt Lake City, UT 84108-1266   http://www.math.utah.edu/~backus

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] x11() hangs in 3.2.1

2015-07-31 Thread Rich Shepard

On Fri, 31 Jul 2015, Marc Schwartz wrote:


Can you run:
 capabilities()


  FWIW, on Slackware-14.1 I've had no issues with X11().


capabilities()

   jpeg pngtiff   tcltk X11aqua
   TRUETRUETRUETRUETRUE   FALSE
   http/ftp sockets  libxmlfifo  cledit   iconv
   TRUETRUETRUETRUETRUETRUE
NLS profmem   cairo ICU long.double libcurl
   TRUE   FALSETRUETRUETRUETRUE

  Built from the SlackBuilds.org script I've been using.

Rich

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Exclude 2014 data from mean

2015-07-31 Thread Bert Gunter
This is very basic. Have you made **any** effort to learn R -- e.g. by
going through an R tutorial? If not, please do this before posting
further. This will save you -- and foks on this list, probably -- a
lot of grief in the long (or even short) run.

Also, if/when you do post further, post in plain text, not HTML, as
requested by the posting guide below.

Cheers,
Bert
Bert Gunter

"Data is not information. Information is not knowledge. And knowledge
is certainly not wisdom."
   -- Clifford Stoll


On Fri, Jul 31, 2015 at 11:49 AM, Adam Jauregui  wrote:
> Hello R-help,
>
> I am trying to compute the mean of a quarterback's career fantasy football
> stats, but I wish to exclude his 2014 stats from the mean, as that will be
> the test data for the model I am trying to build for my academic undergrad
> research.
>
> The code for figuring out the mean of his Yds for every career Game 1 was
> simple:
>
>
> *mean(brady.t$Yds[brady.t$G. == 1])*
> How can I make an "if-then" statement though so that his 2014 stats are
> excluded? Or is there an easier way besides "if-then?"
>
> Thank you,
>
> AKJ
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Exclude 2014 data from mean

2015-07-31 Thread Marc Schwartz

> On Jul 31, 2015, at 1:49 PM, Adam Jauregui  wrote:
> 
> Hello R-help,
> 
> I am trying to compute the mean of a quarterback's career fantasy football
> stats, but I wish to exclude his 2014 stats from the mean, as that will be
> the test data for the model I am trying to build for my academic undergrad
> research.
> 
> The code for figuring out the mean of his Yds for every career Game 1 was
> simple:
> 
> 
> *mean(brady.t$Yds[brady.t$G. == 1])*
> How can I make an "if-then" statement though so that his 2014 stats are
> excluded? Or is there an easier way besides "if-then?"
> 
> Thank you,
> 
> AKJ


It would be helpful to have a sample of data to know the structure, but take a 
look at:

  ?subset

for examples of more complicated logic for subsetting data frames. 

You might be able to use something along the lines of:

  mean(subset(brady.t, (G. == 1) & (Year != 2014), select = Yds)[[1]])

Basically, subset() is returning a data frame where Year does not equal 2014 
and G. is equal to 1. The select argument is only returning the Yds column, 
which would otherwise be a list, so the [[1]] only returns a vector, which is 
passed to mean().

Regards,

Marc Schwartz

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Exclude 2014 data from mean

2015-07-31 Thread Sarah Goslee
Hi Adam,

Possibly subset() or & would be helpful. Or even aggregate(),
depending on your ultimate goal.

Without a reproducible example that includes some sample data provided
using dput() (fake is fine), the code you used, and some clear idea of
what output you expect, it's impossible to figure out how to help you.
Here are some suggestions for creating a good reproducible example:
http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example

Sarah


On Fri, Jul 31, 2015 at 2:49 PM, Adam Jauregui  wrote:
> Hello R-help,
>
> I am trying to compute the mean of a quarterback's career fantasy football
> stats, but I wish to exclude his 2014 stats from the mean, as that will be
> the test data for the model I am trying to build for my academic undergrad
> research.
>
> The code for figuring out the mean of his Yds for every career Game 1 was
> simple:
>
>
> *mean(brady.t$Yds[brady.t$G. == 1])*
> How can I make an "if-then" statement though so that his 2014 stats are
> excluded? Or is there an easier way besides "if-then?"
>
> Thank you,
>
> AKJ
>
> [[alternative HTML version deleted]]
>

-- 
Sarah Goslee
http://www.functionaldiversity.org

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] x11() hangs in 3.2.1

2015-07-31 Thread Marc Schwartz

> On Jul 31, 2015, at 2:22 PM, Steven Backus  wrote:
> 
> I'm on RHEL 6.6, R version 3.2.1 Patched (2015-07-30 r68761) --
> "World-Famous Astronaut".  Issuing the x11() command hangs R and
> does not complete.  A window is partially drawn then freezes.
> Does anyone know of a solution?
> 
> Thanks,
>  Steve


First, just an FYI, that this would be better posted to R-SIG-Fedora:

  https://stat.ethz.ch/mailman/listinfo/r-sig-fedora


Can you run:

  capabilities()

in a terminal session and see what it shows for X11:

> capabilities()
   jpeg pngtiff   tcltk X11aqua 
   TRUETRUETRUETRUETRUETRUE 
   http/ftp sockets  libxmlfifo  cledit   iconv 
   TRUETRUETRUETRUETRUETRUE 
NLS profmem   cairo ICU long.double libcurl 
   TRUETRUETRUETRUETRUETRUE 


The above is on a Mac, just for clarity.

Did you install R via local compilation or via RPMS from the EPEL? I have not 
looked to see if RPMS are available for the patched version.

Presuming local compilation, I would check your configure and build logs for 
warnings/errors. It is possible that you are missing an X11 header or lib 
someplace.


Regards,

Marc Schwartz

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] x11() hangs in 3.2.1

2015-07-31 Thread Steven Backus
I'm on RHEL 6.6, R version 3.2.1 Patched (2015-07-30 r68761) --
"World-Famous Astronaut".  Issuing the x11() command hangs R and
does not complete.  A window is partially drawn then freezes.
Does anyone know of a solution?

Thanks,
  Steve
-- 
Steven J. BackusComputer Systems Manager
University of Utah  E-Mail:  steven.bac...@utah.edu
Genetic EpidemiologyAlternate:  bac...@math.utah.edu
391 Chipeta Way -- Suite D  Office:  801.587.9308
Salt Lake City, UT 84108-1266   http://www.math.utah.edu/~backus

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Exclude 2014 data from mean

2015-07-31 Thread Adam Jauregui
Hello R-help,

I am trying to compute the mean of a quarterback's career fantasy football
stats, but I wish to exclude his 2014 stats from the mean, as that will be
the test data for the model I am trying to build for my academic undergrad
research.

The code for figuring out the mean of his Yds for every career Game 1 was
simple:


*mean(brady.t$Yds[brady.t$G. == 1])*
How can I make an "if-then" statement though so that his 2014 stats are
excluded? Or is there an easier way besides "if-then?"

Thank you,

AKJ

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Using latticeExtra as.layer function with different number of plot panels

2015-07-31 Thread sbihorel

Thanks David,

I was hoping for something a little bit more generic and less 
case-by-case basis.


Sebastien

On 7/30/2015 3:51 PM, David Winsemius wrote:

On Jul 30, 2015, at 8:37 AM, sbihorel wrote:


Hi,

When the as.layer function is used to overaly 2 lattice plots, there seems to be an assumption that 
the data used in both plots will generate the same number of panels (and, I believe, in the same 
order). In case the data used in the plot within the as.layer call is incomplete , data may be 
plotted on the "wrong" panel, and data seem to get re-used on the last panel(s). See what 
happens in the example code below when the records with state.region=="South" are 
dropped...

Is there a trick to overlay panel based upon the conditioning variable value 
rather than the panel order?

require(lattice)
require(latticeExtra)
state2 <- state <- data.frame(state.x77,state.region)
state2$Income <- sample(state2$Income)
state3 <- state2[which(state2$state.region!="South"),]
foo <- xyplot(Income~Population|state.region,data=state,main='foo')
foo

bar <- update(foo,main='bar') + 
as.layer(xyplot(Income~Population|state.region,data=state2,col='red'))
bar

bar2 <- update(foo,main='bar2') + 
as.layer(xyplot(Income~Population|state.region,data=state3,col='red'))
bar2

I don't know if this works using the `+.lattice` function but it is possible to 
selectively update panels using `trellis.focus`



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R parallel / foreach - aggregation of results

2015-07-31 Thread jim holtman
Try this chance to actually return values:


library(doParallel)
Simpar3 <- function(n1) {
   L2distance <- matrix(NA, ncol=n1, nrow=n1)
   data <- rnorm(n1)
   diag(L2distance)=0
   cl <- makeCluster(4)
   registerDoParallel(cl)
   x <- foreach(j=1:n1)  %dopar% {
 library(np)
 datj <- data[j]
 for(k in j:n1) {
   L2distance[j,k] <- k*datj
 }
 L2distance  # return the value
   }
   stopCluster(cl)
   return(x)
 }
 Res <- Simpar3(100)


Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

On Fri, Jul 31, 2015 at 8:39 AM, Martin Spindler 
wrote:

> Dear all,
>
> when I am running the code attached below, it seems that no results are
> returned, only the predefined NAs. What mistake do I make?
> Any comments and help is highly appreciated.
>
> Thanks and best,
>
> Martin
>
>
> Simpar3 <- function(n1) {
>   L2distance <- matrix(NA, ncol=n1, nrow=n1)
>   data <- rnorm(n1)
>   diag(L2distance)=0
>   cl <- makeCluster(4)
>   registerDoParallel(cl)
>   foreach(j=1:n1)  %dopar% {
> library(np)
> datj <- data[j]
> for(k in j:n1) {
>   L2distance[j,k] <- k*datj
> }
>   }
>   stopCluster(cl)
>   return(L2distance)
> }
>
> Res <- Simpar3(100)
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to simulate informative censoring in a Cox PH model?

2015-07-31 Thread Greg Snow
Daniel,

Basically just responding to your last paragraph (the others are
interesting, but I think that you are learning as much as anyone and I
don't currently have any other suggestions).

I am not an expert on copulas, so this is a basic understanding, you
should learn more about them if you choose to use them.  The main idea
of a copula is that it is a bivariate or multivariate distribution
where all the variables have uniform marginal distributions but the
variables are not independent from each other.  How I would suggest
using them is to choose a copula and generate random points from a
bivariate copula, then put those (uniform) values into the inverse pdf
function for the Weibull (or other distribution), one of which is the
event time, the other the censoring time.  This will give you times
that (marginally) come from the distributions of interest, but are not
independent (so would be considered informative censoring).  Repeat
this with different levels of relationship in the copula to see how
much difference it makes in your simulations.

On Thu, Jul 30, 2015 at 2:02 PM, Daniel Meddings  wrote:
> Thanks Greg once more for taking the time to reply. I certainly agree that
> this is not a simple set-up, although it is realistic I think. In short you
> are correct about model mis-specification being the key to producing more
> biased estimates under informative than under non-informative censoring.
> After looking again at my code and trying various things I realize that the
> key factor that leads to the informative and non-informative censoring data
> giving rise to the same biased estimates is how I generate my Z_i variable,
> and also the magnitude of the Z_i coefficient in both of the event and
> informative censoring models.
>
> In the example I gave I generated Z_i (I think of this as a "poor prognosis"
> variable) from a beta distribution so that it ranged from 0-1. The biased
> estimates for "beta_t_1" (I think of this as the effect of a treatment on
> survival) were approximately 1.56 when the true value was -1. What I forgot
> to mention was that estimating a cox model with 1,000,000 subjects to the
> full data (i.e. no censoring at all) arguably gives the best treatment
> effect estimate possible given that the effects of Z_i and Z_i*Treat_i are
> not in the model. This "best possible" estimate turns out to be 1.55 - i.e.
> the example I gave just so happens to be such that even with 25-27%
> censoring, the estimates obtained are almost the best that can be attained.
>
> My guess is that the informative censoring does not bias the estimate more
> than non-informative censoring because the only variable not accounted for
> in the model is Z_i which does not have a large enough effect "beta_t_2",
> and/or "beta_c_2", or perhaps because Z_i only has a narrow range which does
> not permit the current "beta_t_2" value to do any damage?
>
> To investigate the "beta_t_2", and/or "beta_c_2" issue I changed "beta_c_2"
> from 2 to 7 and "beta_c_0" from 0.2 to -1.2, and "beta_d_0" from -0.7 to
> -0.4 to keep the censoring %'s equal at about 30%. This time the "best
> possible" estimate of "beta_t_1" was -1.53 which was similar to that
> obtained previously. The informative censoring gave an estimate for
> "beta_t_1" of -1.49 whereas the non-informative censoring gave -1.53 - this
> time the non-informative censoring attains the "best possible" but the
> non-informative censoring does not.
>
>
>
> I then instead changed "beta_t_2" from 1 to 7 and "beta_c_0" from 0.2 to 2,
> and "beta_d_0" from -0.7 to -1.9 again to keep the censoring %'s equal at
> about 30%. This time the "best possible" estimate of "beta_t_1" was -0.999
> which is pretty much equal to the true value of -1. The informative
> censoring gave an estimate for "beta_t_1" of -1.09 whereas the
> non-informative censoring gave -0.87 – surprisingly this time the
> informative censoring is slightly closer to the “best possible” than the
> non-informative censoring.
>
>
>
> To investigate the Z_i issue I generated it from a normal distribution with
> mean 1 and variance 1. I changed "beta_c_0 " from 0.2 to -0.5 to keep the
> censoring levels equal (this time about 30% for both). This time the "best
> possible" estimate was -1.98 which was further from -1 than previous
> examples. The informative censoring gave an estimate for "beta_t_1" of -1.81
> whereas the non-informative censoring gave -1.84. So again the informative
> censoring gives an estimate closer to the "best possible" when compared with
> the informative censoring, but this time it does not attain the "best
> possible".
>
> In conclusion it is clear to me that a stronger Z_i effect in the censoring
> model causes the informative censoring to be worse than the non-informative
> one (as expected), but a stronger Z_i effect in the event model does not
> have this effect and even causes the independent censoring to be worse –
> this in general may not hold but I nonetheless see it here.

Re: [R] trojan with R download

2015-07-31 Thread Marc Schwartz

> On Jul 31, 2015, at 5:55 AM, tom walk  wrote:
> 
> 
> 
> 
> I am working in China for a month and needed to download an earlier version 
> of R in order to use Deseq2 and its requirements. The download got to the 
> last few seconds and hung up. A trojan was found. It could be coincidence 
> that it happened when I was downloading R, or perhaps a man in the middle 
> added a little something. Anyway, I thought you might be interested. You 
> might want to check on this source and others from this server.
> 
> 
> https://mirrors.ustc.edu.cn/CRAN/bin/windows/base/old/3.0.3/R-3.0.3-win.exe


These things are typically false positives due to overly aggressive filtering.

I downloaded the above file from the same server:

$ md5 R-3.0.3-win.exe
MD5 (R-3.0.3-win.exe) = 446db51e5c188ed2dccbd44dfa5f4aa9

The official MD5 value from the main CRAN server at:

  https://cran.r-project.org/bin/windows/base/old/3.0.3/md5sum.txt

is:

  446db51e5c188ed2dccbd44dfa5f4aa9 *R-3.0.3-win.exe

So unless that hash value was compromised centrally...which if that is the 
case, it has been long enough that mirrors would probably reflect that as well.

Presuming you can get to a different server, try it to see what happens.

Regards,

Marc Schwartz

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] question about implementation of the R

2015-07-31 Thread Marc Schwartz

> On Jul 31, 2015, at 8:38 AM, Beka Biashvili via R-help  
> wrote:
> 
> Dear Sir/MadamPlease provide me with the information how to implement R, what 
> is a steps of implementation and obligations, difficulties and etc.
> thank you in advance Mr. Beka Biashvili
> Quality Management System


I would start with the R Installation and Administration Manual here:

  https://cran.r-project.org/manuals.html

and then other manuals as may be apropos for your needs.

If you need some documentation on R’s SDLC or perhaps any clinical trial 
related regulatory guidance:

  https://www.r-project.org/certification.html


Regards,

Marc Schwartz

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] trojan with R download

2015-07-31 Thread tom walk



I am working in China for a month and needed to download an earlier version of 
R in order to use Deseq2 and its requirements. The download got to the last few 
seconds and hung up. A trojan was found. It could be coincidence that it 
happened when I was downloading R, or perhaps a man in the middle added a 
little something. Anyway, I thought you might be interested. You might want to 
check on this source and others from this server.


https://mirrors.ustc.edu.cn/CRAN/bin/windows/base/old/3.0.3/R-3.0.3-win.exe
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] question about implementation of the R

2015-07-31 Thread Beka Biashvili via R-help
Dear Sir/MadamPlease provide me with the information how to implement R, what 
is a steps of implementation and obligations, difficulties and etc.
thank you in advance Mr. Beka Biashvili
Quality Management System


ISO 9001:2008 Lead Auditor 





18 Lortkipanidze street, Tbilisi, Georgia.


mob: +995 599 414547   

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R parallel / foreach - aggregation of results

2015-07-31 Thread Jon Skoien

Martin,

I think the main problem is that you are trying to assign your results 
to the result matrix inside the foreach loop. Parallel functions in R 
are generally not good at updating parts of matrices from the different 
workers in this way. Instead, using e.g. foreach, each loop of the 
foreach-call has to return a vector which can be cbind-ed to a result 
matrix. Something like:


L2distance = foreach(j=1:n1, .combine = cbind)  %dopar% {
res = rep(NA, 10)
for (k in j:n1) res[k] = k*data[j]
res
}
L2distance

I am not sure what the np-library is, but you should consider putting it 
in a clusterExport-call after creating the cluster.


Best wishes,
Jon


On 7/31/2015 2:39 PM, Martin Spindler wrote:

Dear all,

when I am running the code attached below, it seems that no results are 
returned, only the predefined NAs. What mistake do I make?
Any comments and help is highly appreciated.

Thanks and best,

Martin


Simpar3 <- function(n1) {
   L2distance <- matrix(NA, ncol=n1, nrow=n1)
   data <- rnorm(n1)
   diag(L2distance)=0
   cl <- makeCluster(4)
   registerDoParallel(cl)
   foreach(j=1:n1)  %dopar% {
 library(np)
 datj <- data[j]
 for(k in j:n1) {
   L2distance[j,k] <- k*datj
 }
   }
   stopCluster(cl)
   return(L2distance)
}

Res <- Simpar3(100)

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Jon Olav Skøien
Joint Research Centre - European Commission
Institute for Environment and Sustainability (IES)
Climate Risk Management Unit

Via Fermi 2749, TP 100-01,  I-21027 Ispra (VA), ITALY

jon.sko...@jrc.ec.europa.eu
Tel:  +39 0332 789205

Disclaimer: Views expressed in this email are those of the individual 
and do not necessarily represent official views of the European Commission.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] R parallel / foreach - aggregation of results

2015-07-31 Thread Martin Spindler
Dear all,

when I am running the code attached below, it seems that no results are 
returned, only the predefined NAs. What mistake do I make?
Any comments and help is highly appreciated.

Thanks and best,

Martin


Simpar3 <- function(n1) {
  L2distance <- matrix(NA, ncol=n1, nrow=n1)
  data <- rnorm(n1)
  diag(L2distance)=0
  cl <- makeCluster(4)
  registerDoParallel(cl)
  foreach(j=1:n1)  %dopar% {
library(np)
datj <- data[j]
for(k in j:n1) {
  L2distance[j,k] <- k*datj
}
  }
  stopCluster(cl)
  return(L2distance)
}

Res <- Simpar3(100)

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Looping help

2015-07-31 Thread Jim Lemon
Hi April,
You need nested loops for something like this

qs<- c(0,0.25,0.5,1,2,4,8,16,32,64)
nrows<-dim(Data)[1]
nqs<-length(qs)
D.mat<-SE.mat<-matrix(NA,nrow=nrows,ncol=nqs)
for(row in 1:nrows) {
 for(qval in 1:nqs) {
  # perform your calculation and set D.mat[row,qval] and
SE.mat[row,qval] to the return values
 }
}

Jim

> From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of April
> Smith
> Sent: Friday, July 31, 2015 2:21 AM
> To: r-help@r-project.org
> Subject: [R] Looping help
>
> I have never looped before and know I need to.  I am unsure how to
> proceed:
>
>
>- Action I need done: d(Data[1,2:399], q=0, boot=TRUE,
>boot.arg=list(num.iter=1000))
>- I need this to happen to all rows, I need All[1,2:399] to increase
> to
>All[2:399], etc.
>- But I also need the results from q increasing from 0 to 0.25, 0.5,
> 1,
>2, 4,8,16,32,64 before the loop moves on to the next row.
>- For each iteration I will receive two values: D and st.err.  I
> need
>this put into a matrix
>
>
> I feel like this should be pretty simple to learn, but I have never
> looped before.
>
> I am hoping to get more of a tutorial on how to write loop code, then
> to just be given the loop code.
>
> Thanks,
> April
>
>   [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R parallel - slow speed

2015-07-31 Thread Martin Spindler
Thank you very much to you both for your help.

I knew that parallelizing has some additional "overhead" costs, but I was 
surprised be the order of magnitude (it was 10 times slower.) Therefore I 
thought I made some mistake or that there is a more clever way to do it.

Best,

Martin
 
 

Gesendet: Donnerstag, 30. Juli 2015 um 15:28 Uhr
Von: "jim holtman" 
An: "Jeff Newmiller" 
Cc: "Martin Spindler" , "r-help@r-project.org" 

Betreff: Re: [R] R parallel - slow speed

I ran a test on my Windows box with 4 CPUs.  THere were 4 RScript processes 
started in response to the request for a cluster of 4.  Each of these ran for 
an elapsed time of around 23 seconds, making the median time around 0.2 seconds 
for 100 iterations as reported by microbenchmark.  The 'apply' only takes about 
0.003 seconds for a single iteration - again what microbenchmark is reporting.
 
The 4 RScript processes each use about 3 CPU seconds in the 23 seconds of 
elapsed time, most of that is probably the communication and startup time for 
the processes and reporting results.
 
So as was pointed out previous there is overhead is running in parallel.  You 
probably have to have at least several seconds of heavy computation for a 
iteration to make trying to parallelize something.  You should also investigate 
exactly what is happening on your system so that you can account for the time 
being spent.
 

Jim Holtman
Data Munger Guru
 
What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it. 
On Thu, Jul 30, 2015 at 8:56 AM, Jeff Newmiller  
wrote:Parallelizing comes at a price... and there is no guarantee that you can 
afford it. Vectorizing your algorithms is often a better approach. 
Microbenchmarking  is usually overkill for evaluating parallelizing.

You assume 4 cores... but many CPUs have 2 cores and use hyperthreading to make 
each core look like two.

The operating system can make a difference also... Windows processes are more 
expensive to start and communicate between than *nix processes are. In 
particular, Windows seems to require duplicated RAM pages while *nix can share 
process RAM (at least until they are written to) so you end up needing more 
memory and disk paging of virtual memory becomes more likely.
---
Jeff Newmiller                        The     .       .  Go Live...
DCN:        Basics: ##.#.   
    ##.#.  Live Go...
                                      Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
/Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k
---
Sent from my phone. Please excuse my brevity.

On July 30, 2015 8:26:34 AM EDT, Martin Spindler 
 wrote:
>Dear all,
>
>I am trying to parallelize the function npnewpar given below. When I am
>comparing an application of "apply" with "parApply" the parallelized
>version seems to be much slower (cf output below). Therefore I would
>like to ask how the function could be parallelized more efficient.
>(With increasing sample size the difference becomes smaller, but I was
>wondering about this big differences and how it could be improved.)
>
>Thank you very much for help in advance!
>
>Best,
>
>Martin
>
>
>library(microbenchmark)
>library(doParallel)
>
>n <- 500
>y <- rnorm(n)
>Xc <- rnorm(n)
>Xd <- sample(c(0,1), replace=TRUE)
>Weights <- diag(n)
>n1 <- 50
>Xeval <- cbind(rnorm(n1), sample(c(0,1), n1, replace=TRUE))
>
>
>detectCores()
>cl <- makeCluster(4)
>registerDoParallel(cl)
>microbenchmark(apply(Xeval, 1, npnewpar, y=y, Xc=Xc, Xd = Xd,
>Weights=Weights, h=0.5),  parApply(cl, Xeval, 1, npnewpar, y=y, Xc=Xc,
>Xd = Xd, Weights=Weights, h=0.5), times=100)
>stopCluster(cl)
>
>
>Unit: milliseconds
>                           expr       min        lq      mean    median
>apply(Xeval, 1, npnewpar, y = y, Xc = Xc, Xd = Xd, Weights = Weights,
>   h = 0.5)  4.674914  4.726463  5.455323  4.771016
>parApply(cl, Xeval, 1, npnewpar, y = y, Xc = Xc, Xd = Xd, Weights =
>Weights,      h = 0.5) 34.168250 35.434829 56.553296 39.438899
>        uq       max neval
>  4.843324  57.01519   100
> 49.777265 347.77887   100
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>npnewpar <- function(y, Xc, Xd, Weights, h, xeval) {
>  xc <- xeval[1]
>  xd <- xeval[2]
>  l <- function(x,X) {
>    w <-  Weights[x,X]
>    return(w)
>  }
>  u <- (Xc-xc)/h
>  #K <- kernel(u)
>  K <- dnorm(u)
>  L <- l(xd,Xd)
>  nom <- sum(y*K*L)
>  denom <- sum(K*L)
>  ghat <- nom/denom
>  return(ghat)
>}
>
>__
>R-help@r-project.org[R-help@r-project.org] mailing list -- To UNSUBSCRIBE and 
>more, see
>https://stat.ethz.ch/mailman/listinfo/r-help[https://stat.ethz.ch/mailman/listinfo/r-help]
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html[http://www.R-project.org/posting-guide.html]
>an

Re: [R] R parallel - slow speed

2015-07-31 Thread Martin Spindler
Thank you very much for your help.
 
I tried it under Unix and then the parallel version was faster than under 
Windows (but still slower than the non parall version). This is an important 
point to keep in mind. Thanks for this.
 
Best,
 
Martin

 
 

Gesendet: Donnerstag, 30. Juli 2015 um 14:56 Uhr
Von: "Jeff Newmiller" 
An: "Martin Spindler" , "r-help@r-project.org" 

Betreff: Re: [R] R parallel - slow speed
Parallelizing comes at a price... and there is no guarantee that you can afford 
it. Vectorizing your algorithms is often a better approach. Microbenchmarking 
is usually overkill for evaluating parallelizing.

You assume 4 cores... but many CPUs have 2 cores and use hyperthreading to make 
each core look like two.

The operating system can make a difference also... Windows processes are more 
expensive to start and communicate between than *nix processes are. In 
particular, Windows seems to require duplicated RAM pages while *nix can share 
process RAM (at least until they are written to) so you end up needing more 
memory and disk paging of virtual memory becomes more likely.
---
Jeff Newmiller The . . Go Live...
DCN: Basics: ##.#. ##.#. Live Go...
Live: OO#.. Dead: OO#.. Playing
Research Engineer (Solar/Batteries O.O#. #.O#. with
/Software/Embedded Controllers) .OO#. .OO#. rocks...1k
---
Sent from my phone. Please excuse my brevity.

On July 30, 2015 8:26:34 AM EDT, Martin Spindler  wrote:
>Dear all,
>
>I am trying to parallelize the function npnewpar given below. When I am
>comparing an application of "apply" with "parApply" the parallelized
>version seems to be much slower (cf output below). Therefore I would
>like to ask how the function could be parallelized more efficient.
>(With increasing sample size the difference becomes smaller, but I was
>wondering about this big differences and how it could be improved.)
>
>Thank you very much for help in advance!
>
>Best,
>
>Martin
>
>
>library(microbenchmark)
>library(doParallel)
>
>n <- 500
>y <- rnorm(n)
>Xc <- rnorm(n)
>Xd <- sample(c(0,1), replace=TRUE)
>Weights <- diag(n)
>n1 <- 50
>Xeval <- cbind(rnorm(n1), sample(c(0,1), n1, replace=TRUE))
>
>
>detectCores()
>cl <- makeCluster(4)
>registerDoParallel(cl)
>microbenchmark(apply(Xeval, 1, npnewpar, y=y, Xc=Xc, Xd = Xd,
>Weights=Weights, h=0.5), parApply(cl, Xeval, 1, npnewpar, y=y, Xc=Xc,
>Xd = Xd, Weights=Weights, h=0.5), times=100)
>stopCluster(cl)
>
>
>Unit: milliseconds
> expr min lq mean median
>apply(Xeval, 1, npnewpar, y = y, Xc = Xc, Xd = Xd, Weights = Weights,
> h = 0.5) 4.674914 4.726463 5.455323 4.771016
>parApply(cl, Xeval, 1, npnewpar, y = y, Xc = Xc, Xd = Xd, Weights =
>Weights, h = 0.5) 34.168250 35.434829 56.553296 39.438899
> uq max neval
> 4.843324 57.01519 100
> 49.777265 347.77887 100
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>npnewpar <- function(y, Xc, Xd, Weights, h, xeval) {
> xc <- xeval[1]
> xd <- xeval[2]
> l <- function(x,X) {
> w <- Weights[x,X]
> return(w)
> }
> u <- (Xc-xc)/h
> #K <- kernel(u)
> K <- dnorm(u)
> L <- l(xd,Xd)
> nom <- sum(y*K*L)
> denom <- sum(K*L)
> ghat <- nom/denom
> return(ghat)
>}
>
>__
>R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html[http://www.R-project.org/posting-guide.html]
>and provide commented, minimal, self-contained, reproducible code.
 

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.