date:20110702

Re: [R] How many times occurs

2011-07-02 Thread Rainer Schuermann

On Saturday 02 July 2011 21:40:24 Trying To learn again wrote:

Clumsy but it works (replace the bingo stuff with what you want to do next):

x <- as.matrix( read.table( "input.txt") )

xdim <- dim( x )
ix <- xdim[ 1 ]
jx <- xdim[ 2 ] - 2
bingo <- 0

for( i in 1:ix )
{
  for( j in 1:jx )
  {
if( x[i,j] == 8 && x[i,j+1] == 9 && x[i,j+2] == 2 )
{
bingo <- bingo + 1
}
  }
}

print( bingo )

I'm sure there are more elegant and efficient solutions!

Rgds,
Rainer


Here the matrix as dput( x ):
structure(c(8L, 8L, 8L, 8L, 8L, 8L, 9L, 9L, 9L, 9L, 9L, 9L, 2L, 
2L, 2L, 2L, 2L, 2L, 5L, 8L, 5L, 5L, 8L, 5L, 4L, 9L, 4L, 4L, 9L, 
4L, 5L, 2L, 5L, 5L, 2L, 5L, 8L, 8L, 8L, 8L, 8L, 8L, 5L, 9L, 5L, 
5L, 9L, 9L, 6L, 2L, 6L, 6L, 2L, 2L, 6L, 1L, 4L, 6L, 1L, 2L), .Dim = c(6L, 
10L), .Dimnames = list(NULL, c("V1", "V2", "V3", "V4", "V5", 
"V6", "V7", "V8", "V9", "V10")))

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Hint improve my code

2011-07-02 Thread Joshua Wiley

Hi Edward,

One hint to improve your code is simply stylistic.  Everyone tends to
have their own favorite way, but adding spaces and line breaks can
help make code much easier to read.  Also, you seemed to have used
brackets "[" instead of parentheses "(" for the first two calls to
sum().  As Berend suggested, some more clarification on what is
supposed to be happening will help you get more feedback.  Lastly,
although your data is readable as is, a very convenient way to provide
the list with data is using the function, dput().  Here is a little
example demonstrating its use:

> x <- data.frame(x1 = c("c", "b", "a"),
+   x2 = factor(c("c", "b", "a")), stringsAsFactors = FALSE)
> x ## print to console, two columns appear identical
  x1 x2
1  c  c
2  b  b
3  a  a
> ## but looking at the str()ucture, they are different
> str(x)
'data.frame':   3 obs. of  2 variables:
 $ x1: chr  "c" "b" "a"
 $ x2: Factor w/ 3 levels "a","b","c": 3 2 1
> ## use dput() to provide copy-and-pastable output
> ## that others can use to access the same data as you easy peasy
> dput(x)
structure(list(x1 = c("c", "b", "a"), x2 = structure(c(3L, 2L,
1L), .Label = c("a", "b", "c"), class = "factor")), .Names = c("x1",
"x2"), row.names = c(NA, -3L), class = "data.frame")
>
> ## now someone else is ready to go with your data
> newx <- structure(list(x1 = c("c", "b", "a"), x2 = structure(c(3L, 2L,
+ 1L), .Label = c("a", "b", "c"), class = "factor")), .Names = c("x1",
+ "x2"), row.names = c(NA, -3L), class = "data.frame")
>
> str(newx)
'data.frame':   3 obs. of  2 variables:
 $ x1: chr  "c" "b" "a"
 $ x2: Factor w/ 3 levels "a","b","c": 3 2 1
>


And here is an example of how I might have written your function
(stylistically speaking).  I am not advocating this as the "best",
"correct", or "ideal", way.  Just giving another example that (I at
least) find easier to read and make sense of.

llik <- function(R_j, R_m) {
  if (R_j < 0) {
sum(log(1 / (2 * pi * (sigma_j^2))) -
  (1 / (2 * (sigma_j^2))*(R_j + al_j - b_j * R_m))^2)
  } else if (R_j > 0) {
sum(log(1 / (2 * pi * (sigma_j^2))) -
  (1 / (2 * (sigma_j^2)) * (R_j + au_j - b_j * R_m))^2)
  } else if (R_j == 0) {
sum(log(pnorm(au_j, mean = b_j * R_m, sd = sigma_j) -
  pnorm(al_j, mean = b_j * R_m, sd = sigma_j)))
  }
}

Cheers,

Josh

On Sat, Jul 2, 2011 at 5:52 PM, EdBo  wrote:
> Hi
>
> I have developed the code below. I am worried that the parameters I want to
> be estimated are "not being found" when I ran my code. Is there a way I can
> code them so that R recognize that they should be estimated.
>
> This is the error I am getting.
>
>> out1=optim(llik,par=start.par)
> Error in pnorm(au_j, mean = b_j * R_m, sd = sigma_j) :
>  object 'au_j' not found
>
> #Yet al_j,au_j,sigma_j and b_j are just estimates that balance the
> likelihood function?
>
> llik=function(R_j,R_m)
> if(R_j< 0)
> {
> sum[log(1/(2*pi*(sigma_j^2)))-(1/(2*(sigma_j^2))*(R_j+al_j-b_j*R_m))^2]
> }else if(R_j>0)
> {
> sum[log(1/(2*pi*(sigma_j^2)))-(1/(2*(sigma_j^2))*(R_j+au_j-b_j*R_m))^2]
> }else if(R_j==0)
> {
> sum(log(pnorm(au_j,mean=b_j*R_m,sd=sigma_j)-pnorm(al_j,mean=b_j*R_m,sd=sigma_j)))
> }
> start.par=c(al_j=0,au_j=0,sigma_j=0.01,b_j=1)
> out1=optim(par=start.par,llik)
>
> My Data
>
>     R_j         R_m
>  2e-03   0.026567295
>  3e-03   0.009798475
>  5e-02   0.008497274
>  -1e-02   0.012464578
>  -9e-04   0.002896023
>  9e-02   0.000879473
>  1e-02   0.003194435
>  6e-04   0.010281122
>
> Thank you in advance.
>
> Edward
> UCT
>
> --
> View this message in context: 
> http://r.789695.n4.nabble.com/Hint-improve-my-code-tp3641354p3641354.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles
http://www.joshuawiley.com/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Hint improve my code

2011-07-02 Thread Berend Hasselman


EdBo wrote:
> 
> Hi
> 
> I have developed the code below. I am worried that the parameters I want
> to be estimated are "not being found" when I ran my code. Is there a way I
> can code them so that R recognize that they should be estimated.
> 
> This is the error I am getting.
> 
>> out1=optim(llik,par=start.par)
> Error in pnorm(au_j, mean = b_j * R_m, sd = sigma_j) : 
>   object 'au_j' not found
> 
> #Yet al_j,au_j,sigma_j and b_j are just estimates that balance the
> likelihood function?
> 
> llik=function(R_j,R_m)
> if(R_j< 0)
> {
> sum[log(1/(2*pi*(sigma_j^2)))-(1/(2*(sigma_j^2))*(R_j+al_j-b_j*R_m))^2]
> }else if(R_j>0)
> {
> sum[log(1/(2*pi*(sigma_j^2)))-(1/(2*(sigma_j^2))*(R_j+au_j-b_j*R_m))^2]
> }else if(R_j==0)
> {
> sum(log(pnorm(au_j,mean=b_j*R_m,sd=sigma_j)-pnorm(al_j,mean=b_j*R_m,sd=sigma_j)))
> }
> start.par=c(al_j=0,au_j=0,sigma_j=0.01,b_j=1)
> out1=optim(par=start.par,llik)
> 
> My Data
> 
>  R_j R_m
>   2e-03   0.026567295
>   3e-03   0.009798475
>   5e-02   0.008497274
>  -1e-02   0.012464578
>  -9e-04   0.002896023
>   9e-02   0.000879473
>   1e-02   0.003194435
>   6e-04   0.010281122
> 

You are only passing the R_j and R_m data as argument to your function.
The parameters that are to be used for the maximization must also be passed
as arguments.
This is clearly explained in the help page for optim. So your function
should read

llik <- function(par, R_j, R_m) {
al_j   <- par[1]
au_j  <- par[2]
sigma_j <- par[3]
b_j   <- par[4]

if(R_j< 0) {
   
sum(log(1/(2*pi*(sigma_j^2)))-(1/(2*(sigma_j^2))*(R_j+al_j-b_j*R_m))^2)
} else if(R_j>0)  {
   
sum(log(1/(2*pi*(sigma_j^2)))-(1/(2*(sigma_j^2))*(R_j+au_j-b_j*R_m))^2)
} else if(R_j==0) {
   
sum(log(pnorm(au_j,mean=b_j*R_m,sd=sigma_j)-pnorm(al_j,mean=b_j*R_m,sd=sigma_j)))
}
}

And then use optim as follows:

out1 <- optim(par=start.par,llik, R_j=R_j, R_m=R_m)


But this is not going to work as you want it to.
You'll get warnings:

1: In if (R_j < 0) { ... :
  the condition has length > 1 and only the first element will be used

R_j is a vector and if() takes a scalar (or vector of length 1).

So it is not at all clear what the purpose of the if statement is.

Berend



--
View this message in context: 
http://r.789695.n4.nabble.com/Hint-improve-my-code-tp3641354p3641520.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Error with package xlsx

2011-07-02 Thread Sergio Mira

Hi,

Could anyone help me with this error? I have no idea what that is about...

 > file <- system.file("DADOSCASUMCCORMICK.xlsx", package = "xlsx")
> data <- read.xlsx(file, 1)
Error in .jnew("java/io/FileInputStream", file) :
   java.io.FileNotFoundException:  (No such file or directory)

Thanks!

-- 
Regards,
|| --
|| Sergio Henrique Bento de Mira
|| Computer Science | Class of 2008/2
|| Federal University Of Lavras | UFLA
|| Lavras, MG, Brasil
|| ---
|| sergiohbm...@computacao.ufla.br
|| Cell: (+55) (35) 9128-4240
|| --
"Be the change you want to see in the world"


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Simulating inhomogeneous Poisson process without loop

2011-07-02 Thread Tristan Linke

Dear all

I want to simulate a stochastic jump variance process where N is Bernoulli
with intensity lambda0 + lambda1*Vt. lambda0 is constant and lambda1 can be
interpreted as a regression coefficient on the current variance level Vt. J
is a scaling factor

How can I rewrite this avoiding the loop structure which is very
time-consuming for long simulations?

for (i in 1:N){
...
N <- rbinom(n=1, size=1, prob=(lambda0+lambda1*Vt))
Vt <- ... + J*N
..
}

P.S. This is going towards the Duffie, Pan, Singleton 2000 Transform Pricing
paper, here stochastic volatility with state-dependent correlated jumps
(Eraker 2004).

Thanks a lot in advance.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] For help in R coding

2011-07-02 Thread Bansal, Vikas

DEAR ALL,
I TRIED THIS CODE AND THIS IS RUNNING PERFECTLY...

df=read.table("Case2.pileup",fill=T,sep="\t",colClasses = "character") 
txt=df[,9]
txtvec <- readLines(textConnection(txt))
dad=data.frame(A = unlist(sapply(gregexpr("A|a", txtvec), function(x) if ( 
x[[1]] != -1)  
length(x) else 0 )),
C = unlist(sapply(gregexpr("C|c", txtvec), function(x) if ( x[[1]] != -1)  
length(x) else 0 )),
G = unlist(sapply(gregexpr("G|g", txtvec), function(x) if ( x[[1]] != -1)  
length(x) else 0 )),
T = unlist(sapply(gregexpr("T|t", txtvec), function(x) if ( x[[1]] != -1)  
length(x) else 0 )),
N = unlist(sapply(gregexpr("\\,|\\.", txtvec), function(x) if ( x[[1]] != -1)  
length(x) else 0 )))





Thanking you,
Warm Regards
Vikas Bansal
Msc Bioinformatics
Kings College London

From: David Winsemius [dwinsem...@comcast.net]
Sent: Saturday, July 02, 2011 9:04 PM
To: Dennis Murphy
Cc: r-help@r-project.org; Bansal, Vikas
Subject: Re: [R] For help in R coding

On reflection and a bit of testing I think the best approach would be
to use gregexpr. For counting the number of commas, this appears quite
straightforward.

 > sapply(gregexpr("\\,", txtvec), function(x) if ( x[[1]] != -1)
length(x) else 0 )
  [1] 3 3 3 4 3 3 2 6 4 6 6

It easily generalizes to period and the `|` (or) operation on letters.
( did need to add the check since the length of gregexpr is always at
least one but ihas value -1 when there is no match

 > sapply(gregexpr("t|T", txtvec), function(x) if ( x[[1]] != -1)
length(x) else 0 )
  [1] 0 2 0 0 3 0 0 0 1 0 0


On Jul 2, 2011, at 3:22 PM, Dennis Murphy wrote:

> Hi:
>
> There seems to be a problem if the string ends in , or . , which makes
> it difficult for strsplit() to pick up if it is splitting on those
> characters. Here is an alternative, splitting on individual characters
> and using charmatch() instead:
>
> charsum <- function(s, char) {
>u <- strsplit(s, "")
>sum(sapply(u, function(x) charmatch(x, char)), na.rm = TRUE)
>   }
>
> unname(sapply(txtvec, function(x) charsum(x, ',')))
> unname(sapply(txtvec, function(x) charsum(x, '.')))
>
> Putting this into a data frame,
>
> dfout <- data.frame(periods = unname(sapply(txtvec, function(x)
> charsum(x, '.'))),
>commas = unname(sapply(txtvec,
> function(x) charsum(x, '.'))) )
> txtvec
>
> HTH,
> Dennis
>
> On Sat, Jul 2, 2011 at 10:19 AM, David Winsemius  > wrote:
>>
>> On Jul 2, 2011, at 12:34 PM, Bansal, Vikas wrote:
>>
>>>
>>>
> Dear all,
>
> I am doing a project on variant calling using R.I am working on
> pileup file.There are 10 columns in my data frame and I want to
> count the number of A,C,G and T in each row for column 9.example
> of
> column 9 is given below-
>
> .a,g,,
> .t,t,,
> .,c,c,
> .,a,,,
> .,t,t,t
> .c,,g,^!.
> .g,ggg.^!,
> .$,.,
> a,g,,t,
> ,.,^!.
> ,$.,.
>
> This is a bit confusing for me as these characters are in one
> column
> and how can we scan them for each row to print number of A,C,G
> and T
> for each row.

 Seems a bit clunky but this does the job (first the data):
>
> txt <- " .a,g,,

 +.t,t,,
 +.,c,c,
 +.,a,,,
 +.,t,t,t
 +.c,,g,^!.
 +.g,ggg.^!,
 +.$,.,
 +a,g,,t,
 +,.,^!.
 +,$.,."

> txtvec <- readLines(textConnection(txt))

 Now the clunky solution, Basically subtracts 1 from the counts of
 "fragments" that result from splitting on each letter in turn.
 Could
 be made prettier with a function that did the job.

> data.frame(A = unlist(lapply( lapply( sapply(txtvec, strsplit,

 split="a"), length) , "-", 1)),
 + C = unlist(lapply( lapply( sapply(txtvec, strsplit, split="c"),
 length) , "-", 1)),
 + G = unlist(lapply( lapply( sapply(txtvec, strsplit, split="g"),
 length) , "-", 1)),
 + T = unlist(lapply( lapply( sapply(txtvec, strsplit, split="t"),
 length) , "-", 1)) )
 A C G T
 .a,g,,   1 0 1 0
  .t,t,, 0 0 0 2
  .,c,c, 0 2 0 0
  .,a,,, 1 0 0 0
  .,t,t,t0 0 0 2
  .c,,g,^!.  0 1 1 0
  .g,ggg.^!, 0 0 4 0
  .$,.,  0 0 0 0
  a,g,,t,1 0 1 1
  ,.,^!. 0 0 0 0
  ,$.,.  0 0 0 0

 Has the advantage that the input data ends up as rownames, which
 was a
 surprise.

 If you wanted to count "A" and "a" as equivalent, then the split
 argument should be "a|A"


>>>
> AS YOU MENTIONED THAT IF I WANT TO COUNT A AND a I SHOULD SPLIT
> LIKE
> THIS.
>>>
>>> BUT CAN I COUNT

[R] R for Windows - 5 stars award on Windows 7 Download

2011-07-02 Thread Windows 7 Download

Dear R Development Core Team

We are more than happy that Windows 7 was launched after long restless period 
of waiting. Due to this expected moment, we prepared and launched new Windows 7 
download website that will be used by new Windows 7 customers to look for 
software compatible with Windows 7.

R for Windows has been reviewed by Windows 7 Download and got 5 stars award: 
http://www.windows7download.com/win7-r-for-windows/snvrckjh.html

Draw attention to your product by making it visible on webpage that will be 
used by people who are devoted to Windows 7. The number of Windows 7 customers 
will rise to huge quantity in a short period of time.

Please publish Windows 7 Download award on your website by adding the following 
HTML code:

160 x 80:
http://www.windows7download.com/"; target="_blank">http://www.windows7download.com/templates/w7d/images/awards/award_5.png"; 
alt="Windows 7 Download" border="0"/>

120 x 60:
http://www.windows7download.com/"; target="_blank">http://www.windows7download.com/templates/w7d/images/awards/award_120x60_5.png";
 alt="Windows 7 Download" border="0"/>

Windows 7 compatible logo:
http://www.windows7download.com/"; target="_blank">http://www.windows7download.com/templates/w7d/images/windows7compatible.png";
 alt="R for Windows - Windows 7 compatible" border="0"/>

Text link:
http://www.windows7download.com/"; target="_blank">5 Stars Awarded on 
Windows 7 Download

Here are more images and options how link to us: 
http://www.windows7download.com/linktous.html

Operation System Windows 7 is already looking forward to its rising popularity 
and this is the reason why you should not forgot this unique opportunity of 
rising interest of clients on software compatible with Windows 7.  Highlight 
the facilities of your product by placing it on Windows 7 Download.com, which 
can help you to bring new valuable users.

We're looking forward for further co-operation.

Best regards,
Windows 7 Download

http://www.windows7download.com/
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] For help in R coding

2011-07-02 Thread Bansal, Vikas

HI

THIS SEEMS LITTLE BIT CONFUSING.BUT I AM USING THIS CODING AS SUGGESTED BY YOU-

df=read.table("Case2.pileup",fill=T,sep="\t",colClasses = "character") 
txt=df[,9]
txtvec <- readLines(textConnection(txt))
vik=data.frame(A = unlist(lapply( lapply( sapply(txtvec, strsplit,  
split="a|A"), length) , "-", 1)),C = unlist(lapply( lapply( sapply(txtvec, 
strsplit, split="c|C"),  
length) , "-", 1)),G = unlist(lapply( lapply( sapply(txtvec, strsplit, 
split="g|G"),  
length) , "-", 1)),T = unlist(lapply( lapply( sapply(txtvec, strsplit, 
split="t|T"),  
length) , "-", 1)) )


THE THING IS,AT SOME PLACES IT IS CALCULATING PERFECTLY BUT AT SOME POSITIONS 
IT IS NOT.I AM TRYING TO FIND OUT THE SOLUTION IN BOOKS,ON THE NET BUT I DONT 
KNOW WHY THERE IS NOTHING RELATED TO THIS.I THINK THIS CODING SEEMS TO BE GOOD 
BUT I AM MISSING SOMETHING.
FOR YOUR CONVENIENCE I HAVE ATTACHED MY Case2.pileup file.

I AM VERY THANKFUL TO YOU AND APPRECIATE THAT YOU ARE HELPING AND TAKING YOUR 
PRECIOUS TIME.


Thanking you,
Warm Regards
Vikas Bansal
Msc Bioinformatics
Kings College London

From: Dennis Murphy [djmu...@gmail.com]
Sent: Saturday, July 02, 2011 8:22 PM
To: r-help@r-project.org
Cc: Bansal, Vikas; David Winsemius
Subject: Re: [R] For help in R coding

Hi:

There seems to be a problem if the string ends in , or . , which makes
it difficult for strsplit() to pick up if it is splitting on those
characters. Here is an alternative, splitting on individual characters
and using charmatch() instead:

charsum <- function(s, char) {
u <- strsplit(s, "")
sum(sapply(u, function(x) charmatch(x, char)), na.rm = TRUE)
   }

unname(sapply(txtvec, function(x) charsum(x, ',')))
unname(sapply(txtvec, function(x) charsum(x, '.')))

Putting this into a data frame,

dfout <- data.frame(periods = unname(sapply(txtvec, function(x)
charsum(x, '.'))),
commas = unname(sapply(txtvec,
function(x) charsum(x, '.'))) )
txtvec

HTH,
Dennis

On Sat, Jul 2, 2011 at 10:19 AM, David Winsemius  wrote:
>
> On Jul 2, 2011, at 12:34 PM, Bansal, Vikas wrote:
>
>>
>>
 Dear all,

 I am doing a project on variant calling using R.I am working on
 pileup file.There are 10 columns in my data frame and I want to
 count the number of A,C,G and T in each row for column 9.example of
 column 9 is given below-

 .a,g,,
 .t,t,,
 .,c,c,
 .,a,,,
 .,t,t,t
 .c,,g,^!.
 .g,ggg.^!,
 .$,.,
 a,g,,t,
 ,.,^!.
 ,$.,.

 This is a bit confusing for me as these characters are in one column
 and how can we scan them for each row to print number of A,C,G and T
 for each row.
>>>
>>> Seems a bit clunky but this does the job (first the data):

 txt <- " .a,g,,
>>>
>>> +.t,t,,
>>> +.,c,c,
>>> +.,a,,,
>>> +.,t,t,t
>>> +.c,,g,^!.
>>> +.g,ggg.^!,
>>> +.$,.,
>>> +a,g,,t,
>>> +,.,^!.
>>> +,$.,."
>>>
 txtvec <- readLines(textConnection(txt))
>>>
>>> Now the clunky solution, Basically subtracts 1 from the counts of
>>> "fragments" that result from splitting on each letter in turn. Could
>>> be made prettier with a function that did the job.
>>>
 data.frame(A = unlist(lapply( lapply( sapply(txtvec, strsplit,
>>>
>>> split="a"), length) , "-", 1)),
>>> + C = unlist(lapply( lapply( sapply(txtvec, strsplit, split="c"),
>>> length) , "-", 1)),
>>> + G = unlist(lapply( lapply( sapply(txtvec, strsplit, split="g"),
>>> length) , "-", 1)),
>>> + T = unlist(lapply( lapply( sapply(txtvec, strsplit, split="t"),
>>> length) , "-", 1)) )
>>> A C G T
>>> .a,g,,   1 0 1 0
>>>  .t,t,, 0 0 0 2
>>>  .,c,c, 0 2 0 0
>>>  .,a,,, 1 0 0 0
>>>  .,t,t,t0 0 0 2
>>>  .c,,g,^!.  0 1 1 0
>>>  .g,ggg.^!, 0 0 4 0
>>>  .$,.,  0 0 0 0
>>>  a,g,,t,1 0 1 1
>>>  ,.,^!. 0 0 0 0
>>>  ,$.,.  0 0 0 0
>>>
>>> Has the advantage that the input data ends up as rownames, which was a
>>> surprise.
>>>
>>> If you wanted to count "A" and "a" as equivalent, then the split
>>> argument should be "a|A"
>>>
>>>
>>
 AS YOU MENTIONED THAT IF I WANT TO COUNT A AND a I SHOULD SPLIT LIKE
 THIS.
>>
>> BUT CAN I COUNT . AND , ALSO USING-
>> data.frame(A = unlist(lapply( lapply( sapply(txtvec, strsplit,
>> split=".|,"), length) , "-", 1)),
>>
>> I TRIED IT BUT ITS NOT WORKING.IT IS GIVING THE OUTPUT BUT AT SOME PLACES
>> IT IS SHOWING MORE NUMBER OF . AND , AND SOMEWHERE IT IS NOT EVEN
>> CALCULATING AND JUST SHOWING 0.
>
> You need to use valid regex expressions for 'split'. Since "." and "," are
> special characters they need to be escaped when you wnat the literals to be
> recognized as such.
>
> I

Re: [R] For help in R coding

2011-07-02 Thread Bansal, Vikas


Thanking you,
Warm Regards
Vikas Bansal
Msc Bioinformatics
Kings College London

From: David Winsemius [dwinsem...@comcast.net]
Sent: Saturday, July 02, 2011 6:19 PM
To: Bansal, Vikas
Cc: r-help@r-project.org
Subject: Re: [R] For help in R coding

On Jul 2, 2011, at 12:34 PM, Bansal, Vikas wrote:

>
>
>>> Dear all,
>>>
>>> I am doing a project on variant calling using R.I am working on
>>> pileup file.There are 10 columns in my data frame and I want to
>>> count the number of A,C,G and T in each row for column 9.example of
>>> column 9 is given below-
>>>
>>>  .a,g,,
>>>  .t,t,,
>>>  .,c,c,
>>>  .,a,,,
>>>  .,t,t,t
>>>  .c,,g,^!.
>>>  .g,ggg.^!,
>>>  .$,.,
>>>  a,g,,t,
>>>  ,.,^!.
>>>  ,$.,.
>>>
>>> This is a bit confusing for me as these characters are in one column
>>> and how can we scan them for each row to print number of A,C,G and T
>>> for each row.
>>
>> Seems a bit clunky but this does the job (first the data):
>>> txt <- " .a,g,,
>> +.t,t,,
>> +.,c,c,
>> +.,a,,,
>> +.,t,t,t
>> +.c,,g,^!.
>> +.g,ggg.^!,
>> +.$,.,
>> +a,g,,t,
>> +,.,^!.
>> +,$.,."
>>
>>> txtvec <- readLines(textConnection(txt))
>>
>> Now the clunky solution, Basically subtracts 1 from the counts of
>> "fragments" that result from splitting on each letter in turn. Could
>> be made prettier with a function that did the job.
>>
>>> data.frame(A = unlist(lapply( lapply( sapply(txtvec, strsplit,
>> split="a"), length) , "-", 1)),
>> + C = unlist(lapply( lapply( sapply(txtvec, strsplit, split="c"),
>> length) , "-", 1)),
>> + G = unlist(lapply( lapply( sapply(txtvec, strsplit, split="g"),
>> length) , "-", 1)),
>> + T = unlist(lapply( lapply( sapply(txtvec, strsplit, split="t"),
>> length) , "-", 1)) )
>>  A C G T
>> .a,g,,   1 0 1 0
>>   .t,t,, 0 0 0 2
>>   .,c,c, 0 2 0 0
>>   .,a,,, 1 0 0 0
>>   .,t,t,t0 0 0 2
>>   .c,,g,^!.  0 1 1 0
>>   .g,ggg.^!, 0 0 4 0
>>   .$,.,  0 0 0 0
>>   a,g,,t,1 0 1 1
>>   ,.,^!. 0 0 0 0
>>   ,$.,.  0 0 0 0
>>
>> Has the advantage that the input data ends up as rownames, which
>> was a
>> surprise.
>>
>> If you wanted to count "A" and "a" as equivalent, then the split
>> argument should be "a|A"
>>
>>
>
>>> AS YOU MENTIONED THAT IF I WANT TO COUNT A AND a I SHOULD SPLIT
>>> LIKE THIS.
> BUT CAN I COUNT . AND , ALSO USING-
> data.frame(A = unlist(lapply( lapply( sapply(txtvec, strsplit,
> split=".|,"), length) , "-", 1)),
>
> I TRIED IT BUT ITS NOT WORKING.IT IS GIVING THE OUTPUT BUT AT SOME
> PLACES IT IS SHOWING MORE NUMBER OF . AND , AND SOMEWHERE IT IS NOT
> EVEN CALCULATING AND JUST SHOWING 0.

You need to use valid regex expressions for 'split'. Since "." and ","
are special characters they need to be escaped when you wnat the
literals to be recognized as such.

I haven't figured out why but you need to drop the final operation of
subtracting 1 from the values when counting commas:

data.frame(periods = unlist(lapply( lapply( sapply(txtvec, strsplit,
  split="\\."), length) , "-", 1))
  ,commas = unlist( lapply( sapply(txtvec, strsplit,
  split="\\,"), length) ) )
periods commas
  .a,g,,  1  3
 .t,t,,   1  3
 .,c,c,   1  3
 .,a,,,   1  4
 .,t,t,t  1  4
 .c,,g,^!.1  4
 .g,ggg.^!,   2  2
 .$,.,2  6
 a,g,,t,  0  4
 ,.,^!.   1  7
 ,$.,.1  7

--

David Winsemius, MD
West Hartford, CT

SOME OF THE VALUES ARE COMING INCORRECT.I DO NOT KNOW WHY BUT IF YOU WILL SEE 
YOUR OUTPUT SOME OF COMMAS ARE 7 BUT ACTUALLY THERE ARE 6.THIS SAME PROBLEM IS 
OCCURRING DURING ALPHABETS ALSO WHEN I USE THIS-

data.frame(A = unlist(lapply( lapply( sapply(txtvec, strsplit,  
split="a|A"), length) , "-", 1)),C = unlist(lapply( lapply( sapply(txtvec, 
strsplit, split="c|C"),  
length) , "-", 1)),G = unlist(lapply( lapply( sapply(txtvec, strsplit, 
split="g|G"),  
length) , "-", 1)),T = unlist(lapply( lapply( sapply(txtvec, strsplit, 
split="t|T"),  
length) , "-", 1)) )

I DONT KNOW WHY THIS CODE IS NOT CALCULATING THE EXACT NUMBER.CAN YOU PLEASE 
CHECK IT?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Hint improve my code

2011-07-02 Thread EdBo

Hi

I have developed the code below. I am worried that the parameters I want to
be estimated are "not being found" when I ran my code. Is there a way I can
code them so that R recognize that they should be estimated.

This is the error I am getting.

> out1=optim(llik,par=start.par)
Error in pnorm(au_j, mean = b_j * R_m, sd = sigma_j) : 
  object 'au_j' not found

#Yet al_j,au_j,sigma_j and b_j are just estimates that balance the
likelihood function?

llik=function(R_j,R_m)
if(R_j< 0)
{
sum[log(1/(2*pi*(sigma_j^2)))-(1/(2*(sigma_j^2))*(R_j+al_j-b_j*R_m))^2]
}else if(R_j>0)
{
sum[log(1/(2*pi*(sigma_j^2)))-(1/(2*(sigma_j^2))*(R_j+au_j-b_j*R_m))^2]
}else if(R_j==0)
{
sum(log(pnorm(au_j,mean=b_j*R_m,sd=sigma_j)-pnorm(al_j,mean=b_j*R_m,sd=sigma_j)))
}
start.par=c(al_j=0,au_j=0,sigma_j=0.01,b_j=1)
out1=optim(par=start.par,llik)

My Data

 R_j R_m
  2e-03   0.026567295
  3e-03   0.009798475
  5e-02   0.008497274
 -1e-02   0.012464578
 -9e-04   0.002896023
  9e-02   0.000879473
  1e-02   0.003194435
  6e-04   0.010281122

Thank you in advance.

Edward
UCT

--
View this message in context: 
http://r.789695.n4.nabble.com/Hint-improve-my-code-tp3641354p3641354.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] How many times occurs

2011-07-02 Thread Trying To learn again

Hi all,

I have a data matrix likein "input.txt"

8 9 2 5 4 5 8 5 6 6
8 9 2 8 9 2 8 9 2 1
8 9 2 5 4 5 8 5 6 4
 8 9 2 5 4 5 8 5 6 6
8 9 2 8 9 2 8 9 2 1
8 9 2 5 4 5 8 9 2   2


In this example will be an  6x10 matrix (or data frame)

I want to detect how many times in a row appears this combination  8 follewd
by 9 followed by 2, and create a new matrix with only this number of occurs
then if this number is less than "n" I keep the row. For example in the
last row the number n will be 2 because "series" 8 9 2 appears 2 times in
the same row.

I tried this, but doesn´t worksalso tried other thinks but also the same
results:

*dat<-read.table('input1.txt')*
**
**
*dat1 <- dat[ dat[,1]=8 & dat[,2]=9  & dat[,3]=2 ,]=1*
*dat2<-dat[(dat[,2]= 8  & dat[,3]=9  & dat[,4]=2),]=1*
*dat3<-dat[(dat[,5]=8   & dat[,4]=9  & dat[,5]=2),]=1*
*dat4<-dat[(dat[,4]=8 &   dat[,5]=9  & dat[,6]=2),]=1*
*dat5<-dat[(dat[,5]=8 &   dat[,6]=9  & dat[,7]=2),]=1*
*dat6<-dat[(dat[,6]=8 &   dat[,7]=9  & dat[,8]=2),6]=1*
*dat7<-dat[(dat[,7]=8 &dat[,8]=9 & dat[,9]=2),7]=1*
*dat8<-dat[(dat[,8]=8 &dat[,9]=9 & dat[,10]=2),8]=1*
**
datfinal<-dat1+da2+dat3+dat4+dat5+dat6+dat7+dat8

final2 <- dat[ rowSums(datfinal) < 2 , ]

So my last matrix "final2" will be "dat" without the rows that doesn´t pass
the conditions.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] For help in R coding

2011-07-02 Thread David Winsemius



On Jul 2, 2011, at 4:46 PM, Bansal, Vikas wrote:


DEAR ALL,
I TRIED THIS CODE AND THIS IS RUNNING PERFECTLY...

df=read.table("Case2.pileup",fill=T,sep="\t",colClasses = "character")
txt=df[,9]
txtvec <- readLines(textConnection(txt))
dad=data.frame(A = unlist(sapply(gregexpr("A|a", txtvec),  
function(x) if ( x[[1]] != -1)

length(x) else 0 )),
C = unlist(sapply(gregexpr("C|c", txtvec), function(x) if ( x[[1]] ! 
= -1)

length(x) else 0 )),
G = unlist(sapply(gregexpr("G|g", txtvec), function(x) if ( x[[1]] ! 
= -1)

length(x) else 0 )),
T = unlist(sapply(gregexpr("T|t", txtvec), function(x) if ( x[[1]] ! 
= -1)

length(x) else 0 )),
N = unlist(sapply(gregexpr("\\,|\\.", txtvec), function(x) if  
( x[[1]] != -1)

length(x) else 0 )))



The unlist operation is unnecessary since the sapply operation returns  
a vector.  (It doesn't hurt, but it is unnecessary.)





Thanking you,
Warm Regards
Vikas Bansal
Msc Bioinformatics
Kings College London

From: David Winsemius [dwinsem...@comcast.net]
Sent: Saturday, July 02, 2011 9:04 PM
To: Dennis Murphy
Cc: r-help@r-project.org; Bansal, Vikas
Subject: Re: [R] For help in R coding

On reflection and a bit of testing I think the best approach would be
to use gregexpr. For counting the number of commas, this appears quite
straightforward.


sapply(gregexpr("\\,", txtvec), function(x) if ( x[[1]] != -1)

length(x) else 0 )
 [1] 3 3 3 4 3 3 2 6 4 6 6

It easily generalizes to period and the `|` (or) operation on letters.
( did need to add the check since the length of gregexpr is always at
least one but ihas value -1 when there is no match


sapply(gregexpr("t|T", txtvec), function(x) if ( x[[1]] != -1)

length(x) else 0 )
 [1] 0 2 0 0 3 0 0 0 1 0 0


On Jul 2, 2011, at 3:22 PM, Dennis Murphy wrote:


Hi:

There seems to be a problem if the string ends in , or . , which  
makes

it difficult for strsplit() to pick up if it is splitting on those
characters. Here is an alternative, splitting on individual  
characters

and using charmatch() instead:

charsum <- function(s, char) {
  u <- strsplit(s, "")
  sum(sapply(u, function(x) charmatch(x, char)), na.rm = TRUE)
 }

unname(sapply(txtvec, function(x) charsum(x, ',')))
unname(sapply(txtvec, function(x) charsum(x, '.')))

Putting this into a data frame,

dfout <- data.frame(periods = unname(sapply(txtvec, function(x)
charsum(x, '.'))),
  commas = unname(sapply(txtvec,
function(x) charsum(x, '.'))) )
txtvec

HTH,
Dennis

On Sat, Jul 2, 2011 at 10:19 AM, David Winsemius 
wrote:

On Jul 2, 2011, at 12:34 PM, Bansal, Vikas wrote:





Dear all,

I am doing a project on variant calling using R.I am working on
pileup file.There are 10 columns in my data frame and I want to
count the number of A,C,G and T in each row for column 9.example
of
column 9 is given below-

   .a,g,,
   .t,t,,
   .,c,c,
   .,a,,,
   .,t,t,t
   .c,,g,^!.
   .g,ggg.^!,
   .$,.,
   a,g,,t,
   ,.,^!.
   ,$.,.

This is a bit confusing for me as these characters are in one
column
and how can we scan them for each row to print number of A,C,G
and T
for each row.


Seems a bit clunky but this does the job (first the data):


txt <- " .a,g,,


+.t,t,,
+.,c,c,
+.,a,,,
+.,t,t,t
+.c,,g,^!.
+.g,ggg.^!,
+.$,.,
+a,g,,t,
+,.,^!.
+,$.,."


txtvec <- readLines(textConnection(txt))


Now the clunky solution, Basically subtracts 1 from the counts of
"fragments" that result from splitting on each letter in turn.
Could
be made prettier with a function that did the job.


data.frame(A = unlist(lapply( lapply( sapply(txtvec, strsplit,


split="a"), length) , "-", 1)),
+ C = unlist(lapply( lapply( sapply(txtvec, strsplit, split="c"),
length) , "-", 1)),
+ G = unlist(lapply( lapply( sapply(txtvec, strsplit, split="g"),
length) , "-", 1)),
+ T = unlist(lapply( lapply( sapply(txtvec, strsplit, split="t"),
length) , "-", 1)) )
   A C G T
.a,g,,   1 0 1 0
.t,t,, 0 0 0 2
.,c,c, 0 2 0 0
.,a,,, 1 0 0 0
.,t,t,t0 0 0 2
.c,,g,^!.  0 1 1 0
.g,ggg.^!, 0 0 4 0
.$,.,  0 0 0 0
a,g,,t,1 0 1 1
,.,^!. 0 0 0 0
,$.,.  0 0 0 0

Has the advantage that the input data ends up as rownames, which
was a
surprise.

If you wanted to count "A" and "a" as equivalent, then the split
argument should be "a|A"





AS YOU MENTIONED THAT IF I WANT TO COUNT A AND a I SHOULD SPLIT
LIKE
THIS.


BUT CAN I COUNT . AND , ALSO USING-
data.frame(A = unlist(lapply( lapply( sapply(txtvec, strsplit,
split=".|,"), length) , "-", 1)),

I TRIED IT BUT ITS NOT WORKING.IT IS GIVING THE OUTPUT BUT AT SOME
PLACES
IT IS SHOWING MORE NUMBER OF . AND , AND SOMEWHERE IT IS NOT EVEN
CALCULATING AND JUST SHOWING 0.


You need to use valid regex

Re: [R] 2D Random walk

2011-07-02 Thread jim holtman

This should work.  You have to return an object from a function; you
can not try to reference an object within a function.  So the value is
returned and saved in an object called 'rw' since that is how you are
referening it.


walk.2d<-function(n)
{
rw <- matrix(0, ncol = 2, nrow = n)

# generate the indices to set the deltas
indx <- cbind(seq(n), sample(c(1, 2), n, TRUE))

# now set the values
rw[indx] <- sample(c(-1, 1), n, TRUE)

# cumsum the columns
rw[,1] <- cumsum(rw[, 1])
rw[,2] <- cumsum(rw[, 2])

rw  # return value
}
n<-1000

rw <- walk.2d(n)
plot(0, type="n",xlab="x",ylab="y",main="Random Walk Simulation In
Two Dimensions",xlim=range(rw[,1]),ylim=range(rw[,2]))

# use 'segments' to color each path
segments(head(rw[, 1], -1), head(rw[, 2], -1), tail(rw[, 1], -1), tail(rw[,
2], -1), col ="blue")


On Wed, Jun 29, 2011 at 11:30 AM, Komal  wrote:
> HI Jholtman,
>
> walk.2d<-function(n)
> {
> rw <- matrix(0, ncol = 2, nrow = n)
>
> # generate the indices to set the deltas
> indx <- cbind(seq(n), sample(c(1, 2), n, TRUE))
>
> # now set the values
> rw[indx] <- sample(c(-1, 1), n, TRUE)
>
> # cumsum the columns
> rw[,1] <- cumsum(rw[, 1])
> rw[,2] <- cumsum(rw[, 2])
>
> return(rw[,1],rw[,2])
> }
> n<-1000
>
> plot(walk.2d(n), type="n",xlab="x",ylab="y",main="Random Walk Simulation In
> Two Dimensions",xlim=range(rw[,1]),ylim=range(rw[,2]))
>
>        # use 'segments' to color each path
> segments(head(rw[, 1], -1), head(rw[, 2], -1), tail(rw[, 1], -1), tail(rw[,
> 2], -1), col ="blue")
>
> I tried to make it in a function.. its not working I dont know why... please
> help me correcting this code.
>
> --
> View this message in context: 
> http://r.789695.n4.nabble.com/2D-Random-walk-tp3069557p3633205.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] %dopar% parallel processing experiment

2011-07-02 Thread Steve Lianoglou

Here's another datapoint using the multicore package -- which is what
the foreach/doMC combo uses internally:

I halved your A value to 50,000 because I was getting impatient :-)

A=5
randvalues <- abs(rnorm(A))
minfn <- function( x, i ) { log(abs(x))+x^3+i/A+randvalues[i] }
system.time(a1 <- lapply(1:A, function(i) uniroot(minfn, c(1e-20, 9e20), i)))
   user  system elapsed
 40.826   0.108  41.273

library(multicore)
system.time(a2 <- mclapply(1:A, function(i) uniroot(minfn, c(1e-20, 9e20), i)))
   user  system elapsed
 21.218   0.980  23.473

This was on my 2-core laptop, so -- almost perfect (eg. 2x) speedup.
The advantage over snow is that there's less "ceremony" to get easy
||-ization on a single machine, the cons is that it only ||-izes over
1 multicore machine (and I don't think it 100% works on windows (if it
all)), whereas snow and foreach are more flexible in that regard.

-steve


On Sat, Jul 2, 2011 at 2:42 PM, ivo welch  wrote:
> hi uwe--I did not know what snow was.  from my 1 minute reading, it
> seems like a much more involved setup that is much more flexible after
> the setup cost has been incurred (specifically, allowing use of many
> machines).
>
> the attractiveness of the doMC/foreach framework is its simplicity of
> installation and use.
>
> but if I understand what you are telling me, you are using a different
> parallelization framework, and it shows that my example is completed a
> lot faster using this different parallelization framework.  correct?
> if so, the problem is my use of the doMC framework, not the inherent
> cost of dealing with multiple processes.  is this interpretation
> correct?
>
> regards,
>
> /iaw
>
> 
> Ivo Welch (ivo.we...@gmail.com)
> http://www.ivo-welch.info/
>
>
> 2011/7/2 Uwe Ligges :
>>
>>
>> On 02.07.2011 20:04, ivo welch wrote:
>>>
>>> thank you, uwe.  this is a little disappointing.  parallel processing
>>> for embarrassingly simple parallel operations--those needing no
>>> communication---should be feasible if the thread is not always created
>>> and released, but held.  is there light-weight parallel processing
>>> that could facilitate this?
>>
>> Hmmm, now that you asked I checked it myself using snow:
>>
>> On a some years old 2-core AMD64 machine with R-2.13.0 and snow (using SOCK
>> clsuters, i.e. slow communication) I get:
>>
>>
>>
>>> system.time(parSapply(cl, 1:A, function(i) uniroot(minfn, c(1e-20,9e20),
>>> i)))
>>   user  system elapsed
>>   3.10    0.19   51.43
>>
>> while on a single core without parallelization framework:
>>
>>> system.time(sapply(1:A, function(i) uniroot(minfn, c(1e-20,9e20), i)))
>>   user  system elapsed
>>  93.74    0.09   94.24
>>
>> Hence (although my prior assumption was that the overhead would be big also
>> for other frameworks than foreach) it scales perfectly well with snow,
>> perhaps you have to use foreach in a different way?
>>
>> Best,
>> Uwe Ligges
>>
>>
>>
>>
>>
>>>
>>> regards,
>>>
>>> /iaw
>>>
>>>
>>> 2011/7/2 Uwe Ligges:


 On 02.07.2011 19:32, ivo welch wrote:
>
> dear R experts---
>
> I am experimenting with multicore processing, so far with pretty
> disappointing results.  Here is my simple example:
>
> A<- 10
> randvalues<- abs(rnorm(A))
> minfn<- function( x, i ) { log(abs(x))+x^3+i/A+randvalues[i] }  ## an
> arbitrary function
>
> ARGV<- commandArgs(trailingOnly=TRUE)
>
> if (ARGV[1] == "do-onecore") {
>    library(foreach)
>    discard<- foreach(i = 1:A) %do% uniroot( minfn, c(1e-20,9e20), i ) }
> else
> if (ARGV[1] == "do-multicore") {
>    library(doMC)
>    registerDoMC()
>    cat("You have", getDoParWorkers(), "cores\n")
>    discard<- foreach(i = 1:A) %dopar% uniroot( minfn, c(1e-20,9e20), i )
> }
> else
> if (ARGV[1] == "plain")
>    for (i in 1:A) discard<- uniroot( minfn, c(1e-20,9e20), i ) else
> cat("sorry, but argument", ARGV[1], "is not
> plain|do-onecore|do-multicore\n")
>
>
> on my Mac Pro 3,1 (2 quad-cores), R 2.12.0, which reports 8 cores,
>
>   "plain" takes about 68 seconds (real and user, using the unix timing
> function).
>   "do-onecore" takes about 300 seconds.
>   "do-multicore" takes about 210 seconds real, (300 seconds user).
>
> this seems pretty disappointing.  the cores are not used for the most
> part, either.  feedback appreciated.


 Feedback is that a single computation within your foreach loop is so
 quick
 that the overhead of communicating data and results between processes
 costs
 more time than the actual evaluation, hence you are faster with a single
 process.

 What you should do is:

 write code that does, e.g., 1 iterations within 10 other iterations
 and
 just do a foreach loop around the outer 10. Then you will probably be
 much
 faster (without testing). But this is essentially the example I am using

Re: [R] For help in R coding

2011-07-02 Thread David Winsemius

On reflection and a bit of testing I think the best approach would be  
to use gregexpr. For counting the number of commas, this appears quite  
straightforward.

> sapply(gregexpr("\\,", txtvec), function(x) if ( x[[1]] != -1)  
length(x) else 0 )

 [1] 3 3 3 4 3 3 2 6 4 6 6

It easily generalizes to period and the `|` (or) operation on letters.  
( did need to add the check since the length of gregexpr is always at  
least one but ihas value -1 when there is no match

> sapply(gregexpr("t|T", txtvec), function(x) if ( x[[1]] != -1)  
length(x) else 0 )

 [1] 0 2 0 0 3 0 0 0 1 0 0

On Jul 2, 2011, at 3:22 PM, Dennis Murphy wrote:

Hi:

There seems to be a problem if the string ends in , or . , which makes
it difficult for strsplit() to pick up if it is splitting on those
characters. Here is an alternative, splitting on individual characters
and using charmatch() instead:

charsum <- function(s, char) {
   u <- strsplit(s, "")
   sum(sapply(u, function(x) charmatch(x, char)), na.rm = TRUE)
  }

unname(sapply(txtvec, function(x) charsum(x, ',')))
unname(sapply(txtvec, function(x) charsum(x, '.')))

Putting this into a data frame,

dfout <- data.frame(periods = unname(sapply(txtvec, function(x)
charsum(x, '.'))),
   commas = unname(sapply(txtvec,
function(x) charsum(x, '.'))) )
txtvec

HTH,
Dennis

On Sat, Jul 2, 2011 at 10:19 AM, David Winsemius > wrote:

On Jul 2, 2011, at 12:34 PM, Bansal, Vikas wrote:

Dear all,

I am doing a project on variant calling using R.I am working on
pileup file.There are 10 columns in my data frame and I want to
count the number of A,C,G and T in each row for column 9.example  
of

column 9 is given below-

.a,g,,
.t,t,,
.,c,c,
.,a,,,
.,t,t,t
.c,,g,^!.
.g,ggg.^!,
.$,.,
a,g,,t,
,.,^!.
,$.,.

This is a bit confusing for me as these characters are in one  
column
and how can we scan them for each row to print number of A,C,G  
and T

for each row.

Seems a bit clunky but this does the job (first the data):

txt <- " .a,g,,

+.t,t,,
+.,c,c,
+.,a,,,
+.,t,t,t
+.c,,g,^!.
+.g,ggg.^!,
+.$,.,
+a,g,,t,
+,.,^!.
+,$.,."

txtvec <- readLines(textConnection(txt))

Now the clunky solution, Basically subtracts 1 from the counts of
"fragments" that result from splitting on each letter in turn.  
Could

be made prettier with a function that did the job.

data.frame(A = unlist(lapply( lapply( sapply(txtvec, strsplit,

split="a"), length) , "-", 1)),
+ C = unlist(lapply( lapply( sapply(txtvec, strsplit, split="c"),
length) , "-", 1)),
+ G = unlist(lapply( lapply( sapply(txtvec, strsplit, split="g"),
length) , "-", 1)),
+ T = unlist(lapply( lapply( sapply(txtvec, strsplit, split="t"),
length) , "-", 1)) )
A C G T
.a,g,,   1 0 1 0
 .t,t,, 0 0 0 2
 .,c,c, 0 2 0 0
 .,a,,, 1 0 0 0
 .,t,t,t0 0 0 2
 .c,,g,^!.  0 1 1 0
 .g,ggg.^!, 0 0 4 0
 .$,.,  0 0 0 0
 a,g,,t,1 0 1 1
 ,.,^!. 0 0 0 0
 ,$.,.  0 0 0 0

Has the advantage that the input data ends up as rownames, which  
was a

surprise.

If you wanted to count "A" and "a" as equivalent, then the split
argument should be "a|A"

AS YOU MENTIONED THAT IF I WANT TO COUNT A AND a I SHOULD SPLIT  
LIKE

THIS.

BUT CAN I COUNT . AND , ALSO USING-
data.frame(A = unlist(lapply( lapply( sapply(txtvec, strsplit,
split=".|,"), length) , "-", 1)),

I TRIED IT BUT ITS NOT WORKING.IT IS GIVING THE OUTPUT BUT AT SOME  
PLACES

IT IS SHOWING MORE NUMBER OF . AND , AND SOMEWHERE IT IS NOT EVEN
CALCULATING AND JUST SHOWING 0.

You need to use valid regex expressions for 'split'. Since "." and  
"," are
special characters they need to be escaped when you wnat the  
literals to be

recognized as such.

I haven't figured out why but you need to drop the final operation of
subtracting 1 from the values when counting commas:

data.frame(periods = unlist(lapply( lapply( sapply(txtvec, strsplit,
split="\\."), length) , "-", 1))
 ,commas = unlist( lapply( sapply(txtvec, strsplit,
split="\\,"), length) ) )
  periods commas
 .a,g,,  1  3
   .t,t,,   1  3
   .,c,c,   1  3
   .,a,,,   1  4
   .,t,t,t  1  4
   .c,,g,^!.1  4
   .g,ggg.^!,   2  2
   .$,.,2  6
   a,g,,t,  0  4
   ,.,^!.   1  7
   ,$.,.1  7

--

David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEA

Re: [R] Vector of functions

2011-07-02 Thread Dennis Murphy

Hi,

To amplify on Josh's cogent remarks, the reason you can't create a
vector of functions is because vectors are composed of atomic objects
and functions are not atomic objects:

> is.atomic(f.3)
[1] FALSE
> is.atomic(1)
[1] TRUE
> is.atomic(TRUE)
[1] TRUE
> is.atomic('a')
[1] TRUE

'Vectors' of functions are actually (and appropriately) assigned to
lists. You can check the structure of any R object with str():

> fns <- c(f.1, f.2, f.3)
> str(fns)   ## Oops, R coerced your vector into a list
List of 3
 $ :function (a)
  ..- attr(*, "source")= chr [1:3] "function(a) {" ...
 $ :function (a)
  ..- attr(*, "source")= chr [1:3] "function(a) {" ...
 $ :function (a)
  ..- attr(*, "source")= chr [1:3] "function(a) {" ...

> funs <- list(f.1, f.2, f.3)# clearer code
> str(funs)
List of 3
 $ :function (a)
  ..- attr(*, "source")= chr [1:3] "function(a) {" ...
 $ :function (a)
  ..- attr(*, "source")= chr [1:3] "function(a) {" ...
 $ :function (a)
  ..- attr(*, "source")= chr [1:3] "function(a) {" ...
> identical(fns, funs)   # Are the two objects identical?
[1] TRUE

You can have all kinds of fun with (lists of) functions. I wanted to
plot your second function:

> curve(funs[[2]], -10, 10)
Error in curve(funs[[2]], -10, 10) :
  'expr' must be a function, call or an expression containing 'x'

## Oh yeah, the argument of the function that curve() needs is x.
## We don't need to rewrite the function, though:

> curve(funs[[2]](x), -10, 10)

Now one can verify that the function calls make sense:

> funs[[2]](0)
[1] 1
> funs[[2]](pi/2)
[1] 0.1588316
> funs[[2]](2 * pi)
[1] 1.394927e-05
> funs[[2]](Inf)
[1] 0

If you're going to be doing a lot of work with functions, I'd suggest
picking up a book on R programming. Fortunately, there's a good one
auf Deutsch by Uwe Ligges: Programmieren mit R.
http://www.statistik.tu-dortmund.de/~ligges/PmitR/
and several good ones in English. See the book list page at CRAN:
http://www.r-project.org/doc/bib/R-books.html

HTH,
Dennis

On Sat, Jul 2, 2011 at 9:36 AM, Thomas Stibor  wrote:
> Hi there,
>
> I have a vector of some functions e.g.
> #-#
> f.1 <- function(a) {
>  return( a );
> }
> f.2 <- function(a) {
>  return( 1 - (tanh(a))^2 );
> }
> f.3 <- function(a) {
>  return( 1 / (1+exp(-a)) * (1 - (1 / (1+exp(-a );
> }
>
> func.l <- c(f.1, f.2, f.3);
> #-#
>
> and would like to calculate the output value of a function in the
> vector, that is, pick e.g. function at position 2 in func.l and calculate the
> output value for a=42.
>
> func.l[2](42); # gives error
>
> f <- func.l[2];
> f(42); # gives error
>
> Is there an easy way to solve this problem?
>
> Cheers,
>  Thomas
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Speed Advice for R --- avoid data frames

2011-07-02 Thread ivo welch

hi uwe---thanks for the clarification.  of course, my example should always
be done in vectorized form.  I only used it to show how iterative access
compares in the simplest possible fashion.  <100 accesses per seconds is
REALLY slow, though.

I don't know R internals and the learning curve would be steep.  moreover,
there is no guarantee that changes I would make would be accepted.  so, I
cannot do this.

however, for an R expert, this should not be too difficult.  conceptually,
if data frame element access primitives are create/write/read/destroy in the
code, then it's truly trivial.  just add a matrix (dim the same as the data
frame) of byte pointers to point at the storage upon creation/change time.
 this would be quick-and-dirty.  for curiosity, do you know which source
file has the data frame internals?  maybe I will get tempted anyway if it is
simple enough.

(a more efficient but more involved way to do this would be to store a data
frame internally always as a matrix of data pointers, but this would
probably require more surgery.)

It is also not as important for me, as it is for others...to give a good
impression to those that are not aware of the tradeoffs---which is most
people considering to adopt R.

/iaw



Ivo Welch (ivo.we...@gmail.com)




2011/7/2 Uwe Ligges 

> Some comments:
>
> the comparison matrix rows vs. matrix columns is incorrect: Note that R has
> lazy evaluation, hence you construct your matrix in the timing for the rows
> and it is already constructed in the timing for the columns, hence you want
> to use:
>
>  M <- matrix( rnorm(C*R), nrow=R )
>  D <- as.data.frame(matrix( rnorm(C*R), nrow=R ) )
>  example(M)
>  example(D)
>
> Further on, you are correct with you statement that data.frame indexing is
> much slower, but if you can store your data in matrix form, just go on as it
> is.
>
> I doubt anybody is really going to make the index operation you cited
> within a loop. Then, with a data.frame, I can live with many vectorized
> replacements again:
>
> > system.time(D[,20] <- sqrt(abs(D[,20])) + rnorm(1000))
>   user  system elapsed
>   0.010.000.01
>
> > system.time(D[20,] <- sqrt(abs(D[20,])) + rnorm(1000))
>   user  system elapsed
>   0.510.000.52
>
> OK, it would be nice to do that faster, but this is not easy. I think R
> Core is happy to see contributions to make it faster without breaking
> existing features.
>
>
>
> Best wishes,
> Uwe
>
>
>
>
> On 02.07.2011 20:35, ivo welch wrote:
>
>> This email is intended for R users that are not that familiar with R
>> internals and are searching google about how to speed up R.
>>
>> Despite common misperception, R is not slow when it comes to iterative
>> access.  R is fast when it comes to matrices.  R is very slow when it
>> comes to iterative access into data frames.  Such access occurs when a
>> user uses "data$varname[index]", which is a very common operation.  To
>> illustrate, run the following program:
>>
>> R<- 1000; C<- 1000
>>
>> example<- function(m) {
>>   cat("rows: "); cat(system.time( for (r in 1:R) m[r,20]<-
>> sqrt(abs(m[r,20])) + rnorm(1) ), "\n")
>>   cat("columns: "); cat(system.time(for (c in 1:C) m[20,c]<-
>> sqrt(abs(m[20,c])) + rnorm(1)), "\n")
>>   if (is.data.frame(m)) { cat("df: columns as names: ");
>> cat(system.time(for (c in 1:C) m[[c]][20]<- sqrt(abs(m[[c]][20])) +
>> rnorm(1)), "\n") }
>> }
>>
>> cat("\n Now as matrix\n")
>> example( matrix( rnorm(C*R), nrow=R ) )
>>
>> cat("\n Now as data frame\n")
>> example( as.data.frame( matrix( rnorm(C*R), nrow=R ) ) )
>>
>>
>> The following are the reported timing under R 2.12.0 on a Mac Pro 3,1
>> with ample RAM:
>>
>> matrix, columns: 0.01s
>> matrix, rows: 0.175s
>> data frame, columns: 53s
>> data frame, rows: 56s
>> data frame, names: 58s
>>
>> Data frame access is about 5,000 times slower than matrix column
>> access, and 300 times slower than matrix row access.  R's data frame
>> operational speed is an amazing 40 data accesses per seconds.  I have
>> not seen access numbers this low for decades.
>>
>>
>> How to avoid it?  Not easy.  One way is to create multiple matrices,
>> and group them as an object.  of course, this loses a lot of features
>> of R.  Another way is to copy all data used in calculations out of the
>> data frame into a matrix, do the operations, and then copy them back.
>> not ideal, either.
>>
>> In my opinion, this is an R design flow.  Data frames are the
>> fundamental unit of much statistical analysis, and should be fast.  I
>> think R lacks any indexing into data frames.  Turning on indexing of
>> data frames should at least be an optional feature.
>>
>>
>> I hope this message post helps others.
>>
>> /iaw
>>
>> 
>> Ivo Welch (ivo.we...@gmail.com)
>> http://www.ivo-welch.info/
>>
>> __**
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/**listinfo/r-help
>> PLEASE do read the postin

Re: [R] For help in R coding

2011-07-02 Thread Dennis Murphy

Hi:

There seems to be a problem if the string ends in , or . , which makes
it difficult for strsplit() to pick up if it is splitting on those
characters. Here is an alternative, splitting on individual characters
and using charmatch() instead:

charsum <- function(s, char) {
u <- strsplit(s, "")
sum(sapply(u, function(x) charmatch(x, char)), na.rm = TRUE)
   }

unname(sapply(txtvec, function(x) charsum(x, ',')))
unname(sapply(txtvec, function(x) charsum(x, '.')))

Putting this into a data frame,

dfout <- data.frame(periods = unname(sapply(txtvec, function(x)
charsum(x, '.'))),
commas = unname(sapply(txtvec,
function(x) charsum(x, '.'))) )
txtvec

HTH,
Dennis

On Sat, Jul 2, 2011 at 10:19 AM, David Winsemius  wrote:
>
> On Jul 2, 2011, at 12:34 PM, Bansal, Vikas wrote:
>
>>
>>
 Dear all,

 I am doing a project on variant calling using R.I am working on
 pileup file.There are 10 columns in my data frame and I want to
 count the number of A,C,G and T in each row for column 9.example of
 column 9 is given below-

         .a,g,,
         .t,t,,
         .,c,c,
         .,a,,,
         .,t,t,t
         .c,,g,^!.
         .g,ggg.^!,
         .$,.,
         a,g,,t,
         ,.,^!.
         ,$.,.

 This is a bit confusing for me as these characters are in one column
 and how can we scan them for each row to print number of A,C,G and T
 for each row.
>>>
>>> Seems a bit clunky but this does the job (first the data):

 txt <- " .a,g,,
>>>
>>> +            .t,t,,
>>> +            .,c,c,
>>> +            .,a,,,
>>> +            .,t,t,t
>>> +            .c,,g,^!.
>>> +            .g,ggg.^!,
>>> +            .$,.,
>>> +            a,g,,t,
>>> +            ,.,^!.
>>> +            ,$.,."
>>>
 txtvec <- readLines(textConnection(txt))
>>>
>>> Now the clunky solution, Basically subtracts 1 from the counts of
>>> "fragments" that result from splitting on each letter in turn. Could
>>> be made prettier with a function that did the job.
>>>
 data.frame(A = unlist(lapply( lapply( sapply(txtvec, strsplit,
>>>
>>> split="a"), length) , "-", 1)),
>>> + C = unlist(lapply( lapply( sapply(txtvec, strsplit, split="c"),
>>> length) , "-", 1)),
>>> + G = unlist(lapply( lapply( sapply(txtvec, strsplit, split="g"),
>>> length) , "-", 1)),
>>> + T = unlist(lapply( lapply( sapply(txtvec, strsplit, split="t"),
>>> length) , "-", 1)) )
>>>                     A C G T
>>> .a,g,,               1 0 1 0
>>>          .t,t,,     0 0 0 2
>>>          .,c,c,     0 2 0 0
>>>          .,a,,,     1 0 0 0
>>>          .,t,t,t    0 0 0 2
>>>          .c,,g,^!.  0 1 1 0
>>>          .g,ggg.^!, 0 0 4 0
>>>          .$,.,  0 0 0 0
>>>          a,g,,t,    1 0 1 1
>>>          ,.,^!. 0 0 0 0
>>>          ,$.,.  0 0 0 0
>>>
>>> Has the advantage that the input data ends up as rownames, which was a
>>> surprise.
>>>
>>> If you wanted to count "A" and "a" as equivalent, then the split
>>> argument should be "a|A"
>>>
>>>
>>
 AS YOU MENTIONED THAT IF I WANT TO COUNT A AND a I SHOULD SPLIT LIKE
 THIS.
>>
>> BUT CAN I COUNT . AND , ALSO USING-
>> data.frame(A = unlist(lapply( lapply( sapply(txtvec, strsplit,
>> split=".|,"), length) , "-", 1)),
>>
>> I TRIED IT BUT ITS NOT WORKING.IT IS GIVING THE OUTPUT BUT AT SOME PLACES
>> IT IS SHOWING MORE NUMBER OF . AND , AND SOMEWHERE IT IS NOT EVEN
>> CALCULATING AND JUST SHOWING 0.
>
> You need to use valid regex expressions for 'split'. Since "." and "," are
> special characters they need to be escaped when you wnat the literals to be
> recognized as such.
>
> I haven't figured out why but you need to drop the final operation of
> subtracting 1 from the values when counting commas:
>
> data.frame(periods = unlist(lapply( lapply( sapply(txtvec, strsplit,
>                             split="\\."), length) , "-", 1))
>  ,commas = unlist( lapply( sapply(txtvec, strsplit,
>                             split="\\,"), length) ) )
>                       periods commas
>  .a,g,,                      1      3
>            .t,t,,           1      3
>            .,c,c,           1      3
>            .,a,,,           1      4
>            .,t,t,t          1      4
>            .c,,g,^!.        1      4
>            .g,ggg.^!,       2      2
>            .$,.,        2      6
>            a,g,,t,          0      4
>            ,.,^!.       1      7
>            ,$.,.        1      7
>
> --
>
> David Winsemius, MD
> West Hartford, CT
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help

Re: [R] Repeating a function in R

2011-07-02 Thread Uwe Ligges




On 02.07.2011 20:51, Daniel Malter wrote:

You can just tell the function to create 1000 random numbers. See ?runif for
the specifics. The arguments are n, min, and max. 'n' is the one you are
looking for.


Thanks for providing help on R-help, but for the future please

- respond to the OP rather than to the list only

- cite the OP's message so that anybody else on this mailing list 
understand the context of your message.


Best,
Uwe Ligges





Da.

--
View this message in context: 
http://r.789695.n4.nabble.com/Repeating-a-function-in-R-tp3640508p3640966.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Speed Advice for R --- avoid data frames

2011-07-02 Thread Uwe Ligges

Some comments:

the comparison matrix rows vs. matrix columns is incorrect: Note that R 
has lazy evaluation, hence you construct your matrix in the timing for 
the rows and it is already constructed in the timing for the columns, 
hence you want to use:

 M <- matrix( rnorm(C*R), nrow=R )
 D <- as.data.frame(matrix( rnorm(C*R), nrow=R ) )
 example(M)
 example(D)

Further on, you are correct with you statement that data.frame indexing 
is much slower, but if you can store your data in matrix form, just go 
on as it is.

I doubt anybody is really going to make the index operation you cited 
within a loop. Then, with a data.frame, I can live with many vectorized 
replacements again:

> system.time(D[,20] <- sqrt(abs(D[,20])) + rnorm(1000))
   user  system elapsed
   0.010.000.01

> system.time(D[20,] <- sqrt(abs(D[20,])) + rnorm(1000))
   user  system elapsed
   0.510.000.52

OK, it would be nice to do that faster, but this is not easy. I think R 
Core is happy to see contributions to make it faster without breaking 
existing features.

Best wishes,
Uwe

On 02.07.2011 20:35, ivo welch wrote:

This email is intended for R users that are not that familiar with R
internals and are searching google about how to speed up R.

Despite common misperception, R is not slow when it comes to iterative
access.  R is fast when it comes to matrices.  R is very slow when it
comes to iterative access into data frames.  Such access occurs when a
user uses "data$varname[index]", which is a very common operation.  To
illustrate, run the following program:

R<- 1000; C<- 1000

example<- function(m) {
   cat("rows: "); cat(system.time( for (r in 1:R) m[r,20]<-
sqrt(abs(m[r,20])) + rnorm(1) ), "\n")
   cat("columns: "); cat(system.time(for (c in 1:C) m[20,c]<-
sqrt(abs(m[20,c])) + rnorm(1)), "\n")
   if (is.data.frame(m)) { cat("df: columns as names: ");
cat(system.time(for (c in 1:C) m[[c]][20]<- sqrt(abs(m[[c]][20])) +
rnorm(1)), "\n") }
}

cat("\n Now as matrix\n")
example( matrix( rnorm(C*R), nrow=R ) )

cat("\n Now as data frame\n")
example( as.data.frame( matrix( rnorm(C*R), nrow=R ) ) )

The following are the reported timing under R 2.12.0 on a Mac Pro 3,1
with ample RAM:

matrix, columns: 0.01s
matrix, rows: 0.175s
data frame, columns: 53s
data frame, rows: 56s
data frame, names: 58s

Data frame access is about 5,000 times slower than matrix column
access, and 300 times slower than matrix row access.  R's data frame
operational speed is an amazing 40 data accesses per seconds.  I have
not seen access numbers this low for decades.

How to avoid it?  Not easy.  One way is to create multiple matrices,
and group them as an object.  of course, this loses a lot of features
of R.  Another way is to copy all data used in calculations out of the
data frame into a matrix, do the operations, and then copy them back.
not ideal, either.

In my opinion, this is an R design flow.  Data frames are the
fundamental unit of much statistical analysis, and should be fast.  I
think R lacks any indexing into data frames.  Turning on indexing of
data frames should at least be an optional feature.

I hope this message post helps others.

/iaw

Ivo Welch (ivo.we...@gmail.com)
http://www.ivo-welch.info/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Repeating a function in R

2011-07-02 Thread Daniel Malter

If you want to repeat an entire function, use replicate as in

replicate(15,sapply(1,function(x) runif(x)))

Here, sapply(1,function(x) runif(x)) draws on uniformly distributed
variable. replicate(15,...) is the wrapper function that tells to do this 15
times. The benefit here is that you can replicate a more complex function.
However, if you just need 1000 random draws, I would use the easier approach
I suggested above.

Best,
Daniel

--
View this message in context: 
http://r.789695.n4.nabble.com/Repeating-a-function-in-R-tp3640508p3640982.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Repeating a function in R

2011-07-02 Thread Daniel Malter

You can just tell the function to create 1000 random numbers. See ?runif for
the specifics. The arguments are n, min, and max. 'n' is the one you are
looking for.

Da.

--
View this message in context: 
http://r.789695.n4.nabble.com/Repeating-a-function-in-R-tp3640508p3640966.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] %dopar% parallel processing experiment

2011-07-02 Thread Uwe Ligges




On 02.07.2011 20:42, ivo welch wrote:

hi uwe--I did not know what snow was.  from my 1 minute reading, it
seems like a much more involved setup that is much more flexible after
the setup cost has been incurred (specifically, allowing use of many
machines).

the attractiveness of the doMC/foreach framework is its simplicity of
installation and use.

but if I understand what you are telling me, you are using a different
parallelization framework, and it shows that my example is completed a
lot faster using this different parallelization framework.  correct?
if so, the problem is my use of the doMC framework, not the inherent
cost of dealing with multiple processes.  is this interpretation
correct?



Indeed.

Uwe




regards,

/iaw


Ivo Welch (ivo.we...@gmail.com)
http://www.ivo-welch.info/


2011/7/2 Uwe Ligges:



On 02.07.2011 20:04, ivo welch wrote:


thank you, uwe.  this is a little disappointing.  parallel processing
for embarrassingly simple parallel operations--those needing no
communication---should be feasible if the thread is not always created
and released, but held.  is there light-weight parallel processing
that could facilitate this?


Hmmm, now that you asked I checked it myself using snow:

On a some years old 2-core AMD64 machine with R-2.13.0 and snow (using SOCK
clsuters, i.e. slow communication) I get:




system.time(parSapply(cl, 1:A, function(i) uniroot(minfn, c(1e-20,9e20),
i)))

   user  system elapsed
   3.100.19   51.43

while on a single core without parallelization framework:


system.time(sapply(1:A, function(i) uniroot(minfn, c(1e-20,9e20), i)))

   user  system elapsed
  93.740.09   94.24

Hence (although my prior assumption was that the overhead would be big also
for other frameworks than foreach) it scales perfectly well with snow,
perhaps you have to use foreach in a different way?

Best,
Uwe Ligges







regards,

/iaw


2011/7/2 Uwe Ligges:



On 02.07.2011 19:32, ivo welch wrote:


dear R experts---

I am experimenting with multicore processing, so far with pretty
disappointing results.  Here is my simple example:

A<- 10
randvalues<- abs(rnorm(A))
minfn<- function( x, i ) { log(abs(x))+x^3+i/A+randvalues[i] }  ## an
arbitrary function

ARGV<- commandArgs(trailingOnly=TRUE)

if (ARGV[1] == "do-onecore") {
library(foreach)
discard<- foreach(i = 1:A) %do% uniroot( minfn, c(1e-20,9e20), i ) }
else
if (ARGV[1] == "do-multicore") {
library(doMC)
registerDoMC()
cat("You have", getDoParWorkers(), "cores\n")
discard<- foreach(i = 1:A) %dopar% uniroot( minfn, c(1e-20,9e20), i )
}
else
if (ARGV[1] == "plain")
for (i in 1:A) discard<- uniroot( minfn, c(1e-20,9e20), i ) else
cat("sorry, but argument", ARGV[1], "is not
plain|do-onecore|do-multicore\n")


on my Mac Pro 3,1 (2 quad-cores), R 2.12.0, which reports 8 cores,

   "plain" takes about 68 seconds (real and user, using the unix timing
function).
   "do-onecore" takes about 300 seconds.
   "do-multicore" takes about 210 seconds real, (300 seconds user).

this seems pretty disappointing.  the cores are not used for the most
part, either.  feedback appreciated.



Feedback is that a single computation within your foreach loop is so
quick
that the overhead of communicating data and results between processes
costs
more time than the actual evaluation, hence you are faster with a single
process.

What you should do is:

write code that does, e.g., 1 iterations within 10 other iterations
and
just do a foreach loop around the outer 10. Then you will probably be
much
faster (without testing). But this is essentially the example I am using
for
teaching to show when not to do parallel processing.

Best,
Uwe Ligges







/iaw



Ivo Welch (ivo.we...@gmail.com)

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.






__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] %dopar% parallel processing experiment

2011-07-02 Thread ivo welch

hi uwe--I did not know what snow was.  from my 1 minute reading, it
seems like a much more involved setup that is much more flexible after
the setup cost has been incurred (specifically, allowing use of many
machines).

the attractiveness of the doMC/foreach framework is its simplicity of
installation and use.

but if I understand what you are telling me, you are using a different
parallelization framework, and it shows that my example is completed a
lot faster using this different parallelization framework.  correct?
if so, the problem is my use of the doMC framework, not the inherent
cost of dealing with multiple processes.  is this interpretation
correct?

regards,

/iaw


Ivo Welch (ivo.we...@gmail.com)
http://www.ivo-welch.info/


2011/7/2 Uwe Ligges :
>
>
> On 02.07.2011 20:04, ivo welch wrote:
>>
>> thank you, uwe.  this is a little disappointing.  parallel processing
>> for embarrassingly simple parallel operations--those needing no
>> communication---should be feasible if the thread is not always created
>> and released, but held.  is there light-weight parallel processing
>> that could facilitate this?
>
> Hmmm, now that you asked I checked it myself using snow:
>
> On a some years old 2-core AMD64 machine with R-2.13.0 and snow (using SOCK
> clsuters, i.e. slow communication) I get:
>
>
>
>> system.time(parSapply(cl, 1:A, function(i) uniroot(minfn, c(1e-20,9e20),
>> i)))
>   user  system elapsed
>   3.10    0.19   51.43
>
> while on a single core without parallelization framework:
>
>> system.time(sapply(1:A, function(i) uniroot(minfn, c(1e-20,9e20), i)))
>   user  system elapsed
>  93.74    0.09   94.24
>
> Hence (although my prior assumption was that the overhead would be big also
> for other frameworks than foreach) it scales perfectly well with snow,
> perhaps you have to use foreach in a different way?
>
> Best,
> Uwe Ligges
>
>
>
>
>
>>
>> regards,
>>
>> /iaw
>>
>>
>> 2011/7/2 Uwe Ligges:
>>>
>>>
>>> On 02.07.2011 19:32, ivo welch wrote:

 dear R experts---

 I am experimenting with multicore processing, so far with pretty
 disappointing results.  Here is my simple example:

 A<- 10
 randvalues<- abs(rnorm(A))
 minfn<- function( x, i ) { log(abs(x))+x^3+i/A+randvalues[i] }  ## an
 arbitrary function

 ARGV<- commandArgs(trailingOnly=TRUE)

 if (ARGV[1] == "do-onecore") {
    library(foreach)
    discard<- foreach(i = 1:A) %do% uniroot( minfn, c(1e-20,9e20), i ) }
 else
 if (ARGV[1] == "do-multicore") {
    library(doMC)
    registerDoMC()
    cat("You have", getDoParWorkers(), "cores\n")
    discard<- foreach(i = 1:A) %dopar% uniroot( minfn, c(1e-20,9e20), i )
 }
 else
 if (ARGV[1] == "plain")
    for (i in 1:A) discard<- uniroot( minfn, c(1e-20,9e20), i ) else
 cat("sorry, but argument", ARGV[1], "is not
 plain|do-onecore|do-multicore\n")


 on my Mac Pro 3,1 (2 quad-cores), R 2.12.0, which reports 8 cores,

   "plain" takes about 68 seconds (real and user, using the unix timing
 function).
   "do-onecore" takes about 300 seconds.
   "do-multicore" takes about 210 seconds real, (300 seconds user).

 this seems pretty disappointing.  the cores are not used for the most
 part, either.  feedback appreciated.
>>>
>>>
>>> Feedback is that a single computation within your foreach loop is so
>>> quick
>>> that the overhead of communicating data and results between processes
>>> costs
>>> more time than the actual evaluation, hence you are faster with a single
>>> process.
>>>
>>> What you should do is:
>>>
>>> write code that does, e.g., 1 iterations within 10 other iterations
>>> and
>>> just do a foreach loop around the outer 10. Then you will probably be
>>> much
>>> faster (without testing). But this is essentially the example I am using
>>> for
>>> teaching to show when not to do parallel processing.
>>>
>>> Best,
>>> Uwe Ligges
>>>
>>>
>>>
>>>
>>>
>>>
 /iaw


 
 Ivo Welch (ivo.we...@gmail.com)

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
>>>
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Speed Advice for R --- avoid data frames

2011-07-02 Thread ivo welch

This email is intended for R users that are not that familiar with R
internals and are searching google about how to speed up R.

Despite common misperception, R is not slow when it comes to iterative
access.  R is fast when it comes to matrices.  R is very slow when it
comes to iterative access into data frames.  Such access occurs when a
user uses "data$varname[index]", which is a very common operation.  To
illustrate, run the following program:

R <- 1000; C <- 1000

example <- function(m) {
  cat("rows: "); cat(system.time( for (r in 1:R) m[r,20] <-
sqrt(abs(m[r,20])) + rnorm(1) ), "\n")
  cat("columns: "); cat(system.time(for (c in 1:C) m[20,c] <-
sqrt(abs(m[20,c])) + rnorm(1)), "\n")
  if (is.data.frame(m)) { cat("df: columns as names: ");
cat(system.time(for (c in 1:C) m[[c]][20] <- sqrt(abs(m[[c]][20])) +
rnorm(1)), "\n") }
}

cat("\n Now as matrix\n")
example( matrix( rnorm(C*R), nrow=R ) )

cat("\n Now as data frame\n")
example( as.data.frame( matrix( rnorm(C*R), nrow=R ) ) )


The following are the reported timing under R 2.12.0 on a Mac Pro 3,1
with ample RAM:

matrix, columns: 0.01s
matrix, rows: 0.175s
data frame, columns: 53s
data frame, rows: 56s
data frame, names: 58s

Data frame access is about 5,000 times slower than matrix column
access, and 300 times slower than matrix row access.  R's data frame
operational speed is an amazing 40 data accesses per seconds.  I have
not seen access numbers this low for decades.


How to avoid it?  Not easy.  One way is to create multiple matrices,
and group them as an object.  of course, this loses a lot of features
of R.  Another way is to copy all data used in calculations out of the
data frame into a matrix, do the operations, and then copy them back.
not ideal, either.

In my opinion, this is an R design flow.  Data frames are the
fundamental unit of much statistical analysis, and should be fast.  I
think R lacks any indexing into data frames.  Turning on indexing of
data frames should at least be an optional feature.


I hope this message post helps others.

/iaw


Ivo Welch (ivo.we...@gmail.com)
http://www.ivo-welch.info/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] %dopar% parallel processing experiment

2011-07-02 Thread Uwe Ligges

On 02.07.2011 20:04, ivo welch wrote:

thank you, uwe.  this is a little disappointing.  parallel processing
for embarrassingly simple parallel operations--those needing no
communication---should be feasible if the thread is not always created
and released, but held.  is there light-weight parallel processing
that could facilitate this?

Hmmm, now that you asked I checked it myself using snow:

On a some years old 2-core AMD64 machine with R-2.13.0 and snow (using 
SOCK clsuters, i.e. slow communication) I get:

> system.time(parSapply(cl, 1:A, function(i) uniroot(minfn, 
c(1e-20,9e20), i)))

   user  system elapsed
   3.100.19   51.43

while on a single core without parallelization framework:

> system.time(sapply(1:A, function(i) uniroot(minfn, c(1e-20,9e20), i)))
   user  system elapsed
  93.740.09   94.24

Hence (although my prior assumption was that the overhead would be big 
also for other frameworks than foreach) it scales perfectly well with 
snow, perhaps you have to use foreach in a different way?

Best,
Uwe Ligges

regards,

/iaw

2011/7/2 Uwe Ligges:

On 02.07.2011 19:32, ivo welch wrote:

dear R experts---

I am experimenting with multicore processing, so far with pretty
disappointing results.  Here is my simple example:

A<- 10
randvalues<- abs(rnorm(A))
minfn<- function( x, i ) { log(abs(x))+x^3+i/A+randvalues[i] }  ## an
arbitrary function

ARGV<- commandArgs(trailingOnly=TRUE)

if (ARGV[1] == "do-onecore") {
library(foreach)
discard<- foreach(i = 1:A) %do% uniroot( minfn, c(1e-20,9e20), i ) }
else
if (ARGV[1] == "do-multicore") {
library(doMC)
registerDoMC()
cat("You have", getDoParWorkers(), "cores\n")
discard<- foreach(i = 1:A) %dopar% uniroot( minfn, c(1e-20,9e20), i ) }
else
if (ARGV[1] == "plain")
for (i in 1:A) discard<- uniroot( minfn, c(1e-20,9e20), i ) else
cat("sorry, but argument", ARGV[1], "is not
plain|do-onecore|do-multicore\n")

on my Mac Pro 3,1 (2 quad-cores), R 2.12.0, which reports 8 cores,

   "plain" takes about 68 seconds (real and user, using the unix timing
function).
   "do-onecore" takes about 300 seconds.
   "do-multicore" takes about 210 seconds real, (300 seconds user).

this seems pretty disappointing.  the cores are not used for the most
part, either.  feedback appreciated.

Feedback is that a single computation within your foreach loop is so quick
that the overhead of communicating data and results between processes costs
more time than the actual evaluation, hence you are faster with a single
process.

What you should do is:

write code that does, e.g., 1 iterations within 10 other iterations and
just do a foreach loop around the outer 10. Then you will probably be much
faster (without testing). But this is essentially the example I am using for
teaching to show when not to do parallel processing.

Best,
Uwe Ligges

/iaw

Ivo Welch (ivo.we...@gmail.com)

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] problem with corrgram function

2011-07-02 Thread Kevin Wright

This bug has been fixed.

Kevin


On Tue, Jun 28, 2011 at 6:11 AM, Niels Janssen  wrote:

> Dear list,
>
> I have a problem with the "corrgram" function. It does not seem to "color"
> large negative correlations, while the same correlation, if positive,
> provides no problems. Is this a bug?
>
>require(corrgram)
>a = seq(1,100)
>b = -jitter(seq(1,100), 80)
>cor(a,b) # r about -.96
>c=as.data.frame(cbind(a,b))
>corrgram(c, order=NULL, lower.panel=panel.pie,upper.**panel=NULL,
> text.panel=panel.txt) # no color
>
>c$b = -1*c$b # flip direction of correlation
>cor(c$a, c$b) # r now about +.96
>corrgram(c, order=NULL, lower.panel=panel.pie,upper.**panel=NULL,
> text.panel=panel.txt) #no problem with color.
>
> Thanks!
>
> __**
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/**listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/**
> posting-guide.html 
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] %dopar% parallel processing experiment

2011-07-02 Thread ivo welch

thank you, uwe.  this is a little disappointing.  parallel processing
for embarrassingly simple parallel operations--those needing no
communication---should be feasible if the thread is not always created
and released, but held.  is there light-weight parallel processing
that could facilitate this?

regards,

/iaw


2011/7/2 Uwe Ligges :
>
>
> On 02.07.2011 19:32, ivo welch wrote:
>>
>> dear R experts---
>>
>> I am experimenting with multicore processing, so far with pretty
>> disappointing results.  Here is my simple example:
>>
>> A<- 10
>> randvalues<- abs(rnorm(A))
>> minfn<- function( x, i ) { log(abs(x))+x^3+i/A+randvalues[i] }  ## an
>> arbitrary function
>>
>> ARGV<- commandArgs(trailingOnly=TRUE)
>>
>> if (ARGV[1] == "do-onecore") {
>>    library(foreach)
>>    discard<- foreach(i = 1:A) %do% uniroot( minfn, c(1e-20,9e20), i ) }
>> else
>> if (ARGV[1] == "do-multicore") {
>>    library(doMC)
>>    registerDoMC()
>>    cat("You have", getDoParWorkers(), "cores\n")
>>    discard<- foreach(i = 1:A) %dopar% uniroot( minfn, c(1e-20,9e20), i ) }
>> else
>> if (ARGV[1] == "plain")
>>    for (i in 1:A) discard<- uniroot( minfn, c(1e-20,9e20), i ) else
>> cat("sorry, but argument", ARGV[1], "is not
>> plain|do-onecore|do-multicore\n")
>>
>>
>> on my Mac Pro 3,1 (2 quad-cores), R 2.12.0, which reports 8 cores,
>>
>>   "plain" takes about 68 seconds (real and user, using the unix timing
>> function).
>>   "do-onecore" takes about 300 seconds.
>>   "do-multicore" takes about 210 seconds real, (300 seconds user).
>>
>> this seems pretty disappointing.  the cores are not used for the most
>> part, either.  feedback appreciated.
>
>
> Feedback is that a single computation within your foreach loop is so quick
> that the overhead of communicating data and results between processes costs
> more time than the actual evaluation, hence you are faster with a single
> process.
>
> What you should do is:
>
> write code that does, e.g., 1 iterations within 10 other iterations and
> just do a foreach loop around the outer 10. Then you will probably be much
> faster (without testing). But this is essentially the example I am using for
> teaching to show when not to do parallel processing.
>
> Best,
> Uwe Ligges
>
>
>
>
>
>
>> /iaw
>>
>>
>> 
>> Ivo Welch (ivo.we...@gmail.com)
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Vector of functions

2011-07-02 Thread Joshua Wiley

Hi Thomas,

On Sat, Jul 2, 2011 at 9:36 AM, Thomas Stibor  wrote:
> Hi there,
>
> I have a vector of some functions e.g.
> #-#
> f.1 <- function(a) {
>  return( a );
> }
> f.2 <- function(a) {
>  return( 1 - (tanh(a))^2 );
> }
> f.3 <- function(a) {
>  return( 1 / (1+exp(-a)) * (1 - (1 / (1+exp(-a );
> }
>
> func.l <- c(f.1, f.2, f.3);
> #-#
>
> and would like to calculate the output value of a function in the
> vector, that is, pick e.g. function at position 2 in func.l and calculate the
> output value for a=42.
>
> func.l[2](42); # gives error

Almost there, you just need to use a different extraction operator...

func.l[[2]](42)

compare the output of

func.l[2]
func.l[[2]]

the first one retains the list structure, but just returns a list with
the second element of func.l, the second one actually returns a
function.  You can check this by looking at the class

class(func.l[2])
class(func.l[[2]])

Cheers,

Josh

>
> f <- func.l[2];
> f(42); # gives error
>
> Is there an easy way to solve this problem?
>
> Cheers,
>  Thomas
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles
http://www.joshuawiley.com/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] %dopar% parallel processing experiment

2011-07-02 Thread Uwe Ligges




On 02.07.2011 19:32, ivo welch wrote:

dear R experts---

I am experimenting with multicore processing, so far with pretty
disappointing results.  Here is my simple example:

A<- 10
randvalues<- abs(rnorm(A))
minfn<- function( x, i ) { log(abs(x))+x^3+i/A+randvalues[i] }  ## an
arbitrary function

ARGV<- commandArgs(trailingOnly=TRUE)

if (ARGV[1] == "do-onecore") {
library(foreach)
discard<- foreach(i = 1:A) %do% uniroot( minfn, c(1e-20,9e20), i ) } else
if (ARGV[1] == "do-multicore") {
library(doMC)
registerDoMC()
cat("You have", getDoParWorkers(), "cores\n")
discard<- foreach(i = 1:A) %dopar% uniroot( minfn, c(1e-20,9e20), i ) } else
if (ARGV[1] == "plain")
for (i in 1:A) discard<- uniroot( minfn, c(1e-20,9e20), i ) else
cat("sorry, but argument", ARGV[1], "is not plain|do-onecore|do-multicore\n")


on my Mac Pro 3,1 (2 quad-cores), R 2.12.0, which reports 8 cores,

   "plain" takes about 68 seconds (real and user, using the unix timing
function).
   "do-onecore" takes about 300 seconds.
   "do-multicore" takes about 210 seconds real, (300 seconds user).

this seems pretty disappointing.  the cores are not used for the most
part, either.  feedback appreciated.



Feedback is that a single computation within your foreach loop is so 
quick that the overhead of communicating data and results between 
processes costs more time than the actual evaluation, hence you are 
faster with a single process.


What you should do is:

write code that does, e.g., 1 iterations within 10 other iterations 
and just do a foreach loop around the outer 10. Then you will probably 
be much faster (without testing). But this is essentially the example I 
am using for teaching to show when not to do parallel processing.


Best,
Uwe Ligges







/iaw



Ivo Welch (ivo.we...@gmail.com)

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Vector of functions

2011-07-02 Thread Thomas Stibor

Hi there,

I have a vector of some functions e.g.
#-#
f.1 <- function(a) {
  return( a );
}
f.2 <- function(a) {
  return( 1 - (tanh(a))^2 );
}
f.3 <- function(a) {
  return( 1 / (1+exp(-a)) * (1 - (1 / (1+exp(-a );
}

func.l <- c(f.1, f.2, f.3); 
#-#

and would like to calculate the output value of a function in the
vector, that is, pick e.g. function at position 2 in func.l and calculate the
output value for a=42.

func.l[2](42); # gives error

f <- func.l[2];
f(42); # gives error

Is there an easy way to solve this problem?

Cheers,
 Thomas

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] comparing hazard ratios

2011-07-02 Thread Brian Tsai

hi,

I'm looking for a package to compare two hazard ratios (and assign
statistical significance) obtained from two different predictive models.  I
know of the hr.comp2 function from the survcomp package, but was wondering
if there's any other packages out there.

thanks!

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] (no subject)

2011-07-02 Thread James Tripp


On 02/07/2011 14:03, Shantal Bonnick wrote:

Hi,

If i want to repeat a function, say 100 times, how do i do that? i used the for
loop but there's an error somewhere.

Thanks
shan
[[alternative HTML version deleted]]



Hi Shan,

Could you post the code you're using? Then we can figure out where the 
error is coming from.


Best,

James

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Error when using plot in diag.panel argument of pairs

2011-07-02 Thread Patrick Großmann

Dear Mr. Ligges,

 my apologies! As I can see in your E-Mail the quote of mine is really  just 
unreadable! Anyhow, this was not produced by me writing messy  messages, this 
is due to the Java application behind the client not  being able to cope with 
external tab-characters on my browser when  parsing the message while sending(I 
tipped the message on an external  editor because of the code lines and copy 
pasted it). Of course I also  never would add any html files!

 Anyhow, I send you my original message again (new = True did not help  me. 
Assuming I used it the way you suggested it produced the same  error). I send 
it to myself first, so I am sure, this mess won't happen  again!

 My previous Message:

 #
 Dear Madame or Sir,

 I am having a problem in combining density-smoothed scatterplot matrices  with 
a plot of kernel destiny estimations of each dimension plotted on  the 
respective diagonals.

 I have tried following approach using the package "sm" for the kernel density 
estimation, as well as "MASS" respectively:

 pairs
 (myTable[, 1:4],
 panel=function(x,y, ...)
 {  
 xy=kde2d(x,y) 
 image(xy, add=TRUE)
 }
 ,diag.panel = function(x,...)
 {
 plot(density(x,...), add = TRUE)
 #boxplot(x, add = TRUE)
 }
 )

 where myTable is an ordinary table I am using the first four columns of.  
Unfortunately this produces an error stating that plot has created a  new plot. 
I suppose this is the same situation when forgetting "add  =  TRUE" with 
plotting functions such as boxplot (as in my code used with  "boxplot(x, add = 
TRUE)" ) or any other (e.g. smoothScatter). Analyzing  the partial plot,  which 
can be seen before R rejects my code, I come to  the conclusion that add might 
not be a valid argument for "plot".
  My questions therefore is, whether there is an option to plot the  result of 
the 1d kernel density estimation to the diagonal of the  scatterplot produced 
by pairs or which plot function is valid in this  case. So basically I would 
like to generate the same output I am able to  produce with boxplots on the 
diagonal with a density estimation.

 Additional question: I bet there is an opportunity to include both the  
boxplot and the kernel density estimation on each field of the diagonal  (as I 
saw before in a published paper), but I just could not figure out  how to 
"overdraw". Maybe you could give me a slight idea (reference  would be fine, 
too) on how to achieve this. I would be highly thankful!

 Kind regards and many thanks in advance,
 -Patrick Grossman

 ##

 Please excuse the inconvenience and note my thanks.

 Warmest regards,
 -Patrick Grossman

- Ursprüngliche Nachricht -
Von: Uwe Ligges 
Datum: Samstag, 2. Juli 2011, 16:36
Betreff: Re: [R] Error when using plot in diag.panel argument of pairs
An: Patrick Großmann 
Cc: r-help@r-project.org

> If you send a better readable message with really reproducible 
> code, we 
> may provide a solution, this way it is really hard to read your 
> code. 
> And yes, please do not send html mail. My guess is that 
> par(new=TRUE) 
> will help, but I do not want to construct the examples at first.
> 
> 
> 
> 
> On 02.07.2011 14:37, "Patrick Großmann" wrote:
> >   Dear Madame or Sir,I am having a problem in 
> combining density-smoothed scatterplot matrices with a plot of 
> kernel destiny estimations of each dimension plotted on the 
> respective field of the diagonal.I have tried following approach 
> using the package "sm" for the kernel density estimation, as 
> well as "MASS" respectively:pairs(myTable[, 
> 1:4],panel=function(x,y, ...){ 
> xy=kde2d(x,y) image(xy, 
> add=TRUE)},diag.panel = function(x,...){plot(density(x), add = 
> TRUE)#boxplot(x, add = TRUE)})where myTable is an ordinary table 
> I am using the first four columns of. Unfortunately this 
> produces an error stating that plot has created a new plot. I 
> suppose this is the same situation when forgetting "add  = 
> TRUE" with plotting functions such as boxplot (as in my code 
> used with "boxplot(x, add = TRUE)" ) or any other (e.g. 
> smoothScatter). Analyzing the partial plot,  which can be 
> seen before R rejects my code, I come to the conclusion that add 
> might not be a valid argument for "plot
> ". My questions therefore is, whether there is an option to plot 
> the result of the 1d kernel density estimation to the diagonal 
> of the scatterplot produced by pairs or which plot function is 
> valid in this case. So basically I would like to generate the 
> same output I am able to produce with boxplots on the diagonal 
> with a density estimation.Additional question: I bet there is an 
> opportunity to include both the boxplot and the kernel density 
> estimation on each field of the diagonal (as I saw before in a 
> published paper), but I just could not figure out how to 
> "overdraw". Maybe you could give me a slight idea (reference 
> would be fine, t

Re: [R] For help in R coding

2011-07-02 Thread Bansal, Vikas



>> Dear all,
>>
>> I am doing a project on variant calling using R.I am working on
>> pileup file.There are 10 columns in my data frame and I want to
>> count the number of A,C,G and T in each row for column 9.example of
>> column 9 is given below-
>>
>>   .a,g,,
>>   .t,t,,
>>   .,c,c,
>>   .,a,,,
>>   .,t,t,t
>>   .c,,g,^!.
>>   .g,ggg.^!,
>>   .$,.,
>>   a,g,,t,
>>   ,.,^!.
>>   ,$.,.
>>
>> This is a bit confusing for me as these characters are in one column
>> and how can we scan them for each row to print number of A,C,G and T
>> for each row.
>
> Seems a bit clunky but this does the job (first the data):
>> txt <- " .a,g,,
> +.t,t,,
> +.,c,c,
> +.,a,,,
> +.,t,t,t
> +.c,,g,^!.
> +.g,ggg.^!,
> +.$,.,
> +a,g,,t,
> +,.,^!.
> +,$.,."
>
>> txtvec <- readLines(textConnection(txt))
>
> Now the clunky solution, Basically subtracts 1 from the counts of
> "fragments" that result from splitting on each letter in turn. Could
> be made prettier with a function that did the job.
>
>> data.frame(A = unlist(lapply( lapply( sapply(txtvec, strsplit,
> split="a"), length) , "-", 1)),
> + C = unlist(lapply( lapply( sapply(txtvec, strsplit, split="c"),
> length) , "-", 1)),
> + G = unlist(lapply( lapply( sapply(txtvec, strsplit, split="g"),
> length) , "-", 1)),
> + T = unlist(lapply( lapply( sapply(txtvec, strsplit, split="t"),
> length) , "-", 1)) )
>   A C G T
>  .a,g,,   1 0 1 0
>.t,t,, 0 0 0 2
>.,c,c, 0 2 0 0
>.,a,,, 1 0 0 0
>.,t,t,t0 0 0 2
>.c,,g,^!.  0 1 1 0
>.g,ggg.^!, 0 0 4 0
>.$,.,  0 0 0 0
>a,g,,t,1 0 1 1
>,.,^!. 0 0 0 0
>,$.,.  0 0 0 0
>
> Has the advantage that the input data ends up as rownames, which was a
> surprise.
>
> If you wanted to count "A" and "a" as equivalent, then the split
> argument should be "a|A"
>
>

>>AS YOU MENTIONED THAT IF I WANT TO COUNT A AND a I SHOULD SPLIT LIKE THIS.
BUT CAN I COUNT . AND , ALSO USING-
data.frame(A = unlist(lapply( lapply( sapply(txtvec, strsplit,
split=".|,"), length) , "-", 1)),

I TRIED IT BUT ITS NOT WORKING.IT IS GIVING THE OUTPUT BUT AT SOME PLACES IT IS 
SHOWING MORE NUMBER OF . AND , AND SOMEWHERE IT IS NOT EVEN CALCULATING AND 
JUST SHOWING 0.
>>
>>
>> Thanking you,
>> Warm Regards
>> Vikas Bansal
>> Msc Bioinformatics
>> Kings College London
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> David Winsemius, MD
> West Hartford, CT
>
>
>
>
>
>

David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] %dopar% parallel processing experiment

2011-07-02 Thread ivo welch

dear R experts---

I am experimenting with multicore processing, so far with pretty
disappointing results.  Here is my simple example:

A <- 10
randvalues <- abs(rnorm(A))
minfn <- function( x, i ) { log(abs(x))+x^3+i/A+randvalues[i] }  ## an
arbitrary function

ARGV <- commandArgs(trailingOnly=TRUE)

if (ARGV[1] == "do-onecore") {
   library(foreach)
   discard <- foreach(i = 1:A) %do% uniroot( minfn, c(1e-20,9e20), i ) } else
if (ARGV[1] == "do-multicore") {
   library(doMC)
   registerDoMC()
   cat("You have", getDoParWorkers(), "cores\n")
   discard <- foreach(i = 1:A) %dopar% uniroot( minfn, c(1e-20,9e20), i ) } else
if (ARGV[1] == "plain")
   for (i in 1:A) discard <- uniroot( minfn, c(1e-20,9e20), i ) else
cat("sorry, but argument", ARGV[1], "is not plain|do-onecore|do-multicore\n")


on my Mac Pro 3,1 (2 quad-cores), R 2.12.0, which reports 8 cores,

  "plain" takes about 68 seconds (real and user, using the unix timing
function).
  "do-onecore" takes about 300 seconds.
  "do-multicore" takes about 210 seconds real, (300 seconds user).

this seems pretty disappointing.  the cores are not used for the most
part, either.  feedback appreciated.

/iaw



Ivo Welch (ivo.we...@gmail.com)

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] (no subject)

2011-07-02 Thread David Winsemius



On Jul 2, 2011, at 10:15 AM, Ista Zahn wrote:


Please read the posting guide. "there's an error somewhere" is simply
not enough to go on.



True enough, but under the hypothesis that this might be some sort of  
simulation task, I would also suggest looking at:


?replicate



Best,
Ista

On Sat, Jul 2, 2011 at 9:03 AM, Shantal Bonnick  
 wrote:

Hi,

If i want to repeat a function, say 100 times, how do i do that? i  
used the for

loop but there's an error somewhere.

Thanks
shan
   [[alternative HTML version deleted]]


Learning to post plain text with the yahoo interface should be added  
to shan's TO-DO list.


--

David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] For help in R coding

2011-07-02 Thread David Winsemius



On Jul 2, 2011, at 12:34 PM, Bansal, Vikas wrote:





Dear all,

I am doing a project on variant calling using R.I am working on
pileup file.There are 10 columns in my data frame and I want to
count the number of A,C,G and T in each row for column 9.example of
column 9 is given below-

 .a,g,,
 .t,t,,
 .,c,c,
 .,a,,,
 .,t,t,t
 .c,,g,^!.
 .g,ggg.^!,
 .$,.,
 a,g,,t,
 ,.,^!.
 ,$.,.

This is a bit confusing for me as these characters are in one column
and how can we scan them for each row to print number of A,C,G and T
for each row.


Seems a bit clunky but this does the job (first the data):

txt <- " .a,g,,

+.t,t,,
+.,c,c,
+.,a,,,
+.,t,t,t
+.c,,g,^!.
+.g,ggg.^!,
+.$,.,
+a,g,,t,
+,.,^!.
+,$.,."


txtvec <- readLines(textConnection(txt))


Now the clunky solution, Basically subtracts 1 from the counts of
"fragments" that result from splitting on each letter in turn. Could
be made prettier with a function that did the job.


data.frame(A = unlist(lapply( lapply( sapply(txtvec, strsplit,

split="a"), length) , "-", 1)),
+ C = unlist(lapply( lapply( sapply(txtvec, strsplit, split="c"),
length) , "-", 1)),
+ G = unlist(lapply( lapply( sapply(txtvec, strsplit, split="g"),
length) , "-", 1)),
+ T = unlist(lapply( lapply( sapply(txtvec, strsplit, split="t"),
length) , "-", 1)) )
 A C G T
.a,g,,   1 0 1 0
  .t,t,, 0 0 0 2
  .,c,c, 0 2 0 0
  .,a,,, 1 0 0 0
  .,t,t,t0 0 0 2
  .c,,g,^!.  0 1 1 0
  .g,ggg.^!, 0 0 4 0
  .$,.,  0 0 0 0
  a,g,,t,1 0 1 1
  ,.,^!. 0 0 0 0
  ,$.,.  0 0 0 0

Has the advantage that the input data ends up as rownames, which  
was a

surprise.

If you wanted to count "A" and "a" as equivalent, then the split
argument should be "a|A"




AS YOU MENTIONED THAT IF I WANT TO COUNT A AND a I SHOULD SPLIT  
LIKE THIS.

BUT CAN I COUNT . AND , ALSO USING-
data.frame(A = unlist(lapply( lapply( sapply(txtvec, strsplit,
split=".|,"), length) , "-", 1)),

I TRIED IT BUT ITS NOT WORKING.IT IS GIVING THE OUTPUT BUT AT SOME  
PLACES IT IS SHOWING MORE NUMBER OF . AND , AND SOMEWHERE IT IS NOT  
EVEN CALCULATING AND JUST SHOWING 0.


You need to use valid regex expressions for 'split'. Since "." and ","  
are special characters they need to be escaped when you wnat the  
literals to be recognized as such.


I haven't figured out why but you need to drop the final operation of  
subtracting 1 from the values when counting commas:


data.frame(periods = unlist(lapply( lapply( sapply(txtvec, strsplit,
 split="\\."), length) , "-", 1))
 ,commas = unlist( lapply( sapply(txtvec, strsplit,
 split="\\,"), length) ) )
   periods commas
 .a,g,,  1  3
.t,t,,   1  3
.,c,c,   1  3
.,a,,,   1  4
.,t,t,t  1  4
.c,,g,^!.1  4
.g,ggg.^!,   2  2
.$,.,2  6
a,g,,t,  0  4
,.,^!.   1  7
,$.,.1  7

--

David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Access only part of last dimension of table/matrix

2011-07-02 Thread Marc Schwartz

On Jul 2, 2011, at 5:39 AM, peter dalgaard wrote:

> 
> On Jul 2, 2011, at 03:29 , Marc Schwartz wrote:
> 
>> It actually scared me that I had any recollection of an isolated post from 
>> 10 years ago. Not sure what to make of that...
> 
> If it is any consolation, the author had forgotten it entirely

LOL Peter! Thanks!

Regards,

Marc

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Error when using plot in diag.panel argument of pairs

2011-07-02 Thread Uwe Ligges


OK, so let me make it reproducible and start with

myTable <- iris
require("MASS")


On 02.07.2011 17:09, "Patrick Großmann" wrote:

Dear Mr. Ligges,

  my apologies! As I can see in your E-Mail the quote of mine is really  just 
unreadable! Anyhow, this was not produced by me writing messy  messages, this 
is due to the Java application behind the client not  being able to cope with 
external tab-characters on my browser when  parsing the message while sending(I 
tipped the message on an external  editor because of the code lines and copy 
pasted it). Of course I also  never would add any html files!

  Anyhow, I send you my original message again (new = True did not help  me. 
Assuming I used it the way you suggested it produced the same  error). I send 
it to myself first, so I am sure, this mess won't happen  again!

  My previous Message:

  #
  Dear Madame or Sir,

  I am having a problem in combining density-smoothed scatterplot matrices  
with a plot of kernel destiny estimations of each dimension plotted on  the 
respective diagonals.

  I have tried following approach using the package "sm" for the kernel density 
estimation, as well as "MASS" respectively:

  pairs
  (myTable[, 1:4],
  panel=function(x,y, ...)
  {
  xy=kde2d(x,y)
  image(xy, add=TRUE)
  }
  ,diag.panel = function(x,...)
  {
  plot(density(x,...), add = TRUE)
  #boxplot(x, add = TRUE)
  }
  )



Now we could go with:

pairs(myTable[, 1:4],
panel=function(x,y, ...)
{
xy <- kde2d(x,y)
image(xy, add=TRUE)
},
diag.panel = function(x,...)
{
pu <- par("usr")
d <- density(x,...)
par("usr" = c(pu[1:2], 0, max(d$y)*1.5))
lines(d)
par("usr" = c(pu[1:2], 0, 1))
boxplot(x, at=0.5, boxwex=0.3, horizontal=TRUE, add=TRUE)
 }
)


Uwe Ligges




  where myTable is an ordinary table I am using the first four columns of.  Unfortunately this produces an 
error stating that plot has created a  new plot. I suppose this is the same situation when forgetting 
"add  =  TRUE" with plotting functions such as boxplot (as in my code used with  "boxplot(x, 
add = TRUE)" ) or any other (e.g. smoothScatter). Analyzing  the partial plot,  which can be seen before 
R rejects my code, I come to  the conclusion that add might not be a valid argument for "plot".
   My questions therefore is, whether there is an option to plot the  result of 
the 1d kernel density estimation to the diagonal of the  scatterplot produced 
by pairs or which plot function is valid in this  case. So basically I would 
like to generate the same output I am able to  produce with boxplots on the 
diagonal with a density estimation.

  Additional question: I bet there is an opportunity to include both the  boxplot and the 
kernel density estimation on each field of the diagonal  (as I saw before in a published 
paper), but I just could not figure out  how to "overdraw". Maybe you could 
give me a slight idea (reference  would be fine, too) on how to achieve this. I would be 
highly thankful!


  Kind regards and many thanks in advance,
  -Patrick Grossman

  ##

  Please excuse the inconvenience and note my thanks.

  Warmest regards,
  -Patrick Grossman

- Ursprüngliche Nachricht -
Von: Uwe Ligges
Datum: Samstag, 2. Juli 2011, 16:36
Betreff: Re: [R] Error when using plot in diag.panel argument of pairs
An: Patrick Großmann
Cc: r-help@r-project.org


If you send a better readable message with really reproducible
code, we
may provide a solution, this way it is really hard to read your
code.
And yes, please do not send html mail. My guess is that
par(new=TRUE)
will help, but I do not want to construct the examples at first.




On 02.07.2011 14:37, "Patrick Großmann" wrote:

   Dear Madame or Sir,I am having a problem in

combining density-smoothed scatterplot matrices with a plot of
kernel destiny estimations of each dimension plotted on the
respective field of the diagonal.I have tried following approach
using the package "sm" for the kernel density estimation, as
well as "MASS" respectively:pairs(myTable[,
1:4],panel=function(x,y, ...){
xy=kde2d(x,y) image(xy,
add=TRUE)},diag.panel = function(x,...){plot(density(x), add =
TRUE)#boxplot(x, add = TRUE)})where myTable is an ordinary table
I am using the first four columns of. Unfortunately this
produces an error stating that plot has created a new plot. I
suppose this is the same situation when forgetting "add  =
TRUE" with plotting functions such as boxplot (as in my code
used with "boxplot(x, add = TRUE)" ) or any other (e.g.
smoothScatter). Analyzing the partial plot,  which can be
seen before R rejects my code, I come to the conclusion that add
might not be a valid argument for "plot
". My questions therefore is, whether there is an option to plot
the result of the 1d kernel density estimation to the diagonal
of the scatterplot produced by pairs or which plot functio

Re: [R] Error when using plot in diag.panel argument of pairs

2011-07-02 Thread Uwe Ligges

If you send a better readable message with really reproducible code, we 
may provide a solution, this way it is really hard to read your code. 
And yes, please do not send html mail. My guess is that par(new=TRUE) 
will help, but I do not want to construct the examples at first.





On 02.07.2011 14:37, "Patrick Großmann" wrote:

  Dear Madame or Sir,I am having a problem in combining density-smoothed scatterplot matrices with a plot of kernel destiny 
estimations of each dimension plotted on the respective field of the diagonal.I have tried following approach using the 
package "sm" for the kernel density estimation, as well as "MASS" respectively:pairs(myTable[, 
1:4],panel=function(x,y, ...){ xy=kde2d(x,y) image(xy, add=TRUE)},diag.panel = function(x,...){plot(density(x), add 
= TRUE)#boxplot(x, add = TRUE)})where myTable is an ordinary table I am using the first four columns of. Unfortunately this 
produces an error stating that plot has created a new plot. I suppose this is the same situation when forgetting "add  
= TRUE" with plotting functions such as boxplot (as in my code used with "boxplot(x, add = TRUE)" ) or any 
other (e.g. smoothScatter). Analyzing the partial plot,  which can be seen before R rejects my code, I come to the 
conclusion that add might not be a valid argument for "plot

". My questions therefore is, whether there is an option to plot the result of the 1d 
kernel density estimation to the diagonal of the scatterplot produced by pairs or which plot 
function is valid in this case. So basically I would like to generate the same output I am 
able to produce with boxplots on the diagonal with a density estimation.Additional question: I 
bet there is an opportunity to include both the boxplot and the kernel density estimation on 
each field of the diagonal (as I saw before in a published paper), but I just could not figure 
out how to "overdraw". Maybe you could give me a slight idea (reference would be 
fine, too) on how to achieve this. I would be highly thankful!Kind regards and many thanks in 
advance,-Patrick Grossman --

Patrick Großmann - HiWi zur Netzwerkdokumentation
Network Operation CenterNOC
HochschulrechenzentrumHRZUniversität Bielefeld

Büro:
UHG V0-251
Telefon:
+49 521 106-12618
Fax:
+49 521 106-2969
Telefon Sekretariat:
+49 521 106-4951

http://www.uni-bielefeld.de/hrz/



[[alternative HTML version deleted]]




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Need help with my R- Project

2011-07-02 Thread Uwe Ligges




On 02.07.2011 13:59, akaebi wrote:

What happens to the correlation when i dichotomize the variables? ( i need a
function for R)

What happens if the correlation is always larger or smaller? What happens to
the median? (also a function is needed)

What happens if the distribution is not normal?

I need point clouds!

and last but not least...

what happens if you change the scattering and the normal distribution


I am looking forward to answers! many thanks! I'm just really desperate


In principle, we do not answer homework questions on this list. If you 
need help and cannot solve the problems yourself, the first to ask is 
your supervisor.


Uwe Ligges





--
View this message in context: 
http://r.789695.n4.nabble.com/Need-help-with-my-R-Project-tp3640419p3640419.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] (no subject)

2011-07-02 Thread Jeff Newmiller

Your question needs help. Seriously.

A) No subject on the Subject line. This makes it difficult to keep track of the 
replies as the discussion proceeds.
B) no example code. This makes it difficult to respond directly, since there is 
more than one way to "use the for loop".
C) No mention of the specific error. Errors can be related to syntax or 
semantics, or may be due to a misunderstanding of the correct answer.
D) No explanation of your intent. Sometimes the best way to compute multiple 
results is to avoid a for loop entirely, but we need to know what you are 
trying to accomplish.
E) No mention of your computing environment or its configuration. Sometimes the 
error is related to a particular version of R, and there is a specific easy way 
to ask R for that information.

Please follow the link at the bottom of any R-help list message and learn how 
to ask a question that can be answered online.
---
Jeff Newmiller The . . Go Live...
DCN: Basics: ##.#. ##.#. Live Go...
Live: OO#.. Dead: OO#.. Playing
Research Engineer (Solar/Batteries O.O#. #.O#. with
/Software/Embedded Controllers) .OO#. .OO#. rocks...1k
--- 
Sent from my phone. Please excuse my brevity.

Shantal Bonnick  wrote:

Hi,

If i want to repeat a function, say 100 times, how do i do that? i used the for 
loop but there's an error somewhere.

Thanks
shan
[[alternative HTML version deleted]]

_

R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] (no subject)

2011-07-02 Thread Ista Zahn

Please read the posting guide. "there's an error somewhere" is simply
not enough to go on.

Best,
Ista

On Sat, Jul 2, 2011 at 9:03 AM, Shantal Bonnick  wrote:
> Hi,
>
> If i want to repeat a function, say 100 times, how do i do that? i used the 
> for
> loop but there's an error somewhere.
>
> Thanks
> shan
>        [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
Ista Zahn
Graduate student
University of Rochester
Department of Clinical and Social Psychology
http://yourpsyche.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] (no subject)

2011-07-02 Thread Shantal Bonnick

Hi,

If i want to repeat a function, say 100 times, how do i do that? i used the for 
loop but there's an error somewhere.

Thanks
shan
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] SNOW libraries/functions, rGenoud

2011-07-02 Thread Lui ##

Hello Uwe,

thanks a lot!

That solved the "problem"

Have a nice weekend!
Viele  Grüße

Lui

2011/7/1 Uwe Ligges :
> See ?clusterEvalQ
>
> Uwe Ligges
>
> On 01.07.2011 16:11, Lui ## wrote:
>>
>> Dear group,
>>
>> does anybody know how to export libraries/functions to all nodes when
>> launching snow? I want to use a function from fBasics (dstable) for a
>> rGenoud optimization routine, but I fail "making the function
>> accessible" to the nodes created. I know how it works for variables, I
>> know how it works in snowfall(which cant be used in that case), but I
>> dont know how it culd work in snow.
>>
>> Help appreciated!
>>
>> Lui
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Error when using plot in diag.panel argument of pairs

2011-07-02 Thread Patrick Großmann

 Dear Madame or Sir,I am having a problem in combining density-smoothed 
scatterplot matrices with a plot of kernel destiny estimations of each 
dimension plotted on the respective field of the diagonal.I have tried 
following approach using the package "sm" for the kernel density estimation, as 
well as "MASS" respectively:pairs(myTable[, 1:4],panel=function(x,y, ...){ 
xy=kde2d(x,y) image(xy, add=TRUE)},diag.panel = 
function(x,...){plot(density(x), add = TRUE)#boxplot(x, add = TRUE)})where 
myTable is an ordinary table I am using the first four columns of. 
Unfortunately this produces an error stating that plot has created a new plot. 
I suppose this is the same situation when forgetting "add  = TRUE" with 
plotting functions such as boxplot (as in my code used with "boxplot(x, add = 
TRUE)" ) or any other (e.g. smoothScatter). Analyzing the partial plot,  which 
can be seen before R rejects my code, I come to the conclusion that add might 
not be a valid argument for "plot". My questions therefore is, whether there is 
an option to plot the result of the 1d kernel density estimation to the 
diagonal of the scatterplot produced by pairs or which plot function is valid 
in this case. So basically I would like to generate the same output I am able 
to produce with boxplots on the diagonal with a density estimation.Additional 
question: I bet there is an opportunity to include both the boxplot and the 
kernel density estimation on each field of the diagonal (as I saw before in a 
published paper), but I just could not figure out how to "overdraw". Maybe you 
could give me a slight idea (reference would be fine, too) on how to achieve 
this. I would be highly thankful!Kind regards and many thanks in 
advance,-Patrick Grossman --
Patrick Großmann - HiWi zur Netzwerkdokumentation
Network Operation Center    NOC
Hochschulrechenzentrum    HRZ    Universität Bielefeld

Büro:
UHG V0-251
Telefon:
+49 521 106-12618
Fax:
+49 521 106-2969 
Telefon Sekretariat:
+49 521 106-4951

http://www.uni-bielefeld.de/hrz/



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Need help with my R- Project

2011-07-02 Thread akaebi

What happens to the correlation when i dichotomize the variables? ( i need a
function for R)

What happens if the correlation is always larger or smaller? What happens to
the median? (also a function is needed)

What happens if the distribution is not normal?

I need point clouds!

and last but not least...

what happens if you change the scattering and the normal distribution


I am looking forward to answers! many thanks! I'm just really desperate


--
View this message in context: 
http://r.789695.n4.nabble.com/Need-help-with-my-R-Project-tp3640419p3640419.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] The test of randomized slopes(intercepts)

2011-07-02 Thread 孟欣

Hi all:
I perform the linear mixed model for 300 persons, y is CD4 count,x is time.
 
I randomized slope and intercept,so I can get 300 slopes and 300 intercepts.Now 
I wanna test wheter the variance of 300 slopes and 300 intercepts differs from 
zero. If the variance of 300 slopes(or intercepts) differs from zero at 0.05 
significant level,I should randomize the slope(or intercept), and if not,I 
should fix the slope(or intercept).
But I can't find out which statistical test should be used to test whether the 
variance of 300 slopes(or intercepts) differs from zero,which need your help.
 
Thanks a lot!
My best.
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Plot error in package lme4

2011-07-02 Thread Ben Bolker

Sara Krause  gmail.com> writes:

> 
> Hi,
> 
> I am new to R and not fantastic at statistics so it may well be that I am
> doing something silly but I can't figure out what it is and hoping that
> somebody can help.
> 
> I am running package lme4, and trying to get a Residuals vs. Fitted graph.
>  When I try to plot, I receive an error.
> 
> Error in as.double(y) :
>   cannot coerce type 'S4' to vector of type 'double'
> 
> Here is the code I am using
> 
> library("lme4")
> 
> m1<-lmer(y~trt*time*loc_block*Season+(1|ID), data=d)
> 
> plot(m1,add.smooth=F,which=1)
> 
> Error in as.double(y) :
>   cannot coerce type 'S4' to vector of type 'double'
> 
> I searched the forums for some answers but haven't found anything
> useful.  Any insight into what I am doing wrong and how to fix it.
> 
> Sara

   The standard diagnostics which you have seen in 'lm' are not
implemented for 'lme4'.   You have to do it yourself, but it's
not *too* hard:

library(lme4)

## from ?lmer
fm1 <- lmer(Reaction ~ Days + (Days|Subject), sleepstudy)
f <- fitted(fm1)
r <- residuals(fm1)
plot(f,r)
sm <- loess(r~f)
v <- seq(min(f),max(f),length=101)
lines(v,predict(sm,data.frame(f=v)),col=2)

Further questions about lme4 should probably go to the
r-sig-mixed-models mailing list.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Multilevel Survival Analysis - Cox PH Model

2011-07-02 Thread dunner

There is indeed right censoring, but I obviously didn't explain it very well. 

Patients are either fully oriented or not (1 or 2) after an hour. If they're
not, then the data is right censored.

However, I don't feel that "coxme" is overkill at all, as I may also have to
account for repeated COURSES of treatments on the same individiual, so the
data would be  structured as follows: TREATMENTS repeated within COURSES
repeated within PATIENTS.

However, I may need a little help later specifying the model technically in
R.

What would the random effects bit of the formula look like? At   the moment
I have

mycoxme<-coxme(mysurv~SEX + (1|MRN))

for example.

would a correct specification for the nested data be:

mycoxme<-coxme(mysurv~SEX + (1|MRN|COURSE))

Then of course we have the added issue that treatments are ordered so is
there a "Frailty" model that can account for that I wonder? 


Thanks for all your help.

Ross

--
View this message in context: 
http://r.789695.n4.nabble.com/Multilevel-Survival-Analysis-Cox-PH-Model-tp3638278p3640352.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Access only part of last dimension of table/matrix

2011-07-02 Thread peter dalgaard

On Jul 2, 2011, at 03:29 , Marc Schwartz wrote:

> It actually scared me that I had any recollection of an isolated post from 10 
> years ago. Not sure what to make of that...

If it is any consolation, the author had forgotten it entirely

-- 
Peter Dalgaard
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd@cbs.dk  Priv: pda...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Plot error in package lme4

2011-07-02 Thread Sara Krause

Hi,

I am new to R and not fantastic at statistics so it may well be that I am
doing something silly but I can't figure out what it is and hoping that
somebody can help.

I am running package lme4, and trying to get a Residuals vs. Fitted graph.
 When I try to plot, I receive an error.

Error in as.double(y) :
  cannot coerce type 'S4' to vector of type 'double'


Here is the code I am using

library("lme4")

m1<-lmer(y~trt*time*loc_block*Season+(1|ID), data=d)

plot(m1,add.smooth=F,which=1)


Error in as.double(y) :
  cannot coerce type 'S4' to vector of type 'double'


I searched the forums for some answers but haven't found anything
useful.  Any insight into what I am doing wrong and how to fix it.


Sara

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Access only part of last dimension of table/matrix

2011-07-02 Thread Jim Lemon


On 07/02/2011 07:50 AM, David Winsemius wrote:


I would like to do some operations inside a function using only one
value for the last dimension of a table/matrix:

tabfn <- function (dfrm, facvec, YN ="event"){
return( Etbl <- do.call(table, dfrm[ , c(facvec, "event") ]) )
# just want Etbl[,,,"TRUE"] or Etbl[,, "TRUE"] or Etbl[,"TRUE"]
}
tbl <- tabfn(testdf, c("x", "y") )
tbl # all value of event returned

At the console it is easy for me to count the number of factors and use
the right number of commas

tbl[ , , "TRUE"] if I only want the slice with that value. How can I do
this programmatically?



Hi David,
I used to do something like this before completely reprogramming the 
barNest function.


# set the first dimension (can be the last also) to a number "stat"
sliceargs[[1]] <- x[[stat]]
# for the rest of the dimensions of this array, tack on a comma
for (arg in 2:ndim) sliceargs[[arg]] <- TRUE
 sliceargs[[ndim + 1]] <- slice
# "slice" the array
xslice[[stat]] <- do.call("[", sliceargs)

To get the last dimension, just tack commas together for 1:(ndim-1) and 
then add the number at the end.

I think Bill Dunlap gave me this solution when I was struggling with it.

Jim

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

54 matches

Mail list logo