[R] Performance of graph and igraph package

2011-02-04 Thread William Tu
Dear R users,

I'm using graph library to create a mesh-like network topology and
implement a load balance routing algorithm. The current implementation
uses graph, RBGL, and Rgraphviz libraries. I have a few attributes on
every edge to represent the network loading and capacity, and I
frequently update these values. However, I found it works so slow and
right now I'm consider rewriting it using C or Python.

Then I found another library, which is "igraph". I made a performance
comparison between "graph" and "igraph" as below. For both graph, I
create two attributes for each edge and set/get the value of all edges
and measure the total elapsed time.
A. using graph package:
108 nodes, 832 edges: 17 sec
140 nodes, 4800 edges: 576 sec

B. using igraph package:
100 nodes, 4950 edges: 4 sec
200 nodes, 19900 edges: 111 sec

igraph is much much faster than graph! Is this reasonable? or I did
something wrong?
ps. I'm using 2 core 3GHz CPU with 2G ram.

Thanks in advance,
William

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] uniroot

2011-02-04 Thread Petr Savicky
On Fri, Feb 04, 2011 at 04:35:00AM -0800, dpender wrote:
> 
> Hi,
> 
> I am using the uniroot function in order to carry out a bivariate Monte
> Carlo simulation using the logistics model.
> 
> I have defined the function as:
> 
> BV.FV <- function(x,y,a,A)
> (((x^(-a^-1)+y^(-a^-1))^(a-1))*(y^(a-1/a))*(exp(-((1^(-a^-1)+y^(-a^-1))^a)+y^-1)))-A
> 
> and the procedure is as follows:
> 
> Randomly generate values of A~(0,1), y0 = -(lnA)^-1
> 
> Where: A=Pr{X 
> Use y0 to determine x where x = x(i) and y = y0(i-1) in the above equation.
> 
> Unfortunately when the randomly defined A gets to approximately 0.46 the
> intervals I provide no longer have the opposite signs (both -ve).
> 
> I have tried various different upper limits but I still can't seem to find a
> root.
> 
> Does anyone have any suggestions?

Hi.

The expression for BV.FV() contains only one occurrence of x and it may be
separated from the equation. The solution for x may be expressed as

  get.x <- function(y,a,A)
  
((A/(y^(a-1/a))/(exp(-((1^(-a^-1)+y^(-a^-1))^a)+y^-1)))^(1/(a-1))-y^(-a^-1))^(-a)

  a <- 0.703
  A <- 0.45
  y <-  - 1/log(A)

  uniroot(BV.FV, c(1, 20), y=y, a=a, A=A)$root
  [1] 3.477799

  get.x(y, a, A)
  [1] 3.477794

As pointed out in another reply, the equation does not always have
a solution. It seems to tend to infinity in the following

  for (A in seq(0.45, 0.4677, length=20)) {
y <-  - 1/log(A)
cat(get.x(y, a, A), uniroot(BV.FV, c(1, 1000), y=y, a=a, A=A)$root, 
"\n")
  }

  3.477794 3.477791 
  3.637566 3.637565 
  3.812851 3.812851 
  4.006213 4.006214 
  4.220838 4.220834 
  4.460738 4.460739 
  4.731048 4.731052 
  5.038453 5.038473 
  5.391845 5.391843 
  5.803329 5.803329 
  6.289873 6.28988 
  6.876084 6.876084 
  7.5992 7.599201 
  8.518665 8.518683 
  9.736212 9.736212 
  11.44329 11.44328 
  14.05345 14.05345 
  18.68155 18.68155 
  29.97659 29.9766 
  219.3156 219.3155 

Hope this helps.

Petr Savicky.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Help!!! from R beginner

2011-02-04 Thread jim holtman
You should be able to use 'ifelse'

Os.chr4.gene.new$color <-
ifelse(Os.chr4.gene.new$if_TE_related == "TE_related", "black", "orange")



On Fri, Feb 4, 2011 at 7:09 PM, Tae-Jin Lee  wrote:
> Hello,
>
> I'm trying to add a column to the following data frame. The new column
> will contain "black" when the 5th column(if_TE_related) is
> "TE_related", or "orange" when the 4th column is " " (space).
>
> "chromo"        "MSU_locus"     "end5"  "end3"  "if_TE_related"
> "chr04" "LOC_Os04g01006"        1032    2679    "TE_related"
> "chr04" "LOC_Os04g01008"        7636    3951    "TE_related"
> "chr04" "LOC_Os04g01010"        9521    10296   "TE_related"
> "chr04" "LOC_Os04g01020"        17165   17437   " "
> "chr04" "LOC_Os04g01030"        29372   18440   "TE_related"
> "chr04" "LOC_Os04g01040"        30637   37300   "TE_related"
> ...
>
> So, after a data manipulation, it should look like the following...
>
> "chromo"        "MSU_locus"     "end5"  "end3"  "if_TE_related" "color"
> "chr04" "LOC_Os04g01006"        1032    2679    "TE_related"    "black"
> "chr04" "LOC_Os04g01008"        7636    3951    "TE_related"    "black"
> "chr04" "LOC_Os04g01010"        9521    10296   "TE_related"    "black"
> "chr04" "LOC_Os04g01020"        17165   17437   " "     "orange"
> "chr04" "LOC_Os04g01030"        29372   18440   "TE_related"    "black"
> "chr04" "LOC_Os04g01040"        30637   37300   "TE_related"    "black"
> ...
>
> I have worked on the following code to do this job using function and
> loop, but it is not working. If someone help me, I would really
> appreciate!!!
> The original data frame is Os.chr4.gene.new.
>
> Gene <- Os.chr4.gene.new[, c("if_TE_related")]
> Genecolor <- function(Gene) {
>        lg <-length(Gene)
>        for(i in 1:lg) {
>        if (Gene == "TE_related") {D1 <- (Gene == "black")}
>        if (Gene == " ") {D1 <- (Gene == "orange")}
>        }
>        Gene.color <- cbind(Gene, D1)
>        write.table(Gene.color, file="Gene_color1.txt", sep="\t", row.names=F)
>        }
> Genecolor(Gene)
>
>
> Tae-Jin
> Researcher in NC State University
>
>
>
>
>
>        [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Fwd: Help!!! from R beginner

2011-02-04 Thread Tae-Jin Lee
Sorry. I made a typo. It is the 5th column (not 4th). Thank you.

Tae-Jin

Begin forwarded message:

> From: Tae-Jin Lee 
> Date: February 4, 2011 7:09:38 PM EST
> To: r-help@R-project.org
> Subject: Help!!! from R beginner
>
> Hello,
>
> I'm trying to add a column to the following data frame. The new  
> column will contain "black" when the 5th column(if_TE_related) is  
> "TE_related", or "orange" when the 4th column is " " (space).
>
> "chromo"  "MSU_locus" "end5"  "end3"  "if_TE_related"
> "chr04"   "LOC_Os04g01006"10322679"TE_related"
> "chr04"   "LOC_Os04g01008"76363951"TE_related"
> "chr04"   "LOC_Os04g01010"952110296   "TE_related"
> "chr04"   "LOC_Os04g01020"17165   17437   " "
> "chr04"   "LOC_Os04g01030"29372   18440   "TE_related"
> "chr04"   "LOC_Os04g01040"30637   37300   "TE_related"
> ...
>
> So, after a data manipulation, it should look like the following...
>
> "chromo"  "MSU_locus" "end5"  "end3"  "if_TE_related" "color"
> "chr04"   "LOC_Os04g01006"10322679"TE_related""black"
> "chr04"   "LOC_Os04g01008"76363951"TE_related""black"
> "chr04"   "LOC_Os04g01010"952110296   "TE_related""black"
> "chr04"   "LOC_Os04g01020"17165   17437   " " "orange"
> "chr04"   "LOC_Os04g01030"29372   18440   "TE_related""black"
> "chr04"   "LOC_Os04g01040"30637   37300   "TE_related""black"
> ...
>
> I have worked on the following code to do this job using function  
> and loop, but it is not working. If someone help me, I would really  
> appreciate!!!
> The original data frame is Os.chr4.gene.new.
>
> Gene <- Os.chr4.gene.new[, c("if_TE_related")]
> Genecolor <- function(Gene) {
>   lg <-length(Gene)
>   for(i in 1:lg) {
>   if (Gene == "TE_related") {D1 <- (Gene == "black")}
>   if (Gene == " ") {D1 <- (Gene == "orange")}
>   }
>   Gene.color <- cbind(Gene, D1)
>   write.table(Gene.color, file="Gene_color1.txt", sep="\t",  
> row.names=F)
>   }
> Genecolor(Gene)
>
>
> Tae-Jin
> Researcher in NC State University
>
>
>
>








[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Finding non-normal distributions per row of data frame?

2011-02-04 Thread Hugo Mildenberger
Danny,

sounds like you already have a certain idea how a 'nugget' distribution could 
look like. Maybe you also could intentionally produce some experimental 
data having such distributions, harvest the related patterns from the 
microarray and then apply a method as it was described in 
http://www.cs.uwaterloo.ca/~shai/TALKS/NIPS07_prob_wkshp.pdf
But this is an uneducated guess only.

Best

Hugo


On Saturday 05 February 2011 00:21:01 DB1984 wrote:
> 
> Greg, Dennis - thanks for your input, I really appreciate the feedback, as it
> is not easy to source.
> 
> In terms of the data; I've described it as 20 columns, which is the smallest
> dataset, but this can run to 320 columns, so in some cases there is likely
> to be enough power to detect non-normality. That said, a better solution
> would be useful.
> 
> As a first approximation, I looked at the mean/median ratio to indicate
> simple skew in the data - which suggested that most of the data was normally
> distributed. I took the 'nuggets' to be those with a mean/median ratio in
> the top or bottom 1% of the data. This was a small group - overall the data
> appears relatively normally distributed within rows. 
> 
> The aim is really to find those nuggets with significantly non-normal
> distributions. My hope was to be able to take the tails of the p-values for
> Shapiro-Wilk, or some similar test, and find these enriched with nuggets.
> This may not be an appropriately robust approach - but is there a better
> option?
> 
> One idea was to sort the data in each row, and perform a linear regression.
> For normal distributions I am expecting the intercept to be close to the
> mean. Using the (intercept-mean) and p-values for the fit of the regression
> was again another way to filter out the nuggets in the dataset.
> 
> If it helps, the nuggets I am expecting are either grouped 80% grouped
> around the mean with 20% forming a uni-directional tail, or an approximate
> bimodal distribution. 
> 
> As I'd imagine is obvious - I don't have an ideal solution to finding these
> nuggets, and so coming up with the R code to do so is harder still. If
> anybody has insight into this sort of problem, and can point me in the
> direction of further reading, that would be helpful. If there is a
> ready-made solution, even better!
> 
> As I said, thanks for your time with this...
> 
> 
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Help!!! from R beginner

2011-02-04 Thread Tae-Jin Lee
Hello,

I'm trying to add a column to the following data frame. The new column  
will contain "black" when the 5th column(if_TE_related) is  
"TE_related", or "orange" when the 4th column is " " (space).

"chromo""MSU_locus" "end5"  "end3"  "if_TE_related"
"chr04" "LOC_Os04g01006"10322679"TE_related"
"chr04" "LOC_Os04g01008"76363951"TE_related"
"chr04" "LOC_Os04g01010"952110296   "TE_related"
"chr04" "LOC_Os04g01020"17165   17437   " "
"chr04" "LOC_Os04g01030"29372   18440   "TE_related"
"chr04" "LOC_Os04g01040"30637   37300   "TE_related"
...

So, after a data manipulation, it should look like the following...

"chromo""MSU_locus" "end5"  "end3"  "if_TE_related" "color"
"chr04" "LOC_Os04g01006"10322679"TE_related""black"
"chr04" "LOC_Os04g01008"76363951"TE_related""black"
"chr04" "LOC_Os04g01010"952110296   "TE_related""black"
"chr04" "LOC_Os04g01020"17165   17437   " " "orange"
"chr04" "LOC_Os04g01030"29372   18440   "TE_related""black"
"chr04" "LOC_Os04g01040"30637   37300   "TE_related""black"
...

I have worked on the following code to do this job using function and  
loop, but it is not working. If someone help me, I would really  
appreciate!!!
The original data frame is Os.chr4.gene.new.

Gene <- Os.chr4.gene.new[, c("if_TE_related")]
Genecolor <- function(Gene) {
lg <-length(Gene)
for(i in 1:lg) {
if (Gene == "TE_related") {D1 <- (Gene == "black")}
if (Gene == " ") {D1 <- (Gene == "orange")}
}
Gene.color <- cbind(Gene, D1)
write.table(Gene.color, file="Gene_color1.txt", sep="\t", row.names=F)
}
Genecolor(Gene)


Tae-Jin
Researcher in NC State University





[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] different results in MASS's mca and SAS's corresp

2011-02-04 Thread Gong-Yi Liao
Dear list:

   I have tried MASS's mca function and SAS's PROC corresp on the 
   farms data (included in MASS, also used as mca's example), the 
   results are different:

   R: mca(farms)$rs:
  1 2
1   0.059296637  0.0455871427
2   0.043077902 -0.0354728795
3   0.059834286  0.0730485572
4   0.059834286  0.0730485572
5   0.012900181 -0.0503121890
6   0.038846577 -0.0340961617
7   0.005886752 -0.0438516465
8  -0.015108789 -0.0247221783
9   0.007505626 -0.0646608108
10  0.006631230 -0.0362117073
11  0.013309217 -0.0680733730
12  0.056549933  0.0010773359
13  0.015681958  0.0334320046
14 -0.065598990  0.0151619769
15 -0.046868229  0.0357782553
16 -0.003048803  0.0128157261
17 -0.051281437  0.0278941743
18 -0.051819085  0.0004327598
19 -0.072814626  0.0195622280
20 -0.072814626  0.0195622280

  And in SAS's corresp output:

Row Coordinates

Dim1   Dim2

  1   1.0607-0.8155
  2   0.7706 0.6346
  3   1.0703-1.3067
  4   1.0703-1.3067
  5   0.2308 0.9000
  6   0.6949 0.6099
  7   0.1053 0.7844
  8  -0.2703 0.4422
  9   0.1343 1.1567
 10   0.1186 0.6478
 11   0.2381 1.2177
 12   1.0116-0.0193
 13   0.2805-0.5980
 14  -1.1735-0.2712
 15  -0.8384-0.6400
 16  -0.0545-0.2293
 17  -0.9174-0.4990
 18  -0.9270-0.0077
 19  -1.3025-0.3499
 20  -1.3025-0.3499


Is MASS's mca developed with different definition to SAS's 
corresp ?

Thank you for any comments!




-- 
Gong-Yi Liao

Department of Statistics
University of Connecticut
215 Glenbrook Road  U4120
Storrs, CT 06269-4120

860-486-9478

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] aggregate function - na.action

2011-02-04 Thread Gene Leynes
Ista,

Thank you again.

I had figured that out... and was crafting another message when you replied.

The NAs do come though on the variable that is being aggregated,
However, they do not come through on the categorical variable(s).

The aggregate function must be converting the data frame variables to
factors, with the default "omit=NA" parameter.

The help on "aggregate" says:
na.action A function which indicates what should happen when the data
contain NA values.
  The default is to ignore missing values in the given
variables.
By "data" it must only refer to the aggregated variable, and not the
categorical variables.  I thought it referred to both, because I thought it
referred to the "data" argument, which is the underlying data frame.

I think the proper way to accomplish this would be to recast my x
(categorical) variables as factors.   This is not feasible for me due to
other complications.
Also, (imho) the help should be more clear about what the na.action
modifies.

So, unless someone has a better idea, I guess I'm out of luck?


On Fri, Feb 4, 2011 at 6:05 PM, Ista Zahn  wrote:

> Hi,
>
> On Fri, Feb 4, 2011 at 6:33 PM, Gene Leynes 
> >
> wrote:
> > Thank you both for the thoughtful (and funny) replies.
> >
> > I agree with both of you that sum is the one picking up aggregate.
> Although
> > I didn't mention it, I did realize that in the first place.
> > Also, thank you Phil for pointing out that aggregate only accepts a
> formula
> > value in more recent versions!  I actually thought that was an older
> > feature, but I must be thinking of other functions.
> >
> > I still don't see why these two values are not the same!
> >
> > It seems like a bug to me
>
> No, not a bug (see below).
>
> >
> >> set.seed(100)
> >> dat=data.frame(
> > + x1=sample(c(NA,'m','f'), 100, replace=TRUE),
> > + x2=sample(c(NA, 1:10), 100, replace=TRUE),
> > + x3=sample(c(NA,letters[1:5]), 100, replace=TRUE),
> > + x4=sample(c(NA,T,F), 100, replace=TRUE),
> > + y=sample(c(rep(NA,5), rnorm(95
> >> sum(dat$y, na.rm=T)
> > [1] 0.0815244116598
> >> sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.action=na.pass,
> na.rm=T)$y)
> > [1] -4.45087666247
> >>
>
> Because in the first one you are only removing missing data in dat$y.
> In the second one you are removeing all rows that contain missing data
> in any of the columns.
>
> all.equal(sum(na.omit(dat)$y), sum(aggregate(y~x1+x2+x3+x4, data=dat,
> sum, na.action=na.pass, na.rm=T)$y))
> [1] TRUE
>
> Best,
> Ista
>
> >
> >
> >
> > On Fri, Feb 4, 2011 at 4:18 PM, Ista Zahn 
> wrote:
> >>
> >> Sorry, I didn't see Phil's reply, which is better than mine anyway.
> >>
> >> -Ista
> >>
> >> On Fri, Feb 4, 2011 at 5:16 PM, Ista Zahn 
> >> wrote:
> >> > Hi,
> >> >
> >> > Please see ?na.action
> >> >
> >> > (just kidding!)
> >> >
> >> > So it seems to me the problem is that you are passing na.rm to the sum
> >> > function. So there is no missing data for the na.action argument to
> >> > operate on!
> >> >
> >> > Compare
> >> >
> >> > sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.action=na.fail)$y)
> >> > sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.action=na.pass)$y)
> >> > sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.action=na.omit)$y)
> >> >
> >> >
> >> > Best,
> >> > Ista
> >> >
> >> > On Fri, Feb 4, 2011 at 4:07 PM, Gene Leynes 
> >> > >
> wrote:
> >> >> Can someone please tell me what is up with na.action in aggregate?
> >> >>
> >> >> My (somewhat) reproducible example:
> >> >> (I say somewhat because some lines wouldn't run in a separate
> session,
> >> >> more
> >> >> below)
> >> >>
> >> >> set.seed(100)
> >> >> dat=data.frame(
> >> >>x1=sample(c(NA,'m','f'), 100, replace=TRUE),
> >> >>x2=sample(c(NA, 1:10), 100, replace=TRUE),
> >> >>x3=sample(c(NA,letters[1:5]), 100, replace=TRUE),
> >> >>x4=sample(c(NA,T,F), 100, replace=TRUE),
> >> >>y=sample(c(rep(NA,5), rnorm(95
> >> >> dat
> >> >> ## The total from dat:
> >> >> sum(dat$y, na.rm=T)
> >> >> ## The total from aggregate:
> >> >> sum(aggregate(dat$y, dat[,1:4], sum, na.rm=T)$x)
> >> >> sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T)$y)  ## <--- This
> >> >> line
> >> >> gave an error in a separate R instance
> >> >> ## The aggregate formula is excluding NA
> >> >>
> >> >> ## So, let's try to include NAs
> >> >> sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T,
> >> >> na.action='na.pass')$y)
> >> >> sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T,
> >> >> na.action=na.pass)$y)
> >> >> ## The aggregate formula is STILL excluding NA
> >> >> ## In fact, the formula doesn't seem to notice the na.action
> >> >> sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T, na.action='foo
> man
> >> >> chew')$y)
> >> >> ## H... that error surprised me (since the previous two things
> ran)
> >> >>
> >> >> ## So, let's try to change the global options
> >> >> ## (not mentioned in the help, but after reading the help
> >> >> #

Re: [R] aggregate function - na.action

2011-02-04 Thread Gene Leynes
Just to be clear:

This works:
> set.seed(100)
> dat=data.frame(
+ x1=sample(c(NA,'m','f'), 100, replace=TRUE),
+ x2=sample(c(NA, 1:10), 100, replace=TRUE),
+ x3=sample(c(NA,letters[1:5]), 100, replace=TRUE),
+ x4=sample(c(NA,T,F), 100, replace=TRUE),
+ y=sample(c(rep(NA,5), rnorm(95
> for(j in 1:4)
+ dat[,j] = factor(dat[,j], exclude=NULL)
> sum(dat$y, na.rm=T)
[1] 0.0815244116598
> sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T)$y)
[1] 0.0815244116598
>

It's just that I don't want to do that conversion on my real data, because
of other complications...

I wish you could tell aggregate to use NAs in the categorical data.


On Fri, Feb 4, 2011 at 6:18 PM, Gene Leynes

> wrote:

> Ista,
>
> Thank you again.
>
> I had figured that out... and was crafting another message when you
> replied.
>
> The NAs do come though on the variable that is being aggregated,
> However, they do not come through on the categorical variable(s).
>
> The aggregate function must be converting the data frame variables to
> factors, with the default "omit=NA" parameter.
>
> The help on "aggregate" says:
> na.action A function which indicates what should happen when the data
> contain NA values.
>   The default is to ignore missing values in the given
> variables.
> By "data" it must only refer to the aggregated variable, and not the
> categorical variables.  I thought it referred to both, because I thought it
> referred to the "data" argument, which is the underlying data frame.
>
> I think the proper way to accomplish this would be to recast my x
> (categorical) variables as factors.   This is not feasible for me due to
> other complications.
> Also, (imho) the help should be more clear about what the na.action
> modifies.
>
> So, unless someone has a better idea, I guess I'm out of luck?
>
>
>
> On Fri, Feb 4, 2011 at 6:05 PM, Ista Zahn wrote:
>
>> Hi,
>>
>> On Fri, Feb 4, 2011 at 6:33 PM, Gene Leynes 
>> >
>> wrote:
>> > Thank you both for the thoughtful (and funny) replies.
>> >
>> > I agree with both of you that sum is the one picking up aggregate.
>> Although
>> > I didn't mention it, I did realize that in the first place.
>> > Also, thank you Phil for pointing out that aggregate only accepts a
>> formula
>> > value in more recent versions!  I actually thought that was an older
>> > feature, but I must be thinking of other functions.
>> >
>> > I still don't see why these two values are not the same!
>> >
>> > It seems like a bug to me
>>
>> No, not a bug (see below).
>>
>> >
>> >> set.seed(100)
>> >> dat=data.frame(
>> > + x1=sample(c(NA,'m','f'), 100, replace=TRUE),
>> > + x2=sample(c(NA, 1:10), 100, replace=TRUE),
>> > + x3=sample(c(NA,letters[1:5]), 100, replace=TRUE),
>> > + x4=sample(c(NA,T,F), 100, replace=TRUE),
>> > + y=sample(c(rep(NA,5), rnorm(95
>> >> sum(dat$y, na.rm=T)
>> > [1] 0.0815244116598
>> >> sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.action=na.pass,
>> na.rm=T)$y)
>> > [1] -4.45087666247
>> >>
>>
>> Because in the first one you are only removing missing data in dat$y.
>> In the second one you are removeing all rows that contain missing data
>> in any of the columns.
>>
>> all.equal(sum(na.omit(dat)$y), sum(aggregate(y~x1+x2+x3+x4, data=dat,
>> sum, na.action=na.pass, na.rm=T)$y))
>> [1] TRUE
>>
>> Best,
>> Ista
>>
>> >
>> >
>> >
>> > On Fri, Feb 4, 2011 at 4:18 PM, Ista Zahn 
>> wrote:
>> >>
>> >> Sorry, I didn't see Phil's reply, which is better than mine anyway.
>> >>
>> >> -Ista
>> >>
>> >> On Fri, Feb 4, 2011 at 5:16 PM, Ista Zahn 
>> >> wrote:
>> >> > Hi,
>> >> >
>> >> > Please see ?na.action
>> >> >
>> >> > (just kidding!)
>> >> >
>> >> > So it seems to me the problem is that you are passing na.rm to the
>> sum
>> >> > function. So there is no missing data for the na.action argument to
>> >> > operate on!
>> >> >
>> >> > Compare
>> >> >
>> >> > sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.action=na.fail)$y)
>> >> > sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.action=na.pass)$y)
>> >> > sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.action=na.omit)$y)
>> >> >
>> >> >
>> >> > Best,
>> >> > Ista
>> >> >
>> >> > On Fri, Feb 4, 2011 at 4:07 PM, Gene Leynes 
>> >> > >
>> wrote:
>> >> >> Can someone please tell me what is up with na.action in aggregate?
>> >> >>
>> >> >> My (somewhat) reproducible example:
>> >> >> (I say somewhat because some lines wouldn't run in a separate
>> session,
>> >> >> more
>> >> >> below)
>> >> >>
>> >> >> set.seed(100)
>> >> >> dat=data.frame(
>> >> >>x1=sample(c(NA,'m','f'), 100, replace=TRUE),
>> >> >>x2=sample(c(NA, 1:10), 100, replace=TRUE),
>> >> >>x3=sample(c(NA,letters[1:5]), 100, replace=TRUE),
>> >> >>x4=sample(c(NA,T,F), 100, replace=TRUE),
>> >> >>y=sample(c(rep(NA,5), rnorm(95
>> >> >> dat
>> >> >> ## The total from dat:
>> >> >> sum(dat$y, na.rm=T)
>> >> >> ## The total from aggregate:
>> >> >> sum(aggr

[R] [OT] BLUE vs. BLUP

2011-02-04 Thread Laura Smith
Hi!
Does anyone have a numeric example for calculating BLUE and BLUP, please?

Thanks,
Laura

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] aggregate function - na.action

2011-02-04 Thread Ista Zahn
oops. For clarity, that should have been

sum(ddply(dat, .(x1,x2,x3,x4), function(x){data.frame(y.sum=sum(x$y,
na.rm=TRUE))})$y.sum)

-Ista

On Fri, Feb 4, 2011 at 7:52 PM, Ista Zahn  wrote:
> Hi again,
>
> On Fri, Feb 4, 2011 at 7:18 PM, Gene Leynes  wrote:
>> Ista,
>>
>> Thank you again.
>>
>> I had figured that out... and was crafting another message when you replied.
>>
>> The NAs do come though on the variable that is being aggregated,
>> However, they do not come through on the categorical variable(s).
>>
>> The aggregate function must be converting the data frame variables to
>> factors, with the default "omit=NA" parameter.
>>
>> The help on "aggregate" says:
>> na.action A function which indicates what should happen when the data
>> contain NA values.
>>   The default is to ignore missing values in the given
>> variables.
>> By "data" it must only refer to the aggregated variable, and not the
>> categorical variables.  I thought it referred to both, because I thought it
>> referred to the "data" argument, which is the underlying data frame.
>>
>> I think the proper way to accomplish this would be to recast my x
>> (categorical) variables as factors.
>
> Yes, that would work.
>
> This is not feasible for me due to
>> other complications.
>> Also, (imho) the help should be more clear about what the na.action
>> modifies.
>>
>> So, unless someone has a better idea, I guess I'm out of luck?
>
> Well, you can use ddply from the plyr package:
>
> library(plyr) # may need to install first.
> sum(ddply(dat, .(x1,x2,x3,x4), function(x){data.frame(y.sum=sum(x$y,
> na.rm=TRUE))})$y)
>
> However, I don't think you've told us what you're actually trying to
> accomplish...
>
> Best,
> Ista
>
>>
>>
>> On Fri, Feb 4, 2011 at 6:05 PM, Ista Zahn  wrote:
>>>
>>> Hi,
>>>
>>> On Fri, Feb 4, 2011 at 6:33 PM, Gene Leynes  wrote:
>>> > Thank you both for the thoughtful (and funny) replies.
>>> >
>>> > I agree with both of you that sum is the one picking up aggregate.
>>> > Although
>>> > I didn't mention it, I did realize that in the first place.
>>> > Also, thank you Phil for pointing out that aggregate only accepts a
>>> > formula
>>> > value in more recent versions!  I actually thought that was an older
>>> > feature, but I must be thinking of other functions.
>>> >
>>> > I still don't see why these two values are not the same!
>>> >
>>> > It seems like a bug to me
>>>
>>> No, not a bug (see below).
>>>
>>> >
>>> >> set.seed(100)
>>> >> dat=data.frame(
>>> > + x1=sample(c(NA,'m','f'), 100, replace=TRUE),
>>> > + x2=sample(c(NA, 1:10), 100, replace=TRUE),
>>> > + x3=sample(c(NA,letters[1:5]), 100, replace=TRUE),
>>> > + x4=sample(c(NA,T,F), 100, replace=TRUE),
>>> > + y=sample(c(rep(NA,5), rnorm(95
>>> >> sum(dat$y, na.rm=T)
>>> > [1] 0.0815244116598
>>> >> sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.action=na.pass,
>>> >> na.rm=T)$y)
>>> > [1] -4.45087666247
>>> >>
>>>
>>> Because in the first one you are only removing missing data in dat$y.
>>> In the second one you are removeing all rows that contain missing data
>>> in any of the columns.
>>>
>>> all.equal(sum(na.omit(dat)$y), sum(aggregate(y~x1+x2+x3+x4, data=dat,
>>> sum, na.action=na.pass, na.rm=T)$y))
>>> [1] TRUE
>>>
>>> Best,
>>> Ista
>>>
>>> >
>>> >
>>> >
>>> > On Fri, Feb 4, 2011 at 4:18 PM, Ista Zahn 
>>> > wrote:
>>> >>
>>> >> Sorry, I didn't see Phil's reply, which is better than mine anyway.
>>> >>
>>> >> -Ista
>>> >>
>>> >> On Fri, Feb 4, 2011 at 5:16 PM, Ista Zahn 
>>> >> wrote:
>>> >> > Hi,
>>> >> >
>>> >> > Please see ?na.action
>>> >> >
>>> >> > (just kidding!)
>>> >> >
>>> >> > So it seems to me the problem is that you are passing na.rm to the
>>> >> > sum
>>> >> > function. So there is no missing data for the na.action argument to
>>> >> > operate on!
>>> >> >
>>> >> > Compare
>>> >> >
>>> >> > sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.action=na.fail)$y)
>>> >> > sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.action=na.pass)$y)
>>> >> > sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.action=na.omit)$y)
>>> >> >
>>> >> >
>>> >> > Best,
>>> >> > Ista
>>> >> >
>>> >> > On Fri, Feb 4, 2011 at 4:07 PM, Gene Leynes 
>>> >> > wrote:
>>> >> >> Can someone please tell me what is up with na.action in aggregate?
>>> >> >>
>>> >> >> My (somewhat) reproducible example:
>>> >> >> (I say somewhat because some lines wouldn't run in a separate
>>> >> >> session,
>>> >> >> more
>>> >> >> below)
>>> >> >>
>>> >> >> set.seed(100)
>>> >> >> dat=data.frame(
>>> >> >>        x1=sample(c(NA,'m','f'), 100, replace=TRUE),
>>> >> >>        x2=sample(c(NA, 1:10), 100, replace=TRUE),
>>> >> >>        x3=sample(c(NA,letters[1:5]), 100, replace=TRUE),
>>> >> >>        x4=sample(c(NA,T,F), 100, replace=TRUE),
>>> >> >>        y=sample(c(rep(NA,5), rnorm(95
>>> >> >> dat
>>> >> >> ## The total from dat:
>>> >> >> sum(dat$y, na.rm=T)
>>> >> >> ## The total from aggregate:
>>> >> >> sum(aggregate(dat$y, dat[,1:

Re: [R] aggregate function - na.action

2011-02-04 Thread Ista Zahn
Hi again,

On Fri, Feb 4, 2011 at 7:18 PM, Gene Leynes  wrote:
> Ista,
>
> Thank you again.
>
> I had figured that out... and was crafting another message when you replied.
>
> The NAs do come though on the variable that is being aggregated,
> However, they do not come through on the categorical variable(s).
>
> The aggregate function must be converting the data frame variables to
> factors, with the default "omit=NA" parameter.
>
> The help on "aggregate" says:
> na.action A function which indicates what should happen when the data
> contain NA values.
>   The default is to ignore missing values in the given
> variables.
> By "data" it must only refer to the aggregated variable, and not the
> categorical variables.  I thought it referred to both, because I thought it
> referred to the "data" argument, which is the underlying data frame.
>
> I think the proper way to accomplish this would be to recast my x
> (categorical) variables as factors.

Yes, that would work.

This is not feasible for me due to
> other complications.
> Also, (imho) the help should be more clear about what the na.action
> modifies.
>
> So, unless someone has a better idea, I guess I'm out of luck?

Well, you can use ddply from the plyr package:

library(plyr) # may need to install first.
sum(ddply(dat, .(x1,x2,x3,x4), function(x){data.frame(y.sum=sum(x$y,
na.rm=TRUE))})$y)

However, I don't think you've told us what you're actually trying to
accomplish...

Best,
Ista

>
>
> On Fri, Feb 4, 2011 at 6:05 PM, Ista Zahn  wrote:
>>
>> Hi,
>>
>> On Fri, Feb 4, 2011 at 6:33 PM, Gene Leynes  wrote:
>> > Thank you both for the thoughtful (and funny) replies.
>> >
>> > I agree with both of you that sum is the one picking up aggregate.
>> > Although
>> > I didn't mention it, I did realize that in the first place.
>> > Also, thank you Phil for pointing out that aggregate only accepts a
>> > formula
>> > value in more recent versions!  I actually thought that was an older
>> > feature, but I must be thinking of other functions.
>> >
>> > I still don't see why these two values are not the same!
>> >
>> > It seems like a bug to me
>>
>> No, not a bug (see below).
>>
>> >
>> >> set.seed(100)
>> >> dat=data.frame(
>> > + x1=sample(c(NA,'m','f'), 100, replace=TRUE),
>> > + x2=sample(c(NA, 1:10), 100, replace=TRUE),
>> > + x3=sample(c(NA,letters[1:5]), 100, replace=TRUE),
>> > + x4=sample(c(NA,T,F), 100, replace=TRUE),
>> > + y=sample(c(rep(NA,5), rnorm(95
>> >> sum(dat$y, na.rm=T)
>> > [1] 0.0815244116598
>> >> sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.action=na.pass,
>> >> na.rm=T)$y)
>> > [1] -4.45087666247
>> >>
>>
>> Because in the first one you are only removing missing data in dat$y.
>> In the second one you are removeing all rows that contain missing data
>> in any of the columns.
>>
>> all.equal(sum(na.omit(dat)$y), sum(aggregate(y~x1+x2+x3+x4, data=dat,
>> sum, na.action=na.pass, na.rm=T)$y))
>> [1] TRUE
>>
>> Best,
>> Ista
>>
>> >
>> >
>> >
>> > On Fri, Feb 4, 2011 at 4:18 PM, Ista Zahn 
>> > wrote:
>> >>
>> >> Sorry, I didn't see Phil's reply, which is better than mine anyway.
>> >>
>> >> -Ista
>> >>
>> >> On Fri, Feb 4, 2011 at 5:16 PM, Ista Zahn 
>> >> wrote:
>> >> > Hi,
>> >> >
>> >> > Please see ?na.action
>> >> >
>> >> > (just kidding!)
>> >> >
>> >> > So it seems to me the problem is that you are passing na.rm to the
>> >> > sum
>> >> > function. So there is no missing data for the na.action argument to
>> >> > operate on!
>> >> >
>> >> > Compare
>> >> >
>> >> > sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.action=na.fail)$y)
>> >> > sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.action=na.pass)$y)
>> >> > sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.action=na.omit)$y)
>> >> >
>> >> >
>> >> > Best,
>> >> > Ista
>> >> >
>> >> > On Fri, Feb 4, 2011 at 4:07 PM, Gene Leynes 
>> >> > wrote:
>> >> >> Can someone please tell me what is up with na.action in aggregate?
>> >> >>
>> >> >> My (somewhat) reproducible example:
>> >> >> (I say somewhat because some lines wouldn't run in a separate
>> >> >> session,
>> >> >> more
>> >> >> below)
>> >> >>
>> >> >> set.seed(100)
>> >> >> dat=data.frame(
>> >> >>        x1=sample(c(NA,'m','f'), 100, replace=TRUE),
>> >> >>        x2=sample(c(NA, 1:10), 100, replace=TRUE),
>> >> >>        x3=sample(c(NA,letters[1:5]), 100, replace=TRUE),
>> >> >>        x4=sample(c(NA,T,F), 100, replace=TRUE),
>> >> >>        y=sample(c(rep(NA,5), rnorm(95
>> >> >> dat
>> >> >> ## The total from dat:
>> >> >> sum(dat$y, na.rm=T)
>> >> >> ## The total from aggregate:
>> >> >> sum(aggregate(dat$y, dat[,1:4], sum, na.rm=T)$x)
>> >> >> sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T)$y)  ## <---
>> >> >> This
>> >> >> line
>> >> >> gave an error in a separate R instance
>> >> >> ## The aggregate formula is excluding NA
>> >> >>
>> >> >> ## So, let's try to include NAs
>> >> >> sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T,
>> >> >> na.action='na

Re: [R] Quadratic regression: estimating the maximizing value

2011-02-04 Thread Greg Snow
No, your approach is not correct.  For one you have not taken the covariance 
between B and C into account, another thing is that the ratio of 2 normal 
random variables is not necessarily normal, in some cases it can even follow a 
Cauchy distribution.  Also note that with only 1 degree of freedom the Central 
Limit Theorem is not justified for using normal theory for non normal 
distributions.  So the normal based confidence intervals on the coefficients 
are only reasonable if you are certain that the true error distribution is 
normal or nearly normal.

Some possibilities that may work for you:

Use the nls function with the curve parameterized to use the vertex point as 
one of the parameters (still need normality).

Do bootstrapping (with only 4 points you are not going to get much with 
non-parametric bootstrap, but parametric with some assumptions may work).

Use a Bayesian approach (this may be best if you can find some meaningful prior 
information).

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.org
801.408.8111


> -Original Message-
> From: r-help-boun...@r-project.org [mailto:r-help-bounces@r-
> project.org] On Behalf Of ghe...@blm.gov
> Sent: Friday, February 04, 2011 2:34 PM
> To: r-help@r-project.org
> Subject: [R] Quadratic regression: estimating the maximizing value
> 
> 
> A bioligist colleague sent me the following data.
> 
> x Y
> 3 1
> 7 5
> 148
> 240
> 
> (Yes, only four data points.)  I don't know much about the
> application, but apparently there are good empirical
> reasons to use a quadratic model.
> 
> The goal is to find the X value which maximizes the
> response Y, and to find a confidence interval for this X
> value.
> 
> Finding the maximizing X value is pretty straightforward:
> 
> >Ldat <- data.frame("X"=c(3,7,14,24), "Y"=c(1,5,8,0))
> >(LM<-lm(formula = Y ~ X + I(X^2), data = Ldat))
> Call:
> lm(formula = Y ~ X + I(X^2), data = Ldat)
> 
> Coefficients:
> (Intercept)X   I(X^2)
>-3.86978  1.77762 -0.06729
> 
> > DZ<-function(B,C) { (-B)/(2*C) } # Solve  d/dx(A + Bx + Cx^2) = 0
> > DZ(LM$coefficients[2],LM$coefficients[3])
>X
> 13.20961
> 
> 
> To find a confidence interval, I used "confint()".
> Default confidence level of 95% was not useful; used 80% instead,
> and then computed DZ with the extreme X and I(X^2) coefficients:
> 
> >(CI80<-confint(LM,level=0.8))
> 
>   10 %90 %
> (Intercept) -5.6147948 -2.12476138
> X1.4476460  2.10759306
> I(X^2)  -0.0790312 -0.05553898
> 
> > DZ(CI80[2,1],CI80[3,1])
> [1] 9.1587
> > DZ(CI80[2,2],CI80[3,2])
> [1] 18.97400
> 
> Conclusion: the 80% confidence level for the maximizing X value is
> included in the range (9.158, 18.974)
> 
> #
> Questions:
> 
> 1) Is this the right procedure, or totally off base?
> 
> 2) The coefficient of the "Intercept" is irrelevant to calculating
> the maximizing value of X.  Is there a way to reduce the size of
> the confidence interval by doing a computation that leaves out this
> parameter?
> 
> 3) I believe that confint() indicates the axes of an ellipsoid,
> rather than the corners of a box, in parameter space;
> so that the above procedure is (slightly) too conservative.
> 
> 4) Where are the calculations for confint() documented ?
> 
> Thanks,
> George Heine
> 
> (Embedded image moved to file: pic23206.gif)Please consider the
> environment
> before printing this page
> 
> 
> 
>  <>=<>=<>=<>=<>=<>
>  =<>=<>=<>=<>=<>
> 
>  George Heine, PhD
> 
>  Mathematical Analyst
> 
>  National Operations
>  Center
> 
>  Bureau of Land Management
> 
>  voice (303) 236-0099
> 
>  fax   (303) 236-1974
> 
>  cell  (303) 905-5296
> 
>  <>=<>=<>=<>=<>=<>
>  =<>=<>=<>=<>=<>
> 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Finding non-normal distributions per row of data frame?

2011-02-04 Thread Greg Snow
Have you looked into bioconductor?  There is a separate mailing list and many 
packages designed for genetic analysis within the bioconductor project.

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.org
801.408.8111


> -Original Message-
> From: r-help-boun...@r-project.org [mailto:r-help-bounces@r-
> project.org] On Behalf Of DB1984
> Sent: Friday, February 04, 2011 4:21 PM
> To: r-help@r-project.org
> Subject: Re: [R] Finding non-normal distributions per row of data
> frame?
> 
> 
> Greg, Dennis - thanks for your input, I really appreciate the feedback,
> as it
> is not easy to source.
> 
> In terms of the data; I've described it as 20 columns, which is the
> smallest
> dataset, but this can run to 320 columns, so in some cases there is
> likely
> to be enough power to detect non-normality. That said, a better
> solution
> would be useful.
> 
> As a first approximation, I looked at the mean/median ratio to indicate
> simple skew in the data - which suggested that most of the data was
> normally
> distributed. I took the 'nuggets' to be those with a mean/median ratio
> in
> the top or bottom 1% of the data. This was a small group - overall the
> data
> appears relatively normally distributed within rows.
> 
> The aim is really to find those nuggets with significantly non-normal
> distributions. My hope was to be able to take the tails of the p-values
> for
> Shapiro-Wilk, or some similar test, and find these enriched with
> nuggets.
> This may not be an appropriately robust approach - but is there a
> better
> option?
> 
> One idea was to sort the data in each row, and perform a linear
> regression.
> For normal distributions I am expecting the intercept to be close to
> the
> mean. Using the (intercept-mean) and p-values for the fit of the
> regression
> was again another way to filter out the nuggets in the dataset.
> 
> If it helps, the nuggets I am expecting are either grouped 80% grouped
> around the mean with 20% forming a uni-directional tail, or an
> approximate
> bimodal distribution.
> 
> As I'd imagine is obvious - I don't have an ideal solution to finding
> these
> nuggets, and so coming up with the R code to do so is harder still. If
> anybody has insight into this sort of problem, and can point me in the
> direction of further reading, that would be helpful. If there is a
> ready-made solution, even better!
> 
> As I said, thanks for your time with this...
> 
> 
> --
> View this message in context: http://r.789695.n4.nabble.com/Finding-
> non-normal-distributions-per-row-of-data-frame-tp3259439p3261203.html
> Sent from the R help mailing list archive at Nabble.com.
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] aggregate function - na.action

2011-02-04 Thread Ista Zahn
Hi,

On Fri, Feb 4, 2011 at 6:33 PM, Gene Leynes  wrote:
> Thank you both for the thoughtful (and funny) replies.
>
> I agree with both of you that sum is the one picking up aggregate.  Although
> I didn't mention it, I did realize that in the first place.
> Also, thank you Phil for pointing out that aggregate only accepts a formula
> value in more recent versions!  I actually thought that was an older
> feature, but I must be thinking of other functions.
>
> I still don't see why these two values are not the same!
>
> It seems like a bug to me

No, not a bug (see below).

>
>> set.seed(100)
>> dat=data.frame(
> + x1=sample(c(NA,'m','f'), 100, replace=TRUE),
> + x2=sample(c(NA, 1:10), 100, replace=TRUE),
> + x3=sample(c(NA,letters[1:5]), 100, replace=TRUE),
> + x4=sample(c(NA,T,F), 100, replace=TRUE),
> + y=sample(c(rep(NA,5), rnorm(95
>> sum(dat$y, na.rm=T)
> [1] 0.0815244116598
>> sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.action=na.pass, na.rm=T)$y)
> [1] -4.45087666247
>>

Because in the first one you are only removing missing data in dat$y.
In the second one you are removeing all rows that contain missing data
in any of the columns.

all.equal(sum(na.omit(dat)$y), sum(aggregate(y~x1+x2+x3+x4, data=dat,
sum, na.action=na.pass, na.rm=T)$y))
[1] TRUE

Best,
Ista

>
>
>
> On Fri, Feb 4, 2011 at 4:18 PM, Ista Zahn  wrote:
>>
>> Sorry, I didn't see Phil's reply, which is better than mine anyway.
>>
>> -Ista
>>
>> On Fri, Feb 4, 2011 at 5:16 PM, Ista Zahn 
>> wrote:
>> > Hi,
>> >
>> > Please see ?na.action
>> >
>> > (just kidding!)
>> >
>> > So it seems to me the problem is that you are passing na.rm to the sum
>> > function. So there is no missing data for the na.action argument to
>> > operate on!
>> >
>> > Compare
>> >
>> > sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.action=na.fail)$y)
>> > sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.action=na.pass)$y)
>> > sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.action=na.omit)$y)
>> >
>> >
>> > Best,
>> > Ista
>> >
>> > On Fri, Feb 4, 2011 at 4:07 PM, Gene Leynes  wrote:
>> >> Can someone please tell me what is up with na.action in aggregate?
>> >>
>> >> My (somewhat) reproducible example:
>> >> (I say somewhat because some lines wouldn't run in a separate session,
>> >> more
>> >> below)
>> >>
>> >> set.seed(100)
>> >> dat=data.frame(
>> >>        x1=sample(c(NA,'m','f'), 100, replace=TRUE),
>> >>        x2=sample(c(NA, 1:10), 100, replace=TRUE),
>> >>        x3=sample(c(NA,letters[1:5]), 100, replace=TRUE),
>> >>        x4=sample(c(NA,T,F), 100, replace=TRUE),
>> >>        y=sample(c(rep(NA,5), rnorm(95
>> >> dat
>> >> ## The total from dat:
>> >> sum(dat$y, na.rm=T)
>> >> ## The total from aggregate:
>> >> sum(aggregate(dat$y, dat[,1:4], sum, na.rm=T)$x)
>> >> sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T)$y)  ## <--- This
>> >> line
>> >> gave an error in a separate R instance
>> >> ## The aggregate formula is excluding NA
>> >>
>> >> ## So, let's try to include NAs
>> >> sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T,
>> >> na.action='na.pass')$y)
>> >> sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T,
>> >> na.action=na.pass)$y)
>> >> ## The aggregate formula is STILL excluding NA
>> >> ## In fact, the formula doesn't seem to notice the na.action
>> >> sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T, na.action='foo man
>> >> chew')$y)
>> >> ## H... that error surprised me (since the previous two things ran)
>> >>
>> >> ## So, let's try to change the global options
>> >> ## (not mentioned in the help, but after reading the help
>> >> ##  100 times, I thought I would go above and beyond to avoid
>> >> ##  any r list flames from people complaining
>> >> ##  that I didn't read the help... but that's a separate topic)
>> >> options(na.action ="na.pass")
>> >> sum(aggregate(dat$y, dat[,1:4], sum, na.rm=T)$x)
>> >> sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T)$y)
>> >> sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T,
>> >> na.action='na.pass')$y)
>> >> sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T,
>> >> na.action=na.pass)$y)
>> >> ## (NAs are still omitted)
>> >>
>> >> ## Even more frustrating...
>> >> ## Why don't any of these work???
>> >> sum(aggregate(dat$y, dat[,1:4], sum, na.rm=T, na.action='na.pass')$x)
>> >> sum(aggregate(dat$y, dat[,1:4], sum, na.rm=T, na.action=na.pass)$x)
>> >> sum(aggregate(dat$y, dat[,1:4], sum, na.rm=T, na.action='na.omit')$x)
>> >> sum(aggregate(dat$y, dat[,1:4], sum, na.rm=T, na.action=na.omit)$x)
>> >>
>> >>
>> >> ## This does work, but in my real data set, I want NA to really be NA
>> >> for(j in 1:4)
>> >>    dat[is.na(dat[,j]),j] = 'NA'
>> >> sum(aggregate(dat$y, dat[,1:4], sum, na.rm=T)$x)
>> >> sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T)$y)
>> >>
>> >>
>> >> ## My first session info
>> >> #
>> >> #> sessionInfo()
>> >> #R version 2.12.0 (2010-10-15)
>> >> #Platform: i386-pc-mingw32/i386 (32-bit)
>> >> #
>> >> #locale:
>>

Re: [R] rbinom and probability

2011-02-04 Thread David Winsemius


On Feb 4, 2011, at 4:24 PM, Mcmahon, Kwyatt wrote:


Hello compadRes,

I'm developing a script that selects "cells" over a certain  
metabolic rate to kill them.  A rate between 9 and 12 means that the  
cells are candidates for death.


I'll show you what I mean:

# a would be a vector of cell metabolic rates.
a<-c(8, 7, 9, 6, 10, 11, 4, 5, 6)

#now identify which cells will be candidates for death, namely those  
cells with metabolic rates>9

b<-a[a>6]
#This would select all cells with rates >6
rbinom(length(b), size=1, prob=1)


No, you have an overly complex way of generating a vector of all 1's  
with length(b). And it's not clear what you mean by "select". I see no  
caode that would do any selection or indexing.




#This will select cells using a an exponential probability  
distribution

rbinom(length(b), size=1, prob=(seq(0, 1, by=0.1)^2))


Again no selection appears to be taking place.



I have two questions:

1)  Am I correct in my interpretations?

2)  I'd actually like to have this probability scaled so that  
cells with rates of 9 are unlikely to be selected and those near 12  
are highly likely.  How can I code this?


a[ sample(1:length(a), prob=0.1+(a>9) ,replace=TRUE) ]
[1] 10 10 11 10 10 11 10 11 10
> a[sample(1:length(a), prob=0.1+(a>9) ,replace=TRUE) ]
[1] 10 10 11 11 11 11  5 10 10

The "rates" over 9 are 11 times  (= 1.1/0.1) as likely to be selected.  
You can adjust the ratio by altering the 0.1 value."+" with the first  
argument numeric act to coerce the logical vector to 1's and 0's.


--
David.




Thanks in advance,

Wyatt


sessionInfo()

R version 2.11.1 (2010-05-31)
i386-pc-mingw32

locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

loaded via a namespace (and not attached):
[1] tools_2.11.1



K. Wyatt McMahon, Ph.D.
Postdoctoral Research Associate
Department of Internal Medicine
3601 4th St. - Lubbock, TX - 79430
P: 806-743-4072
F: 806-743-3148


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] problem barplot width

2011-02-04 Thread Ista Zahn
Hi Mario,
>From the help file for barplot:

"Specifying a single value will have no visible effect unless 'xlim'
is specified."

OK, so:

barplot(d,col=barcol,ylim=c(min(d-s*1.25),max(d+s*1.25)), xlim=c(0,1),
width=.1, space = 2)

Best,
Ista
On Fri, Feb 4, 2011 at 4:15 PM, Mario Beolco  wrote:
> Dear R-users,
>
> apologies for the total beginner's question, but I have been trying to
> solve this problem for ages and I seem to be getting nowhere. I also
> have tried to search through the archives of the R mailing list, but I
> am still left with my problem. How do I change the width of the bars
> for this simple barplot? I understand that the the "width" argument
> might do what I want and I have tried to use it to change the width of
> the bars e.g. width=c(0.01,0.01,0.01), but it does not do what I want.
> Any help/suggestions would be greatly appreciated.
>
> # some data
> d<-c(-4.227684e-04,2.012307e-04,4.085164e-05)
> s<-c(3.588785e-05,9.133071e-05,2.072433e-05)
>
> # barplot colour
> barcol<-c("#6E8B3D","#FFB90F","#8B","#6E8B3D","#FFB90F","#8B","#6E8B3D","#FFB90F","#8B")
>
> # create barplot
> barplot(d,col=barcol,ylim=c(min(d-s*1.25),max(d+s*1.25)),space=0.1)
>
>
> thanks in advance for your help
>
> Best wishes,
>
> Mario
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Ista Zahn
Graduate student
University of Rochester
Department of Clinical and Social Psychology
http://yourpsyche.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] aggregate function - na.action

2011-02-04 Thread Gene Leynes
Thank you both for the thoughtful (and funny) replies.

I agree with both of you that sum is the one picking up aggregate.  Although
I didn't mention it, I did realize that in the first place.
Also, thank you Phil for pointing out that aggregate only accepts a formula
value in more recent versions!  I actually thought that was an older
feature, but I must be thinking of other functions.

I still don't see why these two values are not the same!

It seems like a bug to me

> set.seed(100)
> dat=data.frame(
+ x1=sample(c(NA,'m','f'), 100, replace=TRUE),
+ x2=sample(c(NA, 1:10), 100, replace=TRUE),
+ x3=sample(c(NA,letters[1:5]), 100, replace=TRUE),
+ x4=sample(c(NA,T,F), 100, replace=TRUE),
+ y=sample(c(rep(NA,5), rnorm(95
> sum(dat$y, na.rm=T)
*[1] 0.0815244116598*
> sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.action=na.pass, na.rm=T)$y)
*[1] -4.45087666247*
>



On Fri, Feb 4, 2011 at 4:18 PM, Ista Zahn  wrote:

> Sorry, I didn't see Phil's reply, which is better than mine anyway.
>
> -Ista
>
> On Fri, Feb 4, 2011 at 5:16 PM, Ista Zahn 
> wrote:
> > Hi,
> >
> > Please see ?na.action
> >
> > (just kidding!)
> >
> > So it seems to me the problem is that you are passing na.rm to the sum
> > function. So there is no missing data for the na.action argument to
> > operate on!
> >
> > Compare
> >
> > sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.action=na.fail)$y)
> > sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.action=na.pass)$y)
> > sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.action=na.omit)$y)
> >
> >
> > Best,
> > Ista
> >
> > On Fri, Feb 4, 2011 at 4:07 PM, Gene Leynes 
> > >
> wrote:
> >> Can someone please tell me what is up with na.action in aggregate?
> >>
> >> My (somewhat) reproducible example:
> >> (I say somewhat because some lines wouldn't run in a separate session,
> more
> >> below)
> >>
> >> set.seed(100)
> >> dat=data.frame(
> >>x1=sample(c(NA,'m','f'), 100, replace=TRUE),
> >>x2=sample(c(NA, 1:10), 100, replace=TRUE),
> >>x3=sample(c(NA,letters[1:5]), 100, replace=TRUE),
> >>x4=sample(c(NA,T,F), 100, replace=TRUE),
> >>y=sample(c(rep(NA,5), rnorm(95
> >> dat
> >> ## The total from dat:
> >> sum(dat$y, na.rm=T)
> >> ## The total from aggregate:
> >> sum(aggregate(dat$y, dat[,1:4], sum, na.rm=T)$x)
> >> sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T)$y)  ## <--- This
> line
> >> gave an error in a separate R instance
> >> ## The aggregate formula is excluding NA
> >>
> >> ## So, let's try to include NAs
> >> sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T,
> na.action='na.pass')$y)
> >> sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T,
> na.action=na.pass)$y)
> >> ## The aggregate formula is STILL excluding NA
> >> ## In fact, the formula doesn't seem to notice the na.action
> >> sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T, na.action='foo man
> >> chew')$y)
> >> ## H... that error surprised me (since the previous two things ran)
> >>
> >> ## So, let's try to change the global options
> >> ## (not mentioned in the help, but after reading the help
> >> ##  100 times, I thought I would go above and beyond to avoid
> >> ##  any r list flames from people complaining
> >> ##  that I didn't read the help... but that's a separate topic)
> >> options(na.action ="na.pass")
> >> sum(aggregate(dat$y, dat[,1:4], sum, na.rm=T)$x)
> >> sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T)$y)
> >> sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T,
> na.action='na.pass')$y)
> >> sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T,
> na.action=na.pass)$y)
> >> ## (NAs are still omitted)
> >>
> >> ## Even more frustrating...
> >> ## Why don't any of these work???
> >> sum(aggregate(dat$y, dat[,1:4], sum, na.rm=T, na.action='na.pass')$x)
> >> sum(aggregate(dat$y, dat[,1:4], sum, na.rm=T, na.action=na.pass)$x)
> >> sum(aggregate(dat$y, dat[,1:4], sum, na.rm=T, na.action='na.omit')$x)
> >> sum(aggregate(dat$y, dat[,1:4], sum, na.rm=T, na.action=na.omit)$x)
> >>
> >>
> >> ## This does work, but in my real data set, I want NA to really be NA
> >> for(j in 1:4)
> >>dat[is.na(dat[,j]),j] = 'NA'
> >> sum(aggregate(dat$y, dat[,1:4], sum, na.rm=T)$x)
> >> sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T)$y)
> >>
> >>
> >> ## My first session info
> >> #
> >> #> sessionInfo()
> >> #R version 2.12.0 (2010-10-15)
> >> #Platform: i386-pc-mingw32/i386 (32-bit)
> >> #
> >> #locale:
> >> #[1] LC_COLLATE=English_United States.1252
> >> #[2] LC_CTYPE=English_United States.1252
> >> #[3] LC_MONETARY=English_United States.1252
> >> #[4] LC_NUMERIC=C
> >> #[5] LC_TIME=English_United States.1252
> >> #
> >> #attached base packages:
> >> #[1] stats graphics  grDevices utils datasets  methods
> >> base
> >> #
> >> #other attached packages:
> >> #[1] plyr_1.2.1  zoo_1.6-4   gdata_2.8.1 rj_0.5.0-5
> >> #
> >> #loaded via a namespace (and not attached):
> >> #[1] grid_2.12.0   

[R] Quadratic regression: estimating the maximizing value

2011-02-04 Thread gheine

A bioligist colleague sent me the following data.

x Y
3 1
7 5
148
240

(Yes, only four data points.)  I don't know much about the
application, but apparently there are good empirical
reasons to use a quadratic model.

The goal is to find the X value which maximizes the
response Y, and to find a confidence interval for this X
value.

Finding the maximizing X value is pretty straightforward:

>Ldat <- data.frame("X"=c(3,7,14,24), "Y"=c(1,5,8,0))
>(LM<-lm(formula = Y ~ X + I(X^2), data = Ldat))
Call:
lm(formula = Y ~ X + I(X^2), data = Ldat)

Coefficients:
(Intercept)X   I(X^2)
   -3.86978  1.77762 -0.06729

> DZ<-function(B,C) { (-B)/(2*C) } # Solve  d/dx(A + Bx + Cx^2) = 0
> DZ(LM$coefficients[2],LM$coefficients[3])
   X
13.20961


To find a confidence interval, I used "confint()".
Default confidence level of 95% was not useful; used 80% instead,
and then computed DZ with the extreme X and I(X^2) coefficients:

>(CI80<-confint(LM,level=0.8))

  10 %90 %
(Intercept) -5.6147948 -2.12476138
X1.4476460  2.10759306
I(X^2)  -0.0790312 -0.05553898

> DZ(CI80[2,1],CI80[3,1])
[1] 9.1587
> DZ(CI80[2,2],CI80[3,2])
[1] 18.97400

Conclusion: the 80% confidence level for the maximizing X value is
included in the range (9.158, 18.974)

#
Questions:

1) Is this the right procedure, or totally off base?

2) The coefficient of the "Intercept" is irrelevant to calculating
the maximizing value of X.  Is there a way to reduce the size of
the confidence interval by doing a computation that leaves out this
parameter?

3) I believe that confint() indicates the axes of an ellipsoid,
rather than the corners of a box, in parameter space;
so that the above procedure is (slightly) too conservative.

4) Where are the calculations for confint() documented ?

Thanks,
George Heine

(Embedded image moved to file: pic23206.gif)Please consider the environment
before printing this page


   
 <>=<>=<>=<>=<>=<> 
 =<>=<>=<>=<>=<>   
   
 George Heine, PhD 
   
 Mathematical Analyst  
   
 National Operations   
 Center
   
 Bureau of Land Management 
   
 voice (303) 236-0099  
   
 fax   (303) 236-1974  
   
 cell  (303) 905-5296  
   
 <>=<>=<>=<>=<>=<> 
 =<>=<>=<>=<>=<>   
   

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] problem barplot width

2011-02-04 Thread Mario Beolco
Dear R-users,

apologies for the total beginner's question, but I have been trying to
solve this problem for ages and I seem to be getting nowhere. I also
have tried to search through the archives of the R mailing list, but I
am still left with my problem. How do I change the width of the bars
for this simple barplot? I understand that the the "width" argument
might do what I want and I have tried to use it to change the width of
the bars e.g. width=c(0.01,0.01,0.01), but it does not do what I want.
Any help/suggestions would be greatly appreciated.

# some data
d<-c(-4.227684e-04,2.012307e-04,4.085164e-05)
s<-c(3.588785e-05,9.133071e-05,2.072433e-05)

# barplot colour
barcol<-c("#6E8B3D","#FFB90F","#8B","#6E8B3D","#FFB90F","#8B","#6E8B3D","#FFB90F","#8B")

# create barplot
barplot(d,col=barcol,ylim=c(min(d-s*1.25),max(d+s*1.25)),space=0.1)


thanks in advance for your help

Best wishes,

Mario

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] rbinom and probability

2011-02-04 Thread Mcmahon, Kwyatt
Hello compadRes,

I'm developing a script that selects "cells" over a certain metabolic rate to 
kill them.  A rate between 9 and 12 means that the cells are candidates for 
death.

I'll show you what I mean:

# a would be a vector of cell metabolic rates.
a<-c(8, 7, 9, 6, 10, 11, 4, 5, 6)

#now identify which cells will be candidates for death, namely those cells with 
metabolic rates>9
b<-a[a>6]
#This would select all cells with rates >6
rbinom(length(b), size=1, prob=1)

#This will select cells using a an exponential probability distribution
rbinom(length(b), size=1, prob=(seq(0, 1, by=0.1)^2))

I have two questions:

1)  Am I correct in my interpretations?

2)  I'd actually like to have this probability scaled so that cells with 
rates of 9 are unlikely to be selected and those near 12 are highly likely.  
How can I code this?

Thanks in advance,

Wyatt

> sessionInfo()
R version 2.11.1 (2010-05-31)
i386-pc-mingw32

locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

loaded via a namespace (and not attached):
[1] tools_2.11.1



K. Wyatt McMahon, Ph.D.
Postdoctoral Research Associate
Department of Internal Medicine
3601 4th St. - Lubbock, TX - 79430
P: 806-743-4072
F: 806-743-3148


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Where does R look for library packages - there is no package called 'BRugs'

2011-02-04 Thread conor1725

I created a file called .Renviron and set R_LIBS_USER to the same path I had
set R_LIBS to. I put this file in
C:/Users/myusername/Documents/mysubdirectory. I also commented out the line
I had put in the Renviron.site file. Now I get the same error as I
originally got. When I put .Renvrion in C:/Users/myusername/Documents
instead, it does work, but the original reason I posted my question was my
desire to not have anything R (or BRugs) related directly in my
C:/Users/myusername/Documents folder. If there's a way to do this the
correct way without putting any files or folders in my
C:/Users/myusername/Documents folder as opposed to my
C:/Users/myusername/Documents/mysubdirectory folder or my C:/Program
Files/R/R-2.12.0, I'd still be interested in knowing how.


Prof Brian Ripley wrote:
> 
> For the record, this is not a good answer to the question (and the 
> real answer is 'explained fully' on the help page for .libPaths, and 
> also in the rw-FAQ, Q4.2).
> 
> If you move your personal library directory, you need to set the 
> environment variable R_LIBS_USER .
> 
> And personal (not site) environment variables should be set in 
> ~/.Renviron, not in Renviron.site.  If you want a site library, that 
> is explained fully in the references in my first paragraph.
> 
> On Thu, 3 Feb 2011, conor1725 wrote:
> 
>>
>>
>> David Winsemius wrote:
>>>
>>>
>>> On Jan 30, 2011, at 5:42 PM, conor1725 wrote:
>>>

 I am have installed R on Windows 7 machine. R got installed in the
 directory
 C:\Program Files\R\R-2.12.0. Then I installed the package BRugs
 using the
 install.packages command. I did not get an option during the
 installation as
 to what directory I wanted BRugs installed in. It ended up in
 C:\myusername\Documents as C:\myusername\Documents\R\win-library
 \2.12\BRugs.
 When I open and R and run the command library(BRugs) it works fine.

 However, I would like to move the BRugs stuff out of my Documents
 folder,
 either to a subdirectory under Documents of my choosing or to
 somewhere in
 C:\Program Files\R\R-2.12.0. I tried just moving the whole folder and
 subfolders R\win-library\2.12\... from Documents to a subfolder. I
 don't
 think this matters, but just to complete, there is also a folder
 called Coda
 with the BRugs folder in the 2.12 folder and of course there are
 subfolders
 and files under each of the folders Coda and BRugs. Then when I try
 library(BRugs) I get the error message "Error in library(BRugs) :
 there is
 no package called 'BRugs'". So it seems R only knows to look for the
 BRugs
 package in it's original installation location.

 How can I get it to work having BRugs in one of my desired locations?
>>>
>>> Then you need to understand where R will be able to find your
>>> packages, and how one might change that:
>>>
>>> ?libPaths
>>>
>>> --
>>> David Winsemius, MD
>>> West Hartford, CT
>>>
>>>
>>
>> I'm posting a specific answer to my own question for the record since The
>> help for libPaths (actually should be ?.libPaths) doesn't explain this
>> fully. I created the file Renviron.site in the folder C:\Program
>> Files\R\R-2.12.0\etc and put the text R_LIBS =
>> C:/Users/myusername/Documents/.../R/win-library/2.12 in that file.  The
>> ...
>> represents the path for the subdirectory that I moved
>> R\win-library\2.12\...
>> to.
>> -- 
>> View this message in context:
>> http://r.789695.n4.nabble.com/Where-does-R-look-for-library-packages-there-is-no-package-called-BRugs-tp3247748p3259520.html
>> Sent from the R help mailing list archive at Nabble.com.
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
> 
> -- 
> Brian D. Ripley,  rip...@stats.ox.ac.uk
> Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
> University of Oxford, Tel:  +44 1865 272861 (self)
> 1 South Parks Road, +44 1865 272866 (PA)
> Oxford OX1 3TG, UKFax:  +44 1865 272595
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
> 

-- 
View this message in context: 
http://r.789695.n4.nabble.com/Where-does-R-look-for-library-packages-there-is-no-package-called-BRugs-tp3247748p3261227.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and p

Re: [R] Finding non-normal distributions per row of data frame?

2011-02-04 Thread DB1984

Greg, Dennis - thanks for your input, I really appreciate the feedback, as it
is not easy to source.

In terms of the data; I've described it as 20 columns, which is the smallest
dataset, but this can run to 320 columns, so in some cases there is likely
to be enough power to detect non-normality. That said, a better solution
would be useful.

As a first approximation, I looked at the mean/median ratio to indicate
simple skew in the data - which suggested that most of the data was normally
distributed. I took the 'nuggets' to be those with a mean/median ratio in
the top or bottom 1% of the data. This was a small group - overall the data
appears relatively normally distributed within rows. 

The aim is really to find those nuggets with significantly non-normal
distributions. My hope was to be able to take the tails of the p-values for
Shapiro-Wilk, or some similar test, and find these enriched with nuggets.
This may not be an appropriately robust approach - but is there a better
option?

One idea was to sort the data in each row, and perform a linear regression.
For normal distributions I am expecting the intercept to be close to the
mean. Using the (intercept-mean) and p-values for the fit of the regression
was again another way to filter out the nuggets in the dataset.

If it helps, the nuggets I am expecting are either grouped 80% grouped
around the mean with 20% forming a uni-directional tail, or an approximate
bimodal distribution. 

As I'd imagine is obvious - I don't have an ideal solution to finding these
nuggets, and so coming up with the R code to do so is harder still. If
anybody has insight into this sort of problem, and can point me in the
direction of further reading, that would be helpful. If there is a
ready-made solution, even better!

As I said, thanks for your time with this...


-- 
View this message in context: 
http://r.789695.n4.nabble.com/Finding-non-normal-distributions-per-row-of-data-frame-tp3259439p3261203.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] aggregate function - na.action

2011-02-04 Thread Ista Zahn
Sorry, I didn't see Phil's reply, which is better than mine anyway.

-Ista

On Fri, Feb 4, 2011 at 5:16 PM, Ista Zahn  wrote:
> Hi,
>
> Please see ?na.action
>
> (just kidding!)
>
> So it seems to me the problem is that you are passing na.rm to the sum
> function. So there is no missing data for the na.action argument to
> operate on!
>
> Compare
>
> sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.action=na.fail)$y)
> sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.action=na.pass)$y)
> sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.action=na.omit)$y)
>
>
> Best,
> Ista
>
> On Fri, Feb 4, 2011 at 4:07 PM, Gene Leynes  wrote:
>> Can someone please tell me what is up with na.action in aggregate?
>>
>> My (somewhat) reproducible example:
>> (I say somewhat because some lines wouldn't run in a separate session, more
>> below)
>>
>> set.seed(100)
>> dat=data.frame(
>>        x1=sample(c(NA,'m','f'), 100, replace=TRUE),
>>        x2=sample(c(NA, 1:10), 100, replace=TRUE),
>>        x3=sample(c(NA,letters[1:5]), 100, replace=TRUE),
>>        x4=sample(c(NA,T,F), 100, replace=TRUE),
>>        y=sample(c(rep(NA,5), rnorm(95
>> dat
>> ## The total from dat:
>> sum(dat$y, na.rm=T)
>> ## The total from aggregate:
>> sum(aggregate(dat$y, dat[,1:4], sum, na.rm=T)$x)
>> sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T)$y)  ## <--- This line
>> gave an error in a separate R instance
>> ## The aggregate formula is excluding NA
>>
>> ## So, let's try to include NAs
>> sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T, na.action='na.pass')$y)
>> sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T, na.action=na.pass)$y)
>> ## The aggregate formula is STILL excluding NA
>> ## In fact, the formula doesn't seem to notice the na.action
>> sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T, na.action='foo man
>> chew')$y)
>> ## H... that error surprised me (since the previous two things ran)
>>
>> ## So, let's try to change the global options
>> ## (not mentioned in the help, but after reading the help
>> ##  100 times, I thought I would go above and beyond to avoid
>> ##  any r list flames from people complaining
>> ##  that I didn't read the help... but that's a separate topic)
>> options(na.action ="na.pass")
>> sum(aggregate(dat$y, dat[,1:4], sum, na.rm=T)$x)
>> sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T)$y)
>> sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T, na.action='na.pass')$y)
>> sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T, na.action=na.pass)$y)
>> ## (NAs are still omitted)
>>
>> ## Even more frustrating...
>> ## Why don't any of these work???
>> sum(aggregate(dat$y, dat[,1:4], sum, na.rm=T, na.action='na.pass')$x)
>> sum(aggregate(dat$y, dat[,1:4], sum, na.rm=T, na.action=na.pass)$x)
>> sum(aggregate(dat$y, dat[,1:4], sum, na.rm=T, na.action='na.omit')$x)
>> sum(aggregate(dat$y, dat[,1:4], sum, na.rm=T, na.action=na.omit)$x)
>>
>>
>> ## This does work, but in my real data set, I want NA to really be NA
>> for(j in 1:4)
>>    dat[is.na(dat[,j]),j] = 'NA'
>> sum(aggregate(dat$y, dat[,1:4], sum, na.rm=T)$x)
>> sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T)$y)
>>
>>
>> ## My first session info
>> #
>> #> sessionInfo()
>> #R version 2.12.0 (2010-10-15)
>> #Platform: i386-pc-mingw32/i386 (32-bit)
>> #
>> #locale:
>> #        [1] LC_COLLATE=English_United States.1252
>> #[2] LC_CTYPE=English_United States.1252
>> #[3] LC_MONETARY=English_United States.1252
>> #[4] LC_NUMERIC=C
>> #[5] LC_TIME=English_United States.1252
>> #
>> #attached base packages:
>> #        [1] stats     graphics  grDevices utils     datasets  methods
>> base
>> #
>> #other attached packages:
>> #        [1] plyr_1.2.1  zoo_1.6-4   gdata_2.8.1 rj_0.5.0-5
>> #
>> #loaded via a namespace (and not attached):
>> #        [1] grid_2.12.0     gtools_2.6.2    lattice_0.19-13 rJava_0.8-8
>> #[5] tools_2.12.0
>>
>>
>>
>> I tried running that example in a different version of R, with and I got
>> completely different results
>>
>> The other version of R wouldn't recognize the formula at all..
>>
>> My other version of R:
>>
>> #  My second session info
>> #> sessionInfo()
>> #R version 2.10.1 (2009-12-14)
>> #i386-pc-mingw32
>> #
>> #locale:
>> #        [1] LC_COLLATE=English_United States.1252
>> #[2] LC_CTYPE=English_United States.1252
>> #[3] LC_MONETARY=English_United States.1252
>> #[4] LC_NUMERIC=C
>> #[5] LC_TIME=English_United States.1252
>> #
>> #attached base packages:
>> #        [1] stats     graphics  grDevices utils     datasets  methods
>> base
>> #>
>> #
>>
>> PS: Also, I have read the help on aggregate, factor, as.factor, and several
>> other topics.  If I missed something, please let me know.
>> Some people like to reply to questions by telling the sender that R has
>> documentation.  Please don't.  The R help archives are littered with
>> reminders, friendly and otherwise, of R's documentation.
>>
>>        [[alternative HTML version deleted]]
>>
>> __
>> R-help@r

Re: [R] aggregate function - na.action

2011-02-04 Thread Ista Zahn
Hi,

Please see ?na.action

(just kidding!)

So it seems to me the problem is that you are passing na.rm to the sum
function. So there is no missing data for the na.action argument to
operate on!

Compare

sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.action=na.fail)$y)
sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.action=na.pass)$y)
sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.action=na.omit)$y)


Best,
Ista

On Fri, Feb 4, 2011 at 4:07 PM, Gene Leynes  wrote:
> Can someone please tell me what is up with na.action in aggregate?
>
> My (somewhat) reproducible example:
> (I say somewhat because some lines wouldn't run in a separate session, more
> below)
>
> set.seed(100)
> dat=data.frame(
>        x1=sample(c(NA,'m','f'), 100, replace=TRUE),
>        x2=sample(c(NA, 1:10), 100, replace=TRUE),
>        x3=sample(c(NA,letters[1:5]), 100, replace=TRUE),
>        x4=sample(c(NA,T,F), 100, replace=TRUE),
>        y=sample(c(rep(NA,5), rnorm(95
> dat
> ## The total from dat:
> sum(dat$y, na.rm=T)
> ## The total from aggregate:
> sum(aggregate(dat$y, dat[,1:4], sum, na.rm=T)$x)
> sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T)$y)  ## <--- This line
> gave an error in a separate R instance
> ## The aggregate formula is excluding NA
>
> ## So, let's try to include NAs
> sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T, na.action='na.pass')$y)
> sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T, na.action=na.pass)$y)
> ## The aggregate formula is STILL excluding NA
> ## In fact, the formula doesn't seem to notice the na.action
> sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T, na.action='foo man
> chew')$y)
> ## H... that error surprised me (since the previous two things ran)
>
> ## So, let's try to change the global options
> ## (not mentioned in the help, but after reading the help
> ##  100 times, I thought I would go above and beyond to avoid
> ##  any r list flames from people complaining
> ##  that I didn't read the help... but that's a separate topic)
> options(na.action ="na.pass")
> sum(aggregate(dat$y, dat[,1:4], sum, na.rm=T)$x)
> sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T)$y)
> sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T, na.action='na.pass')$y)
> sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T, na.action=na.pass)$y)
> ## (NAs are still omitted)
>
> ## Even more frustrating...
> ## Why don't any of these work???
> sum(aggregate(dat$y, dat[,1:4], sum, na.rm=T, na.action='na.pass')$x)
> sum(aggregate(dat$y, dat[,1:4], sum, na.rm=T, na.action=na.pass)$x)
> sum(aggregate(dat$y, dat[,1:4], sum, na.rm=T, na.action='na.omit')$x)
> sum(aggregate(dat$y, dat[,1:4], sum, na.rm=T, na.action=na.omit)$x)
>
>
> ## This does work, but in my real data set, I want NA to really be NA
> for(j in 1:4)
>    dat[is.na(dat[,j]),j] = 'NA'
> sum(aggregate(dat$y, dat[,1:4], sum, na.rm=T)$x)
> sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T)$y)
>
>
> ## My first session info
> #
> #> sessionInfo()
> #R version 2.12.0 (2010-10-15)
> #Platform: i386-pc-mingw32/i386 (32-bit)
> #
> #locale:
> #        [1] LC_COLLATE=English_United States.1252
> #[2] LC_CTYPE=English_United States.1252
> #[3] LC_MONETARY=English_United States.1252
> #[4] LC_NUMERIC=C
> #[5] LC_TIME=English_United States.1252
> #
> #attached base packages:
> #        [1] stats     graphics  grDevices utils     datasets  methods
> base
> #
> #other attached packages:
> #        [1] plyr_1.2.1  zoo_1.6-4   gdata_2.8.1 rj_0.5.0-5
> #
> #loaded via a namespace (and not attached):
> #        [1] grid_2.12.0     gtools_2.6.2    lattice_0.19-13 rJava_0.8-8
> #[5] tools_2.12.0
>
>
>
> I tried running that example in a different version of R, with and I got
> completely different results
>
> The other version of R wouldn't recognize the formula at all..
>
> My other version of R:
>
> #  My second session info
> #> sessionInfo()
> #R version 2.10.1 (2009-12-14)
> #i386-pc-mingw32
> #
> #locale:
> #        [1] LC_COLLATE=English_United States.1252
> #[2] LC_CTYPE=English_United States.1252
> #[3] LC_MONETARY=English_United States.1252
> #[4] LC_NUMERIC=C
> #[5] LC_TIME=English_United States.1252
> #
> #attached base packages:
> #        [1] stats     graphics  grDevices utils     datasets  methods
> base
> #>
> #
>
> PS: Also, I have read the help on aggregate, factor, as.factor, and several
> other topics.  If I missed something, please let me know.
> Some people like to reply to questions by telling the sender that R has
> documentation.  Please don't.  The R help archives are littered with
> reminders, friendly and otherwise, of R's documentation.
>
>        [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Ista Zahn
Graduate student
University of Rocheste

Re: [R] aggregate function - na.action

2011-02-04 Thread Phil Spector

Gene -
   Let me try to address your concerns one at a time:

Since the formula interface to aggregate was introduced 
pretty recently (I think R-2.11.1, but I might be wrong)

so when you try to use it in an R-2.10.1 it won't work.

Now let's take a close look at the help page for aggregate.

The default method, which will be called if you pass a vector
to aggregate, or the data frame method are described like this:

 aggregate(x, ...)

 ## S3 method for class 'data.frame'
 aggregate(x, by, FUN, ..., simplify = TRUE)

So if you pass an na.action= argument to aggregate when the first argument
is a vector or data frame, it gets picked up by the ... argument and gets
passed to your function, so you might see messages like this:


sum(1:10,na.action=na.omit)

Error in sum(1:10, na.action = na.omit) :
  invalid 'type' (closure) of argument

sum(1:10,na.action='na.omit')

Error in sum(1:10, na.action = "na.omit") :
  invalid 'type' (character) of argument

(It's sum complaining, not aggregate.)

As far as na.action goes, when you're using the aggregate formula method,
it will remove all rows from the specified data frame that have any missing
values.  If you pass that to a function with the na.rm=TRUE argument, that
function will remove the missing values as it should.  So the only time you'll
see the effect of na.action=na.pass is when you call a function that won't
remove the missing values.   (The subtle distinction between na.action=na.omit
and na.rm=TRUE is the function you're calling is that na.omit will remove
the entire row of data when it encounters a missing value, while the na.rm=TRUE
argument will remove missing values separately from each variable.)

Hope this helps.
- Phil Spector
 Statistical Computing Facility
 Department of Statistics
 UC Berkeley
 spec...@stat.berkeley.edu



On Fri, 4 Feb 2011, Gene Leynes wrote:


Can someone please tell me what is up with na.action in aggregate?

My (somewhat) reproducible example:
(I say somewhat because some lines wouldn't run in a separate session, more
below)

set.seed(100)
dat=data.frame(
   x1=sample(c(NA,'m','f'), 100, replace=TRUE),
   x2=sample(c(NA, 1:10), 100, replace=TRUE),
   x3=sample(c(NA,letters[1:5]), 100, replace=TRUE),
   x4=sample(c(NA,T,F), 100, replace=TRUE),
   y=sample(c(rep(NA,5), rnorm(95
dat
## The total from dat:
sum(dat$y, na.rm=T)
## The total from aggregate:
sum(aggregate(dat$y, dat[,1:4], sum, na.rm=T)$x)
sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T)$y)  ## <--- This line
gave an error in a separate R instance
## The aggregate formula is excluding NA

## So, let's try to include NAs
sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T, na.action='na.pass')$y)
sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T, na.action=na.pass)$y)
## The aggregate formula is STILL excluding NA
## In fact, the formula doesn't seem to notice the na.action
sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T, na.action='foo man
chew')$y)
## H... that error surprised me (since the previous two things ran)

## So, let's try to change the global options
## (not mentioned in the help, but after reading the help
##  100 times, I thought I would go above and beyond to avoid
##  any r list flames from people complaining
##  that I didn't read the help... but that's a separate topic)
options(na.action ="na.pass")
sum(aggregate(dat$y, dat[,1:4], sum, na.rm=T)$x)
sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T)$y)
sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T, na.action='na.pass')$y)
sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T, na.action=na.pass)$y)
## (NAs are still omitted)

## Even more frustrating...
## Why don't any of these work???
sum(aggregate(dat$y, dat[,1:4], sum, na.rm=T, na.action='na.pass')$x)
sum(aggregate(dat$y, dat[,1:4], sum, na.rm=T, na.action=na.pass)$x)
sum(aggregate(dat$y, dat[,1:4], sum, na.rm=T, na.action='na.omit')$x)
sum(aggregate(dat$y, dat[,1:4], sum, na.rm=T, na.action=na.omit)$x)


## This does work, but in my real data set, I want NA to really be NA
for(j in 1:4)
   dat[is.na(dat[,j]),j] = 'NA'
sum(aggregate(dat$y, dat[,1:4], sum, na.rm=T)$x)
sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T)$y)


## My first session info
#
#> sessionInfo()
#R version 2.12.0 (2010-10-15)
#Platform: i386-pc-mingw32/i386 (32-bit)
#
#locale:
#[1] LC_COLLATE=English_United States.1252
#[2] LC_CTYPE=English_United States.1252
#[3] LC_MONETARY=English_United States.1252
#[4] LC_NUMERIC=C
#[5] LC_TIME=English_United States.1252
#
#attached base packages:
#[1] stats graphics  grDevices utils datasets  methods
base
#
#other attached packages:
#[1] plyr_1.2.1  zoo_1.6-4   gdata_2.8.1 rj_0.5.0-5
#
#loaded via a namespace (and not attached):
#[1] grid_

Re: [R] Finding non-normal distributions per row of data frame?

2011-02-04 Thread Dennis Murphy
Hi:

The problem you have, IMO, is the multiplicity of tests and the p-value
adjustments that need to be made for them. Here's a little simulation using
normal and exponential distributions of the size you're contemplating.

# Normal data
d <- matrix(rnorm(40), nrow = 2)
# Function to compute the Shapiro-Wilk test and return its statistic and
p-value
f <- function(x) {
   w <- shapiro.test(x)
   c(teststat = w$stat, pval = w$p.value)
 }

# Apply f to the rows of d, return result as a data frame
u <- as.data.frame(t(apply(d, 1, f)))
# Use the p.adjust() function to correct p-values for multiplicity of
testing
u$padj <- p.adjust(u$pval)
length(u$pval[u$pval < 0.05])
[1] 1040  # in the ballpark of expectations on comparisonwide basis
> length(u$pval[u$pval < 0.0001])
[1] 0# ditto
> length(u$padj[u$padj < 0.05])
[1] 0# how many adjusted p-values are below the familywise rate
>
# Repeat with standard exponential samples:
dd <- matrix(rexp(40), nrow = 2)
u2 <- as.data.frame(t(apply(dd, 1, f)))
u2$padj <- p.adjust(u2$pval)
> length(u2$pval[u2$pval < 0.05])
[1] 16670  # about 83.5% rejected on comparisonwise basis
> length(u2$pval[u2$pval < 0.0001])
[1] 2493# about 12.5% with comparisonwise p-values < 0.0001
> length(u2$pval[u2$padj < 0.05])
[1] 262  # about 1.3% significant at 5% familywise rate after
adjustment for multiplicity

You can probably amp this up a little with a different method of p-value
adjustment, but the central point is that if you're expecting to find
'nuggets' out of 20,000 tests, those samples will have to be extremely
different from a normal distribution.

To take up Greg Snow's comment about sampling p-values, observe that if the
normality assumption holds (i.e., the null hypothesis in the Shapiro-Wilk
test is true), then the p-values follow a Uniform(0, 1) distribution.

w1 <- runif(2)
> length(w1[w1 < 0.05])
[1] 1047   # comparable to the simulation with normal samples above
> length(w1[w1 < 0.0001])
[1] 1 # ditto
w2 <- p.adjust(w1)# adjust for multiplicity
> length(w2[w2 < 0.05])
[1] 0# same as for the normal sample simulation


Like Greg Snow, I haven't seen any justification for why you think these
tests are useful or meaningful scientifically. If you're using something
like a t-test to make a decision of some sort, for example, then the
independence assumption has far more serious implications on the statistical
properties of the t-test than does the normality assumption.

This simulation, and your series of tests, assumes that the individual
samples are independent. I know little or nothing about microarrays, but is
it plausible to believe that samples from different locations on a
microarray are independent?

HTH,
Dennis


On Fri, Feb 4, 2011 at 11:38 AM, DB1984  wrote:

>
> Thanks Peter.
>
> I understand your point, and that there is potentially a high false
> discovery rate - but I'd expect the interesting data points (genes on a
> microarray) to be within that list too. The next step would be to filter
> based on some greater understanding of the biology...
>
>
> Alternative approaches that come to mind are to look at the magnitude of
> the
> deviation - through Q-Q plot residuals, or to perform a linear regression
> on
> each row, and select those rows for which the coefficients fit predefined
> criteria. I'm still feeling my way into how to do this, though.
>
> Is there a better approach to identifying non-normal or skewed
> distributions
> that I am missing? Thanks for your advice...
> --
> View this message in context:
> http://r.789695.n4.nabble.com/Finding-non-normal-distributions-per-row-of-data-frame-tp3259439p3260881.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Finding non-normal distributions per row of data frame?

2011-02-04 Thread Greg Snow
It is fine, you just overthought the solution and used both the applys and for 
loops (see another thread today where I made the same mistake of overthinking 
and combining 2 different methods).  I was just pointing out the errors so you 
could improve for next time.

But here is some things to think about.  If every row were from a normal 
distribution and you use an alpha of 0.05, then with 20,000 rows you would 
expect to have 1,000 significant tests just by chance alone.  Also, what if all 
your samples are from non-normal distributions, how much power do you have with 
only 20 points to detect the non-normality?

So, with the large number of type I and type II errors, what meaning can you 
get from all of this? 

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.org
801.408.8111


> -Original Message-
> From: r-help-boun...@r-project.org [mailto:r-help-bounces@r-
> project.org] On Behalf Of DB1984
> Sent: Friday, February 04, 2011 12:00 PM
> To: r-help@r-project.org
> Subject: Re: [R] Finding non-normal distributions per row of data
> frame?
> 
> 
> Hi Greg,
> 
> In addition to the reply above, to address your questions - I fully
> appreciate that my understanding of the code is basic - this is my
> first
> attempt at putting this together...
> 
> My starting point is a data frame with numeric and text columns, but I
> can
> cut columns to make a fully numeric matrix if that is easier to handle.
> 
> "apply(y, 1, shapiro.test)" works for a second dataframe, yes. I guess
> that
> I chose a bad example dataset for 'nt'!
> 
> 
> The overall aim is to test the normality of the distribution of the
> values
> in each row. I would then subset out the non-normal distributions to
> interrogate further. The shapiro.test seems a simple first pass at
> this. I'd
> like to move on to plotting residuals of a QQplot next, to see if that
> is
> more or less sensitive at detecting non-normal distributions in the
> dataset.
> 
> If you would recommend an alternative approach, I'd appreciate the
> input,
> thanks..
> --
> View this message in context: http://r.789695.n4.nabble.com/Finding-
> non-normal-distributions-per-row-of-data-frame-tp3259439p3260812.html
> Sent from the R help mailing list archive at Nabble.com.
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] exact logistic regression

2011-02-04 Thread Łukasz Ręcławowicz
data2elrm<-cbind(mydata,n=rep(1,dim(mydata)[1]))
>
 More logic would be:

data2elrm2<-cbind(mydata,n=rep(1,nrow(mydata)))

Sorry for obfuscation.

-- 
Mi³ego dnia

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] exact logistic regression

2011-02-04 Thread Łukasz Ręcławowicz
2011/2/4 Den 

> To use elrm() I have to aggregate my data,which is really time consuming
> when I look for the way out through many variables.

You don't have to do that. *One exception is that the binomial response
should be specified as success/trials, where success gives the number of
successes and trials gives the number of binomial trials for each row of
dataset. *
So... when one row = one trial:

data2elrm<-cbind(mydata,n=rep(1,dim(mydata)[1]))*
*

But the worst thing
> is that I am not not sure if I can  trust to p-values in output.
>
It depends, but it's always a "guess".

-- 
Mi³ego dnia

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to learn more from R

2011-02-04 Thread Gene Leynes
Search the help archive and you'll find dozens of suggestions about beginner
manuals

You can search the archive at nabble.com or markmail.com
http://r-project.markmail.org/
http://r.789695.n4.nabble.com/R-help-f789696.html

(I don't know why the nabble URL is so complicated)

For seeing R examples and syntax you may like the examples at quick R:
http://www.statmethods.net/index.html


On Fri, Feb 4, 2011 at 4:09 AM, Karl Ove Hufthammer  wrote:

> Jie TANG wrote:
>
> > Now I want to know how much R can do .where can i get some examples or
> > demo resource to study advanced function of R ?
>
> Not really what you were thinking of, I guess, but don’t overlook the
> ‘demo’
> function. For example, try ‘demo(graphics)’.
>
> Type ‘demo()’ for a list of demos, but note that more become available when
> you load new packages (try the ‘rgl’, ‘lattice’ and ‘animation’ packages
> for
> some nice graphical demos).
>
> --
> Karl Ove Hufthammer
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] aggregate function - na.action

2011-02-04 Thread Gene Leynes
Can someone please tell me what is up with na.action in aggregate?

My (somewhat) reproducible example:
(I say somewhat because some lines wouldn't run in a separate session, more
below)

set.seed(100)
dat=data.frame(
x1=sample(c(NA,'m','f'), 100, replace=TRUE),
x2=sample(c(NA, 1:10), 100, replace=TRUE),
x3=sample(c(NA,letters[1:5]), 100, replace=TRUE),
x4=sample(c(NA,T,F), 100, replace=TRUE),
y=sample(c(rep(NA,5), rnorm(95
dat
## The total from dat:
sum(dat$y, na.rm=T)
## The total from aggregate:
sum(aggregate(dat$y, dat[,1:4], sum, na.rm=T)$x)
sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T)$y)  ## <--- This line
gave an error in a separate R instance
## The aggregate formula is excluding NA

## So, let's try to include NAs
sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T, na.action='na.pass')$y)
sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T, na.action=na.pass)$y)
## The aggregate formula is STILL excluding NA
## In fact, the formula doesn't seem to notice the na.action
sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T, na.action='foo man
chew')$y)
## H... that error surprised me (since the previous two things ran)

## So, let's try to change the global options
## (not mentioned in the help, but after reading the help
##  100 times, I thought I would go above and beyond to avoid
##  any r list flames from people complaining
##  that I didn't read the help... but that's a separate topic)
options(na.action ="na.pass")
sum(aggregate(dat$y, dat[,1:4], sum, na.rm=T)$x)
sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T)$y)
sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T, na.action='na.pass')$y)
sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T, na.action=na.pass)$y)
## (NAs are still omitted)

## Even more frustrating...
## Why don't any of these work???
sum(aggregate(dat$y, dat[,1:4], sum, na.rm=T, na.action='na.pass')$x)
sum(aggregate(dat$y, dat[,1:4], sum, na.rm=T, na.action=na.pass)$x)
sum(aggregate(dat$y, dat[,1:4], sum, na.rm=T, na.action='na.omit')$x)
sum(aggregate(dat$y, dat[,1:4], sum, na.rm=T, na.action=na.omit)$x)


## This does work, but in my real data set, I want NA to really be NA
for(j in 1:4)
dat[is.na(dat[,j]),j] = 'NA'
sum(aggregate(dat$y, dat[,1:4], sum, na.rm=T)$x)
sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T)$y)


## My first session info
#
#> sessionInfo()
#R version 2.12.0 (2010-10-15)
#Platform: i386-pc-mingw32/i386 (32-bit)
#
#locale:
#[1] LC_COLLATE=English_United States.1252
#[2] LC_CTYPE=English_United States.1252
#[3] LC_MONETARY=English_United States.1252
#[4] LC_NUMERIC=C
#[5] LC_TIME=English_United States.1252
#
#attached base packages:
#[1] stats graphics  grDevices utils datasets  methods
base
#
#other attached packages:
#[1] plyr_1.2.1  zoo_1.6-4   gdata_2.8.1 rj_0.5.0-5
#
#loaded via a namespace (and not attached):
#[1] grid_2.12.0 gtools_2.6.2lattice_0.19-13 rJava_0.8-8
#[5] tools_2.12.0



I tried running that example in a different version of R, with and I got
completely different results

The other version of R wouldn't recognize the formula at all..

My other version of R:

#  My second session info
#> sessionInfo()
#R version 2.10.1 (2009-12-14)
#i386-pc-mingw32
#
#locale:
#[1] LC_COLLATE=English_United States.1252
#[2] LC_CTYPE=English_United States.1252
#[3] LC_MONETARY=English_United States.1252
#[4] LC_NUMERIC=C
#[5] LC_TIME=English_United States.1252
#
#attached base packages:
#[1] stats graphics  grDevices utils datasets  methods
base
#>
#

PS: Also, I have read the help on aggregate, factor, as.factor, and several
other topics.  If I missed something, please let me know.
Some people like to reply to questions by telling the sender that R has
documentation.  Please don't.  The R help archives are littered with
reminders, friendly and otherwise, of R's documentation.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] lapply, strsplit, and list elements

2011-02-04 Thread Greg Snow
Darn,  Good catch, I fell victim to overthinking the problem.

I think I was more thinking of:
'[0-9]+(?=/)'

Which uses the whole match (then I switched thinking and captured the number, 
but did not simplify the other part).  Yours is the best.

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.org
801.408.8111


> -Original Message-
> From: Gabor Grothendieck [mailto:ggrothendi...@gmail.com]
> Sent: Friday, February 04, 2011 12:22 PM
> To: Greg Snow
> Cc: Dick Harray; r-help@r-project.org
> Subject: Re: [R] lapply, strsplit, and list elements
> 
> On Fri, Feb 4, 2011 at 1:27 PM, Greg Snow  wrote:
> > Try this:
> >
> >> x <- c("349/077,349/074,349/100,349/117",
> > +          "340/384.2,340/513,367/139,455/128,D13/168",
> > +          "600/437,128/903,128/904")
> >>
> >> library(gsubfn)
> >> out <- strapply(x, '([0-9]+)(?=/)')
> >> out
> > [[1]]
> > [1] "349" "349" "349" "349"
> >
> > [[2]]
> > [1] "340" "340" "367" "455" "13"
> >
> > [[3]]
> > [1] "600" "128" "128"
> >
> >
> > The strapply looks for the pattern then returns every time it finds
> the pattern.  The pattern in this case is 1 or more digits that are
> followed by a /, but the slash is not included in the matched portion
> (a positive look ahead).
> >
> > If you need more than digits you can modify the pattern to whatever
> matches before the /.
> 
> Also this similar approach with a slight simplification of the regular
> expression:
> 
>strapply(x, '([0-9]+)/')
> 
> or to convert the numbers to numeric at the same time:
> 
>strapply(x, '([0-9]+)/', as.numeric)
> 
> 
> --
> Statistics & Software Consulting
> GKX Group, GKX Associates Inc.
> tel: 1-877-GKX-GROUP
> email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Changing the direction of axis labels on a plot

2011-02-04 Thread Uwe Ligges



On 04.02.2011 01:42, Longe wrote:

On 02/03/2011 01:57 PM, Peter Alspach wrote:

Tena koe

?par

and check the las argument.

HTH 

Peter Alspach


Thank you, Peter, and also David and William. The las argument indeed
helps in setting the tick labels horizontally. Beautiful! But how do I
rotate also the axis label, which is not altered by the las argument?


Don't label the axis at first and later on use mtext() to add something 
into the axis.


Best,
Uwe Ligges





Regards
L.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Avoiding two loops

2011-02-04 Thread Petr Savicky
On Fri, Feb 04, 2011 at 02:03:22PM -0500, sudhir cr wrote:
> Hello,
> 
> I have a R code for doing convolution of two functions:
> 
> convolveSlow <- function(x, y) {
> nx <- length(x); ny <- length(y)
> xy <- numeric(nx + ny - 1)
> for(i in seq(length = nx)) {
> xi <- x[[i]]
> for(j in seq(length = ny)) {
> ij <- i+j-1
> xy[[ij]] <- xy[[ij]] + xi * y[[j]]
> }
> }
> xy
> }
> 
> How do I reduce the 2 loops so that I can run the  code faster?

Hello:

Convolution of two vectors may be computed also using matrix reshaping
without a loop. For example 

  convolution <- function(x, y)
  {
  # more efficient if length(x) >= length(y)
  m <- length(x)
  n <- length(y)
  zero <- matrix(0, nrow=n, ncol=n)
  a <- rbind(x %o% y, zero)
  k <- m + n - 1
  b <- matrix(c(a)[1:(n*k)], nrow=k, ncol=n)
  rowSums(b)
  }

Testing this on computing the product of the polynomials (1+t)^4 (1+t)^3
yields

  x <- choose(4, 0:4)
  y <- choose(3, 0:3)
  convolution(x, y)

  [1]  1  7 21 35 35 21  7  1

which is the same as choose(7, 0:7).

See also ?convolve.

Hope this helps.

Petr Savicky.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Finding non-normal distributions per row of data frame?

2011-02-04 Thread DB1984

Thanks Peter.

I understand your point, and that there is potentially a high false
discovery rate - but I'd expect the interesting data points (genes on a
microarray) to be within that list too. The next step would be to filter
based on some greater understanding of the biology...


Alternative approaches that come to mind are to look at the magnitude of the
deviation - through Q-Q plot residuals, or to perform a linear regression on
each row, and select those rows for which the coefficients fit predefined
criteria. I'm still feeling my way into how to do this, though.

Is there a better approach to identifying non-normal or skewed distributions
that I am missing? Thanks for your advice...
-- 
View this message in context: 
http://r.789695.n4.nabble.com/Finding-non-normal-distributions-per-row-of-data-frame-tp3259439p3260881.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Finding non-normal distributions per row of data frame?

2011-02-04 Thread DB1984

Yes, that was dumb - I got that...
-- 
View this message in context: 
http://r.789695.n4.nabble.com/Finding-non-normal-distributions-per-row-of-data-frame-tp3259439p3260843.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Finding non-normal distributions per row of data frame?

2011-02-04 Thread Peter Ehlers

On 2011-02-04 11:00, DB1984 wrote:


Hi Greg,

In addition to the reply above, to address your questions - I fully
appreciate that my understanding of the code is basic - this is my first
attempt at putting this together...

My starting point is a data frame with numeric and text columns, but I can
cut columns to make a fully numeric matrix if that is easier to handle.

"apply(y, 1, shapiro.test)" works for a second dataframe, yes. I guess that
I chose a bad example dataset for 'nt'!


The overall aim is to test the normality of the distribution of the values
in each row. I would then subset out the non-normal distributions to
interrogate further. The shapiro.test seems a simple first pass at this. I'd
like to move on to plotting residuals of a QQplot next, to see if that is
more or less sensitive at detecting non-normal distributions in the dataset.

If you would recommend an alternative approach, I'd appreciate the input,
thanks..


I don't know what your overall scientific aim is, but here's
something to ponder:

Suppose that you randomly sample 400,000 observations from a
NORMAL distribution and put these into a matrix of 20,000
rows by 20 columns and then perform your row-wise Normality
tests, storing the p-values.

If you now select those rows with p-value < 0.05, you will
get about . many rows. (Fill in your best guess.)

Question: what does that imply for your scientific analysis?

Answer: Normality testing may not be your best line of attack.

Peter Ehlers

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] lapply, strsplit, and list elements

2011-02-04 Thread Gabor Grothendieck
On Fri, Feb 4, 2011 at 1:27 PM, Greg Snow  wrote:
> Try this:
>
>> x <- c("349/077,349/074,349/100,349/117",
> +          "340/384.2,340/513,367/139,455/128,D13/168",
> +          "600/437,128/903,128/904")
>>
>> library(gsubfn)
>> out <- strapply(x, '([0-9]+)(?=/)')
>> out
> [[1]]
> [1] "349" "349" "349" "349"
>
> [[2]]
> [1] "340" "340" "367" "455" "13"
>
> [[3]]
> [1] "600" "128" "128"
>
>
> The strapply looks for the pattern then returns every time it finds the 
> pattern.  The pattern in this case is 1 or more digits that are followed by a 
> /, but the slash is not included in the matched portion (a positive look 
> ahead).
>
> If you need more than digits you can modify the pattern to whatever matches 
> before the /.

Also this similar approach with a slight simplification of the regular
expression:

   strapply(x, '([0-9]+)/')

or to convert the numbers to numeric at the same time:

   strapply(x, '([0-9]+)/', as.numeric)


-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] GWAF package: lme.batch.imputed(): object 'kmat' not found

2011-02-04 Thread Jim Moon
Hello, All,

GWAF 1.2
R.Version() is below.

system(lme.batch.imputed(
phenfile = 'phenfile.csv',
genfile = 'CARe_imputed_release.0.fhsR.gz',
pedfile='pedfile.csv',
phen='phen1',
covar=c('covar1','covar2'),
kinmat='imputed_fhs.kinship.RData',
outfile='imputed.FHS.IBC.GWAF.LME.output.0.txt'
))

Gives the error messages:

Error in coxme.varcheck(ncluster, varlist, n, gvars, groups, sparse, rescale,  :
  object 'kmat' not found
Error in lme.cov.out$theta : $ operator is invalid for atomic vectors

Might anyone know why?

Thank you for your time!

Jim

---
> R.Version()
$platform
[1] "x86_64-apple-darwin9.6.0"

$arch
[1] "x86_64"

$os
[1] "darwin9.6.0"

$system
[1] "x86_64, darwin9.6.0"

$status
[1] ""

$major
[1] "2"

$minor
[1] "12.1"

$year
[1] "2010"

$month
[1] "12"

$day
[1] "16"

$`svn rev`
[1] "53855"

$language
[1] "R"

$version.string
[1] "R version 2.12.1 (2010-12-16)"

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Avoiding two loops

2011-02-04 Thread Dirk Eddelbuettel

On 4 February 2011 at 14:03, sudhir cr wrote:
| Hello,
| 
| I have a R code for doing convolution of two functions:
| 
| convolveSlow <- function(x, y) {
| nx <- length(x); ny <- length(y)
| xy <- numeric(nx + ny - 1)
| for(i in seq(length = nx)) {
| xi <- x[[i]]
| for(j in seq(length = ny)) {
| ij <- i+j-1
| xy[[ij]] <- xy[[ij]] + xi * y[[j]]
| }
| }
| xy
| }
| 
| How do I reduce the 2 loops so that I can run the  code faster?

Maybe by reading the answer to the _exact same question_ you appear to have
asked on SO yesterday?

http://stackoverflow.com/questions/4894506/avoid-two-for-loops-in-r

Dirk

| 
| Thank you
| San
| 
|   [[alternative HTML version deleted]]
| 
| __
| R-help@r-project.org mailing list
| https://stat.ethz.ch/mailman/listinfo/r-help
| PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
| and provide commented, minimal, self-contained, reproducible code.

-- 
Dirk Eddelbuettel | e...@debian.org | http://dirk.eddelbuettel.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Avoiding two loops

2011-02-04 Thread sudhir cr
Hello,

I have a R code for doing convolution of two functions:

convolveSlow <- function(x, y) {
nx <- length(x); ny <- length(y)
xy <- numeric(nx + ny - 1)
for(i in seq(length = nx)) {
xi <- x[[i]]
for(j in seq(length = ny)) {
ij <- i+j-1
xy[[ij]] <- xy[[ij]] + xi * y[[j]]
}
}
xy
}

How do I reduce the 2 loops so that I can run the  code faster?

Thank you
San

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Finding non-normal distributions per row of data frame?

2011-02-04 Thread DB1984

Hi Greg,

In addition to the reply above, to address your questions - I fully
appreciate that my understanding of the code is basic - this is my first
attempt at putting this together...

My starting point is a data frame with numeric and text columns, but I can
cut columns to make a fully numeric matrix if that is easier to handle.

"apply(y, 1, shapiro.test)" works for a second dataframe, yes. I guess that
I chose a bad example dataset for 'nt'!


The overall aim is to test the normality of the distribution of the values
in each row. I would then subset out the non-normal distributions to
interrogate further. The shapiro.test seems a simple first pass at this. I'd
like to move on to plotting residuals of a QQplot next, to see if that is
more or less sensitive at detecting non-normal distributions in the dataset. 

If you would recommend an alternative approach, I'd appreciate the input,
thanks..
-- 
View this message in context: 
http://r.789695.n4.nabble.com/Finding-non-normal-distributions-per-row-of-data-frame-tp3259439p3260812.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Finding non-normal distributions per row of data frame?

2011-02-04 Thread David Winsemius


On Feb 4, 2011, at 1:41 PM, DB1984 wrote:



Thanks David - but '1' (if I understood correctly) returns the same  
value for

each row, which I took to be an error.


And exactly what were you expecting with that data?



nt
V1V2V3V4V5V6
1 24.71 23.56 24.71 23.56 24.71 23.56
2 25.64 25.06 25.64 25.06 25.64 25.06
3 21.29 20.87 21.29 20.87 21.29 20.87
4 25.92 26.92 25.92 26.92 25.92 26.92
5 24.36 23.60 24.36 23.60 24.36 23.60
6 24.91 24.32 24.91 24.32 24.91 24.32
7 23.56 23.14 23.56 23.14 23.56 23.14
8 23.94 23.99 23.94 23.99 23.94 23.99
9 26.65 26.07 26.65 26.07 26.65 26.07

test <- apply(nt, 1, shapiro.test)


Run this and ponder the results a bit:

 apply(nt, 1, diff )

--

David.


fred<-data.frame(sapply(test,function(x)c(x$statistic, x$p.value)))
fred
  X1  X2  X3  X4   
X5  X6

X7  X8  X9
W 0.682676792 0.682676792 0.682676792 0.682676792 0.682676882  
0.682676792

0.682676792 0.682676792 0.682676792
 0.004039347 0.004039347 0.004039347 0.004039347 0.004039358  
0.004039347

0.004039347 0.004039347 0.004039347


Version details are below:
R.Version()
$platform
[1] "x86_64-apple-darwin9.8.0"

$arch
[1] "x86_64"

$os
[1] "darwin9.8.0"

$system
[1] "x86_64, darwin9.8.0"

$status
[1] ""

$major
[1] "2"

$minor
[1] "12.0"

$year
[1] "2010"

$month
[1] "10"

$day
[1] "15"

$`svn rev`
[1] "53317"

$language
[1] "R"

$version.string
[1] "R version 2.12.0 (2010-10-15)"
--
View this message in context: 
http://r.789695.n4.nabble.com/Finding-non-normal-distributions-per-row-of-data-frame-tp3259439p3260769.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] lapply, strsplit, and list elements

2011-02-04 Thread Henrique Dallazuanna
Try this:

strsplit(x, "/\\d+\\.\\d+,|/\\d+,|/\\d+")

On Fri, Feb 4, 2011 at 1:37 PM, Dick Harray  wrote:

> Hi there,
>
> I have a problem about lapply, strsplit, and accessing list elements,
> which I don't understand or cannot solve:
>
> I have e.g. a character vector with three elements:
>
> x = c("349/077,349/074,349/100,349/117",
> "340/384.2,340/513,367/139,455/128,D13/168",
> "600/437,128/903,128/904")
>
>
> The task I want to perform, is to generate a list, comprising the
> portion in front of the "/" of each element of x:
>
> neededResult = list(c("349","349", "349", "349"),
> c("340", "340", "367", "455", "D13"),
> c("600", "128", "128") )
>
>
> I figured out that for a single element of x the following works
>
> unlist( lapply( strsplit( unlist( strsplit(x[1], "\\,") ), "/"), "[", 1) )
>
> but due to "unlist" it doesn't provide the required result if extended
> to all elements of x
>
> unlist(lapply(strsplit( unlist( lapply(x, strsplit, "\\,")), "/"), "[",))
>
>
> Someone can help me to get the needed result?
>
> Thanks and regards,
>
> Dirk
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Henrique Dallazuanna
Curitiba-Paraná-Brasil
25° 25' 40" S 49° 16' 22" O

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Finding non-normal distributions per row of data frame?

2011-02-04 Thread DB1984

Thanks David - but '1' (if I understood correctly) returns the same value for
each row, which I took to be an error.

nt
 V1V2V3V4V5V6
1 24.71 23.56 24.71 23.56 24.71 23.56
2 25.64 25.06 25.64 25.06 25.64 25.06
3 21.29 20.87 21.29 20.87 21.29 20.87
4 25.92 26.92 25.92 26.92 25.92 26.92
5 24.36 23.60 24.36 23.60 24.36 23.60
6 24.91 24.32 24.91 24.32 24.91 24.32
7 23.56 23.14 23.56 23.14 23.56 23.14
8 23.94 23.99 23.94 23.99 23.94 23.99
9 26.65 26.07 26.65 26.07 26.65 26.07

test <- apply(nt, 1, shapiro.test)
> fred<-data.frame(sapply(test,function(x)c(x$statistic, x$p.value)))
> fred
   X1  X2  X3  X4  X5  X6   
  
X7  X8  X9
W 0.682676792 0.682676792 0.682676792 0.682676792 0.682676882 0.682676792
0.682676792 0.682676792 0.682676792
  0.004039347 0.004039347 0.004039347 0.004039347 0.004039358 0.004039347
0.004039347 0.004039347 0.004039347


Version details are below:
R.Version()
$platform
[1] "x86_64-apple-darwin9.8.0"

$arch
[1] "x86_64"

$os
[1] "darwin9.8.0"

$system
[1] "x86_64, darwin9.8.0"

$status
[1] ""

$major
[1] "2"

$minor
[1] "12.0"

$year
[1] "2010"

$month
[1] "10"

$day
[1] "15"

$`svn rev`
[1] "53317"

$language
[1] "R"

$version.string
[1] "R version 2.12.0 (2010-10-15)"
-- 
View this message in context: 
http://r.789695.n4.nabble.com/Finding-non-normal-distributions-per-row-of-data-frame-tp3259439p3260769.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] switching y-axis to x-axis for qqmath

2011-02-04 Thread Anderson, Chris

This is the qqmath example from the lattice package. I added the scales to the 
example. I would like to switch the axis and not sure how? Meaning I would like 
the "height" on the x-axis and the probability on the y-axis. Will you show me 
the correct syntax for this switch thanks.


qqmath(~ height | voice.part, aspect = "xy", data = singer,
   prepanel = prepanel.qqmathline,
scales = list(x = list(at = qnorm(c(0.001,0.01, 0.05, 0.25, 0.5, 0.75, 
0.95,0.99, 0.999)),
labels = c(0.001, 0.01,0.05, 0.25, 0.5, 0.75, 0.95, 0.99,0.999),alternating 
= 3), y = list(alternating = 3)))


Chris Anderson
Data Analyst
Medical Affairs
wk: 925-677-4870
cell: 707-315-8486
Fax:925-677-4670


This electronic message transmission, including any attachments, 
contains information which may be confidential, privileged and/or otherwise 
exempt from disclosure under applicable law. The information is intended to 
be for the use of the individual(s) or entity named above. If you are not 
the intended recipient or the employee or agent responsible for delivering 
the message to the intended recipient, you are hereby notified that any 
disclosure, copying, distribution or use of the contents of this 
information is strictly prohibited.  If you have received this electronic 
transmission in error, please notify the sender immediately by telephone 
(800-676-6777) or by a "reply to sender only" message and destroy all 
electronic and hard copies of the communication, including attachments.  
Thank you.For more information on Paradigm Management Services, LLC, 
please visit http://www.paradigmcorp.com 

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Apply parameters to a function from a list

2011-02-04 Thread Florian Burkart

Of course. Thanks.

On 04/02/2011 18:32, Greg Snow wrote:

Try:


do.call(test,point)


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Apply parameters to a function from a list

2011-02-04 Thread Greg Snow
Try:

> do.call(test,point)

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.org
801.408.8111


> -Original Message-
> From: r-help-boun...@r-project.org [mailto:r-help-bounces@r-
> project.org] On Behalf Of Florian Burkart
> Sent: Friday, February 04, 2011 7:36 AM
> To: r-help@r-project.org
> Subject: [R] Apply parameters to a function from a list
> 
> Hey,
> 
> this may be a simple question, but I am struggling to apply a list of
> parameters to a function.
> 
> 
> Say I have the following function:
> 
> test<-function(a=1,b=2,c=3,d=4){a+b+c+d}
> 
> And the following list:
> 
> point<-list(a=3,d=2)
> 
> Is there a way I can evaluate function test at point?
> 
> 
> 
> (Apart from changing function test to take in a list, and take values
> out of the list, but I'd rather not touch test).
> 
> Thanks vm
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] lapply, strsplit, and list elements

2011-02-04 Thread Greg Snow
Try this:

> x <- c("349/077,349/074,349/100,349/117",
+  "340/384.2,340/513,367/139,455/128,D13/168",
+  "600/437,128/903,128/904")
> 
> library(gsubfn)
> out <- strapply(x, '([0-9]+)(?=/)')
> out
[[1]]
[1] "349" "349" "349" "349"

[[2]]
[1] "340" "340" "367" "455" "13" 

[[3]]
[1] "600" "128" "128"


The strapply looks for the pattern then returns every time it finds the 
pattern.  The pattern in this case is 1 or more digits that are followed by a 
/, but the slash is not included in the matched portion (a positive look ahead).

If you need more than digits you can modify the pattern to whatever matches 
before the /.

Hope this helps,

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.org
801.408.8111


> -Original Message-
> From: r-help-boun...@r-project.org [mailto:r-help-bounces@r-
> project.org] On Behalf Of Dick Harray
> Sent: Friday, February 04, 2011 8:37 AM
> To: r-help@r-project.org
> Subject: [R] lapply, strsplit, and list elements
> 
> Hi there,
> 
> I have a problem about lapply, strsplit, and accessing list elements,
> which I don't understand or cannot solve:
> 
> I have e.g. a character vector with three elements:
> 
> x = c("349/077,349/074,349/100,349/117",
>  "340/384.2,340/513,367/139,455/128,D13/168",
>  "600/437,128/903,128/904")
> 
> 
> The task I want to perform, is to generate a list, comprising the
> portion in front of the "/" of each element of x:
> 
> neededResult = list(c("349","349", "349", "349"),
>  c("340", "340", "367", "455", "D13"),
>  c("600", "128", "128") )
> 
> 
> I figured out that for a single element of x the following works
> 
> unlist( lapply( strsplit( unlist( strsplit(x[1], "\\,") ), "/"), "[",
> 1) )
> 
> but due to "unlist" it doesn't provide the required result if extended
> to all elements of x
> 
> unlist(lapply(strsplit( unlist( lapply(x, strsplit, "\\,")), "/"),
> "[",))
> 
> 
> Someone can help me to get the needed result?
> 
> Thanks and regards,
> 
> Dirk
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] dataframes from a function

2011-02-04 Thread Phil Spector

Chris -
   You can solve your problem by removing the print 
statements and replacing them with


answer = list(org.plot[org.plot$sub %in% ex.plot,],
  new.pl[new.pl$yar %in% "1991",],
  new.pl[new.pl$yar %in% "1993",])
prefix = sub('-','.',ex.plot)
names(answer) = paste(prefix,c('.new','new1991','new1993'),sep='')
return(answer)

That will return a list with three dataframes having the names that
you specified.

If you actually want to create separate objects in your workspace, you
could look at the help page for the assign function.

- Phil Spector
 Statistical Computing Facility
 Department of Statistics
 UC Berkeley
 spec...@stat.berkeley.edu




On Fri, 4 Feb 2011, chris20 wrote:



Hi,
I'm trying to create a function to return three dataframes for later use in
a graphic so I want the return from the function to give me dataframes and
with unique names relating to the variable they are based on.
For example.

sub<-c("6-1a","6-1a","6-1a","9-2b","9-2b","9-2b","7c","7c","7c")
yar<-rep(1991:1993,3)
x1n<-c(4,6,3,6,4,7,9,4,2)
y1m<-c(3,7,5,8,4,3,6,7,8)
prac<-data.frame(sub,yar,x1n,y1m)

plot.restrut<-function(org.plot,ex.plot) {
new.pl<-org.plot[org.plot$sub %in% ex.plot,]
new.pl.st<-new.pl[new.pl$yar %in% "1991",]
new.pl.fin<-new.pl[new.pl$yar %in% "1993",]
print(new.pl)
print(new.pl.st)
print(new.pl.fin)
}

plot.restrut(prac,"9-2b")

##
I would like the output to be one dataframe with the name "9.2b.new" which
is new.pl in the example
Another data frame to be "9.2bnew1991" and another dataframe "9.2bnew1993"

I want the dataframe names to relate to the variables they have come from
because I want to use the same function to extract lots of subsets of my
data.

Thanks
Chris


--
View this message in context: 
http://r.789695.n4.nabble.com/dataframes-from-a-function-tp3260581p3260581.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Apply parameters to a function from a list

2011-02-04 Thread Henrique Dallazuanna
Try this:

do.call(test, point)


On Fri, Feb 4, 2011 at 12:35 PM, Florian Burkart wrote:

> Hey,
>
> this may be a simple question, but I am struggling to apply a list of
> parameters to a function.
>
>
> Say I have the following function:
>
> test<-function(a=1,b=2,c=3,d=4){a+b+c+d}
>
> And the following list:
>
> point<-list(a=3,d=2)
>
> Is there a way I can evaluate function test at point?
>
>
>
> (Apart from changing function test to take in a list, and take values out
> of the list, but I'd rather not touch test).
>
> Thanks vm
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Henrique Dallazuanna
Curitiba-Paraná-Brasil
25° 25' 40" S 49° 16' 22" O

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] spatial autocorrelation for data that are temporally pseudoreplicated

2011-02-04 Thread Gagic, Vasna


Dear all,

I collected my data from the different agricultural fields every week over a
period of a month.
how can I test for spatial autocorrelation in R with data that are temporally
pseudoreplicated?

I used lme with correlation=corCompSymm(form=~Date) to model temporal
pseudoreplication.

Regards,
VG

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Suppress only Z axis tick marks and numbers in wireframe statement

2011-02-04 Thread Matthew A. H. Walters
I actually have several questions revolving around the generation of
wireframe plots.

 

The most pressing is that which is described by the subject line.

 

I am trying to produce a figure with x, y, and z labels, but only tick marks
on the x, and y axes.  I have supplied sample code below substituting the
volcano data set for my own such that you may run the code and see the
results.  Obviously the main title and labels don't make sense given the
data as a result.  To be clear, my problem is that I don't wish the z axis
tick mark to appear at all, not that they intersect with the Z label.

 

Thanks in advance for any help.

 

Matt

 

CODE:

 

require(lattice)

trellis.par.set("axis.line", list(col="transparent"))

wireframe(volcano, 

  main="Correlated Beta Distributions",type='n',

  xlab=list("Egg Survival",rot=-18),

  ylab=list("Nestling Survival",rot=50),

  zlab=list("Density",rot=95),#zaxt="n",

  screen=list(z=-30,x=-60,y=0),light.source=c(90,-15,90),

  scales=list(arrows=F,distance=c(.75,.75,0.25),

  cex=1,col="black"),

  aspect=c(1,.4),shade=T,colorkey=F,

  shade.colors = function(irr, ref, height, w = .4)

grey(w * irr + (1 - w) * (1 - (1 - ref)^0.4)))


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] always about positive definite matrix

2011-02-04 Thread Mike Miller
(Apologies to the cc-list: I'm resending from a different address because 
I didn't realize it was going to r-help.)


I'm also not an expert on this topic.  I just wanted to list a couple of 
ways that non-PD matrices might arise.  I'll just add now a couple of 
pointers:


First, I believe the term "semipositive definite" is considered ambiguous 
because in some literature it means that the matrix the smallest 
eigenvalue is zero and in other literature it means that no eigenvalue is 
negative (but there might be no zero eigenvalues).  I think I might have 
read about this ambiguity in Searle's excellent "Matrix Algebra Useful for 
Statistics":


http://www.amazon.com/Matrix-Algebra-Useful-Statistics-Probability/dp/0471866814

Second, people in certain stat areas often recommend this chapter by 
Werner Worthke, formerly of SAS Institute, Inc.:


Wothke, Werner (1995), "Nonpositive Definite Matrices in Structural 
Equation Modeling", in "Testing Structural Equation Models", by Kenneth A. 
Bollen and J. Scott Long (eds.), Sage Publications, Newbury Park pp. 
256-293.


http://www.amazon.com/Testing-Structural-Equation-Models-Editions/dp/0803945078

I think you'll find basic definitions and explanations there along with a 
lot of information about how non-positive definite matrices may arise in 
real-world applications.


Best,

Mike


On Fri, 4 Feb 2011, Spencer Graves wrote:

 1.  Martin Maechler's comments should be taken as replacements for 
anything I wrote where appropriate.  Any apparent conflict is a result of his 
superior knowledge.



 2.  'eigen' returns the eigenvalue decomposition assuming the matrix is 
symmetric, ignoring anything in m[upper.tri(m)].



 3.  The basic idea behind both posdefify and nearPD is to compute the 
eigenenvalues and vectors, then replace any eigenvalues that are small or 
negative with some suitable small positive number and reconstruct the matrix 
from this modified eigenvalue decomposition.  posdefify and nearPD implement 
modifications of this basic idea.



 4.  I recommend in the summary you mention nearPD but not posdefify, 
because nearPD was written more recently using the results of research not 
available to the authors when posdefify was written.



MARTIN:  There is a typo in the first line of the documentation for 
"symmpart".  It currently reads, "symmpart(x) computes the symmetric part (x 
+ t(x))/2 and the skew symmetric part (x - t(x))/2 of a square matrix x.". 
It should read, "symmpart(x) computes the symmetric part (x + t(x))/2 and 
skewpart the skew symmetric part (x - t(x))/2 of a square matrix x."



 Hope this helps.
 Spencer


On 2/4/2011 6:26 AM, Stefano Sofia wrote:

Dear R-users,
I followed with high interest the thread about positive definite matrix.
I tracked all the messages of the discussion and I am trying to make a 
summary of all the correlated problems that arose from the discussion and 
the best solutions to overcome them.
As far as I understood, the main problems are two: assessing the symmetry 
of the given matrix and dealing with eigenvalues very close to zero.

Do I miss some important points?

The functions that have been mentioned are eigen (I think in particualr the 
isSymmetric.matrix function), the function posdefify of the sfmisc package 
and the function nearPD of the Matrix package. I believe that some 
conversations have not been shared with the mailing list and therefore I 
find difficult to trace everything.


I understood very well the summary in four points given by Dr.Spencer 
Graves (message 53 of ISSUE 30, VOL 95), and parts of the comments added by 
Dr.Martin Maechler (message 71 of the same issue).
I am not able to understand the improvement given by posdefify with respect 
to eigen and why nearPD is even better.


Any final help?
thank you for your attention

Stefano Sofia PhD
Weather Department of Civil Protection Marche Region

AVVISO IMPORTANTE: Questo messaggio di posta elettronica può contenere 
informazioni confidenziali, pertanto è destinato solo a persone autorizzate 
alla ricezione. I messaggi di posta elettronica per i client di Regione 
Marche possono contenere informazioni confidenziali e con privilegi legali. 
Se non si è il destinatario specificato, non leggere, copiare, inoltrare o 
archiviare questo messaggio. Se si è ricevuto questo messaggio per errore, 
inoltrarlo al mittente ed eliminarlo completamente dal sistema del proprio 
computer. Ai sensi dell’art. 6 della  DGR n. 1394/2008 si segnala che, in 
caso di necessità ed urgenza, la risposta al presente messaggio di posta 
elettronica può essere visionata da persone estranee al destinatario.
IMPORTANT NOTICE: This e-mail message is intended to be received only by 
persons entitled to receive the confidential information it may contain. 
E-mail messages to clients of Regione Marche may contain information that 
is confidential and legally privileged. Please do not read, copy, forward, 
or store this message unless 

Re: [R] always about positive definite matrix

2011-02-04 Thread Spencer Graves
  1.  Martin Maechler's comments should be taken as replacements 
for anything I wrote where appropriate.  Any apparent conflict is a 
result of his superior knowledge.



  2.  'eigen' returns the eigenvalue decomposition assuming the 
matrix is symmetric, ignoring anything in m[upper.tri(m)].



  3.  The basic idea behind both posdefify and nearPD is to compute 
the eigenenvalues and vectors, then replace any eigenvalues that are 
small or negative with some suitable small positive number and 
reconstruct the matrix from this modified eigenvalue decomposition.  
posdefify and nearPD implement modifications of this basic idea.



  4.  I recommend in the summary you mention nearPD but not 
posdefify, because nearPD was written more recently using the results of 
research not available to the authors when posdefify was written.



MARTIN:  There is a typo in the first line of the documentation for 
"symmpart".  It currently reads, "symmpart(x) computes the symmetric 
part (x + t(x))/2 and the skew symmetric part (x - t(x))/2 of a square 
matrix x.".  It should read, "symmpart(x) computes the symmetric part (x 
+ t(x))/2 and skewpart the skew symmetric part (x - t(x))/2 of a square 
matrix x."



  Hope this helps.
  Spencer


On 2/4/2011 6:26 AM, Stefano Sofia wrote:

Dear R-users,
I followed with high interest the thread about positive definite matrix.
I tracked all the messages of the discussion and I am trying to make a summary 
of all the correlated problems that arose from the discussion and the best 
solutions to overcome them.
As far as I understood, the main problems are two: assessing the symmetry of 
the given matrix and dealing with eigenvalues very close to zero.
Do I miss some important points?

The functions that have been mentioned are eigen (I think in particualr the 
isSymmetric.matrix function), the function posdefify of the sfmisc package and 
the function nearPD of the Matrix package. I believe that some conversations 
have not been shared with the mailing list and therefore I find difficult to 
trace everything.

I understood very well the summary in four points given by Dr.Spencer Graves 
(message 53 of ISSUE 30, VOL 95), and parts of the comments added by Dr.Martin 
Maechler (message 71 of the same issue).
I am not able to understand the improvement given by posdefify with respect to 
eigen and why nearPD is even better.

Any final help?
thank you for your attention

Stefano Sofia PhD
Weather Department of Civil Protection Marche Region

AVVISO IMPORTANTE: Questo messaggio di posta elettronica può contenere 
informazioni confidenziali, pertanto è destinato solo a persone autorizzate 
alla ricezione. I messaggi di posta elettronica per i client di Regione Marche 
possono contenere informazioni confidenziali e con privilegi legali. Se non si 
è il destinatario specificato, non leggere, copiare, inoltrare o archiviare 
questo messaggio. Se si è ricevuto questo messaggio per errore, inoltrarlo al 
mittente ed eliminarlo completamente dal sistema del proprio computer. Ai sensi 
dell’art. 6 della  DGR n. 1394/2008 si segnala che, in caso di necessità ed 
urgenza, la risposta al presente messaggio di posta elettronica può essere 
visionata da persone estranee al destinatario.
IMPORTANT NOTICE: This e-mail message is intended to be received only by 
persons entitled to receive the confidential information it may contain. E-mail 
messages to clients of Regione Marche may contain information that is 
confidential and legally privileged. Please do not read, copy, forward, or 
store this message unless you are an intended recipient of it. If you have 
received this message in error, please forward it to the sender and delete it 
completely from your computer system.



--
Spencer Graves, PE, PhD
President and Chief Operating Officer
Structure Inspection and Monitoring, Inc.
751 Emerson Ct.
San José, CA 95126
ph:  408-655-4567

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Finding non-normal distributions per row of data frame?

2011-02-04 Thread David Winsemius


On Feb 4, 2011, at 10:48 AM, DB1984 wrote:



Thanks for the feedback Patrizio - but your function is performing the
shapiro.test on columns instead of rows...

I tried:
nt<-data.frame(#a dataframe with 6 columns and 9 rows)

nr <- nrow(nt)


test <- apply(nt, nt[1:nr,], shapiro.test)

Error in ds[-MARGIN] : invalid subscript type 'list'


The second argument to apply needs to be a (short) numeric vector.  
Your request suggest you simply want the number 1, since that would  
tell apply to use the function on rows.





fred<-data.frame(sapply(test,function(x)c(x$statistic, x$p.value)))


But picked up the error above. Do I still require a loop to apply  
this over

rows?


--
View this message in context: 
http://r.789695.n4.nabble.com/Finding-non-normal-distributions-per-row-of-data-frame-tp3259439p3260412.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] always about positive definite matrix

2011-02-04 Thread spencerg
  1.  Martin Maechler's comments should be taken as replacements 
for anything I wrote where appropriate.  Any apparent conflict is a 
result of his superior knowledge.



  2.  'eigen' returns the eigenvalue decomposition assuming the 
matrix is symmetric, ignoring anything in m[upper.tri(m)].



  3.  The basic idea behind both posdefify and nearPD is to compute 
the eigenenvalues and vectors, then replace any eigenvalues that are 
small or negative with some suitable small positive number and 
reconstruct the matrix from this modified eigenvalue decomposition. 
posdefify and nearPD implement modifications of this basic idea.



  4.  I recommend in the summary you mention nearPD but not 
posdefify, because nearPD was written more recently using the results of 
research not available to the authors when posdefify was written.



MARTIN:  There is a typo in the first line of the documentation for 
"symmpart".  It currently reads, "symmpart(x) computes the symmetric 
part (x + t(x))/2 and the skew symmetric part (x - t(x))/2 of a square 
matrix x.".  It should read, "symmpart(x) computes the symmetric part (x 
+ t(x))/2 and skewpart the skew symmetric part (x - t(x))/2 of a square 
matrix x."



  Hope this helps.
  Spencer


On 2/4/2011 6:26 AM, Stefano Sofia wrote:

Dear R-users,
I followed with high interest the thread about positive definite matrix.
I tracked all the messages of the discussion and I am trying to make a summary 
of all the correlated problems that arose from the discussion and the best 
solutions to overcome them.
As far as I understood, the main problems are two: assessing the symmetry of 
the given matrix and dealing with eigenvalues very close to zero.
Do I miss some important points?

The functions that have been mentioned are eigen (I think in particualr the 
isSymmetric.matrix function), the function posdefify of the sfmisc package and 
the function nearPD of the Matrix package. I believe that some conversations 
have not been shared with the mailing list and therefore I find difficult to 
trace everything.

I understood very well the summary in four points given by Dr.Spencer Graves 
(message 53 of ISSUE 30, VOL 95), and parts of the comments added by Dr.Martin 
Maechler (message 71 of the same issue).
I am not able to understand the improvement given by posdefify with respect to 
eigen and why nearPD is even better.

Any final help?
thank you for your attention

Stefano Sofia PhD
Weather Department of Civil Protection Marche Region

AVVISO IMPORTANTE: Questo messaggio di posta elettronica può contenere 
informazioni confidenziali, pertanto è destinato solo a persone autorizzate 
alla ricezione. I messaggi di posta elettronica per i client di Regione Marche 
possono contenere informazioni confidenziali e con privilegi legali. Se non si 
è il destinatario specificato, non leggere, copiare, inoltrare o archiviare 
questo messaggio. Se si è ricevuto questo messaggio per errore, inoltrarlo al 
mittente ed eliminarlo completamente dal sistema del proprio computer. Ai sensi 
dell’art. 6 della  DGR n. 1394/2008 si segnala che, in caso di necessità ed 
urgenza, la risposta al presente messaggio di posta elettronica può essere 
visionata da persone estranee al destinatario.
IMPORTANT NOTICE: This e-mail message is intended to be received only by 
persons entitled to receive the confidential information it may contain. E-mail 
messages to clients of Regione Marche may contain information that is 
confidential and legally privileged. Please do not read, copy, forward, or 
store this message unless you are an intended recipient of it. If you have 
received this message in error, please forward it to the sender and delete it 
completely from your computer system.



--
Spencer Graves, PE, PhD
President and Chief Operating Officer
Structure Inspection and Monitoring, Inc.
751 Emerson Ct.
San José, CA 95126
ph:  408-655-4567

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] pass nrow(x) to dots in function(x){plot(x,...)}

2011-02-04 Thread rex.dwyer
Hi Marianne,
The quick-and-dirty solution is to add one character and make ns global:
ns <<- nrow(x)
Poor practice, but OK temporarily if you're just debugging.

This is an issue of "scope".  You are assuming dynamic scope, whereas R uses 
static scope.  'ns' was not defined when you said paste("n=",ns); it doesn't 
matter what its value is later.  Even though R delays evaluation of the 
argument until it is first needed, if it ends up being evaluated, the result is 
the same as if you evaluated it in the environment where it appeared in the 
text.  You can do something like this:

myfun <- function(x, title.fun, ...) {
  ns <- nrow(x)
  title = title.fun(ns)
  barplot(x , main=title, ... ) }

myfun(m1, title.fun=function(n) paste("n = ",n) )

Then the paste isn't evaluated until title.fun is *called*.  If you don't want 
to always supply title.fun, you give a default:
myfun <- function(x, title.fun=paste, ...) { ...
or
myfun <- function(x, title.fun=function(...){""}, ...) { ...
or
myfun <- function(x, title.fun=function(...){main}, main="", ...) { ... # 
(I think)


Rex

---
Message: 4
Date: Wed, 2 Feb 2011 11:51:50 +
From: Marianne Promberger 
To: "r-help@r-project.org" 
Subject: [R] pass nrow(x) to dots in function(x){plot(x,...)}
Message-ID: <20110202115150.GD8598@lauren>
Content-Type: text/plain; charset=us-ascii

Dear Rers,

I have a function to barplot() a matrix, eg

myfun <- function(x, ...) { barplot(x , ... )}

(The real function is more complicated, it does things to the matrix first.)

So I can do:

m1 <-  matrix(1:20,4)
myfun(m1)
myfun(m1, main="My title")

I'd like to be able to add the number of rows of the matrix passed to
the function to the "..." argument, eg

myfun(m1, main=paste("n=",ns))

where 'ns' would be nrow(m1)

I've tried this but it doesn't work:

myfun <- function(x, ...) {
  ns <- nrow(x)
  barplot(x , ... ) }

myfun(m1, main=paste("n = ",ns) )

ns is not found

So, basically, how do I assign an object inside a function that I can
then access in the dots when executing the function?

Many thanks

Marianne


--
Marianne Promberger PhD, King's College London
http://promberger.info
R version 2.12.0 (2010-10-15)
Ubuntu 9.04



--



message may contain confidential information. If you are not the designated 
recipient, please notify the sender immediately, and delete the original and 
any copies. Any use of the message by you is prohibited. 
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] get caller's name

2011-02-04 Thread Luke Tierney

The options showWarnCalls and showErrorCalls may also help --
they can be use do enable automatic printing of a call stack
summary. From ?options:

 ‘showWarnCalls’, ‘showErrorCalls’: a logical.  Should warning and
  error messages show a summary of the call stack?  By default
  error calls are shown in non-interactive sessions.

Best,

luke


On 02/04/2011 10:09 AM, William Dunlap wrote:

-Original Message-
From: r-help-boun...@r-project.org
[mailto:r-help-boun...@r-project.org] On Behalf Of Duncan Murdoch
Sent: Friday, February 04, 2011 6:03 AM
To: Ernest Adrogué
Cc: r-h...@stat.math.ethz.ch
Subject: Re: [R] get caller's name

On 03/02/2011 10:27 AM, Ernest Adrogué wrote:

Hi,
Suppose a function that checks an object:

stop.if.dims<- function(x) {
if (! is.null(dim(x))) {
  stop("cannot handle dimensional data")
}
}

This would be used by other functions that can only work with
dimensionless objects. The problem is the error message

would need to

include the name of the function that called stop.if.dims,

so that the

user knows which function got an argument that was incorrect.

How do I do this? Or maybe there is another way...


I see you have the answer you wanted, but I'd suggest you don't need
this:  the user should just use traceback() after the error
to see the
full call stack.  Perhaps it's not the caller that's the problem, but
the caller of the caller...

Duncan Murdoch


stopifnot() deals with the problem by arranging for the
error reporting mechanism not to say what function the
error comes from.  If you are in the know, not seeing
"Error in someFunction(arg):" in the error might motivate
you to call traceback().  If you are not in the know, it
just frustrates you.

R>  myFunc<- function(x) stopifnot(all(x>0))
R>  myFunc(-pi)
Error: all(x>  0) is not TRUE
R>  traceback()
3: stop(paste(ch, " is not ", if (length(r)>  1L) "all ", "TRUE",
   sep = ""), call. = FALSE)
2: stopifnot(all(x>  0))
1: myFunc(-pi)

In S+ stopifnot() uses eval(call("stop", message), sys.parent())
to make the error message appear to come from the caller of
stopifnot().  The initial message is clearer but the traceback
more confusing:

>  myFunc<- function(x) stopifnot(all(x>0))
>  myFunc(-pi)
Problem in myFunc( - pi): all(x>  0) is not TRUE
Use traceback() to see the call stack
>  traceback()
8: eval(action, sys.parent())
7: doErrorAction("Problem in myFunc( - pi): all(x>  0) is not TRUE", 1000)
6: stop("all(x>  0) is not TRUE")
5: eval(call, sys.parent())
4: stopifnot(all(x>  0))
3: myFunc( - pi)
2: eval(expression(myFunc( - pi)))
1:
Message: Problem in myFunc( - pi): all(x>  0) is not TRUE

If I try the eval(call("stop",message), sys.parent()) trick
in R the error is reported as being from eval():
>  myFunc(-pi)
Error in eval(expr, envir, enclos) : all(x>  0) is not TRUE

S+'s error handler has some logic so that a stop() called from
eval() is reported as being from the frame that eval is evaluating
in.

It might be nice to be able to tell stop(), warning(), and message()
to pretend they were called from somewhere other than where they
were actually called from.  Falling back on traceback() doesn't
help with warnings and messages.  Being able to put standard messages
like 'x not a matrix' into utility functions is handy, but it makes
it hard to track down the problem.

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


--
Luke Tierney
Statistics and Actuarial Science
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
   Actuarial Science
241 Schaeffer Hall  email:  l...@stat.uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] always about positive definite matrix

2011-02-04 Thread Mike Miller
I'm also not an expert on this topic.  I just wanted to list a couple of 
ways that non-PD matrices might arise.  I'll just add now a couple of 
pointers:


First, I believe the term "semipositive definite" is considered ambiguous 
because in some literature it means that the matrix the smallest 
eigenvalue is zero and in other literature it means that no eigenvalue is 
negative (but there might be no zero eigenvalues).  I think I might have 
read about this ambiguity in Searle's excellent "Matrix Algebra Useful for 
Statistics":


http://www.amazon.com/Matrix-Algebra-Useful-Statistics-Probability/dp/0471866814

Second, people in certain stat areas often recommend this chapter by 
Werner Worthke, formerly of SAS Institute, Inc.:


Wothke, Werner (1995), "Nonpositive Definite Matrices in Structural 
Equation Modeling", in "Testing Structural Equation Models", by Kenneth A. 
Bollen and J. Scott Long (eds.), Sage Publications, Newbury Park pp. 
256-293.


http://www.amazon.com/Testing-Structural-Equation-Models-Editions/dp/0803945078

I think you'll find basic definitions and explanations there along with a 
lot of information about how non-positive definite matrices may arise in 
real-world applications.


Best,

Mike


On Fri, 4 Feb 2011, Spencer Graves wrote:

 1.  Martin Maechler's comments should be taken as replacements for 
anything I wrote where appropriate.  Any apparent conflict is a result of his 
superior knowledge.



 2.  'eigen' returns the eigenvalue decomposition assuming the matrix is 
symmetric, ignoring anything in m[upper.tri(m)].



 3.  The basic idea behind both posdefify and nearPD is to compute the 
eigenenvalues and vectors, then replace any eigenvalues that are small or 
negative with some suitable small positive number and reconstruct the matrix 
from this modified eigenvalue decomposition.  posdefify and nearPD implement 
modifications of this basic idea.



 4.  I recommend in the summary you mention nearPD but not posdefify, 
because nearPD was written more recently using the results of research not 
available to the authors when posdefify was written.



MARTIN:  There is a typo in the first line of the documentation for 
"symmpart".  It currently reads, "symmpart(x) computes the symmetric part (x 
+ t(x))/2 and the skew symmetric part (x - t(x))/2 of a square matrix x.". 
It should read, "symmpart(x) computes the symmetric part (x + t(x))/2 and 
skewpart the skew symmetric part (x - t(x))/2 of a square matrix x."



 Hope this helps.
 Spencer


On 2/4/2011 6:26 AM, Stefano Sofia wrote:

Dear R-users,
I followed with high interest the thread about positive definite matrix.
I tracked all the messages of the discussion and I am trying to make a 
summary of all the correlated problems that arose from the discussion and 
the best solutions to overcome them.
As far as I understood, the main problems are two: assessing the symmetry 
of the given matrix and dealing with eigenvalues very close to zero.

Do I miss some important points?

The functions that have been mentioned are eigen (I think in particualr the 
isSymmetric.matrix function), the function posdefify of the sfmisc package 
and the function nearPD of the Matrix package. I believe that some 
conversations have not been shared with the mailing list and therefore I 
find difficult to trace everything.


I understood very well the summary in four points given by Dr.Spencer 
Graves (message 53 of ISSUE 30, VOL 95), and parts of the comments added by 
Dr.Martin Maechler (message 71 of the same issue).
I am not able to understand the improvement given by posdefify with respect 
to eigen and why nearPD is even better.


Any final help?
thank you for your attention

Stefano Sofia PhD
Weather Department of Civil Protection Marche Region

AVVISO IMPORTANTE: Questo messaggio di posta elettronica può contenere 
informazioni confidenziali, pertanto è destinato solo a persone autorizzate 
alla ricezione. I messaggi di posta elettronica per i client di Regione 
Marche possono contenere informazioni confidenziali e con privilegi legali. 
Se non si è il destinatario specificato, non leggere, copiare, inoltrare o 
archiviare questo messaggio. Se si è ricevuto questo messaggio per errore, 
inoltrarlo al mittente ed eliminarlo completamente dal sistema del proprio 
computer. Ai sensi dell’art. 6 della  DGR n. 1394/2008 si segnala che, in 
caso di necessità ed urgenza, la risposta al presente messaggio di posta 
elettronica può essere visionata da persone estranee al destinatario.
IMPORTANT NOTICE: This e-mail message is intended to be received only by 
persons entitled to receive the confidential information it may contain. 
E-mail messages to clients of Regione Marche may contain information that 
is confidential and legally privileged. Please do not read, copy, forward, 
or store this message unless you are an intended recipient of it. If you 
have received this message in error, please forward it to the sender and 

[R] exact logistic regression

2011-02-04 Thread Den
Hate to say that, but it looks like Stata is way above R, considering
exact logistic regression.
To use elrm() I have to aggregate my data,which is really time consuming
when I look for the way out through many variables. But the worst thing
is that I am not not sure if I can  trust to p-values in output. 

I would be happy,however, if someone could contradict me on this issue.
With best regards
Denis

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] MSBVAR and hc.forecast

2011-02-04 Thread david ostrowski
attempting to do multivariate modelling in R with known future
conditions (in this case variable 'b') using MSBVAR and hc.forecast.
The sample code (a paired down representation) does not give anywhere
near the expected results - I am assuming that a forecast 8 steps out
would approximate 'a' as the sequence 1.1,2.1,3.1,100.1 corresponding
to the input set.

I have varied the input length to be longer  as well as  using longer
simulation times.

If any one has very small example code such as this or knows how to
obtain my expected behavior it is much appreciated!



include("MSBVAR")
a = 
c(1.1,2.1,3.1,100.1,1.1,2.1,3.1,100.1,1.1,2.1,3.1,100.1,1.1,2.1,3.1,100.1,1.1,2.1,3.1,100.1,1.1,2.1,3.1,100.1)
b = 
c(1.1,1.1,1.1,100.1,1.1,1.1,1.1,100.1,1.1,1.1,1.,100.1,1.1,1.1,1.1,100.1,1.1,1.1,1.1,100.1,1.1,1.1,1.,100.1)
K <-ts(cbind(a,b),start=c(1.1,1.1), names=c("a","b"))


fit.bvar <- szbvar(K, p = 1, lambda0=0.6, lambda=0.1, lambda3=2,
lambda4=0.25,lambda5=0,mu5=0, mu6=0, prior=0)
y <- matrix(c(rep(0,8),c(1.10,1.10,1.10,100.1,1.10,1.10,1.0,100.10)),ncol=2)
h <- hc.forecast(fit.bvar,y,nsteps=8,burnin=300, gibbs=500,exog=NULL)

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Finding non-normal distributions per row of data frame?

2011-02-04 Thread DB1984

Thanks for the feedback Patrizio - but your function is performing the
shapiro.test on columns instead of rows...

I tried:
nt<-data.frame(#a dataframe with 6 columns and 9 rows)

nr <- nrow(nt)

 
test <- apply(nt, nt[1:nr,], shapiro.test)

Error in ds[-MARGIN] : invalid subscript type 'list'

fred<-data.frame(sapply(test,function(x)c(x$statistic, x$p.value)))


But picked up the error above. Do I still require a loop to apply this over
rows?


-- 
View this message in context: 
http://r.789695.n4.nabble.com/Finding-non-normal-distributions-per-row-of-data-frame-tp3259439p3260412.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Applying multiple functions to one object

2011-02-04 Thread Den

'Aggregate multiple functions into a single function. Combine multiple
functions to a single function returning a named vector of outputs'

This is a short description of each() function from plyr package

Here is an example from help

each(min, max)(1:10)

Hope this helps 
Regards
Denis




У Пят, 04/02/2011 у 08:37 -0600, jctoll піша:
> On Wed, Feb 2, 2011 at 7:59 AM, Karl Ove Hufthammer  wrote:
> > Dear list members,
> >
> > I recall seeing a convenience function for applying multiple functions to
> > one object (i.e., almost the opposite of 'mapply’) somewhere.
> > Example: If the function was named ’fun’ the output of
> >
> >  fun(3.14, mode, typeof, class)
> >
> > would be identical to the output of
> >
> >  c(mode(3.14), typeof(3.14), class(3.14))
> >
> > Is my memory failing me, or does such a function already exists in a
> > package? Of course, it’s not difficult to define a summary function and
> > apply this to the object, but writing, for example,
> >
> > fun(x, mean, median, sd, mad)
> >
> > to quickly show the relevant information is much more *convient*.
> >
> >
> > It would be even nicer with a function that could also handle vectors and
> > lists of values, and output the result as data frames or matrices. Example:
> >
> > x = c("foo", "bar", "foobar")
> > fun(x, nchar, function(st) substr(st, 1 ,2) )
> >
> > y = list(3, 3L, 3.14, factor(3))
> > fun(x, mode, typeof, class)
> >
> > --
> > Karl Ove Hufthammer
> >
> > __
> > R-help@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
> 
> Karl,
> 
> Perhaps you're thinking of the Reduce function? There's an example
> from the help page that you might be able to adapt to your purpose.
> 
> ## Iterative function application:
> Funcall <- function(f, ...) f(...)
> ## Compute log(exp(acos(cos(0))
> Reduce(Funcall, list(log, exp, acos, cos), 0, right = TRUE)
> 
> HTH,
> 
> James
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] RODBC : how to avoid truncatig big integer when using sqlsave

2011-02-04 Thread PtitBleu

Hello,

I'm still trying to modify my script in order to use RODBC instead of RMySQL
(no more ready-to-use package for windows).

My new problem is the following one (not seen with RMySQL):

I'd like to copy a data.frame to a mysql table. One column is a numeric one
with big integer like :
2005000206110 (13 figures)
If I type this number in R, I get:
2.005e+12
and when I use sqlSave to save the data, in the MySQL table I get (the type
of the column is bigint(13)): 
20050 instead of 2005000206110

I tried with varType = bigint unsigned for this column but it doesn't work.

Is there a way to save the number with all the figures?
Thanks in advance.
Have a nice week-end,
Ptit Bleu.
-- 
View this message in context: 
http://r.789695.n4.nabble.com/RODBC-how-to-avoid-truncatig-big-integer-when-using-sqlsave-tp3260251p3260251.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Apply parameters to a function from a list

2011-02-04 Thread Florian Burkart

Hey,

this may be a simple question, but I am struggling to apply a list of 
parameters to a function.



Say I have the following function:

test<-function(a=1,b=2,c=3,d=4){a+b+c+d}

And the following list:

point<-list(a=3,d=2)

Is there a way I can evaluate function test at point?



(Apart from changing function test to take in a list, and take values 
out of the list, but I'd rather not touch test).


Thanks vm

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] exact logistic regression

2011-02-04 Thread Denis Kazakiewicz
Dear Łukasz
Thank you very much for information
Dear R people could you please help please with following questions 
Sorry for my silly questions, because I am not a mathematician.
1. Is elrm() works similar as exact regression in SAS or Stata?  After
double check in Stata and R some results from my data vary greatly. 
2. How to obtain p-value from MCMClogit() from MCMCpack?

Thank you
Denis

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] dataframes from a function

2011-02-04 Thread chris20

Hi,
I'm trying to create a function to return three dataframes for later use in
a graphic so I want the return from the function to give me dataframes and
with unique names relating to the variable they are based on.
For example.

sub<-c("6-1a","6-1a","6-1a","9-2b","9-2b","9-2b","7c","7c","7c")
yar<-rep(1991:1993,3)
x1n<-c(4,6,3,6,4,7,9,4,2)
y1m<-c(3,7,5,8,4,3,6,7,8)
prac<-data.frame(sub,yar,x1n,y1m)

plot.restrut<-function(org.plot,ex.plot) {
new.pl<-org.plot[org.plot$sub %in% ex.plot,]
new.pl.st<-new.pl[new.pl$yar %in% "1991",]
new.pl.fin<-new.pl[new.pl$yar %in% "1993",]
print(new.pl)
print(new.pl.st)
print(new.pl.fin)
}

plot.restrut(prac,"9-2b")

##
I would like the output to be one dataframe with the name "9.2b.new" which
is new.pl in the example
Another data frame to be "9.2bnew1991" and another dataframe "9.2bnew1993"

I want the dataframe names to relate to the variables they have come from
because I want to use the same function to extract lots of subsets of my
data.

Thanks
Chris 


-- 
View this message in context: 
http://r.789695.n4.nabble.com/dataframes-from-a-function-tp3260581p3260581.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] recode according to specific sequence of characters within a string variable

2011-02-04 Thread Greg Snow
You can do this with regular expressions, since you want to extract specific 
values from the string I would suggest learning about the gsubfn package, it is 
a bit easier with gsubfn than with the other matching tools. 


-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.org
801.408.8111


> -Original Message-
> From: r-help-boun...@r-project.org [mailto:r-help-bounces@r-
> project.org] On Behalf Of D. Alain
> Sent: Friday, February 04, 2011 5:33 AM
> To: r-help@r-project.org
> Subject: [R] recode according to specific sequence of characters within
> a string variable
> 
> Dear R-List,
> 
> I have a dataframe with one column "name.of.report" containing
> character values, e.g.
> 
> 
> >df$name.of.report
> 
> "jeff_2001_teamx"
> "teamy_jeff_2002"
> "robert_2002_teamz"
> "mary_2002_teamz"
> "2003_mary_teamy"
> ...
> (i.e. the bit of interest is not always at same position)
> 
> Now I want to recode the column "name.of.report" into the variables
> "person", "year","team", like this
> 
> >new.df
> 
> "person"  "year"  "team"
> jeff   2001  x
> jeff   2002  y
> robert   2002  z
> mary    2002  z
> 
> I tried with grep()
> 
> df$person<-grep("jeff",df$name.of.report)
> 
> but of course it didn't exactly result in what I wanted to do. Could
> not find any solution via RSeek. Excuse me if it is a very silly
> question, but can anyone help me find a way out of this?
> 
> Thanks a lot
> 
> Alain
> 
> 
> 
> 
>   [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] recode according to specific sequence of characters within a string variable

2011-02-04 Thread Greg Snow
So you want to combine multiple columns back into a single column with the 
strings pasted together?  If that is correct then look at the paste and sprintf 
functions (use one or the other, not both).

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.org
801.408.8111


> -Original Message-
> From: r-help-boun...@r-project.org [mailto:r-help-bounces@r-
> project.org] On Behalf Of Denis Kazakiewicz
> Sent: Friday, February 04, 2011 6:26 AM
> To: Marc Schwartz
> Cc: R-help
> Subject: Re: [R] recode according to specific sequence of characters
> within a string variable
> 
> Dear R people
> Could you please help
> I have similar but opposite question
> How to reshape data from DF.new  to  DF from example, Mark kindly
> provided?
> 
> Thank you
> Denis
> 
> On Пят, 2011-02-04 at 07:09 -0600, Marc Schwartz wrote:
> > On Feb 4, 2011, at 6:32 AM, D. Alain wrote:
> >
> > > Dear R-List,
> > >
> > > I have a dataframe with one column "name.of.report" containing
> character values, e.g.
> > >
> > >
> > >> df$name.of.report
> > >
> > > "jeff_2001_teamx"
> > > "teamy_jeff_2002"
> > > "robert_2002_teamz"
> > > "mary_2002_teamz"
> > > "2003_mary_teamy"
> > > ...
> > > (i.e. the bit of interest is not always at same position)
> > >
> > > Now I want to recode the column "name.of.report" into the variables
> "person", "year","team", like this
> > >
> > >> new.df
> > >
> > > "person"  "year"  "team"
> > > jeff   2001  x
> > > jeff   2002  y
> > > robert   2002  z
> > > mary2002  z
> > >
> > > I tried with grep()
> > >
> > > df$person<-grep("jeff",df$name.of.report)
> > >
> > > but of course it didn't exactly result in what I wanted to do.
> Could not find any solution via RSeek. Excuse me if it is a very silly
> question, but can anyone help me find a way out of this?
> > >
> > > Thanks a lot
> > >
> > > Alain
> >
> >
> > There will be several approaches, all largely involving the use of
> ?regex. Here is one:
> >
> >
> > DF <- data.frame(name.of.report = c("jeff_2001_teamx",
> "teamy_jeff_2002",
> > "robert_2002_teamz",
> "mary_2002_teamz",
> > "2003_mary_teamy"))
> >
> > > DF
> >  name.of.report
> > 1   jeff_2001_teamx
> > 2   teamy_jeff_2002
> > 3 robert_2002_teamz
> > 4   mary_2002_teamz
> > 5   2003_mary_teamy
> >
> >
> > DF.new <- data.frame(person = gsub("[_0-9]|team.", "",
> DF$name.of.report),
> >  year = gsub(".*([0-9]{4}).*","\\1",
> DF$name.of.report),
> >  team = gsub(".*team(.).*","\\1",
> DF$name.of.report))
> >
> >
> > > DF.new
> >   person year team
> > 1   jeff 2001x
> > 2   jeff 2002y
> > 3 robert 2002z
> > 4   mary 2002z
> > 5   mary 2003y
> >
> >
> >
> > HTH,
> >
> > Marc Schwartz
> >
> > __
> > R-help@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to get the variables used in stepwise logistic regression?

2011-02-04 Thread Soyeon Kim
Thank you. This is what I want.
I need to study about attr :).

Soyeon

On Fri, Feb 4, 2011 at 11:07 AM, David Winsemius  wrote:
>
> On Feb 4, 2011, at 11:56 AM, Soyeon Kim wrote:
>
>> Dear All,
>>
>> I used glm and then used step function for stepwise regression.
>> Now, I want to store the variables used in the stepwise regression.
>
> I am not sure that what you ask here is actually what you eventually ask and
> will limit my answer to only what you ask below.
>
>>
>> The code is the following.
>>
>> m_logistic <- glm(y ~ . + M1:T + M2:T + M3:T+ M4:T +M5:T,
>> family=binomial("logit"), data = data)
>>  step_glm <- step(m_logistic)
>>
>>> step_glm$terms
>>
>> y ~ M1 + T + M1:T
>> attr(,"variables")
>> list(y, M1, T)
>> attr(,"factors")
>>  M1 T M1:T
>> y   0 0    0
>> M1  1 0    1
>> T   0 1    1
>> attr(,"term.labels")
>> [1] "M1"   "T"    "M1:T"
>> attr(,"order")
>> [1] 1 1 2
>> attr(,"intercept")
>> [1] 1
>> attr(,"response")
>> [1] 1
>> attr(,".Environment")
>> 
>> attr(,"predvars")
>> list(y, M1, T)
>> attr(,"dataClasses")
>>       y        M1         T
>> "numeric" "numeric" "numeric"
>>
>> How can I extract only "term.labels" ?
>>
>
> Have you tried what would seem to be the obvious answer (assuming that is a
> faithful console output):
>
> attr(step_glm$terms,  "term.labels")
>
> ?attr
> ?attributes
>
> (And the usual caveats: stepwise methods are often invalid.)
>
> --
> David Winsemius, MD
> West Hartford, CT
>
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to get the variables used in stepwise logistic regression?

2011-02-04 Thread David Winsemius


On Feb 4, 2011, at 11:56 AM, Soyeon Kim wrote:


Dear All,

I used glm and then used step function for stepwise regression.
Now, I want to store the variables used in the stepwise regression.


I am not sure that what you ask here is actually what you eventually  
ask and will limit my answer to only what you ask below.




The code is the following.

m_logistic <- glm(y ~ . + M1:T + M2:T + M3:T+ M4:T +M5:T,
family=binomial("logit"), data = data)
 step_glm <- step(m_logistic)


step_glm$terms

y ~ M1 + T + M1:T
attr(,"variables")
list(y, M1, T)
attr(,"factors")
  M1 T M1:T
y   0 00
M1  1 01
T   0 11
attr(,"term.labels")
[1] "M1"   "T""M1:T"
attr(,"order")
[1] 1 1 2
attr(,"intercept")
[1] 1
attr(,"response")
[1] 1
attr(,".Environment")

attr(,"predvars")
list(y, M1, T)
attr(,"dataClasses")
   yM1 T
"numeric" "numeric" "numeric"

How can I extract only "term.labels" ?



Have you tried what would seem to be the obvious answer (assuming that  
is a faithful console output):


attr(step_glm$terms,  "term.labels")

?attr
?attributes

(And the usual caveats: stepwise methods are often invalid.)

--
David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] How to get the variables used in stepwise logistic regression?

2011-02-04 Thread Soyeon Kim
Dear All,

I used glm and then used step function for stepwise regression.
Now, I want to store the variables used in the stepwise regression.

The code is the following.

 m_logistic <- glm(y ~ . + M1:T + M2:T + M3:T+ M4:T +M5:T,
family=binomial("logit"), data = data)
  step_glm <- step(m_logistic)

>  step_glm$terms
 y ~ M1 + T + M1:T
attr(,"variables")
list(y, M1, T)
attr(,"factors")
   M1 T M1:T
y   0 00
M1  1 01
T   0 11
attr(,"term.labels")
[1] "M1"   "T""M1:T"
attr(,"order")
[1] 1 1 2
attr(,"intercept")
[1] 1
attr(,"response")
[1] 1
attr(,".Environment")

attr(,"predvars")
list(y, M1, T)
attr(,"dataClasses")
yM1 T
"numeric" "numeric" "numeric"

How can I extract only "term.labels" ?

Thank you,
Soyeon

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Finding non-normal distributions per row of data frame?

2011-02-04 Thread Greg Snow
I get a different set of errors than you do (what version of R are you using?).

Patrizio showed one way to do what you want.  But, what is it that you are 
really trying to accomplish?  What do you think the result of 20,000 normality 
tests (each of which may not be answering the real question on its own) will 
tell you?

Your code below seems to be mixing concepts that it would benefit you to learn 
more about and when to use each one.  If y is fully numeric, then it is more 
efficient to use a matrix than a data frame.  

In your loop you assign y.temp to be a list containing 1 row from y, this 
results in a list with 1 element which is a 1 row data frame.  Why make it a 
list?  Do you really want it to stay a data frame or be a vector?

You then run lapply on a list with only one element, that works, but is a bit 
wasteful and does not accomplish anything more than running the function on the 
single element.

Then the shapiro.test function is passed a data frame when it is expecting a 
vector (this gives an error on my install, you may have something different 
going on).

Then testtable is being overwritten each time through the loop, so you are 
throwing away most of your work without ever doing anything with it.

Why the loop and the apply's?

Why not just try something like apply(y, 1, shapiro.test) ?

And overall what are you trying to accomplish? Because what this is likely to 
accomplish is probably less useful than just generating random numbers.

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.org
801.408.8111

> -Original Message-
> From: r-help-boun...@r-project.org [mailto:r-help-bounces@r-
> project.org] On Behalf Of DB1984
> Sent: Thursday, February 03, 2011 8:52 PM
> To: r-help@r-project.org
> Subject: [R] Finding non-normal distributions per row of data frame?
> 
> 
> This is my first attempt at this, so hopefully a few kind pointers can
> get me
> going in the right direction...
> 
> I have a large data frame of 20+ columns and 20,000 rows. I'd like to
> evaluate the distribution of values in each row, to determine whether
> they
> meet the criteria of a normal distribution. I'd loop this over all the
> rows
> in the data frame, and output the summary results to a new data frame.
> 
> I have a loop that should run a Shapiro-Wilk test over each row,
> 
> y= data frame
> 
> for (j in 1:nr) {
> y.temp<-list(y[j,])
> testsw <- lapply(y.temp, shapiro.test)
> testtable <- t(sapply(testsw, function(x) c(x$statistic, x$p.value)))
>  colnames(testtable) <- c("W", "p.value")
> }
> 
> 
> but it is currently throwing out an error:
>  "Error in `rownames<-`(`*tmp*`, value = "1") :
>   attempt to set rownames on object with no dimensions"
> 
> ...which I guess is unrelated to the evaluation of normality, and more
> likely a faulty loop?
> 
> Any suggestions either for this test, or a better way to evaluate the
> normal
> distribution (e.g. qq-plot residuals for each row) would be greatly
> received. Thanks.
> --
> View this message in context: http://r.789695.n4.nabble.com/Finding-
> non-normal-distributions-per-row-of-data-frame-tp3259439p3259439.html
> Sent from the R help mailing list archive at Nabble.com.
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] (no subject)

2011-02-04 Thread Yan Jiao
Dear R users?

 

I want to know how to use strata in survival analysis, I want to combine
multiple dataset:

 

Coxph(Surv(merged.time.v, merged.event.v)~merged.score+
strata(dataset.v))

 

So merged.time.v and merged.event.v are from multiple datasets.

How should I define dataset.v here?

 

Many thanks

 

yan

 

 


**
This email and any files transmitted with it are confide...{{dropped:10}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] in object of class formula

2011-02-04 Thread Prof Brian Ripley
Did you look at the help?  ?formula has a whole section about why 
formulas have environments and what they do.


On Fri, 4 Feb 2011, Aviad Klein wrote:


# Hi all,

# I've made a function to make a formula out of a data.frame without columns
which contain a constant value.

# The function "which.constant" returns the indices of colums with constant
values:

which.constant <- function(data.frame) { # counts the number of columns in a
data.frame which are constant
h <- sapply(sapply(data.frame,unique),length) # count unique values
return(as.numeric(which(h==1)))}

# The function "make.formula" returns the desired formula but with an
unwanted addition :

make.formula <- function(data.frame) {
   h <- which.constant(data.frame)
   hh <- names(data.frame)[-h]
   hh <- paste("~",paste(hh,collapse="+"),sep="")
   return(as.formula(hh))}

# The following structure should give an example of how it works:

Data <- structure(list(cs_jail_2 = c(2L, 1L, 1L, 0L, 0L, 1L), cs_jail_3 =
c(0L,
0L, 0L, 0L, 0L, 0L), cs_jail_4 = c(1L, 2L, 2L, 2L, 0L, 0L)), .Names =
c("cs_jail_2",
"cs_jail_3", "cs_jail_4"), row.names = c(NA, 6L), class = "data.frame")

make.formula(Data)

# ~cs_jail_2 + cs_jail_4
# 

# what does this  mean?

# does it effect computation in any way? how to get read of it?


See also ?environment for how to remove (sic) it.



# thank you,

# Aviad
# aviadklein.wordpress.com/

[[alternative HTML version deleted]]


Please don't send HTML when expressly asked not to in the posting 
guide.


--
Brian D. Ripley,  rip...@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] get caller's name

2011-02-04 Thread William Dunlap
> -Original Message-
> From: r-help-boun...@r-project.org 
> [mailto:r-help-boun...@r-project.org] On Behalf Of Duncan Murdoch
> Sent: Friday, February 04, 2011 6:03 AM
> To: Ernest Adrogué
> Cc: r-h...@stat.math.ethz.ch
> Subject: Re: [R] get caller's name
> 
> On 03/02/2011 10:27 AM, Ernest Adrogué wrote:
> > Hi,
> > Suppose a function that checks an object:
> >
> > stop.if.dims<- function(x) {
> >if (! is.null(dim(x))) {
> >  stop("cannot handle dimensional data")
> >}
> > }
> >
> > This would be used by other functions that can only work with
> > dimensionless objects. The problem is the error message 
> would need to
> > include the name of the function that called stop.if.dims, 
> so that the
> > user knows which function got an argument that was incorrect.
> >
> > How do I do this? Or maybe there is another way...
> 
> I see you have the answer you wanted, but I'd suggest you don't need 
> this:  the user should just use traceback() after the error 
> to see the 
> full call stack.  Perhaps it's not the caller that's the problem, but 
> the caller of the caller...
> 
> Duncan Murdoch

stopifnot() deals with the problem by arranging for the
error reporting mechanism not to say what function the
error comes from.  If you are in the know, not seeing
"Error in someFunction(arg):" in the error might motivate
you to call traceback().  If you are not in the know, it
just frustrates you.

   R> myFunc <- function(x) stopifnot(all(x>0))
   R> myFunc(-pi)
   Error: all(x > 0) is not TRUE
   R> traceback()
   3: stop(paste(ch, " is not ", if (length(r) > 1L) "all ", "TRUE", 
  sep = ""), call. = FALSE)
   2: stopifnot(all(x > 0))
   1: myFunc(-pi)

In S+ stopifnot() uses eval(call("stop", message), sys.parent())
to make the error message appear to come from the caller of
stopifnot().  The initial message is clearer but the traceback
more confusing:

   > myFunc <- function(x) stopifnot(all(x>0))
   > myFunc(-pi)
   Problem in myFunc( - pi): all(x > 0) is not TRUE 
   Use traceback() to see the call stack
   > traceback()
   8: eval(action, sys.parent())
   7: doErrorAction("Problem in myFunc( - pi): all(x > 0) is not TRUE", 1000)
   6: stop("all(x > 0) is not TRUE")
   5: eval(call, sys.parent())
   4: stopifnot(all(x > 0))
   3: myFunc( - pi)
   2: eval(expression(myFunc( - pi)))
   1: 
   Message: Problem in myFunc( - pi): all(x > 0) is not TRUE 

If I try the eval(call("stop",message), sys.parent()) trick
in R the error is reported as being from eval():
   > myFunc(-pi)
   Error in eval(expr, envir, enclos) : all(x > 0) is not TRUE

S+'s error handler has some logic so that a stop() called from
eval() is reported as being from the frame that eval is evaluating
in.

It might be nice to be able to tell stop(), warning(), and message()
to pretend they were called from somewhere other than where they
were actually called from.  Falling back on traceback() doesn't
help with warnings and messages.  Being able to put standard messages
like 'x not a matrix' into utility functions is handy, but it makes
it hard to track down the problem.

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com 


> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] uniroot

2011-02-04 Thread Berend Hasselman


dpender wrote:
> 
> Hi,
> 
> I am using the uniroot function in order to carry out a bivariate Monte
> Carlo simulation using the logistics model.
> 
> I have defined the function as:
> 
> BV.FV <- function(x,y,a,A)
> (((x^(-a^-1)+y^(-a^-1))^(a-1))*(y^(a-1/a))*(exp(-((1^(-a^-1)+y^(-a^-1))^a)+y^-1)))-A
> 
> and the procedure is as follows:
> 
> Randomly generate values of A~(0,1), y0 = -(lnA)^-1
> 
> Where: A=Pr{X 
> Use y0 to determine x where x = x(i) and y = y0(i-1) in the above
> equation.
> 
> Unfortunately when the randomly defined A gets to approximately 0.46 the
> intervals I provide no longer have the opposite signs (both -ve).
> 

Try

curve(BV.FV(x,y=y0,a=.703,A=.7),from=1,to=10)

for different values of a, A and y0.
I have a feeling that your function or something else is not quite correct.

Berend
-- 
View this message in context: 
http://r.789695.n4.nabble.com/uniroot-tp3260090p3260438.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] lapply, strsplit, and list elements

2011-02-04 Thread William Dunlap

> -Original Message-
> From: r-help-boun...@r-project.org 
> [mailto:r-help-boun...@r-project.org] On Behalf Of Dick Harray
> Sent: Friday, February 04, 2011 7:37 AM
> To: r-help@r-project.org
> Subject: [R] lapply, strsplit, and list elements
> 
> Hi there,
> 
> I have a problem about lapply, strsplit, and accessing list elements,
> which I don't understand or cannot solve:
> 
> I have e.g. a character vector with three elements:
> 
> x = c("349/077,349/074,349/100,349/117",
>  "340/384.2,340/513,367/139,455/128,D13/168",
>  "600/437,128/903,128/904")
> 
>   
> The task I want to perform, is to generate a list, comprising the
> portion in front of the "/" of each element of x:
> 
> neededResult = list(c("349","349", "349", "349"),
>  c("340", "340", "367", "455", "D13"),
>  c("600", "128", "128") )

Try the following, which first splits each string by commas
(returning a list), then removes the first slash and everything
after it (using lapply to maintain the list structure).

   > gotResult <- lapply(strsplit(x, ","), function(xi)gsub("/.*", "",
xi))
   > identical(getResult, neededResult)
   [1] TRUE

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com

> 
> 
> I figured out that for a single element of x the following works
> 
> unlist( lapply( strsplit( unlist( strsplit(x[1], "\\,") ), 
> "/"), "[", 1) )
> 
> but due to "unlist" it doesn't provide the required result if extended
> to all elements of x
> 
> unlist(lapply(strsplit( unlist( lapply(x, strsplit, "\\,")), 
> "/"), "[",))
> 
> 
> Someone can help me to get the needed result?
> 
> Thanks and regards,
> 
> Dirk
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] in object of class formula

2011-02-04 Thread Aviad Klein
# Hi all,

# I've made a function to make a formula out of a data.frame without columns
which contain a constant value.

# The function "which.constant" returns the indices of colums with constant
values:

which.constant <- function(data.frame) { # counts the number of columns in a
data.frame which are constant
h <- sapply(sapply(data.frame,unique),length) # count unique values
return(as.numeric(which(h==1)))}

# The function "make.formula" returns the desired formula but with an
unwanted addition :

make.formula <- function(data.frame) {
h <- which.constant(data.frame)
hh <- names(data.frame)[-h]
hh <- paste("~",paste(hh,collapse="+"),sep="")
return(as.formula(hh))}

# The following structure should give an example of how it works:

Data <- structure(list(cs_jail_2 = c(2L, 1L, 1L, 0L, 0L, 1L), cs_jail_3 =
c(0L,
0L, 0L, 0L, 0L, 0L), cs_jail_4 = c(1L, 2L, 2L, 2L, 0L, 0L)), .Names =
c("cs_jail_2",
"cs_jail_3", "cs_jail_4"), row.names = c(NA, 6L), class = "data.frame")

make.formula(Data)

# ~cs_jail_2 + cs_jail_4
# 

# what does this  mean?

# does it effect computation in any way? how to get read of it?

# thank you,

# Aviad
# aviadklein.wordpress.com/

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] recode according to specific sequence of characters within a string variable

2011-02-04 Thread David Winsemius


On Feb 4, 2011, at 8:26 AM, Denis Kazakiewicz wrote:


Dear R people
Could you please help
I have similar but opposite question
How to reshape data from DF.new  to  DF from example, Mark kindly
provided?


Well, I don't think you want a random order, right? If what you are  
asking is for a single character element per line of dataframe then  
try this:


apply(df.new, 1, paste, collapse="_")

--
David.


Thank you
Denis

On Пят, 2011-02-04 at 07:09 -0600, Marc Schwartz wrote:

On Feb 4, 2011, at 6:32 AM, D. Alain wrote:


Dear R-List,

I have a dataframe with one column "name.of.report" containing  
character values, e.g.




df$name.of.report


"jeff_2001_teamx"
"teamy_jeff_2002"
"robert_2002_teamz"
"mary_2002_teamz"
"2003_mary_teamy"
...
(i.e. the bit of interest is not always at same position)

Now I want to recode the column "name.of.report" into the  
variables "person", "year","team", like this



new.df


"person"  "year"  "team"
jeff   2001  x
jeff   2002  y
robert   2002  z
mary2002  z

I tried with grep()

df$person<-grep("jeff",df$name.of.report)

but of course it didn't exactly result in what I wanted to do.  
Could not find any solution via RSeek. Excuse me if it is a very  
silly question, but can anyone help me find a way out of this?


Thanks a lot

Alain



There will be several approaches, all largely involving the use of ? 
regex. Here is one:



DF <- data.frame(name.of.report = c("jeff_2001_teamx",  
"teamy_jeff_2002",
   "robert_2002_teamz",  
"mary_2002_teamz",

   "2003_mary_teamy"))


DF

name.of.report
1   jeff_2001_teamx
2   teamy_jeff_2002
3 robert_2002_teamz
4   mary_2002_teamz
5   2003_mary_teamy


DF.new <- data.frame(person = gsub("[_0-9]|team.", "", DF 
$name.of.report),
year = gsub(".*([0-9]{4}).*","\\1", DF 
$name.of.report),
team = gsub(".*team(.).*","\\1", DF 
$name.of.report))




DF.new

 person year team
1   jeff 2001x
2   jeff 2002y
3 robert 2002z
4   mary 2002z
5   mary 2003y



HTH,

Marc Schwartz

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] lapply, strsplit, and list elements

2011-02-04 Thread Dick Harray
Hi there,

I have a problem about lapply, strsplit, and accessing list elements,
which I don't understand or cannot solve:

I have e.g. a character vector with three elements:

x = c("349/077,349/074,349/100,349/117",
 "340/384.2,340/513,367/139,455/128,D13/168",
 "600/437,128/903,128/904")


The task I want to perform, is to generate a list, comprising the
portion in front of the "/" of each element of x:

neededResult = list(c("349","349", "349", "349"),
 c("340", "340", "367", "455", "D13"),
 c("600", "128", "128") )


I figured out that for a single element of x the following works

unlist( lapply( strsplit( unlist( strsplit(x[1], "\\,") ), "/"), "[", 1) )

but due to "unlist" it doesn't provide the required result if extended
to all elements of x

unlist(lapply(strsplit( unlist( lapply(x, strsplit, "\\,")), "/"), "[",))


Someone can help me to get the needed result?

Thanks and regards,

Dirk

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] My own distribution in survreg function

2011-02-04 Thread Terry Therneau
---  begin included message 

I'm trying to do some analysis using survreg function.  I need to
implement
there my own distribution with density: 

 lambda*exp(-lambda*y),  where y = a1/(1+exp(-a2*x)). 
a1, a2 are unknown parameters and x >0. 
--- end inclusion --

 The survreg function can fit location/scale families where the location
parameter is Xb, X a matrix of predictors and b the parameters.  I don't
think that your distribution can be written in that form.
  An alternative is to take the apporach of the t-distribution for
survreg, which has the right for for fixed df.  Then wrap the survreg
function inside optim, using the latter to get an estimate of df.

No, I don't have time to give deeper programming help, due to other
projects due.

Terry Therneau

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Estimation and Forecast of Seasonal Component

2011-02-04 Thread Giovanni Petris
I would probably use a Dynamic Linear Model, combining seasonal
component and regression. See package dlm and its vignette for
examples. 

HTH,
Giovanni

On Tue, 2011-02-01 at 11:01 +, Paolo Rossi wrote:
> Hi list,
> 
> 
> I would like to estimate and forecast the seasonal component of a series. My
> model which uses daily data would be something y t = alpha + beta x SeasComp
> t + gamma x OtherRegressors t.
> 
> One approach to this would be use quarterly dummies, another to use a sine
> function. The first would cause a step change when we move from a season to
> another;  the latter impose too much regularity in my data, i.e. symmetry
> around y-axis and x-axis.
> 
> I  have looked at decompose and stl but could  not find a way to forecast
> the seasonal component. They dont allow for they presence of other
> regressors either. Can someone let me know the best way for me to proceed
> with this?
> Thanks in advance
> Paolo
> 
>   [[alternative HTML version deleted]]
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Layout control from RCytoscape

2011-02-04 Thread Fahim M
Hi
I have a list of nodes and edges.
I want to draw a graph from two different direction left and right. I
am drawing the required graph which is growing from left. Similar will be
the case that will grow from right. the one that grows from right may have
few nodes present in left subtree.


   /
   /   a1 ---
  /   \
 a -a2
  \
   \a3

 /  a4
   /
 b a5
  \
\  a6

..
...
Can I draw this type of graph from Rcytoscape.
Thanks
Fahim



-- 
Fahim Mohammad
Bioinforformatics Lab
University of Louisville
Louisville, KY, USA
Ph:  +1-502-409-1167

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] GAM quasipoisson in MuMIn - SOLVED

2011-02-04 Thread Karen Moore
Hi,

Got my issues sorted.

Error message solved:

I heard from the guy who developed MuMIn and his suggestion worked.

"As for the error you get, it seems you are running an old version of MuMIn.
Please update the package first."

I did (I was only 1 version behind in both R and in MuMIn) and error
disappeared!

Running quasipoisson GAM in MuMIn:
As for my questions on GAM and " to include only models with smooth OR
linear term (but not both) for each variable", as Gavin said, this refers
only to global models with both smooth and linear terms for the
*same*variableswhich I didn't have.

Regards,

Karen

On 4 February 2011 12:31, Karen Moore  wrote:

>
> Hi,
>
>  I have a GAM quasipoisson that I'd like to run through MuMIn package
>
>
>- dredge
>- gettop.models
>- model.avg
>
>
> However, I'm having no luck with script from an example in MuMIn help file.
> In MuMIn help they advise "include only models with smooth OR linear term
> (but not both) for each variable". Their example is:
>
> # Example with gam models (based on "example(gam)")
>
> require(mgcv)
>
>
>
> dat <- gamSim(1, n = 500, dist="poisson", scale=0.1)
>
>
>
> gam1 <- gam(y ~ s(x0) + s(x1) + s(x2) + s(x3) + (x1+x2+x3)^2, family =
> poisson, data = dat, method = "REML")
>
>
>
> cat(dQuote(getAllTerms(gam1)), "\n")
>
>
>
>
>
> # include only models with smooth OR linear term (but not both) for each
> variable:
>
>
>
>
>
> dd <- dredge(gam1, subset=(!`s(x1)` | !x1) & (!`s(x2)` | !x2) & (!`s(x3)` |
> !x3))
>
>
> But I'm not sure at all how to apply these instructions to my data
>
>
> My formula is
>
> GAM<-gam(Species richness (count) ~ Categorical +  Continous + Continous +
> *s*(Continous ) + Continous : Continous + Continous : Continous,
> family=quasipoisson, data =)
>
>
> Thanks for any advice on script
>
>
> Karen
>
>
>
>
>


-- 
Karen Moore
PhD Researcher,
FORESTBIO,
Department of Botany,
Trinity College Dublin
Ireland

Ph: 00 353 (0)87 9422 502

http://www.ucc.ie/en/planforbio

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] vegan and sweave using xtable

2011-02-04 Thread Olga Lyashevska

Thanks Ista,

Very instructive and works like a charm!

Cheers!
Olga

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] vegan and sweave using xtable

2011-02-04 Thread Ista Zahn
Hi Olga,
Here is how I would approach this problem. I hope it is instructive.

library(vegan)
data(BCI)
mod <- radfit(BCI[1,])

class(mod) #find out what mod is
methods(class="radfit") #find out what methods are available for
objects of this class. Nothing looks promising, so define a summary
method for it, based on the print method

print.radfit #look to see how the print method is defined. Copy this
function and modify it, as shown below.

summary.radfit <- function (x, digits = max(3, getOption("digits") - 2), ...)
{
# modify the funciton to eliminate the header info
   # cat("\nRAD models, family", x$family$family, "\n")
   # cat("No. of species ", length(x$y), ", total abundance ",
   # sum(x$y), "\n\n", sep = "")
p <- coef(x)
if (any(!is.na(p)))
p <- formatC(p, format = "g", flag = " ", digits = digits)
p <- apply(p, 2, function(x) gsub("NA", " ", x))
aic <- sapply(x$models, AIC)
bic <- sapply(x$models, AIC, k = log(length(x$y)))
dev <- sapply(x$models, deviance)
stats <- format(cbind(Deviance = dev, AIC = aic, BIC = bic),
digits = digits, ...)
out <- cbind(p, stats)
#modify the funciton so it doesn't print the result
#print(out, quote = FALSE)
return(out) #modify the function to return the summary matrix.
}

library(xtable)
xtable(summary(mod))# enjoy the fruits of our labor!


On Fri, Feb 4, 2011 at 9:03 AM, Olga Lyashevska  wrote:
> Dear all,
>
> Using:
>
> library(vegan)
> data(BCI)
> mod <- radfit(BCI[1,])
> mod
>
> RAD models, family poisson
> No. of species 93, total abundance 448
>
>           par1      par2     par3    Deviance AIC      BIC
> Null                                   39.5261 315.4362 315.4362
> Preemption  0.042797                   21.8939 299.8041 302.3367
> Lognormal   1.0687    1.0186           25.1528 305.0629 310.1281
> Zipf        0.11033  -0.74705          61.0465 340.9567 346.0219
> Mandelbrot  100.52   -2.312    24.084   4.2271 286.1372 293.7350
>
>
> I want to include the table shown above into the latex document  (using
> xtable).
>
> library(xtable)
> xtable(mod)
> Error in UseMethod("xtable") :
>  no applicable method for 'xtable' applied to an object of class "radfit"
>
> Also is there is a way to customize the table - I want to exclude first two
> rows  of the output and make a table of:
>
>           par1      par2     par3    Deviance AIC      BIC
> Null                                   39.5261 315.4362 315.4362
> Preemption  0.042797                   21.8939 299.8041 302.3367
> Lognormal   1.0687    1.0186           25.1528 305.0629 310.1281
> Zipf        0.11033  -0.74705          61.0465 340.9567 346.0219
> Mandelbrot  100.52   -2.312    24.084   4.2271 286.1372 293.7350
>
>
> Many thanks,
> Olga
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Ista Zahn
Graduate student
University of Rochester
Department of Clinical and Social Psychology
http://yourpsyche.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Importing dates from SPSS file

2011-02-04 Thread dunner

as.POSIXlt(book$DATE, origin="1582/10/14") works.

The Gregorian calendar was the kicker, thanks to ggrothendieck at gmail.com 

Ross
-- 
View this message in context: 
http://r.789695.n4.nabble.com/Importing-dates-from-SPSS-file-tp3260293p3260329.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Importing dates from SPSS file

2011-02-04 Thread Gabor Grothendieck
On Fri, Feb 4, 2011 at 9:47 AM, dunner  wrote:
>
> Hello all, kind regards,
>
> I have imported a data.frame from SPSS using "foreign":read.spss but
> unfortunately it is reading dates in a way neither R nor myself can
> understand.
>
>> book$DATE
>  [1] 13502246400 13443321600 13477795200 13472956800 13501728000 13445395200
> 13501382400 13502851200 13444185600 13461465600 13457232000
> [12] 13458096000 13432435200 13431484800 13495334400 13491964800 13451184000
> 13490755200 13505616000 13505616000 13502851200 13436928000
> [23] 13497408000 13465094400 13484707200 13447555200 13438483200 13464489600
> 13494816000 13432435200 13455676800 13463884800 13455417600
> [34] 1347840 13504665600 13490150400 13504060800 13471401600 13492915200
> 13465958400 13445136000
>
> I have looked around the documentation but can find nothing similar there or
> in Spector's chapter on Dates
>
> How do I coerce these to %d%m%Y form?
>
> Thank you for your time and patience

SPSS uses seconds since the beginning of the Gregorian calendar.  See
the help desk article in R News 4/1 for details:
http://cran.r-project.org/doc/Rnews/Rnews_2004-1.pdf


-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] coxph fails to survfit

2011-02-04 Thread Bond, Stephen
Responding to T. Therneau:

> sessionInfo()
R version 2.10.1 (2009-12-14) 
i386-pc-mingw32 

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252  
  LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C   LC_TIME=English_United States.1252   
 

attached base packages:
[1] splines   stats graphics  grDevices utils datasets  methods   base  
   

other attached packages:
[1] survival_2.35-8

Also:
> s1 <- predict(mod1,newdata=inc[50050:50100,],
+   type="expected")
Error in predict.coxph(mod1, newdata = inc[50050:50100, ], type = "expected") : 
  Method not yet finished


Stephen 

-Original Message-
From: Terry Therneau [mailto:thern...@mayo.edu] 
Sent: Friday, February 04, 2011 8:54 AM
To: Bond, Stephen
Cc: r-help@r-project.org
Subject: Re: coxph fails to survfit

>  I have a model with quant vars only and the error message does not
make sense:

Could you tell us what version of S and of the survival package you are
using?  You can get this with sessionInfo(), see the posting guide for
details.  

This would help me identify the issue.  I was planning to post an update
to CRAN this weekend that has changes to survfit.coxph, addressing a
different issue than yours  -- models with a strata by covariate
interaction --- but this would be a very good time to address your issue
too if it is a code bug.

Terry Therneau (author of survival).

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Importing dates from SPSS file

2011-02-04 Thread jim holtman
What do the dates look like in the original file?

On Fri, Feb 4, 2011 at 9:47 AM, dunner  wrote:
>
> Hello all, kind regards,
>
> I have imported a data.frame from SPSS using "foreign":read.spss but
> unfortunately it is reading dates in a way neither R nor myself can
> understand.
>
>> book$DATE
>  [1] 13502246400 13443321600 13477795200 13472956800 13501728000 13445395200
> 13501382400 13502851200 13444185600 13461465600 13457232000
> [12] 13458096000 13432435200 13431484800 13495334400 13491964800 13451184000
> 13490755200 13505616000 13505616000 13502851200 13436928000
> [23] 13497408000 13465094400 13484707200 13447555200 13438483200 13464489600
> 13494816000 13432435200 13455676800 13463884800 13455417600
> [34] 1347840 13504665600 13490150400 13504060800 13471401600 13492915200
> 13465958400 13445136000
>
> I have looked around the documentation but can find nothing similar there or
> in Spector's chapter on Dates
>
> How do I coerce these to %d%m%Y form?
>
> Thank you for your time and patience
>
> Ross Dunne
> ross.du...@tcd.ie
>
>
> --
> View this message in context: 
> http://r.789695.n4.nabble.com/Importing-dates-from-SPSS-file-tp3260293p3260293.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Importing dates from SPSS file

2011-02-04 Thread dunner

Hello all, kind regards,

I have imported a data.frame from SPSS using "foreign":read.spss but
unfortunately it is reading dates in a way neither R nor myself can
understand.

> book$DATE
 [1] 13502246400 13443321600 13477795200 13472956800 13501728000 13445395200
13501382400 13502851200 13444185600 13461465600 13457232000
[12] 13458096000 13432435200 13431484800 13495334400 13491964800 13451184000
13490755200 13505616000 13505616000 13502851200 13436928000
[23] 13497408000 13465094400 13484707200 13447555200 13438483200 13464489600
13494816000 13432435200 13455676800 13463884800 13455417600
[34] 1347840 13504665600 13490150400 13504060800 13471401600 13492915200
13465958400 13445136000

I have looked around the documentation but can find nothing similar there or
in Spector's chapter on Dates

How do I coerce these to %d%m%Y form?

Thank you for your time and patience

Ross Dunne
ross.du...@tcd.ie


-- 
View this message in context: 
http://r.789695.n4.nabble.com/Importing-dates-from-SPSS-file-tp3260293p3260293.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Applying multiple functions to one object

2011-02-04 Thread jctoll
On Wed, Feb 2, 2011 at 7:59 AM, Karl Ove Hufthammer  wrote:
> Dear list members,
>
> I recall seeing a convenience function for applying multiple functions to
> one object (i.e., almost the opposite of 'mapply’) somewhere.
> Example: If the function was named ’fun’ the output of
>
>  fun(3.14, mode, typeof, class)
>
> would be identical to the output of
>
>  c(mode(3.14), typeof(3.14), class(3.14))
>
> Is my memory failing me, or does such a function already exists in a
> package? Of course, it’s not difficult to define a summary function and
> apply this to the object, but writing, for example,
>
> fun(x, mean, median, sd, mad)
>
> to quickly show the relevant information is much more *convient*.
>
>
> It would be even nicer with a function that could also handle vectors and
> lists of values, and output the result as data frames or matrices. Example:
>
> x = c("foo", "bar", "foobar")
> fun(x, nchar, function(st) substr(st, 1 ,2) )
>
> y = list(3, 3L, 3.14, factor(3))
> fun(x, mode, typeof, class)
>
> --
> Karl Ove Hufthammer
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Karl,

Perhaps you're thinking of the Reduce function? There's an example
from the help page that you might be able to adapt to your purpose.

## Iterative function application:
Funcall <- function(f, ...) f(...)
## Compute log(exp(acos(cos(0))
Reduce(Funcall, list(log, exp, acos, cos), 0, right = TRUE)

HTH,

James

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Adding axes label to image:Fixed

2011-02-04 Thread ogbos okike
Dear Jim,
One touch!! Thank you very much
Best regards
Ogbos

On 4 February 2011 13:47, Jim Lemon  wrote:

> On 02/04/2011 09:14 PM, ogbos okike wrote:
>
>> Dear all,
>> Using the code I got from the link (
>> http://www.phaget4.org/R/image_matrix.html), I obtained a nice plot that
>> suits me. However, adding axes labels proved difficult for me . I have
>> succeeded in adding a few things to the plot function so as to get what a
>> better plot. Other things work but ylab="days" did not reflect.
>> It will be great if somebody can advice on how to add axes labels to the
>> plot.
>>
>
> Hi Ogbos,
> Try this after the plot has been displayed:
>
> title(ylab="days",line=1)
>
> Jim
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] sensitivity analysis with external simulation model - decoupling in package "sensitivity" or alternative?

2011-02-04 Thread Rainer M Krug
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hi

I hava a simulation model, of which I want to do a sensitivity analysis.
I have identified a number of input variables and my response variable.
What I hava done so far:

1) I created a Latin Hypercube with the lhs package (10.000 simulations)
2) simulated these

and now I have a data frame with the input parameter and the response
variable. I looked around and found the package "sensitivity" which
seems to be the most suitable for my case. As the model is not in R, I
have to use the decoupling approach. But now I am stuck:

How can I do this? An example highlighting how to do the decoupling to
do a sensitivity analysis, would be very much appreciated.

Alternatively, ids there a different approach I could take?

Thanks,

Rainer

- -- 
Rainer M. Krug, PhD (Conservation Ecology, SUN), MSc (Conservation
Biology, UCT), Dipl. Phys. (Germany)

Centre of Excellence for Invasion Biology
Natural Sciences Building
Office Suite 2039
Stellenbosch University
Main Campus, Merriman Avenue
Stellenbosch
South Africa

Tel:+33 - (0)9 53 10 27 44
Cell:   +27 - (0)8 39 47 90 42
Fax (SA):   +27 - (0)8 65 16 27 82
Fax (D) :   +49 - (0)3 21 21 25 22 44
Fax (FR):   +33 - (0)9 58 10 27 44
email:  rai...@krugs.de

Skype:  RMkrug
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk1MCYQACgkQoYgNqgF2egof/gCfR3GuJChsPf34N4wFpFSzAdx2
zc4AmwUkIM83rffU+6dV8ZwNTfd23pcg
=tv8r
-END PGP SIGNATURE-

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] recode according to specific sequence of characters within a string variable

2011-02-04 Thread Denis Kazakiewicz
Dear R people
Could you please help
I have similar but opposite question
How to reshape data from DF.new  to  DF from example, Mark kindly
provided?

Thank you
Denis

On Пят, 2011-02-04 at 07:09 -0600, Marc Schwartz wrote:
> On Feb 4, 2011, at 6:32 AM, D. Alain wrote:
> 
> > Dear R-List, 
> > 
> > I have a dataframe with one column "name.of.report" containing character 
> > values, e.g.
> > 
> > 
> >> df$name.of.report
> > 
> > "jeff_2001_teamx"
> > "teamy_jeff_2002"
> > "robert_2002_teamz"
> > "mary_2002_teamz"
> > "2003_mary_teamy"
> > ...
> > (i.e. the bit of interest is not always at same position)
> > 
> > Now I want to recode the column "name.of.report" into the variables 
> > "person", "year","team", like this
> > 
> >> new.df
> > 
> > "person"  "year"  "team"
> > jeff   2001  x
> > jeff   2002  y
> > robert   2002  z
> > mary2002  z
> > 
> > I tried with grep()
> > 
> > df$person<-grep("jeff",df$name.of.report)
> > 
> > but of course it didn't exactly result in what I wanted to do. Could not 
> > find any solution via RSeek. Excuse me if it is a very silly question, but 
> > can anyone help me find a way out of this?
> > 
> > Thanks a lot
> > 
> > Alain
> 
> 
> There will be several approaches, all largely involving the use of ?regex. 
> Here is one:
> 
> 
> DF <- data.frame(name.of.report = c("jeff_2001_teamx", "teamy_jeff_2002", 
> "robert_2002_teamz", "mary_2002_teamz", 
> "2003_mary_teamy"))
> 
> > DF
>  name.of.report
> 1   jeff_2001_teamx
> 2   teamy_jeff_2002
> 3 robert_2002_teamz
> 4   mary_2002_teamz
> 5   2003_mary_teamy
> 
> 
> DF.new <- data.frame(person = gsub("[_0-9]|team.", "", DF$name.of.report),
>  year = gsub(".*([0-9]{4}).*","\\1", DF$name.of.report),
>  team = gsub(".*team(.).*","\\1", DF$name.of.report))
> 
> 
> > DF.new
>   person year team
> 1   jeff 2001x
> 2   jeff 2002y
> 3 robert 2002z
> 4   mary 2002z
> 5   mary 2003y
> 
> 
> 
> HTH,
> 
> Marc Schwartz
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] vegan and sweave using xtable

2011-02-04 Thread Olga Lyashevska

Dear all,

Using:

library(vegan)
data(BCI)
mod <- radfit(BCI[1,])
mod

RAD models, family poisson
No. of species 93, total abundance 448

   par1  par2 par3Deviance AIC  BIC
Null   39.5261 315.4362 315.4362
Preemption  0.042797   21.8939 299.8041 302.3367
Lognormal   1.06871.0186   25.1528 305.0629 310.1281
Zipf0.11033  -0.74705  61.0465 340.9567 346.0219
Mandelbrot  100.52   -2.31224.084   4.2271 286.1372 293.7350


I want to include the table shown above into the latex document   
(using xtable).


library(xtable)
xtable(mod)
Error in UseMethod("xtable") :
  no applicable method for 'xtable' applied to an object of class  
"radfit"


Also is there is a way to customize the table - I want to exclude  
first two rows  of the output and make a table of:


   par1  par2 par3Deviance AIC  BIC
Null   39.5261 315.4362 315.4362
Preemption  0.042797   21.8939 299.8041 302.3367
Lognormal   1.06871.0186   25.1528 305.0629 310.1281
Zipf0.11033  -0.74705  61.0465 340.9567 346.0219
Mandelbrot  100.52   -2.31224.084   4.2271 286.1372 293.7350


Many thanks,
Olga

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] get caller's name

2011-02-04 Thread Duncan Murdoch

On 03/02/2011 10:27 AM, Ernest Adrogué wrote:

Hi,
Suppose a function that checks an object:

stop.if.dims<- function(x) {
   if (! is.null(dim(x))) {
 stop("cannot handle dimensional data")
   }
}

This would be used by other functions that can only work with
dimensionless objects. The problem is the error message would need to
include the name of the function that called stop.if.dims, so that the
user knows which function got an argument that was incorrect.

How do I do this? Or maybe there is another way...


I see you have the answer you wanted, but I'd suggest you don't need 
this:  the user should just use traceback() after the error to see the 
full call stack.  Perhaps it's not the caller that's the problem, but 
the caller of the caller...


Duncan Murdoch

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


  1   2   >