date:20080613


Let me pick up on

Enabling SSE instructions in addition while building R (yes, you have to 
enable them explicitly, see man gcc) is possible but does not help much 
since all maths is mostly done in BLAS.


The final part is not true for my 'maths', only for those doing linear 
algebra.  Enabling use of SSE registers can help with CPU scheduling, and 
so can have a suprisingly large effect, so if you only run R on a single 
CPU type it is worth tuning the code to that CPU (e.g. -mtune=core2) 
alongside turning up optimization levels.



On Fri, 13 Jun 2008, Ivan Adzhubey wrote:


Hi Ivo,

On Friday 13 June 2008 12:23:06 am ivo welch wrote:

Dear Statisticians--- This is not even an R question, so please
forgive me.  I have so much ignorance in this matter that I do not
know where to begin.  I hope someone can point me to documentation
and/or a sample.


You will sure find some answers to your questions if you look into
R-admin.html file under "Building from source" section. Do a search on BLAS
and you will be presented with some options. Using a bit of R web site search
on the same keyword will give you even more food for thought.


I want to compute a covariance as quickly as non-humanly possible on
an Intel core processor (up to SSE4) under linux.  Alas, I have no
idea how to engage CPU vectorization.  Do I need to use special data
types, or is "double" correct?  Does SSE* understand NaN?  Should I
rely on gcc autodetection of the vectorized meaning of my code, or are
there specific libraries that I should call?


I use Goto BLAS library and it works great. Usually runs 3 to 30 times faster
than the stock R BLAS library, depending on your code. Enabling SSE
instructions in addition while building R (yes, you have to enable them
explicitly, see man gcc) is possible but does not help much since all maths
is mostly done in BLAS.

That said, optimized BLAS libraries give most speed increase with older
processors. Newer crop of multi-core CPUs with large shared caches is much
more difficult to hand-tune code for. You may want to subscribe to Goto BLAS
mailing list for an in-depth discussion. ATLAS community is also very helpful
(I use their code with our AMD CPUs).


What I want to learn about is as simple as it gets:
  typedef double Double;  // or whatever SSE* needs as close equivalent
  Double vector1[N], vector2[N];
  // then fill them with stuff.


R does not have types, everything that does not look like character string or
an integer is treated as double. All arithmetics are always done in double
precision.


  vector3= vector_mult(vector1,vector2, N);
  vector4= sum(vector1, N);

I just need a pointer and/or primer.  PS: If someone knows of a
superfast vectorized implementation of Gentleman's WLS algorithm,
please point me to it, too.  I am still using my old non-vectorized C
routines.


HTH,
Ivan

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Problem with rowMeans()

2008-06-13 Thread Wacek Kusnierczyk

Erik Iverson wrote:
>
>
> ss wrote:
>> It is:
>>
>>  > data <-
>> read.table('E-TABM-1-processed-data-1342561271_log2_with_symbols.txt',
>> row.names = NULL ,header=TRUE, fill=TRUE)
>>  > class(data[3])
>> [1] "data.frame"
>>  >
>>
>
> Oops, should have said  class(data[[3]]) and
> is.numeric(data[[3]])
>
oops, my typo.  of course, data[3] is a *data frame* (if data is one),
so is.numeric(data[3]) must be FALSE.  but clearly if column 3 was
excluded, is.numeric(data[[3]]) must have been FALSE.

vQ

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] parsing - input buffer overflow

2008-06-13 Thread Daniel Malter

Hi,

I am trying to parse a large amount of text using gregexpr(). Unfortunately,
I get an "input buffer overflow" message when I attempt that with too large
an amount of text. The error messages occurs before the parsing. The problem
is that I cannot assign the text to a variable (an object) if the text is
too large.

This problem has been mentioned before, which I found using the RSiteSearch.
However, the post is from 2006, and I thought it might have improved by now.
Is there any way to increase the limit or to get around this problem?

x="Saint Lucia, Saint Kitts and Nevis, Saint Helena, Clipperton Island,
Tristan da Cunha"

#What I want to achieve is to parse the text for the number of occurrences
of a certain character string within the text.

#This is done using:

n=100 #choose n large enough
length(which(is.na(gregexpr("Saint",x,ignore.case=TRUE)[[1]][1:n])==FALSE))

But again, if the text is large, I cannot assign it to x. I'd be grateful
for any suggestions.

Cheers,
Daniel


-
cuncta stricte discussurus

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Sweave: looping over mixed R/LaTeX code

2008-06-13 Thread Stephan Kolassa

Dear guRus,

I would like to loop over a medium amount of Sweave code, including both R and 
LaTeX chunks. Is there any way to do so? As an illustration, can I create a 
.tex file like this using a loop within a .Rnw file, where the "1,2,3" comes 
from some iteration variable in R?


\documentclass{article}
\usepackage{Sweave}
\begin{document}
Iteration 1
Iteration 2
Iteration 3
\end{document}


Right now, I do have a working but painful solution. I put the loop contents in 
a separate loop.Rnw file, then:
1. run everything before the loop through R for initialization
2. Sweave loop.Rnw; shell("move loop.tex loop_1.tex")
   Sweave loop.Rnw; shell("move loop.tex loop_2.tex")
   ...
   Sweave loop.Rnw; shell("move loop.tex loop_n.tex")
3. \input all loop_i.tex files into master.Rnw and Sweave master.Rnw

This does what I need, however, it is a major pain code-wise, e.g., there 
appears to be no way to control the loop during execution (n must be known in 
advance), and I need to control all graphics using \includegraphics with the 
iteration counter paste()d into the filename.

An alternative may be not using Sweave and working with one giant sink() and 
lots of print()s, letting R just write the entire .tex file. This also appears 
inelegant to me.

Is there a better way to do this?

I have tried to do my homework, see below. Do I get partial credit ;-) ?

Thank you all for your time!
Stephan


#


I can't simply start a for loop within an R chunk and finish it in another one.

whiledo in the ifthen.sty package doesn't like Sweave at all. And of course, it 
would simply reuse the R chunks if it did work, without changing things between 
loops. For the same reason, I cannot define a \newcommand{\loopcontent}{...} 
with the entire loop contents and then simply write \loopcontent \loopcontent 
... or \input or \include the loop content from an external file.

Of course it would be possible to not use Sweave and just use the output from 
the R console, but there are a couple of figures I would really like to see 
close to the relevant portions of the calculations.

I also thought about putting the entire loop in *one* R chunk, but then I see 
no way to include LaTeX chunks *within* this R chunk. I can't just sink() to 
the .tex file in the middle of the R chunk (as the sink() gets appended to the 
.tex file only after Sweave is done with it). 

I have read the Sweave manual and FAQs and the R/R Windows FAQ, I did both 
RSiteSearches and RSeek searches for all combinations of "Sweave" and "loop", 
"for", "while" I could think of.

For what it's worth, here's my sessionInfo():

R version 2.7.0 (2008-04-22) 
i386-pc-mingw32 

locale:
LC_COLLATE=German_Germany.1252;LC_CTYPE=German_Germany.1252;LC_MONETARY=German_Germany.1252;LC_NUMERIC=C;LC_TIME=German_Germany.1252

attached base packages:
[1] stats graphics  grDevices utils datasets  tcltk methods   base  
   

other attached packages:
[1] svIDE_0.9-5

loaded via a namespace (and not attached):
[1] svMisc_0.9-5

--

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] adding custom axis to image.plot() and strange clipping behavior

2008-06-13 Thread Stephen Tucker

Hi list,

I wanted to plot an image with a colorbar to the right of the plot, but set my 
own axis labels (text rather than numbers) to the image. I have previously 
accomplished this with two calls to image(), but the package 'fields' has a 
wrapper function, image.plot(), which does this task conveniently.

However, I could not add axes to the original image after a call to 
image.plot(); I have found that I needed to set par(xpd=TRUE) within the 
function to allow this to happen:

###=== begin code
library(fields)

## make data matrix
m <- matrix(1:15,ncol=3)

## plot
image.plot(m,axes=FALSE)
axis(1) # doesn't work

par(xpd=TRUE)
axis(1) # still doesn't work

## replace the 28th element of the body of image.plot()
## and assign to new function called 'imp'
## here I just use the second condition of 'if' statement
## and set 'xpd = TRUE'
imp <- `body<-`(image.plot,value=`[[<-`(body(image.plot),28,
quote({par(big.par)
  par(plt = big.par$plt, xpd = TRUE)
  par(mfg = mfg.save, new = FALSE)
  invisible()})))
imp(m,axes=FALSE)
box()
axis(1,axTicks(1),lab=letters[1:length(axTicks(1))])
## clip to plotting region for additional
## graphical elements to be added:
par(xpd=FALSE)
abline(v=0.5)
###=== end code

I wonder if anyone has any insights into this behavior? Since in the axis() 
documentation, it says:
"Note that xpd is not accepted as clipping is always to the device region"
I am surprised to find (1) that the par(xpd=TRUE) works in the case above, and 
(2) that it must be called before the function call is terminated.

I wonder if anyone has any insights into this behavior. I have reproduced this 
on both my Linux box (Ubuntu Gutsy Gibbon 64-bit, R 2.7.0, fields package 
version 4.1) and Windows machine (32-bit XP Pro, R 2.7.0, fields package 
version 4.1).

Thanks very much,

Stephen

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Sweave: looping over mixed R/LaTeX code

2008-06-13 Thread Delphine Fontaine

Dear Stephan,

I have the same problem than you. My solution is a bit different but not very 
elegant
I have a master document (let say master.Snw) and a file containing the code to 
repeat (which would be in the loop).
In the master document I start a counter at 0, and I copy " 
\SweaveInput{loop.Snw}" as many times as the n of the loop.
And in my loop.Snw, I don't forget to increment the counter of 1.
Not marvelous, but it works...

Delphine




Delphine Fontaine
Statistician
Data & Statistics Department
Genexion SA 

 Please consider the environment before printing this e-mail


> -Original Message-
> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
> project.org] On Behalf Of Stephan Kolassa
> Sent: vendredi 13 juin 2008 10:22
> To: r-help@r-project.org
> Subject: [R] Sweave: looping over mixed R/LaTeX code
> 
> 
> Dear guRus,
> 
> I would like to loop over a medium amount of Sweave code, including
> both R and LaTeX chunks. Is there any way to do so? As an illustration,
> can I create a .tex file like this using a loop within a .Rnw file,
> where the "1,2,3" comes from some iteration variable in R?
> 
> 
> \documentclass{article}
> \usepackage{Sweave}
> \begin{document}
> Iteration 1
> Iteration 2
> Iteration 3
> \end{document}
> 
> 
> Right now, I do have a working but painful solution. I put the loop
> contents in a separate loop.Rnw file, then:
> 1. run everything before the loop through R for initialization
> 2. Sweave loop.Rnw; shell("move loop.tex loop_1.tex")
>Sweave loop.Rnw; shell("move loop.tex loop_2.tex")
>...
>Sweave loop.Rnw; shell("move loop.tex loop_n.tex")
> 3. \input all loop_i.tex files into master.Rnw and Sweave master.Rnw
> 
> This does what I need, however, it is a major pain code-wise, e.g.,
> there appears to be no way to control the loop during execution (n must
> be known in advance), and I need to control all graphics using
> \includegraphics with the iteration counter paste()d into the filename.
> 
> An alternative may be not using Sweave and working with one giant
> sink() and lots of print()s, letting R just write the entire .tex file.
> This also appears inelegant to me.
> 
> Is there a better way to do this?
> 
> I have tried to do my homework, see below. Do I get partial credit ;-)
> ?
> 
> Thank you all for your time!
> Stephan
> 
> 
> #
> 
> 
> I can't simply start a for loop within an R chunk and finish it in
> another one.
> 
> whiledo in the ifthen.sty package doesn't like Sweave at all. And of
> course, it would simply reuse the R chunks if it did work, without
> changing things between loops. For the same reason, I cannot define a
> \newcommand{\loopcontent}{...} with the entire loop contents and then
> simply write \loopcontent \loopcontent ... or \input or \include the
> loop content from an external file.
> 
> Of course it would be possible to not use Sweave and just use the
> output from the R console, but there are a couple of figures I would
> really like to see close to the relevant portions of the calculations.
> 
> I also thought about putting the entire loop in *one* R chunk, but then
> I see no way to include LaTeX chunks *within* this R chunk. I can't
> just sink() to the .tex file in the middle of the R chunk (as the
> sink() gets appended to the .tex file only after Sweave is done with
> it).
> 
> I have read the Sweave manual and FAQs and the R/R Windows FAQ, I did
> both RSiteSearches and RSeek searches for all combinations of "Sweave"
> and "loop", "for", "while" I could think of.
> 
> For what it's worth, here's my sessionInfo():
> 
> R version 2.7.0 (2008-04-22)
> i386-pc-mingw32
> 
> locale:
> LC_COLLATE=German_Germany.1252;LC_CTYPE=German_Germany.1252;LC_MONETARY
> =German_Germany.1252;LC_NUMERIC=C;LC_TIME=German_Germany.1252
> 
> attached base packages:
> [1] stats graphics  grDevices utils datasets  tcltk methods
> base
> 
> other attached packages:
> [1] svIDE_0.9-5
> 
> loaded via a namespace (and not attached):
> [1] svMisc_0.9-5
> 
> --
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Looping, Control Flow & Conditional Statements

2008-06-13 Thread Garth.Warren

Dear R Group:

 

I have little experience using R and even less experience with control
flow type questions.

 

See the following code:

 

a1 = c(0, 1, 1, 1,

0, 0, 0, 0, 0,

0, 0, 1,

1, 1, 1, 0, 0)  

 

for(i in 1:1){

sx <- paste("a",i,sep="")

s <- eval(parse(text = paste("a",i,sep="")))

{g = numeric(length(s))

 k = numeric(length(s))

{for (i in 1:length(s))

{for (j in 1:length(s))

ifelse(((j=i)>1),(g[j] = s[j] + s[i]),(k[j] = s[j] + s[i]))

}}

h1 <- hist(g,freq=TRUE)

h <- h1$counts[4]

cat(sx,":", h,"\n",file = "C:/temp/test-beta.txt", append=TRUE)

}}

 

 

The output is:

> g

 [1] 0 2 2 2 0 0 0 0 0 0 0 2 2 2 2 0 0

> k

 [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

> 

> h

[1] 7

 

& a text file, which has:

a1 : 7

 

k is a by-product of the ifelse statement and is of no interest & g and
h only go part-way to answering my question, which is:

 

For every time an object i.e. a1 (which is actually a time series) - 0 1
1 1 0 0 0 0 0 0 0 1 1 1 1 0 0   has as value over 0 how long do the
values stay above 0. So in this case a1 has two goups or events where
the value is above zero, the first event lasts for 3 'days' and the
second event lasts for 4 'days'. I have my code telling me that there
was a total of 7 'days' in event or above 0, but what I need to know is
that there were two 'events' and the 1st lasted 3 'days' and the 2nd
lasted '4' days. Essentially I want a text file output to say:

 
a1.1 : 3

 
a1.2 : 4

 

My thinking is that I need to somehow get the code working through each
vector one value at a time and when a value is found to meet the critera
of > 0  R creates a new vector; to use the above example it would come
to the first value >0 and then create the new vector a1.1 = (1,1,1) then
as the next value in the series is 0 it would close this new vector
'a1.1'. It would then continue until it reaches the next value >0 and
then create the vector a1.2 = (1,1,1,1) then again as the next value in
the series is 0 it would close this new vector, and so on. 

 

Then all I need to do is perform a count of '1's in these new vectors to
find how many days they met this criteria of being greater than 0

 

I hope the above makes sense and I really hope there is someone willing
and able to help. I don't know how to proceed.

 

Thanks,

Garth 

 

 

 

 

 

 


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] parsing - input buffer overflow


On Fri, 13 Jun 2008, Daniel Malter wrote:


Hi,

I am trying to parse a large amount of text using gregexpr(). Unfortunately,
I get an "input buffer overflow" message when I attempt that with too large
an amount of text. The error messages occurs before the parsing. The problem
is that I cannot assign the text to a variable (an object) if the text is
too large.


R does have limits on the command line length (1024 bytes up to R-devel, 
4096 bytes there).  What happens if you exceed that depends on the 
interface you are using (and you have not told us).  Beyond that, the 
parser has a limit of MAXELTSIZE (8192 bytes) on strings.


I don't see any need for 'improvement' though: why are you entering very 
long strings as part of the R program?  They are data, and e.g. 
readLines() and scan() have no limits on string length beyond those 
imposed by R's internals (2^31-1 bytes).



This problem has been mentioned before, which I found using the RSiteSearch.
However, the post is from 2006, and I thought it might have improved by now.
Is there any way to increase the limit or to get around this problem?

x="Saint Lucia, Saint Kitts and Nevis, Saint Helena, Clipperton Island,
Tristan da Cunha"


I presume that is not an example?  It looks like a character vector which 
has been collapsed by paste(x, ", ") and would be better strsplit() into 
its components than using gregexpr.



#What I want to achieve is to parse the text for the number of occurrences
of a certain character string within the text.

#This is done using:

n=100 #choose n large enough
length(which(is.na(gregexpr("Saint",x,ignore.case=TRUE)[[1]][1:n])==FALSE))

But again, if the text is large, I cannot assign it to x. I'd be grateful
for any suggestions.

Cheers,
Daniel


-
cuncta stricte discussurus

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to increase the for() loop speed?

2008-06-13 Thread Karl Ove Hufthammer

Rafael Barros de Rezende:

> I would like to know if there is a way to increase the for() loop speed
> because in my routine the calculations are too slow.

Read the article 'How Can I Avoid This Loop or Make It Faster?' on page 46
in the latest R News "http://cran.r-project.org/doc/Rnews/Rnews_2008-1.pdf";.

-- 
Karl Ove Hufthammer

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Output of silhouette (cluster package)

2008-06-13 Thread Cristiano Varin

Dear R users,
I am mailing you about the graphical output of silhouette (cluster  
package)

 From the example of silhouette in help(silhouette):

 > ar <- agnes(ruspini)
 > si3 <- silhouette(cutree(ar, k = 5), # k = 4 gave the same as pam()  
above
+daisy(ruspini))
 > plot(si3, nmax = 80, cex.names = 0.5)

from which one may conclude that group 1 is composed by units from 1  
to 20, group 2 by units from 21 to 43, group 3 by units from 44 to 57,  
group 4 by units from 58 to 60 and, finally, group 5 by units from 61  
to 75.

However, this seems to be in contrast with the output of silhouette  
where the fourth group is composed by units from 46 to 48 instead of  
units from 58 to 60 (belonging to the third cluster), see
 > si3
   cluster neighbor   sil_width
  [1,]   15 0.679838078
  [2,]   15 0.745615002
  [3,]   15 0.758796123
  [4,]   14 0.715554768
  [5,]   15 0.664657114
  [6,]   14 0.783993831
  [7,]   12 0.590057470
  [8,]   14 0.747969458
  [9,]   15 0.792304760
[10,]   14 0.803547635
[11,]   14 0.742402051
[12,]   14 0.722302731
[13,]   14 0.665412622
[14,]   15 0.756910666
[15,]   15 0.700685403
[16,]   15 0.743601834
[17,]   15 0.614854124
[18,]   15 0.708007860
[19,]   15 0.700093839
[20,]   14 0.568989067
[21,]   24 0.751866935
[22,]   24 0.790783667
[23,]   24 0.802659788
[24,]   24 0.785895823
[25,]   24 0.822943473
[26,]   24 0.831313347
[27,]   24 0.818043337
[28,]   24 0.805454305
[29,]   24 0.770547118
[30,]   24 0.768289979
[31,]   23 0.794485567
[32,]   24 0.829925955
[33,]   24 0.807379640
[34,]   24 0.790626589
[35,]   24 0.817427927
[36,]   23 0.793572412
[37,]   24 0.760561408
[38,]   24 0.743170109
[39,]   23 0.761413953
[40,]   23 0.704193051
[41,]   24 0.297007126
[42,]   24 0.522049838
[43,]   23 0.488556828
[44,]   34 0.377632488
[45,]   34 0.007214464
[46,]   43 0.699407534
[47,]   43 0.837451212
[48,]   43 0.794349431
[49,]   34 0.632862996
[50,]   34 0.586149139
[51,]   34 0.647326133
[52,]   34 0.650020368
[53,]   34 0.629131005
[54,]   34 0.618843633
[55,]   34 0.586439350
[56,]   34 0.586788051
[57,]   34 0.668108812
[58,]   34 0.650074540
[59,]   34 0.628444500
[60,]   34 0.591393005
[61,]   51 0.770110294
[62,]   51 0.815309198
[63,]   54 0.771622667
[64,]   51 0.806125429
[65,]   51 0.850310507
[66,]   51 0.822984066
[67,]   51 0.852743923
[68,]   51 0.762055943
[69,]   51 0.839180986
[70,]   51 0.854894699
[71,]   51 0.838106473
[72,]   51 0.774812117
[73,]   51 0.795021304
[74,]   51 0.759681469
[75,]   51 0.742553847
attr(,"Ordered")
[1] FALSE
attr(,"call")
silhouette.default(x = cutree(ar, k = 5), dist = daisy(ruspini))
attr(,"class")
[1] "silhouette"

Thanks for your attention,
Cristiano
-
Cristiano Varin
[EMAIL PROTECTED]
http://www.dst.unive.it/~sammy/









[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] model simplification using Crawley as a guide

2008-06-13 Thread Jim Lemon


Peter Dalgaard wrote:

...
That'll be anti-hist()-amine, I presume?


I would think p-necillin a more appropriate treatment.

Jim

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] uncertainty bounds for a weighted moving average

2008-06-13 Thread jgarcia

Hi,
well; this is not a R-specific question. But perhaps you can help.
If I've got an irregularly sampled time series, and conduct a moving
average filter (e.g., with a triangular kernel), how could the uncertainty
bounds be calculated?

Thanks and best regards

J.
---

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Writing a new link for a GLM.

2008-06-13 Thread Jan Graffelman


Hi,

I wish to write a new link function for a GLM. R's glm routine does
not supply the "loglog" link. I modified the make.link function adding
the code:

}, loglog = {
linkfun <- function(mu) -log(-log(mu))
linkinv <- function(eta) exp(-exp(-eta))
mu.eta <- function(eta) exp(-exp(-eta)-eta)
valideta <- function(eta) all(eta != 0)
}, stop(sQuote(link), " link not recognised"))
structure(list(linkfun = linkfun, linkinv = linkinv, mu.eta = mu.eta,
valideta = valideta, name = link), class = "link-glm")
}


and then call glm with argument

glm(y~x1+x2+x3,family=binomial(link=make.link("loglog")),data=X)

and that seems to work.

Is this the way to include a new link function? Any other suggestions?

Jan.

--

|Jan Graffelman  |tel:   +34-93-4011739|
|Dpt. of Statistics & Operations Research|fax:   +34-93-4016575|
|Universitat Politecnica de Catalunya|email: [EMAIL PROTECTED]|
|Av. Diagonal 647, 6th floor |www: |
|08028 Barcelona, Spain  |  http://www-eio.upc.es/~jan/|

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] adding custom axis to image.plot() and strange clipping behavior

2008-06-13 Thread Katharine Mullen

I also noticed that adding a custom axis with image.plot was a problem;
you can also do:

library(fields)

m <- matrix(1:15,ncol=3)

par(mar=c(5,5,5,7))

image(m, axes=FALSE)

# add axis
axis(1,axTicks(1),lab=letters[1:length(axTicks(1))])
box()

## add legend
image.plot(m, legend.only=TRUE)

On Fri, 13 Jun 2008, Stephen Tucker wrote:

> Hi list,
>
> I wanted to plot an image with a colorbar to the right of the plot, but
> set my own axis labels (text rather than numbers) to the image. I have
> previously accomplished this with two calls to image(), but the package
> 'fields' has a wrapper function, image.plot(), which does this task
> conveniently.
>
> However, I could not add axes to the original image after a call to
> image.plot(); I have found that I needed to set par(xpd=TRUE) within the
> function to allow this to happen:
>
> ###=== begin code
> library(fields)
>
> ## make data matrix
> m <- matrix(1:15,ncol=3)
>
> ## plot
> image.plot(m,axes=FALSE)
> axis(1) # doesn't work
>
> par(xpd=TRUE)
> axis(1) # still doesn't work
>
> ## replace the 28th element of the body of image.plot()
> ## and assign to new function called 'imp'
> ## here I just use the second condition of 'if' statement
> ## and set 'xpd = TRUE'
> imp <- `body<-`(image.plot,value=`[[<-`(body(image.plot),28,
> quote({par(big.par)
>   par(plt = big.par$plt, xpd = TRUE)
>   par(mfg = mfg.save, new = FALSE)
>   invisible()})))
> imp(m,axes=FALSE)
> box()
> axis(1,axTicks(1),lab=letters[1:length(axTicks(1))])
> ## clip to plotting region for additional
> ## graphical elements to be added:
> par(xpd=FALSE)
> abline(v=0.5)
> ###=== end code
>
> I wonder if anyone has any insights into this behavior? Since in the axis() 
> documentation, it says:
> "Note that xpd is not accepted as clipping is always to the device region"
> I am surprised to find (1) that the par(xpd=TRUE) works in the case above, 
> and (2) that it must be called before the function call is terminated.
>
> I wonder if anyone has any insights into this behavior. I have reproduced 
> this on both my Linux box (Ubuntu Gutsy Gibbon 64-bit, R 2.7.0, fields 
> package version 4.1) and Windows machine (32-bit XP Pro, R 2.7.0, fields 
> package version 4.1).
>
> Thanks very much,
>
> Stephen
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Sweave: looping over mixed R/LaTeX code

2008-06-13 Thread Dieter Menne

Stephan Kolassa  gmx.de> writes:

> I would like to loop over a medium amount of Sweave code, including both R and
LaTeX chunks. Is there any way to
> do so? As an illustration, can I create a .tex file like this using a loop
within a .Rnw file, where the
> "1,2,3" comes from some iteration variable in R?
> 
> 
> \documentclass{article}
> \usepackage{Sweave}
> \begin{document}
> Iteration 1
> Iteration 2
> Iteration 3
> \end{document}
> 

I normally do this with a \newcommand: all latex stuff in the newcommand{},
passing parameters created by R.

Dieter

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] MCA in R

2008-06-13 Thread John Fox

Dear Kimmo,

> -Original Message-
> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
On
> Behalf Of K. Elo
> Sent: June-13-08 1:43 AM
> To: r-help@r-project.org
> Subject: Re: [R] MCA in R
> 
> Dear John,
> 
> thanks for Your quick reply.
> 
> > John Fox wrote:
> > Dear Kimmo,
> >
> > MCA is a rather old name (introduced, I think, in the 1960s by
> > Songuist and Morgan in the OSIRIS package) for a linear model
> > consisting entirely of factors and with only additive effects --
> > i.e., an ANOVA model will no interactions.
> 
> It is true, that MCA is an old name, but the technique itself is still
> robust, I think. The problem I am facing is that I have a research
> project where I try to find out which factors affect measured knowledge
> of a specific issue. As predictors I have formal education, interest,
> gender and consumption of different medias (TV, newspapers etc.). Now,
> these are correlated predictors and running e.g. a simple anova
> (anova(lm(...)) as You suggested) won't - if I have understood correctly
> - consider the problem of correlated predictors. MCA would do this.

That's because anova() calculates sequential ("type-I") sums of squares; if
you use the Anova() function in the car package, for example, you'll get
so-called type-II sums of squares -- for each factor after the others. You
could also more tediously do these tests directly using the anova()
function, by contrasting alternative models: the full model and the model
deleting each factor in turn.

> 
> A colleague of mine has run anova and MCA in SPSS and the results differ
> significantly.

Yes, see above.

>  Because I am more familiar with R, I just hoped that this
> marvelous statistical package could handle MCA, too :)
> 
> > Typically, the results of
> > an MCA are reported using "adjusted means." You could compute these
> > manually, or via the effects package.
> 
> Well, I am interested in the eta and beta values, too. 

Aren't the eta values just the square-roots of the R^2's from the individual
one-way ANOVAs? I don't remember how the betas are defined, but do recall
that they are a peculiar attempt to define standardized partial regression
coefficients for factors that combine all of the levels.

> I have tried to
> use the effects package but my attempts with all.effects resulted in
> errors. I have to figure out what's going wrong here :)

If you tell me what you did, ideally including an example that I can
reproduce, I can probably tell you what's wrong.

Regards,
 John

> 
> Kind regards,
> Kimmo Elo
> 
> --
> University of Turku, Finland
> Dep. of political science
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Switching the order of legend boxes in a lattice bar graph

2008-06-13 Thread Bob Green



I suspect there is a simple solution to this problem, but have been 
unable to find it. Below is some code that I have run to create 3 
lattice graphs. I have been asked to change the legend so that the 
'No' and dark blue are above "Y" and light blue in the legend to 
mirror the stacked bars in the graph which feature dark blue above light blue.


I have tried changing the data as well as the order of the legend 
text, without success.  Any assistance is much appreciated,


regards

Bob Green



library(lattice)
SNFP1 <- as.table(matrix(c(4,1, 4,4, 1,3, 2,7, 1,6, 0,4), ncol = 6, 
dimnames = list(group=c("Y","No"), Status=c("A","B", "C", "D", "E", "F"
barplot(SNFP1, beside=FALSE, legend=TRUE, ylim=c(0, 60), ylab="N of 
patients", main ="district 1", col=c("light blue", "dark blue"))


# "A","B", "C", "D", "E", "F"

SNFP2 <- as.table(matrix(c(3,7, 1,5, 0,1, 0,1), ncol = 4, dimnames = 
list(group=c("Y","No"), Status=c("G","H", "I", "J"
barplot(SNFP2, beside=FALSE, legend=TRUE, ylim=c(0, 60), ylab="N of 
patients", main ="district 2", col=c("light blue", "dark blue"))


# "G", "H", "I", "J",

SNFP3 <- as.table(matrix(c(3,0, 0,2, 3,4), ncol = 3, dimnames = 
list(group=c("Y","No"), Status=c("K","L", "M"
barplot(SNFP3, beside=FALSE, legend=TRUE, ylim=c(0, 60), ylab="N of 
patients", main ="district 3", col=c("light blue", "dark blue"))



df1 <- as.data.frame(t(SNFP1))
df2 <- as.data.frame(t(SNFP2))
df3 <- as.data.frame(t(SNFP3))
stuff <- make.groups(A=df1, B=df2, C=df3)

# simple version
barchart(Freq ~ Status | which, groups=group, data=stuff, 
stack=TRUE,scales=list(x=list(relation="free")), auto.key=TRUE)


# advanced version
barchart(Freq ~ Status | which, groups=group, data=stuff, stack=TRUE, 
as.table=TRUE, layout=c(2,2), 
skip=c(F,T,F,F),scales=list(x=list(relation="free")), ylab="patients",
main="Figure 1: X by district", 
par.settings=list(superpose.polygon=list(col=c("light blue", "dark 
blue"))), auto.key=list(x = .6, y = .7, corner = c(0, 0)))


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Switching the order of legend boxes in a lattice bar graph

2008-06-13 Thread Markus Gesmann

Hi Bob,

Would this:

mykey <- list(
 rectangles = list(col=c("dark blue","light blue") ),
 text=list(lab=c("No","Yes")),x = .6, y = .7, corner = c(0, 0))

barchart(Freq ~ Status | which, groups=group, data=stuff, stack=TRUE, 
as.table=TRUE, layout=c(2,2), 
skip=c(F,T,F,F),scales=list(x=list(relation="free")), ylab="patients", 
main="Figure 1: X by district",  
par.settings=list(superpose.polygon=list(col=c("light blue", "dark blue"))), 
key=mykey)

solve your problem?

Regards,

Markus

Markus Gesmann │Associate Director│Libero Ventures Ltd, One Broadgate, London 
EC2M 2QS
tel: +44 (0)207 826 9080│ dir: +44 (0)207 826 9085│fax: +44 (0)207 826 9090 
│www.libero.uk.com

A Lehman Brothers Company

AUTHORISED AND REGULATED BY THE FINANCIAL SERVICES AUTHORITY


-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Bob Green
Sent: 13 June 2008 12:14
To: r-help@r-project.org
Subject: [R] Switching the order of legend boxes in a lattice bar graph


I suspect there is a simple solution to this problem, but have been
unable to find it. Below is some code that I have run to create 3
lattice graphs. I have been asked to change the legend so that the
'No' and dark blue are above "Y" and light blue in the legend to
mirror the stacked bars in the graph which feature dark blue above light blue.

I have tried changing the data as well as the order of the legend
text, without success.  Any assistance is much appreciated,

regards

Bob Green



library(lattice)
SNFP1 <- as.table(matrix(c(4,1, 4,4, 1,3, 2,7, 1,6, 0,4), ncol = 6,
dimnames = list(group=c("Y","No"), Status=c("A","B", "C", "D", "E", "F"

barplot(SNFP1, beside=FALSE, legend=TRUE, ylim=c(0, 60), ylab="N of
patients", main ="district 1", col=c("light blue", "dark blue"))

# "A","B", "C", "D", "E", "F"

SNFP2 <- as.table(matrix(c(3,7, 1,5, 0,1, 0,1), ncol = 4, dimnames =
list(group=c("Y","No"), Status=c("G","H", "I", "J"

barplot(SNFP2, beside=FALSE, legend=TRUE, ylim=c(0, 60), ylab="N of
patients", main ="district 2", col=c("light blue", "dark blue"))

# "G", "H", "I", "J",

SNFP3 <- as.table(matrix(c(3,0, 0,2, 3,4), ncol = 3, dimnames =
list(group=c("Y","No"), Status=c("K","L", "M"

barplot(SNFP3, beside=FALSE, legend=TRUE, ylim=c(0, 60), ylab="N of
patients", main ="district 3", col=c("light blue", "dark blue"))


df1 <- as.data.frame(t(SNFP1))
df2 <- as.data.frame(t(SNFP2))
df3 <- as.data.frame(t(SNFP3))
stuff <- make.groups(A=df1, B=df2, C=df3)

# simple version
barchart(Freq ~ Status | which, groups=group, data=stuff,
stack=TRUE,scales=list(x=list(relation="free")), auto.key=TRUE)

# advanced version
barchart(Freq ~ Status | which, groups=group, data=stuff, stack=TRUE,
as.table=TRUE, layout=c(2,2),
skip=c(F,T,F,F),scales=list(x=list(relation="free")), ylab="patients",
main="Figure 1: X by district",
par.settings=list(superpose.polygon=list(col=c("light blue", "dark
blue"))), auto.key=list(x = .6, y = .7, corner = c(0, 0)))

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

This message is intended for the personal and confidential use for the 
designated recipient(s) named above.  If you are not the intended recipient of 
this message you are hereby notified that any review, dissemination,  
distribution or copying of this message is strictly prohibited. This 
communication is for information purposes only and should not be regarded as an 
offer to sell or as a solicitation of an offer to buy any financial product, an 
official confirmation of any transaction or as an official statement of Libero 
Ventures Ltd.  Email transmissions cannot be guaranteed to be secure or 
error-free. Therefore we do not represent that this information is complete or 
accurate and it should not be relied upon as such.  All information is subject 
to change without notice.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Writing a new link for a GLM.

2008-06-13 Thread roger koenker


I wrote an R-news note about this sort of thing in 2006, you can
navigate there via CRAN...

url:www.econ.uiuc.edu/~rogerRoger Koenker
email[EMAIL PROTECTED]Department of Economics
vox: 217-333-4558University of Illinois
fax:   217-244-6678Champaign, IL 61820


On Jun 13, 2008, at 4:54 AM, Jan Graffelman wrote:


Hi,

I wish to write a new link function for a GLM. R's glm routine does
not supply the "loglog" link. I modified the make.link function adding
the code:

   }, loglog = {
   linkfun <- function(mu) -log(-log(mu))
   linkinv <- function(eta) exp(-exp(-eta))
   mu.eta <- function(eta) exp(-exp(-eta)-eta)
   valideta <- function(eta) all(eta != 0)
   }, stop(sQuote(link), " link not recognised"))
   structure(list(linkfun = linkfun, linkinv = linkinv, mu.eta =  
mu.eta,

   valideta = valideta, name = link), class = "link-glm")
}


and then call glm with argument

glm(y~x1+x2+x3,family=binomial(link=make.link("loglog")),data=X)

and that seems to work.

Is this the way to include a new link function? Any other suggestions?

Jan.

--

|Jan Graffelman  |tel:
+34-93-4011739|
|Dpt. of Statistics & Operations Research|fax:
+34-93-4016575|
|Universitat Politecnica de Catalunya|email: [EMAIL PROTECTED] 
|
|Av. Diagonal 647, 6th floor | 
www: |
|08028 Barcelona, Spain  |  http://www-eio.upc.es/~jan/ 
|


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] R and Browninan Motion/ Langevin Equation package

2008-06-13 Thread Peter Mueller

Hi,

I'm writing a short course tutorial to Browninan Motion/ Langevin Equation.
At the end of the theory section I wanted to add a short GNU R example, so the 
students can play a little around.

I already looked in the MASS book (by Venables and Ripley) but I couldn't find 
any Brownian Motion/ Langevin Equation package.
Are there any good packages or tutorials available  which cover R and Browninan 
Motion/ Langevin Equation?

Thanks
Peter
-- 

Jetzt dabei sein: http://www.shortview.de/[EMAIL PROTECTED]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] MCA in R

Although John Fox naturally mentions his Anova function, I would like to 
point out that drop1() (and MASS::dropterm) also does the tests of Type-II 
ANOVA of which John says 'more tediously do these tests directly'.


It seems a lot easier to teach newcomers about drop1() than to introduce 
the SAS terminology and then say (to quote ?Anova)


  'the definitions used here do not correspond precisely to those
   employed by SAS'

(I would welcome a description of the precise differences on the Anova 
help page.)



On Fri, 13 Jun 2008, John Fox wrote:


Dear Kimmo,


-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]

On

Behalf Of K. Elo
Sent: June-13-08 1:43 AM
To: r-help@r-project.org
Subject: Re: [R] MCA in R

Dear John,

thanks for Your quick reply.


John Fox wrote:
Dear Kimmo,

MCA is a rather old name (introduced, I think, in the 1960s by
Songuist and Morgan in the OSIRIS package) for a linear model
consisting entirely of factors and with only additive effects --
i.e., an ANOVA model will no interactions.


It is true, that MCA is an old name, but the technique itself is still
robust, I think. The problem I am facing is that I have a research
project where I try to find out which factors affect measured knowledge
of a specific issue. As predictors I have formal education, interest,
gender and consumption of different medias (TV, newspapers etc.). Now,
these are correlated predictors and running e.g. a simple anova
(anova(lm(...)) as You suggested) won't - if I have understood correctly
- consider the problem of correlated predictors. MCA would do this.


That's because anova() calculates sequential ("type-I") sums of squares; if
you use the Anova() function in the car package, for example, you'll get
so-called type-II sums of squares -- for each factor after the others. You
could also more tediously do these tests directly using the anova()
function, by contrasting alternative models: the full model and the model
deleting each factor in turn.



A colleague of mine has run anova and MCA in SPSS and the results differ
significantly.


Yes, see above.


 Because I am more familiar with R, I just hoped that this
marvelous statistical package could handle MCA, too :)


Typically, the results of
an MCA are reported using "adjusted means." You could compute these
manually, or via the effects package.


Well, I am interested in the eta and beta values, too.


Aren't the eta values just the square-roots of the R^2's from the individual
one-way ANOVAs? I don't remember how the betas are defined, but do recall
that they are a peculiar attempt to define standardized partial regression
coefficients for factors that combine all of the levels.


I have tried to
use the effects package but my attempts with all.effects resulted in
errors. I have to figure out what's going wrong here :)


If you tell me what you did, ideally including an example that I can
reproduce, I can probably tell you what's wrong.

Regards,
John



Kind regards,
Kimmo Elo

--
University of Turku, Finland
Dep. of political science

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide

http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] MCA in R

2008-06-13 Thread John Fox

Dear Brian,

> -Original Message-
> From: Prof Brian Ripley [mailto:[EMAIL PROTECTED]
> Sent: June-13-08 8:13 AM
> To: John Fox
> Cc: 'K. Elo'; r-help@r-project.org
> Subject: Re: [R] MCA in R
> 
> Although John Fox naturally mentions his Anova function, I would like to
> point out that drop1() (and MASS::dropterm) also does the tests of Type-II
> ANOVA of which John says 'more tediously do these tests directly'.

It's true that for an additive model (such as Kimmo's), drop1() and Anova()
produce the same sums of squares, but for a model in which some terms are
marginal to others, drop1() produces tests only for the high-order terms.
One could specify scope = ~ . to drop1(), but that produces so-called
"type-III" tests. Perhaps there's some convenient way around this of which
I'm unaware.

> 
> It seems a lot easier to teach newcomers about drop1() than to introduce
> the SAS terminology and then say (to quote ?Anova)
> 
>'the definitions used here do not correspond precisely to those
> employed by SAS'
> 
> (I would welcome a description of the precise differences on the Anova
> help page.)

As I recall, the differences are for "type-III" tests, where in Anova()
these are dependent upon contrast coding.

Regards,
 John

> 
> 
> On Fri, 13 Jun 2008, John Fox wrote:
> 
> > Dear Kimmo,
> >
> >> -Original Message-
> >> From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED]
> > On
> >> Behalf Of K. Elo
> >> Sent: June-13-08 1:43 AM
> >> To: r-help@r-project.org
> >> Subject: Re: [R] MCA in R
> >>
> >> Dear John,
> >>
> >> thanks for Your quick reply.
> >>
> >>> John Fox wrote:
> >>> Dear Kimmo,
> >>>
> >>> MCA is a rather old name (introduced, I think, in the 1960s by
> >>> Songuist and Morgan in the OSIRIS package) for a linear model
> >>> consisting entirely of factors and with only additive effects --
> >>> i.e., an ANOVA model will no interactions.
> >>
> >> It is true, that MCA is an old name, but the technique itself is still
> >> robust, I think. The problem I am facing is that I have a research
> >> project where I try to find out which factors affect measured knowledge
> >> of a specific issue. As predictors I have formal education, interest,
> >> gender and consumption of different medias (TV, newspapers etc.). Now,
> >> these are correlated predictors and running e.g. a simple anova
> >> (anova(lm(...)) as You suggested) won't - if I have understood
correctly
> >> - consider the problem of correlated predictors. MCA would do this.
> >
> > That's because anova() calculates sequential ("type-I") sums of squares;
if
> > you use the Anova() function in the car package, for example, you'll get
> > so-called type-II sums of squares -- for each factor after the others.
You
> > could also more tediously do these tests directly using the anova()
> > function, by contrasting alternative models: the full model and the
model
> > deleting each factor in turn.
> >
> >>
> >> A colleague of mine has run anova and MCA in SPSS and the results
differ
> >> significantly.
> >
> > Yes, see above.
> >
> >>  Because I am more familiar with R, I just hoped that this
> >> marvelous statistical package could handle MCA, too :)
> >>
> >>> Typically, the results of
> >>> an MCA are reported using "adjusted means." You could compute these
> >>> manually, or via the effects package.
> >>
> >> Well, I am interested in the eta and beta values, too.
> >
> > Aren't the eta values just the square-roots of the R^2's from the
> individual
> > one-way ANOVAs? I don't remember how the betas are defined, but do
recall
> > that they are a peculiar attempt to define standardized partial
regression
> > coefficients for factors that combine all of the levels.
> >
> >> I have tried to
> >> use the effects package but my attempts with all.effects resulted in
> >> errors. I have to figure out what's going wrong here :)
> >
> > If you tell me what you did, ideally including an example that I can
> > reproduce, I can probably tell you what's wrong.
> >
> > Regards,
> > John
> >
> >>
> >> Kind regards,
> >> Kimmo Elo
> >>
> >> --
> >> University of Turku, Finland
> >> Dep. of political science
> >>
> >> __
> >> R-help@r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
> >
> > __
> > R-help@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
> 
> --
> Brian D. Ripley,  [EMAIL PROTECTED]
> Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
> University of Oxford, Tel:  +44 1865 272861 (self)
> 1 South Parks Road, +44 1865

Re: [R] Problems with mars in R in the case of nonlinear functions

2008-06-13 Thread Stephen Milborrow

| I'm trying to use mars function in R to interpolate nonlinear
| multivariate functions.
| However, it seems that mars gives me a fit which uses only very few
| basis function and it underfits very badly.

Try the "earth" package which extends the mars function in the mda package.

Your example becomes

library(earth) # was mda
f <- function(x,y) { x^2-y^2 }
x <- seq(-1,1,length=10)
x <- outer(x*0,x,FUN="+")
y <- t(x)
X <- cbind(as.vector(x),as.vector(y))
z <- f(x,y)
fit <- earth(X, as.vector(z))
summary(fit)
plotmo(fit) # note better fit than before
# your original plotting code could be used too

For this kind of data, you could possibly use the minspan parameter.  MARS
by default does not allow every observation to be used as a knot in the
generated basis functions. This strategyy increases resistance to runs of
correlated noise in the data.  For non-noisy data, you can set minspan=1 to
allow MARS to consider
every observation as a potential knot.  If your data were noisy then
minspan=1 could overfit the data.  With earth, you can use trace=2 to see
the calculated minspan value.

If you run the above example with the earth parameter trace=1, you will see
that the stopping condition for the forward pass is:

Reached delta RSq threshold (DeltaRSq 0.00030214 < 0.001)

To make the forward pass continue further, change the "delta RSq threshold"
by using the thresh parameter:

fit <- earth(X, as.vector(z), thresh=1e-6)

The resulting model "looks" better when plotted, but note that using thresh
here makes almost no change to the GRSq.  That is, with the lower threshold
the model is more complicated (has more terms) but does not have a greater
predictive power.  The threshold is just one of the reasons that the forward
pass can terminate (reaching the the maximum number of terms nk is another).
AFAIK Friedman's code (that you ran from Matlab) does not use the threshold
but instead just continues forward stepping until nk is reached.  In this
case the Matlab model is arguably more complicated than it need be.  I
believe the forward threshhold for MARS was an innovation of Hastie and
Tibshirani, but I could be wrong.

To reduce mailing list traffic, let's continue this discussion off-line i.e.
by direct mail to each other, and if necessary I will summarize results of
our discussions in the earth documentation.

Regards
Steve

| Message: 76
| Date: Thu, 12 Jun 2008 13:35:35 -0700
| From: Janne Huttunen <[EMAIL PROTECTED]>
| Subject: [R] Problems with mars in R in the case of nonlinear
| functions
| To:
| Message-ID: <[EMAIL PROTECTED]>
| Content-Type: text/plain; charset=ISO-8859-1; format=flowed
|
| Hi,
|
| I'm trying to use mars function in R to interpolate nonlinear
| multivariate functions.
| However, it seems that mars gives me a fit which uses only very few
| basis function and
| it underfits very badly.
|
| For example, I have tried the following code to test mars:
|
| require("mda")
|
| f <- function(x,y) { x^2-y^2 };
| #f <- function(x,y) { x+2*y };
|
| # Grid
| x <- seq(-1,1,length=10);
| x <- outer(x*0,x,FUN="+"); y <- t(x);
| X <- cbind(as.vector(x),as.vector(y));
|
| # Data
| z <- f(x,y);
|
| fit <- mars(X,as.vector(z),nk=200,penalty=2,thresh=1e-3,degree=2);
|
| # Plotting
| par(mfrow=c(1,2),pty="s")
| lims <- c(min(c(min(z),min(fit$fitted))),max(c(max(z),max(fit$fitted
| persp(z=z,ticktype='detailed',col='lightblue',shade=.75,ltheta=50,
|   xlab='x',ylab='y',zlab='z',main='true',phi=25,theta=55,zlim=lims)
|
persp(z=matrix(fit$fitted.values,nrow=nrow(x),byrow=F),ticktype='detailed',
|col='lightblue',
| xlab='x',ylab='y',zlab='z',shade=.75,ltheta=50,main='MARS',
|phi=25,theta=55,zlim=lims)
|
| (the code is also here if someone wants to try it:
| http://venda.uku.fi/~jmhuttun/R/marstest.R)
|
| The results are here: http://venda.uku.fi/~jmhuttun/R/R-10.pdf . The
| fitted model contains only
| 5 terms which is not enough in this case. Adjusting parameters like nk,
| thresh, penalty and degree
| seems only have minor effect or no effect at all. It's also strange that
| when I increase
| the number of points in the grid, the results are ever worse:
| see e.g. http://venda.uku.fi/~jmhuttun/R/R-20.pdf for a 20x20 grid.
| However Mars seems to work well with linear functions (e.g. with the
| function which
| is commented in the above code).
|
| Do anyone know what is wrong in this case? Do I miss something is there
| something
| wrong in my code?
|
| This seems not to be a problem with MARS method in general. For example,
| Friedman's MARS implementation (ran in Matlab) gives a rather good fit:
| see http://venda.uku.fi/~jmhuttun/R/Matlab.pdf .
|
| Thank you
|
| Janne
|
| -- 
| Janne Huttunen
| University of California
| Department of Statistics
| 367 Evans Hall Berlekey, CA 94720-3860
| email: [EMAIL PROTECTED]
| phone: +1-510-502-5205
| office room: 449 Evans Hall

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/

Re: [R] R and Browninan Motion/ Langevin Equation package

Google "R and Browninan Motion".. It turned up this link:
http://landshape.org/enm/r-code-for-brownian-motion/

Mybe this will help.

On Fri, Jun 13, 2008 at 8:08 AM, Peter Mueller <[EMAIL PROTECTED]> wrote:
> Hi,
>
> I'm writing a short course tutorial to Browninan Motion/ Langevin Equation.
> At the end of the theory section I wanted to add a short GNU R example, so 
> the students can play a little around.
>
> I already looked in the MASS book (by Venables and Ripley) but I couldn't 
> find any Brownian Motion/ Langevin Equation package.
> Are there any good packages or tutorials available  which cover R and 
> Browninan Motion/ Langevin Equation?
>
> Thanks
> Peter
> --
>
> Jetzt dabei sein: http://www.shortview.de/[EMAIL PROTECTED]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] piper diagram

2008-06-13 Thread stephen sefick

RSEIS -  I think may have a piper diagram.

On Thu, Jun 12, 2008 at 8:39 PM, Michael Grant <[EMAIL PROTECTED]> wrote:

> Sorry no previous message text or addresses, but I just cleaned my mailbox
> and then found something relevant. Regarding the Piper diagram. I just
> noticed the 'hydrogeo' package on CRAN, courtesy of one Myles English. That
> should be what you need or close to it.
>
>
>
> Best regards,
>
> Michael Grant
>
>
>[[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Let's not spend our time and resources thinking about things that are so
little or so large that all they really do for us is puff us up and make us
feel like gods. We are mammals, and have not exhausted the annoying little
problems of being mammals.

-K. Mullis

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] C# and R

2008-06-13 Thread Neil Gupta

Hello R-Users,

I came across this link on CodeProject.com and was wondering, if anyone has
implemented this and the benefits of doing so.
This may also be of some help for others. Here is a link to the project:
http://www.codeproject.com/KB/cs/RtoCSharp.aspx

Regards,

Neil Gupta

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] C# and R

This is about Windows, C# and R-(D)COM.  The latter has its own list which 
would be much more appropriate.  See http://sunsite.univie.ac.at/rcom/

(Linked from CRAN->Software->Other.)

On Fri, 13 Jun 2008, Neil Gupta wrote:


Hello R-Users,

I came across this link on CodeProject.com and was wondering, if anyone has
implemented this and the benefits of doing so.
This may also be of some help for others. Here is a link to the project:
http://www.codeproject.com/KB/cs/RtoCSharp.aspx

Regards,

Neil Gupta

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Problem with Freq function {prettyR}

2008-06-13 Thread ukoenig

Does someone have an idea?
Thanks a lot!

Udo


Quoting Udo <[EMAIL PROTECTED]>:

> Dear list,
> I have a problem with freq from prettyR.
>
> Please have a look at my syntax with a litte example:
>
>
> library(prettyR)
>
> #Version 1
> test.df<-data.frame(q1=sample(1:4,8,TRUE), gender=sample(c("f","m"),8,TRUE))
> test.df
> freq(test.df) #No error message
>
> #Version 2
> test.df<-data.frame(gender=sample(c("f","m"),8,TRUE), q1=sample(1:4,8,TRUE))
> test.df
> freq(test.df)
>
> Error message: "Error in vector("integer", length) : Vector size can´t be NA"
>
> Can someone tell me, why an error message occurs in version two? I am
> helpless...
>
> Thanks in advance!
>
> Udo K ö n i g
>
> 
>
> Clinic for Child an Adolescent Psychiatry
> Philipps University of Marburg / Germany
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Access violation when calling Front41

2008-06-13 Thread Siyi FENG

  Hello! When I tried to call Front41 in R, I met some problem. After I
entered: system ('front41.exe'),  an error occured :

"jwe0019i-u The program was terminated abnormally with Exception Code
EXCEPTION_ACCESS_VIOLATION.
error summary (Fortran)
error number  error level  error count
  jwe0019i u   1
total error count = 1

FRONTIER - Version 4.1c
***
"
How can i deal with it?




-- 
Siyi FENG
Department of Agricultural Economics
Texas A&M University, 2124 TAMU
College Station, TX 77843-2124
[EMAIL PROTECTED]

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] subsetting data-frame by vector of characters

2008-06-13 Thread james perkins


Hi,

I have a very simple problem but I can't think how to solve it without 
using a for loop and creating a large logical vector. However given the 
nature of the problem I am sure there is a "1-liner" that could do the 
same thing much more efficiently.


bascially I have a dataframe with characters in, eg

>names.and.numbers

(index)NameFave.Number
1John7
2Tony12
3Phil14
4Adam22
5Robert23


Now, imagine I have a vector of names, ie:

>names = c("John,Phil,Robert")

All I want to do is get the subset of the dataframe which corresponds to 
the names in the vector "Names". IE


(index)NameFave.Number
1John7
2Phil14
3Robert23

Sorry, I know its trivial but I'm new to R and its hard to start 
thinking in R, as I say, I've written a complicated for loop using 
intersect and creating a logical table, but this is very long winded!!!


Regards,

Jim

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Maximum likelihood estimation in R with censored Data

2008-06-13 Thread Bluder Olivia

Hello,

 

I'm trying to calculate the Maximum likelihood estimators for a dataset
which contains censored data.

 

I started by using the function "nlm", but isn't there a separate method
for doing this for e.g. the "weibull" and the "log-normal" distribution?

 

Thanks,

Olivia

 


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Package Installation produces "linux/limits.h: No such file or directory" error when installing the lpSolve package

2008-06-13 Thread Basileis


Hi,
   I too had this same problem but it got resolved by installing two
packages :

1. kernel-headers
2. kernel-devel

I hope this helps in your case.

Regards
Sharwan

Joe_K wrote:
> 
> Dear Friends,
> 
> I am trying to install a few packages in R and am receiving error
> messages.  Since the error messages are different, I am posting them
> separately.  The second error is with the installation of lpSolve.
> 
> The core error message is:
> 
> In file included from /usr/include/bits/posix1_lim.h:153,
>  from /usr/include/limits.h:145,
>  from
> /usr/lib64/gcc/x86_64-suse-linux/4.2.1/include/limits.h:122,
>  from
> /usr/lib64/gcc/x86_64-suse-linux/4.2.1/include/syslimits.h:7,
>  from
> /usr/lib64/gcc/x86_64-suse-linux/4.2.1/include/limits.h:11,
>  from colamd.c:677:
> /usr/include/bits/local_lim.h:36:26
>  error: linux/limits.h: No such file or directory
> make: *** [colamd.o] Error 1
> ERROR: compilation failed for package 'lpSolve'
> 
> 
> The first things that I tried was to figure out where linux/limits.h was. 
> I discovered that there are seven versions of limits.h on the system and
> they are not identical.
> 
> /usr/include/limits.h
> /usr/src/linux-2.6.22.13-0.3/Documentation/i2c/chips/limits.h
> /usr/include/c++/4.2.1/tr1/limits.h
> /usr/lib64/qt4/demos/qtdemo/xml/limits.h
> /usr/src/linux-2.6.22.13-0.3/include/linux/limits.h
> /usr/src/linux-2.6.22.13-0.3/include/asm-arm/limits.h
> /usr/src/linux-2.6.22.13-0.3/include/asm-arm26/limits.h
> 
> Only one has "linux" immediately preceding it in the path:
> /usr/src/linux-2.6.22.13-0.3/include/linux/limits.h
> 
> I assume that /usr/include/bits/local_lim.h is trying to use a relative
> path.  The only line in local_lim.h with limits.h in it is:
> 
> #include 
> 
> So, I tried modifying the line to read:
> 
> #include 
> 
> That did not work, so I changed it back again.  I guess my theory about it
> looking for a relative path was wrong.
> 
> Since then, I have been Googling the issue all weekend and have found
> similar errors, but not exactly the same.  Some are suggesting changing
> kernel headers and other files.  Since the context of these other posts
> are dissimilar, I figured it best not to mess with kernel headers or some
> of the other radical solutions offered.
> 
> There was one suggestion in a post to install glibc-headers, however, I
> cannot seem to find that for Suse 10.3.  Is it something included in
> another package?  Is it something that is now obsolete?
> 
> CAN ANYONE HELP ME DEBUG THIS?
> 
> I am running R version 2.6.1 (2007-11-26) on Suse Linux 10.3 64-bit x86_64
> on a Boxx Technologies computer with a TYAN Thunder K8WE S2895 Motherboard
> with 4Gb Ram and 2 dual CPUs (total of 4 CPUs).  The CPUs are AMD Opteron. 
> Hard Disk Usage is 4 150 Gb SATA drives array with a Com3 9550SX
> Controller set at RAID 5.
> 
> The full error message received from Rkward upon the package installation
> attempt was:
> 
> R version 2.6.1 (2007-11-26)
> Copyright (C) 2007 The R Foundation for Statistical Computing
> ISBN 3-900051-07-0
> R is free software and comes with ABSOLUTELY NO WARRANTY.
> You are welcome to redistribute it under certain conditions.
> Type 'license()' or 'licence()' for distribution details.
> 
>   Natural language support but running in an English locale
> 
> R is a collaborative project with many contributors.
> Type 'contributors()' for more information and
> 'citation()' on how to cite R or R packages in publications.
> 
> Type 'demo()' for some demos, 'help()' for on-line help, or
> 'help.start()' for an HTML browser interface to help.
> Type 'q()' to quit R.
>>
> options (repos=c (CRAN="http://lib.stat.cmu.edu/R/CRAN";))
>> install.packages (pkgs=c ("lpSolve"),
>> lib="/home/joe/R/x86_64-unknown-linux-gnu-library/2.6",
>> destdir="/home/joe/.rkward/package_archive", dependencies=TRUE)
> trying URL
> 'http://lib.stat.cmu.edu/R/CRAN/src/contrib/lpSolve_5.5.8.tar.gz'
> Content type 'application/x-gzip' length 449804 bytes (439 Kb)
> opened URL
> 
> downloaded 439 Kb
> /home/joe/R/x86_64-unknown-linux-gnu-library/2.6
> * Installing *source* package 'lpSolve' ...
> ** libs
> gcc -std=gnu99 -I/usr/lib64/R/include -I/usr/lib64/R/include -I .
> -DINTEGERTIME -DPARSER_LP -DBUILDING_FOR_R -DYY_NEVER_INTERACTIVE -DUSRDLL
> -DCLOCKTIME -DRoleIsExternalInvEngine -DINVERSE_ACTIVE=INVERSE_LUSOL
> -DINLINE=static -DParanoia -I/usr/local/include-fpic  -g -O2 -c
> colamd.c -o colamd.o
> In file included from /usr/include/bits/posix1_lim.h:153,
>  from /usr/include/limits.h:145,
>  from
> /usr/lib64/gcc/x86_64-suse-linux/4.2.1/include/limits.h:122,
>  from
> /usr/lib64/gcc/x86_64-suse-linux/4

Re: [R] Problem with Freq function {prettyR}

2008-06-13 Thread James W. MacDonald

Since this is a contributed package, you should be contacting the 
maintainer (as mentioned in the posting guide).


Anyway, the problem occurs because in the second case you have a factor 
in the first column and numeric in the second. This part of the code 
will illustrate what I mean:


for (i in 1:nfreq) {
if (display.na)
nna <- sum(is.na(x[[i]]))
else nna <- 0
xt <- na.omit(x[[i]])
if (is.null(levels))
levels <- unique(xt)
if (is.numeric(x[[i]]))
xt <- factor(xt, levels = levels)

So the first time through this loop the levels variable is set to 
c("m","f"). On the second time levels is no longer NULL, so when the xt 
variable is created it is essentially this:


xt <- factor(xt, levels = c("m","f"))

and since xt contains only numbers you get

[1]
Levels: m f

Best,

Jim



[EMAIL PROTECTED] wrote:

Does someone have an idea?
Thanks a lot!

Udo


Quoting Udo <[EMAIL PROTECTED]>:


Dear list,
I have a problem with freq from prettyR.

Please have a look at my syntax with a litte example:


library(prettyR)

#Version 1
test.df<-data.frame(q1=sample(1:4,8,TRUE), gender=sample(c("f","m"),8,TRUE))
test.df
freq(test.df) #No error message

#Version 2
test.df<-data.frame(gender=sample(c("f","m"),8,TRUE), q1=sample(1:4,8,TRUE))
test.df
freq(test.df)

Error message: "Error in vector("integer", length) : Vector size can´t be NA"

Can someone tell me, why an error message occurs in version two? I am
helpless...

Thanks in advance!

Udo K ö n i g



Clinic for Child an Adolescent Psychiatry
Philipps University of Marburg / Germany



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


--
James W. MacDonald, M.S.
Biostatistician
Affymetrix and cDNA Microarray Core
University of Michigan Cancer Center
1500 E. Medical Center Drive
7410 CCGC
Ann Arbor MI 48109
734-647-5623

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] subsetting data-frame by vector of characters

2008-06-13 Thread Chuck Cleland

On 6/13/2008 10:07 AM, james perkins wrote:

Hi,

I have a very simple problem but I can't think how to solve it without 
using a for loop and creating a large logical vector. However given the 
nature of the problem I am sure there is a "1-liner" that could do the 
same thing much more efficiently.

bascially I have a dataframe with characters in, eg

 >names.and.numbers

(index)NameFave.Number
1John7
2Tony12
3Phil14
4Adam22
5Robert23

Now, imagine I have a vector of names, ie:

 >names = c("John,Phil,Robert")

All I want to do is get the subset of the dataframe which corresponds to 
the names in the vector "Names". IE

(index)NameFave.Number
1John7
2Phil14
3Robert23

Sorry, I know its trivial but I'm new to R and its hard to start 
thinking in R, as I say, I've written a complicated for loop using 
intersect and creating a logical table, but this is very long winded!!!

  How about this:

subset(names.and.numbers, Name %in% mynames)

  where mynames is the vector of names you want?

?subset

?is.element

Regards,

Jim

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code. 

--
Chuck Cleland, Ph.D.
NDRI, Inc. (www.ndri.org)
71 West 23rd Street, 8th floor
New York, NY 10010
tel: (212) 845-4495 (Tu, Th)
tel: (732) 512-0171 (M, W, F)
fax: (917) 438-0894

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] histogram

2008-06-13 Thread Paul Adams

Hello everyone,
I am trying to plot a histogram from the following code:
dat<-read.table(file="C:\\Documents and Settings\\Owner\\My 
Documents\\Yeast\\Yeast.txt",header=T,row.names=1)
file.show(file="C:\\Documents and Settings\\Owner\\My 
Documents\\Yeast\\Yeast.txt")
x<-dat[2,23:46]
y=mean(x,trim=0,na.rm=T)
colMeans(dat[2,23:46])
boxplot(dat[2,23:46])
hist(dat[2,23:46])
The box plot is fine but the histogram keeps giving me the error that x
must be numeric.I am not sure what is wrong here with the instructions
for the histogram plot.
Any help would be appreciated
Paul


  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] subsetting data-frame by vector of characters

2008-06-13 Thread Wacek Kusnierczyk

james perkins wrote:
> Hi,
>
> I have a very simple problem but I can't think how to solve it without
> using a for loop and creating a large logical vector. However given
> the nature of the problem I am sure there is a "1-liner" that could do
> the same thing much more efficiently.
>
> bascially I have a dataframe with characters in, eg
>
> >names.and.numbers
>
> (index)NameFave.Number
> 1John7
> 2Tony12
> 3Phil14
> 4Adam22
> 5Robert23
>
>
> Now, imagine I have a vector of names, ie:
>
> >names = c("John,Phil,Robert")

this is a one-element vector of string(s) that are concatenated names
(strings with names).
or you mean:  names = c("John", "Phil", "Robert")

>
> All I want to do is get the subset of the dataframe which corresponds
> to the names in the vector "Names". IE
>
> (index)NameFave.Number
> 1John7
> 2Phil14
> 3Robert23

this should do:
names.and.numbers[names.and.numbers$Name %in% names,]

if names is as you say above, do
names.and.numbers[names.and.numbers$Name %in% strsplit(names,","), ]

you do create a logical vector here (what does 'large' mean?), but no
loop is involved at the surface.

vQ

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] histogram

It is hard to respond without reproducible examples.  Do
str(dat[2,23:46]) and see what it reports.  My guess is that one of
the columns is not numeric.  Find out which one it is, fix it and then
try 'hist' again.

On Fri, Jun 13, 2008 at 10:21 AM, Paul Adams <[EMAIL PROTECTED]> wrote:
> Hello everyone,
> I am trying to plot a histogram from the following code:
> dat<-read.table(file="C:\\Documents and Settings\\Owner\\My 
> Documents\\Yeast\\Yeast.txt",header=T,row.names=1)
> file.show(file="C:\\Documents and Settings\\Owner\\My 
> Documents\\Yeast\\Yeast.txt")
> x<-dat[2,23:46]
> y=mean(x,trim=0,na.rm=T)
> colMeans(dat[2,23:46])
> boxplot(dat[2,23:46])
> hist(dat[2,23:46])
> The box plot is fine but the histogram keeps giving me the error that x
> must be numeric.I am not sure what is wrong here with the instructions
> for the histogram plot.
> Any help would be appreciated
> Paul
>
>
>
>[[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] histogram

2008-06-13 Thread Erik Iverson




Paul Adams wrote:

Hello everyone,
I am trying to plot a histogram from the following code:
dat<-read.table(file="C:\\Documents and Settings\\Owner\\My 
Documents\\Yeast\\Yeast.txt",header=T,row.names=1)
file.show(file="C:\\Documents and Settings\\Owner\\My 
Documents\\Yeast\\Yeast.txt")
x<-dat[2,23:46]
y=mean(x,trim=0,na.rm=T)
colMeans(dat[2,23:46])
boxplot(dat[2,23:46])
hist(dat[2,23:46])


Check what the class of your object is

class(dat[2, 23:46])

may be a data.frame. If so, you can try to convert accordingly (see 
?as.numeric)


Erik

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] subsetting data-frame by vector of characters

2008-06-13 Thread james perkins


Thanks a lot for that. Its the %in% I needed to work out mainly

large didn't mean anything in particular, just that it gets quite long 
with the real data.

I did mean: names = c("John", "Phil", "Robert")

The only problem is that using the method you suggest is that I lose the 
indexing, ie in the example, instead of:


(index)NameFave.Number
1John7
2Phil14
3Robert23


I end up with


(index) Name Fave.Number
1 John 7
3 Phil 14
5 Robert 23

This isnt a problem at the moment but I guess it could be if I used the 
table later in loops. Is there an easy way to re-index the table?


Kind regards

Jim

Wacek Kusnierczyk wrote:

james perkins wrote:
  

Hi,

I have a very simple problem but I can't think how to solve it without
using a for loop and creating a large logical vector. However given
the nature of the problem I am sure there is a "1-liner" that could do
the same thing much more efficiently.

bascially I have a dataframe with characters in, eg



names.and.numbers
  

(index)NameFave.Number
1John7
2Tony12
3Phil14
4Adam22
5Robert23


Now, imagine I have a vector of names, ie:



names = c("John,Phil,Robert")
  


this is a one-element vector of string(s) that are concatenated names
(strings with names).
or you mean:  names = c("John", "Phil", "Robert")


  

All I want to do is get the subset of the dataframe which corresponds
to the names in the vector "Names". IE

(index)NameFave.Number
1John7
2Phil14
3Robert23



this should do:
names.and.numbers[names.and.numbers$Name %in% names,]

if names is as you say above, do
names.and.numbers[names.and.numbers$Name %in% strsplit(names,","), ]

you do create a logical vector here (what does 'large' mean?), but no
loop is involved at the surface.

vQ

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] histogram

2008-06-13 Thread Lars Fischer

Hi, 

please someone correct me, but
On 13/06/2008, 07:21, [EMAIL PROTECTED] wrote:
> dat<-read.table(file="C:\\Documents and Settings\\Owner\\My 
> Documents\\Yeast\\Yeast.txt",header=T,row.names=1)

Check mode and class of dat. read.table provided you with a dataframe
of, essentially, string data. You have to apply as.numeric where it
fits.

> x<-dat[2,23:46]
 ^
most probably here.

Regards
Lars

p.s. Your code is awfully to read, please add some spaces where
appropriate.

-- 
Lars Fischertel: +49 (0)6151 16-2889
Technische Universität Darmstadt
Fachbereich Informatik/ FG Sicherheit in der Informationstechnik
PGP FPR: A197 CBE1 91FC 0CE3 A71D  77F2 1094 CB6E CEE3 7111

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] histogram

2008-06-13 Thread Peter Dalgaard

jim holtman wrote:
> It is hard to respond without reproducible examples.  Do
> str(dat[2,23:46]) and see what it reports.  My guess is that one of
> the columns is not numeric.  Find out which one it is, fix it and then
> try 'hist' again.
>   
No, this will be wrong whatever the data are. The problem is that
dat[2,23:46] is a one-row dataframe, i.e. a list, which is not a numeric
vector. Possibly
hist(unlist(dat[2,23:46])) is what is wanted. I don't think the boxplot
is "fine" either, except in the sense that it does not give an error
(try boxplot(airquality[2,])).

> On Fri, Jun 13, 2008 at 10:21 AM, Paul Adams <[EMAIL PROTECTED]> wrote:
>   
>> Hello everyone,
>> I am trying to plot a histogram from the following code:
>> dat<-read.table(file="C:\\Documents and Settings\\Owner\\My 
>> Documents\\Yeast\\Yeast.txt",header=T,row.names=1)
>> file.show(file="C:\\Documents and Settings\\Owner\\My 
>> Documents\\Yeast\\Yeast.txt")
>> x<-dat[2,23:46]
>> y=mean(x,trim=0,na.rm=T)
>> colMeans(dat[2,23:46])
>> boxplot(dat[2,23:46])
>> hist(dat[2,23:46])
>> The box plot is fine but the histogram keeps giving me the error that x
>> must be numeric.I am not sure what is wrong here with the instructions
>> for the histogram plot.
>> Any help would be appreciated
>> Paul
>>
>>
>>
>>[[alternative HTML version deleted]]
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>> 
>
>
>
>   


-- 
   O__   Peter Dalgaard Øster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark  Ph:  (+45) 35327918
~~ - ([EMAIL PROTECTED])  FAX: (+45) 35327907

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] subsetting data-frame by vector of characters

2008-06-13 Thread Peter Dalgaard

james perkins wrote:
> Thanks a lot for that. Its the %in% I needed to work out mainly
>
> large didn't mean anything in particular, just that it gets quite long
> with the real data.
> I did mean: names = c("John", "Phil", "Robert")
>
> The only problem is that using the method you suggest is that I lose
> the indexing, ie in the example, instead of:
>
> (index)NameFave.Number
> 1John7
> 2Phil14
> 3Robert23
>
>
> I end up with
>
>
> (index) Name Fave.Number
> 1 John 7
> 3 Phil 14
> 5 Robert 23
>
> This isnt a problem at the moment but I guess it could be if I used
> the table later in loops. Is there an easy way to re-index the table?
>
Notice that these are names, not numbers:  result[2,1] is "Phil" in both
cases. If it bothers you, just set rownames(result) <- NULL

(BTW, are your names unique? in that case you could set them as rownames
and use them for indexing:

rownames(names.and.numbers) <- names.and.numbers$Name
names.and.numbers[names, ]

> Kind regards
>
> Jim
>
> Wacek Kusnierczyk wrote:
>> james perkins wrote:
>>  
>>> Hi,
>>>
>>> I have a very simple problem but I can't think how to solve it without
>>> using a for loop and creating a large logical vector. However given
>>> the nature of the problem I am sure there is a "1-liner" that could do
>>> the same thing much more efficiently.
>>>
>>> bascially I have a dataframe with characters in, eg
>>>
>>>
 names.and.numbers
   
>>> (index)NameFave.Number
>>> 1John7
>>> 2Tony12
>>> 3Phil14
>>> 4Adam22
>>> 5Robert23
>>>
>>>
>>> Now, imagine I have a vector of names, ie:
>>>
>>>
 names = c("John,Phil,Robert")
   
>>
>> this is a one-element vector of string(s) that are concatenated names
>> (strings with names).
>> or you mean:  names = c("John", "Phil", "Robert")
>>
>>
>>  
>>> All I want to do is get the subset of the dataframe which corresponds
>>> to the names in the vector "Names". IE
>>>
>>> (index)NameFave.Number
>>> 1John7
>>> 2Phil14
>>> 3Robert23
>>> 
>>
>> this should do:
>> names.and.numbers[names.and.numbers$Name %in% names,]
>>
>> if names is as you say above, do
>> names.and.numbers[names.and.numbers$Name %in% strsplit(names,","), ]
>>
>> you do create a logical vector here (what does 'large' mean?), but no
>> loop is involved at the surface.
>>
>> vQ
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.


-- 
   O__   Peter Dalgaard Øster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark  Ph:  (+45) 35327918
~~ - ([EMAIL PROTECTED])  FAX: (+45) 35327907

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] package under unix

2008-06-13 Thread cgenolin


Hi the list,

I write a package for clusterizing longitudinal data using a non 
parametric algorithm. I develop the package under windows. To be as 
user friendly as possible, the package use some graphical procedure to 
"show" to the user the evolution of the cluster construction, and to 
export the graph in a friendly way.


Here are some example : http://christophe.genolini.free.fr/kml

Everything works fine... under windows.

Unfortunately, it seems it does not work under linux. I first use the 
instruction:



windows(5,5,xpos=0)


which seems to be incompatible. Then I used :


if(getOption("device")=="windows"){windows(5,5,xpos=0)}else{}


but it is non portable either.

I do not know linux so it will be very hard for me to test and change my code.
On the other hand, I spend a lot of time to develop a graphical 
interface for exporting the result in a easy way, so it would be a pity 
to remove the code that deal with graphics.


Can someone help ?

Christophe

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] subsetting data-frame by vector of characters

2008-06-13 Thread Wacek Kusnierczyk

james perkins wrote:
> Thanks a lot for that. Its the %in% I needed to work out mainly
>
> large didn't mean anything in particular, just that it gets quite long
> with the real data.
> I did mean: names = c("John", "Phil", "Robert")
>
> The only problem is that using the method you suggest is that I lose
> the indexing, ie in the example, instead of:
>
> (index)NameFave.Number
> 1John7
> 2Phil14
> 3Robert23
>
>
> I end up with
>
>
> (index) Name Fave.Number
> 1 John 7
> 3 Phil 14
> 5 Robert 23
>
> This isnt a problem at the moment but I guess it could be if I used
> the table later in loops. Is there an easy way to re-index the table?
strange.  i run this simulated example, and it's ok:

d = data.frame(a=letters[rep(1:5,2)], b=letters[10:1])
d[d$a %in% letters[1:3], ]

you can always add an index column:

d = data.frame(index=1:dim(d)[[1]],d)

vQ

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Wanted: your examples of logged axes with custom tick marks

Dear all,

I'm trying to improve the default layout of tick marks for log scaled
axes in ggplot2.  To this end, it would be really useful to see what
people actually do in practice.  If you've ever made a log-log (or
semi-log) plot and customised the location of the ticks, I'd really
appreciate a copy of your graph (if it's publicly available) or a
statement of the range of the data, and the tick marks you used.

I'm not aware of any published research on this topic, but if I've
missed something, a pointer to relevant work would be greatly
appreciated.

Thanks!

Hadley

-- 
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] CRAN package XML (omegahat)

2008-06-13 Thread David Keegan

Hi,

I'm having issues using this package to parse large XML files.
Where should bugs be reported? The omegahat website has several
broken links.

Regards
David Keegan.
--

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Rest of a division

2008-06-13 Thread Eric Ferreira

Dear useRs,

How do I ask for the rest of a division?

For instantce, in C is like:

4%2 = 0

Best regards,

-- 
Eric B Ferreira
Exact Sciences Department
Federal University of Lavras
Brasil

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Rest of a division

2008-06-13 Thread Peter Dalgaard

Eric Ferreira wrote:
> Dear useRs,
>
> How do I ask for the rest of a division?
>
> For instantce, in C is like:
>
> 4%2 = 0
>
> Best regards,
>
>   
> 4%%2
[1] 0

-- 
   O__   Peter Dalgaard Øster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark  Ph:  (+45) 35327918
~~ - ([EMAIL PROTECTED])  FAX: (+45) 35327907

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Rest of a division

2008-06-13 Thread Charilaos Skiadas


?"%%"

On Jun 13, 2008, at 11:23 AM, Eric Ferreira wrote:


Dear useRs,

How do I ask for the rest of a division?

For instantce, in C is like:

4%2 = 0

Best regards,

--
Eric B Ferreira
Exact Sciences Department
Federal University of Lavras
Brasil



Haris Skiadas
Department of Mathematics and Computer Science
Hanover College

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] CRAN package XML (omegahat)

2008-06-13 Thread Martin Morgan


Bugs to the package maintainer, for this and all packages

> packageDescription('XML')[['Maintainer']]
[1] "Duncan Temple Lang <[EMAIL PROTECTED]>"

Best luck will come with the usual, sessionInfo(), easily reproducible 
and compact example, use of current software versions, etc.


Martin

David Keegan wrote:

Hi,

I'm having issues using this package to parse large XML files.
Where should bugs be reported? The omegahat website has several
broken links.

Regards
David Keegan.
--

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Martin Morgan
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M2 B169
Phone: (206) 667-2793

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Sweave: looping over mixed R/LaTeX code

2008-06-13 Thread Jeffrey Horner

Stephan Kolassa wrote on 06/13/2008 03:22 AM:

Dear guRus,

I would like to loop over a medium amount of Sweave code, including both R and LaTeX
chunks. Is there any way to do so? As an illustration, can I create a .tex file like this
using a loop within a .Rnw file, where the "1,2,3" comes from some iteration
variable in R?

\documentclass{article}
\usepackage{Sweave}
\begin{document}
Iteration 1
Iteration 2
Iteration 3
\end{document}

Another alternative would be to use the brew package from CRAN:

http://cran.r-project.org/web/packages/brew/index.html

While the disadvantage would be a change of syntax from Sweave to brew,
you would gain the advantage of looping over code chunks.

brew also installs a collection of example files, one being a conversion
of the Sweave test file to brew. Scope out the 'Examples' section from
the brew help page.

Best,

Jeff

Right now, I do have a working but painful solution. I put the loop contents in
a separate loop.Rnw file, then:
1. run everything before the loop through R for initialization
2. Sweave loop.Rnw; shell("move loop.tex loop_1.tex")
Sweave loop.Rnw; shell("move loop.tex loop_2.tex")
...
Sweave loop.Rnw; shell("move loop.tex loop_n.tex")
3. \input all loop_i.tex files into master.Rnw and Sweave master.Rnw

This does what I need, however, it is a major pain code-wise, e.g., there
appears to be no way to control the loop during execution (n must be known in
advance), and I need to control all graphics using \includegraphics with the
iteration counter paste()d into the filename.

An alternative may be not using Sweave and working with one giant sink() and
lots of print()s, letting R just write the entire .tex file. This also appears
inelegant to me.

Is there a better way to do this?

I have tried to do my homework, see below. Do I get partial credit ;-) ?

Thank you all for your time!
Stephan

I can't simply start a for loop within an R chunk and finish it in another one.

whiledo in the ifthen.sty package doesn't like Sweave at all. And of course, it
would simply reuse the R chunks if it did work, without changing things between
loops. For the same reason, I cannot define a \newcommand{\loopcontent}{...}
with the entire loop contents and then simply write \loopcontent \loopcontent
... or \input or \include the loop content from an external file.

Of course it would be possible to not use Sweave and just use the output from
the R console, but there are a couple of figures I would really like to see
close to the relevant portions of the calculations.

I also thought about putting the entire loop in *one* R chunk, but then I see no way to include LaTeX chunks *within* this R chunk. I can't just sink() to the .tex file in the middle of the R chunk (as the sink() gets appended to the .tex file only after Sweave is done with it).

I have read the Sweave manual and FAQs and the R/R Windows FAQ, I did both RSiteSearches and RSeek searches for all
combinations of "Sweave" and "loop", "for", "while" I could think of.

For what it's worth, here's my sessionInfo():

R version 2.7.0 (2008-04-22)
i386-pc-mingw32

locale:
LC_COLLATE=German_Germany.1252;LC_CTYPE=German_Germany.1252;LC_MONETARY=German_Germany.1252;LC_NUMERIC=C;LC_TIME=German_Germany.1252

attached base packages:
[1] stats graphics grDevices utils datasets tcltk methods base

other attached packages:
[1] svIDE_0.9-5

loaded via a namespace (and not attached):
[1] svMisc_0.9-5

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

--
http://biostat.mc.vanderbilt.edu/JeffreyHorner

Re: [R] Rest of a division

2008-06-13 Thread Erik Iverson


?Arithmetic

Eric Ferreira wrote:

Dear useRs,

How do I ask for the rest of a division?

For instantce, in C is like:

4%2 = 0

Best regards,



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] help with colsplit (reshape)

2008-06-13 Thread Ista Zahn


Dear list,

I'm trying to figure out how to use the reshape package to reshape  
data from a "wide" format to a "long" format. I have data like this


pid <- c(1:10)
predA <- c(-1,-2,-1,-2,-1,-2,-1,-2,-1,-2)
predB.1 <- c(0,0,0,1,1,0,0,0,1,1)
predB.2 <- c(2,2,3,3,3,2,2,3,3,3)
predC.1 <- c(10,10,10,10,10,11,11,11,11,11)
predC.2 <- c(12,12,13,13,13,12,12,13,13,13)
out.1 <- c(100:109)
out.2 <- c(200:209)
Data <- data.frame(pid, predA, predB.1, predB.2, predC.1, predC.2, out. 
1, out.2)


and I want to make it look like this:

head(L.Data <- reshape(Data, varying = list(3:4, 5:6, 7:8),  
idvar="pid", v.names=c("PredA", "PredB", "Out"),  
timevar="measure.num", times=c(1,2), direction="long"))

pid predA measure.num PredA PredB Out
1.1   1-1   1 010 100
2.1   2-2   1 010 101
3.1   3-1   1 010 102
4.1   4-2   1 110 103
5.1   5-1   1 110 104
6.1   6-2   1 011 105

Using Hadley's JSS article "Reshaping Data with the reshape Package"  
as a guide, I tried the following:


M.Data <- melt(Data, id="pid")
M.Data2 <- cbind(M.Data, colsplit(M.Data$variable, split = ".", names  
= c("treatment", "time")))


but this gave a warning and resulted in

head(M.Data2)
  pid variable value treatment time NA. NA..1 NA..2 NA..3 NA..4
1   1predA-1NA   NA  NANANANANA
2   2predA-2NA   NA  NANANANANA
3   3predA-1NA   NA  NANANANANA
4   4predA-2NA   NA  NANANANANA
5   5predA-1NA   NA  NANANANANA
6   6predA-2NA   NA  NANANANANA

I searched the mailing list and found this post: http://tolstoy.newcastle.edu.au/R/e4/help/08/05/11857.html 
 which led me to try


M.Data2 <- data.frame(M.Data, colsplit(M.Data$variable, split = "\\.",  
names = c("treatment", "time")))


which gave:

head(M.Data2)
  pid variable value treatment  time
1   1predA-1 predA predA
2   2predA-2 predA predA
3   3predA-1 predA predA
4   4predA-2 predA predA
5   5predA-1 predA predA
6   6predA-2 predA predA

Closer but no cigar.

I would be grateful if someone will tell me (a) how to reshape the  
data as described above using the reshape package, (b) what difference  
between split = "." and split = "\\." is, and (c) if more information  
about the colsplit command is available anywhere.


Thank you very much in advance,
Ista

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Looping, Control Flow & Conditional Statements

2008-06-13 Thread Charles C. Berry




See

?rle

Start with this:


a1.runs <- rle( a1 )
a1.runs$lengths[ a1.runs$values>0 ]

[1] 3 4




HTH,

Chuck

p.s.


library(fortunes)
fortune(106)


If the answer is parse() you should usually rethink the question.
   -- Thomas Lumley
  R-help (February 2005)
--

see

?get

On Fri, 13 Jun 2008, [EMAIL PROTECTED] wrote:


Dear R Group:



I have little experience using R and even less experience with control
flow type questions.



See the following code:



a1 = c(0, 1, 1, 1,

0, 0, 0, 0, 0,

0, 0, 1,

1, 1, 1, 0, 0)



for(i in 1:1){

   sx <- paste("a",i,sep="")

   s <- eval(parse(text = paste("a",i,sep="")))

{g = numeric(length(s))

k = numeric(length(s))

   {for (i in 1:length(s))

   {for (j in 1:length(s))

   ifelse(((j=i)>1),(g[j] = s[j] + s[i]),(k[j] = s[j] + s[i]))

}}

h1 <- hist(g,freq=TRUE)

h <- h1$counts[4]

cat(sx,":", h,"\n",file = "C:/temp/test-beta.txt", append=TRUE)

}}





The output is:


g


[1] 0 2 2 2 0 0 0 0 0 0 0 2 2 2 2 0 0


k


[1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0






h


[1] 7



& a text file, which has:

   a1 : 7



k is a by-product of the ifelse statement and is of no interest & g and
h only go part-way to answering my question, which is:



For every time an object i.e. a1 (which is actually a time series) - 0 1
1 1 0 0 0 0 0 0 0 1 1 1 1 0 0   has as value over 0 how long do the
values stay above 0. So in this case a1 has two goups or events where
the value is above zero, the first event lasts for 3 'days' and the
second event lasts for 4 'days'. I have my code telling me that there
was a total of 7 'days' in event or above 0, but what I need to know is
that there were two 'events' and the 1st lasted 3 'days' and the 2nd
lasted '4' days. Essentially I want a text file output to say:


a1.1 : 3


a1.2 : 4



My thinking is that I need to somehow get the code working through each
vector one value at a time and when a value is found to meet the critera
of > 0  R creates a new vector; to use the above example it would come
to the first value >0 and then create the new vector a1.1 = (1,1,1) then
as the next value in the series is 0 it would close this new vector
'a1.1'. It would then continue until it reaches the next value >0 and
then create the vector a1.2 = (1,1,1,1) then again as the next value in
the series is 0 it would close this new vector, and so on.



Then all I need to do is perform a count of '1's in these new vectors to
find how many days they met this criteria of being greater than 0



I hope the above makes sense and I really hope there is someone willing
and able to help. I don't know how to proceed.



Thanks,

Garth














[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



Charles C. Berry(858) 534-2098
Dept of Family/Preventive Medicine
E mailto:[EMAIL PROTECTED]  UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] alternative to matching/merge?

2008-06-13 Thread Lana Schaffer

Jim,
My code is this:
 mergefunc <- function(x,seqFile){
# merge(seqFile,x)
cbind(x, seqFile[ match(as.vector(x$index), as.vector(seqFile$index)),
])
}
LIX <- lapply(d.frame[[1]], mergefunc,seqFile=seqFile)
Each matrix/data.frame takes 0.2 seconds and then to do this
1240 times takes ~4 minutes.
Thanks,
Lana

-Original Message-
From: jim holtman [mailto:[EMAIL PROTECTED] 
Sent: Thursday, June 12, 2008 6:40 PM
To: Lana Schaffer
Cc: r-help@r-project.org
Subject: Re: [R] alternative to matching/merge?

It would be nice if you at least included the code that you are using
and a subset of the data.  Have you run Rprof to determine which of the
functions is consuming the time?

On Thu, Jun 12, 2008 at 3:25 PM, Lana Schaffer <[EMAIL PROTECTED]>
wrote:
>
> Greetings,
> I am doing matching/merge for a table (40919x3) to data which is in 
> the form of a list of 1268 data.frames.  Using lapply this is taking 
> ~5 minutes.  I know that the match/merge functions are time consuming,

> so is there an alternative to this accomplish this goal?  is lapply 
> not efficient?
>
> Lana Schaffer
> Biostatistics/Informatics
> The Scripps Research Institute
> DNA Array Core Facility
> La Jolla, CA 92037
> (858) 784-2263
> (858) 784-2994
> [EMAIL PROTECTED]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

--
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Problem with Freq function {prettyR}

2008-06-13 Thread ukoenig

Thanks a lot, Jim!

> Since this is a contributed package, you should be contacting the
> maintainer (as mentioned in the posting guide).
sorry


>
> Anyway, the problem occurs because in the second case you have a factor
> in the first column and numeric in the second. This part of the code
> will illustrate what I mean:
>
> for (i in 1:nfreq) {
>  if (display.na)
>  nna <- sum(is.na(x[[i]]))
>  else nna <- 0
>  xt <- na.omit(x[[i]])
>  if (is.null(levels))
>  levels <- unique(xt)
>  if (is.numeric(x[[i]]))
>  xt <- factor(xt, levels = levels)
>
> So the first time through this loop the levels variable is set to
> c("m","f"). On the second time levels is no longer NULL, so when the xt
> variable is created it is essentially this:
>
> xt <- factor(xt, levels = c("m","f"))
>
> and since xt contains only numbers you get
>
> [1]
> Levels: m f
>
> Best,
>
> Jim
>
>
>
> [EMAIL PROTECTED] wrote:
> > Does someone have an idea?
> > Thanks a lot!
> >
> > Udo
> >
> >
> > Quoting Udo <[EMAIL PROTECTED]>:
> >
> >> Dear list,
> >> I have a problem with freq from prettyR.
> >>
> >> Please have a look at my syntax with a litte example:
> >>
> >>
> >> library(prettyR)
> >>
> >> #Version 1
> >> test.df<-data.frame(q1=sample(1:4,8,TRUE),
> gender=sample(c("f","m"),8,TRUE))
> >> test.df
> >> freq(test.df) #No error message
> >>
> >> #Version 2
> >> test.df<-data.frame(gender=sample(c("f","m"),8,TRUE),
> q1=sample(1:4,8,TRUE))
> >> test.df
> >> freq(test.df)
> >>
> >> Error message: "Error in vector("integer", length) : Vector size can´t be
> NA"
> >>
> >> Can someone tell me, why an error message occurs in version two? I am
> >> helpless...
> >>
> >> Thanks in advance!
> >>
> >> Udo K ö n i g
> >>
> >> 
> >>
> >> Clinic for Child an Adolescent Psychiatry
> >> Philipps University of Marburg / Germany
> >>
> >
> > __
> > R-help@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> --
> James W. MacDonald, M.S.
> Biostatistician
> Affymetrix and cDNA Microarray Core
> University of Michigan Cancer Center
> 1500 E. Medical Center Drive
> 7410 CCGC
> Ann Arbor MI 48109
> 734-647-5623
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] alternative to matching/merge?

What is the structure of 'd.frame' and 'segFile'?  Run Rprof so that
we can see which of the functions it is spending its time in.  What
happens if x$index is not in seqFile$index?  Are the values in the
'index' unique in both structures?  Subsetting a data frame can be
expensive when compared to using a matrix.  Could you use a matrix
instead of a data frame; are all the columns the same mode?  Again
either a subset of data would be helpful or an 'str' on the data
objects being used so that we can understand what they are.

On Fri, Jun 13, 2008 at 12:03 PM, Lana Schaffer <[EMAIL PROTECTED]> wrote:
> Jim,
> My code is this:
>  mergefunc <- function(x,seqFile){
> # merge(seqFile,x)
> cbind(x, seqFile[ match(as.vector(x$index), as.vector(seqFile$index)),
> ])
> }
> LIX <- lapply(d.frame[[1]], mergefunc,seqFile=seqFile)
> Each matrix/data.frame takes 0.2 seconds and then to do this
> 1240 times takes ~4 minutes.
> Thanks,
> Lana
>
> -Original Message-
> From: jim holtman [mailto:[EMAIL PROTECTED]
> Sent: Thursday, June 12, 2008 6:40 PM
> To: Lana Schaffer
> Cc: r-help@r-project.org
> Subject: Re: [R] alternative to matching/merge?
>
> It would be nice if you at least included the code that you are using
> and a subset of the data.  Have you run Rprof to determine which of the
> functions is consuming the time?
>
> On Thu, Jun 12, 2008 at 3:25 PM, Lana Schaffer <[EMAIL PROTECTED]>
> wrote:
>>
>> Greetings,
>> I am doing matching/merge for a table (40919x3) to data which is in
>> the form of a list of 1268 data.frames.  Using lapply this is taking
>> ~5 minutes.  I know that the match/merge functions are time consuming,
>
>> so is there an alternative to this accomplish this goal?  is lapply
>> not efficient?
>>
>> Lana Schaffer
>> Biostatistics/Informatics
>> The Scripps Research Institute
>> DNA Array Core Facility
>> La Jolla, CA 92037
>> (858) 784-2263
>> (858) 784-2994
>> [EMAIL PROTECTED]
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
>
> --
> Jim Holtman
> Cincinnati, OH
> +1 513 646 9390
>
> What is the problem you are trying to solve?
>

-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] alternative to matching/merge?

2008-06-13 Thread Lana Schaffer

Jim,
d.frame[[i]] is a list of data.frames and seqFile is a
data.frame.  I have coverted them to vectors/matrixes and
the timing is the same as data.frame.  'index' is unique
in both structures.  The list is subset into data.frame/matrix
structures.  
Lana

-Original Message-
From: jim holtman [mailto:[EMAIL PROTECTED] 
Sent: Friday, June 13, 2008 9:45 AM
To: Lana Schaffer
Cc: r-help@r-project.org
Subject: Re: [R] alternative to matching/merge?

What is the structure of 'd.frame' and 'segFile'?  Run Rprof so that we
can see which of the functions it is spending its time in.  What happens
if x$index is not in seqFile$index?  Are the values in the 'index'
unique in both structures?  Subsetting a data frame can be expensive
when compared to using a matrix.  Could you use a matrix instead of a
data frame; are all the columns the same mode?  Again either a subset of
data would be helpful or an 'str' on the data objects being used so that
we can understand what they are.

On Fri, Jun 13, 2008 at 12:03 PM, Lana Schaffer <[EMAIL PROTECTED]>
wrote:
> Jim,
> My code is this:
>  mergefunc <- function(x,seqFile){
> # merge(seqFile,x)
> cbind(x, seqFile[ match(as.vector(x$index), as.vector(seqFile$index)),
> ])
> }
> LIX <- lapply(d.frame[[1]], mergefunc,seqFile=seqFile) Each 
> matrix/data.frame takes 0.2 seconds and then to do this 1240 times 
> takes ~4 minutes.
> Thanks,
> Lana
>
> -Original Message-
> From: jim holtman [mailto:[EMAIL PROTECTED]
> Sent: Thursday, June 12, 2008 6:40 PM
> To: Lana Schaffer
> Cc: r-help@r-project.org
> Subject: Re: [R] alternative to matching/merge?
>
> It would be nice if you at least included the code that you are using 
> and a subset of the data.  Have you run Rprof to determine which of 
> the functions is consuming the time?
>
> On Thu, Jun 12, 2008 at 3:25 PM, Lana Schaffer <[EMAIL PROTECTED]>
> wrote:
>>
>> Greetings,
>> I am doing matching/merge for a table (40919x3) to data which is in 
>> the form of a list of 1268 data.frames.  Using lapply this is taking
>> ~5 minutes.  I know that the match/merge functions are time 
>> consuming,
>
>> so is there an alternative to this accomplish this goal?  is lapply 
>> not efficient?
>>
>> Lana Schaffer
>> Biostatistics/Informatics
>> The Scripps Research Institute
>> DNA Array Core Facility
>> La Jolla, CA 92037
>> (858) 784-2263
>> (858) 784-2994
>> [EMAIL PROTECTED]
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
>
> --
> Jim Holtman
> Cincinnati, OH
> +1 513 646 9390
>
> What is the problem you are trying to solve?
>

--
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Level Plot and Scale of Colorkey

2008-06-13 Thread emma hartnett

I am drawing level plots but I would like to specify the range of the colorkey, 
I am not having any success figuring this out so any help would be greatly 
appreciated!

Here is an example of what I am trying to do:

disp<-1

x <- seq(1, 10,by=1) 
y <- seq(1,10,by=1)
g <- expand.grid(x = x, y = y)
g$z <- 1/exp((abs(g$x-5)+abs(g$y-5))*disp)
g$z<-g$z/sum(g$z)

levelplot(z ~ x * y, g,xlab="x co-ordinate", ylab="y co-ordinate" 
,colorkey=TRUE,col.regions=(col=gray((0:32)/32)))

I would like to enforce the number of divisions on the colorkey scale and the 
size – so for example from 0 to 0.1 in increments of 0.02 (just as an example). 
 

I apologize if this is an obvious question but I have read the documentation 
and scoured the archives and cannot figure it out.




  __
can.html

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] cluster.stats

2008-06-13 Thread Laura Poggio

Dear list,
I just tried to use the function cluster.stat in the package fpc.
I just have a couple of questions about the syntax:

cluster.stats(d,clustering,alt.clustering=NULL,
silhouette=TRUE,G2=FALSE,G3=FALSE)

1) the distance object (d) is an object obtained by the function dist() on
my own original matrix?
2) clustering is the clusters vector as result of one of the many clustering
methods?

Thank you very much in advance and sorry for such basic question, but I did
not manage to clarify my mind.

Laura

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Level Plot and Scale of Colorkey

2008-06-13 Thread Toby Marthews

Try

colscaledivs=100#colscaledivs=15 here is the R default
levelplot(z ~ x * y, g,xlab="x co-ordinate",ylab="y
co-ordinate",colorkey=TRUE,at=seq(from=-0.01,to=0.25,length=colscaledivs),col.regions=(col=gray((0:colscaledivs)/colscaledivs)))

Toby Marthews


Le Ven 13 juin 2008 18:50, emma hartnett a écrit :
> I am drawing level plots but I would like to specify the range of the
> colorkey, I am not having any success figuring this out so any help would
> be greatly appreciated!
>
> Here is an example of what I am trying to do:
>
> disp<-1
>
> x <- seq(1, 10,by=1)
> y <- seq(1,10,by=1)
> g <- expand.grid(x = x, y = y)
> g$z <- 1/exp((abs(g$x-5)+abs(g$y-5))*disp)
> g$z<-g$z/sum(g$z)
>
> levelplot(z ~ x * y, g,xlab="x co-ordinate", ylab="y co-ordinate"
> ,colorkey=TRUE,col.regions=(col=gray((0:32)/32)))
>
> I would like to enforce the number of divisions on the colorkey scale and
> the size – so for example from 0 to 0.1 in increments of 0.02 (just as an
> example).
>
> I apologize if this is an obvious question but I have read the
> documentation and scoured the archives and cannot figure it out.
>
>
>
>
>   __
> can.html
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] restricted coefficient and factor for linear regression.

2008-06-13 Thread Oh Dong-hyun


Hi,

my data set is data.frame(id, yr, y, l, e, k).

I would like to estimate Lee and Schmidts (1993, OUP) model in R.

My colleague wrote SAS code as follows:
** procedures for creating dummy variables are omitted **
** di# and dt# are dummy variables for industry and time **
data a2; merge a1 a2 a; by id yr;
 proc sysnlin maxit=100 outest=beta2;
 endogenous y;
 exogenous  l e k di1-di12 dt2-dt10;
 parms a0 0.94 al -0.14 ae 1.8 ak -0.9
 b1 0 b2 0 b3 0 b4 0 b5 0 b6 0 b7 0 b8 0 b9 0 b10 0 b11 0
 b12 0 c2 0 c3 0 c4 0 c5 0 c6 0 c7 0 c8 0 c9 0 c10 0;
 y=a0+al*l+ae*e+ak*k
 +(b1*di1+b2*di2+b3*di3+b4*di4+b5*di5+b6*di6
 +b7*di7+b8*di8+b9*di9+b10*di10+b11*di11+b12*di12)*
 (1*dt1+c2*dt2+c3*dt3+c4*dt4+c5*dt5+c6*dt6+c7*dt7
 +c8*dt8+c9*dt9+c10*dt10);
 title '* lee/schmidt parameter estimates *';

My R code is as follows:
##
library(plm)
dt <- read.table("dt.dta", sep = "\t", header= T)
dt$id <- factor(dt$id)
dt$yr <- factor(dt$yr)
fit.model <- I(log(y)) ~ I(log(l)) + I(log(e)) + yr * id
re.fit.gls <- pggls(fit.model, data = dt)
#

I've got the following error message:
# Error message ###
Error in dimnames(x) <- dn :
  length of 'dimnames' [2] not equal to array extent
 End of Error message

I would like to figure out three things.
1. How can I restrict coefficient in model? As you can see in SAS  
code, coefficient of dt1 is restricted to 1.
2. If it is possible to restrict coefficients, it is possible to  
restrict coefficients of factors? If so, how?


Thanks in advance.

Best,


=
Dong-hyun Oh
Center of Excellence for Science and Innovation Studies
Royal Institute or Technology, Sweden
e-mail: [EMAIL PROTECTED]
cel: +46 73 563 45 22

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Maximum likelihood estimation in R with censored Data

2008-06-13 Thread Ben Bolker

Bluder Olivia  k-ai.at> writes:

> 
> Hello,
> 
> I'm trying to calculate the Maximum likelihood estimators for a dataset
> which contains censored data.
> 
> I started by using the function "nlm", but isn't there a separate method
> for doing this for e.g. the "weibull" and the "log-normal" distribution?
> 
> Thanks,
> 
> Olivia

  This is not *quite* enough detail about what you
want to do.  Can you (as the posting guide suggests!)
give us a small example of what you want to do?  You may be able
to do this via the survreg() command in the survival
package, or you may want to do it yourself by constructing
a log-likelihood function with dweibull() for uncensored
data and pweibull() for censored data [or dlnorm/plnorm].

  Ben Bolker

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] nls() vs lm() estimates

2008-06-13 Thread Héctor Villalobos

Hi,

I'm trying to understand why the coefficients "a" and "b" for the model: W = 
a*L^b estimated
via nls() differs from those obtained for the log transformed model: log(W) = 
log(a) + b*log(L)
estimated via lm(). Also, if I didn't make a mistake, R-squared suggests a 
"better" adjustment
for the model using coefficients estimated by lm() . Perhaps I'm doing 
something wrong in
nls()?

I hope the code below explains this better. Thanks in advance for any hints.

Héctor


L <-
c(8,8.1,8.5,9,9.4,9.4,9.5,9.5,9.5,9.6,9.8,10,10,10,10,10,10,10,10,10,10,10.2,10.3,10.4,10.4,1
0.4,10.4,10.5,10.5,10.5,10.5,10.5,10.5,10.5,10.5,10.7,10.7,10.8,10.9,10.9,10.9,11,11,11,11,1
1,11,11,11,11,11,11,11,11,11,11,11,11,11.1,11.1,11.2,11.2,11.2,11.3,11.3,11.3,11.3,11.3,11.
4,11.4,11.4,11.4,11.5,11.5,11.5,11.5,11.5,11.5,11.5,11.5,11.6,11.6,11.6,11.6,11.6,11.6,11.6,
11.6,11.7,11.7,11.7,11.7,11.7,11.8,11.8,11.8,11.8,11.8,11.9,12,12,12,12,12,12,12,12,12,12,1
2,12,12,12,12,12,12,12,12,12,12,12,12,12,12.1,12.2,12.2,12.2,12.3,12.3,12.3,12.3,12.3,12.3,
12.3,12.3,12.3,12.4,12.4,12.4,12.4,12.4,12.4,12.5,12.5,12.5,12.5,12.5,12.5,12.5,12.5,12.6,12
.6,12.7,12.7,12.8,12.8,12.8,12.9,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,
13,13.2,13.2,13.3,13.5,13.5,13.5,13.5,13.5,13.5,14)

W <-
c(11,13,13.45,21.66,19.5,19.73,19.74,19.42,21.48,20.47,23.02,22.7,20.19,23.3,27.05,19.81,
20.01,26,24,25,20,25,26.29,31.26,23.08,29.85,24.27,27.49,25,26.03,24,26,28.21,24.62,21.6
9,24.68,23.6,25.42,26.7,30.25,30.06,33.62,32,30,32.46,30,30,28.8,30.2,31.44,32.84,33.04,3
5,28,29,33,34,28,28.51,35.67,33.72,33,28.53,34.85,34.5,37.44,37.74,31.36,30.12,36.03,33.4
,33.51,34,33,33.79,34.93,35,34.13,35.65,34,32.77,41.71,31.26,32.4,28.81,35.63,34.96,36.74
,32.38,38.14,34.12,40.26,40.27,36.96,38.35,42.36,40.33,31.59,34.44,38,42.63,40,36.28,37,3
4.4,34,33.64,39.05,40.46,35.45,38.72,35,33,35,33,40,35,37,36,32,43,35,40,33.54,40.06,43.3
8,40.3,44.81,43,46.32,37.45,37.71,45.9,36.1,44.78,43.12,45.5,41.62,38,37,43.08,43.82,47.2
5,43,41.59,43.58,41,44,48,43,45.46,43.5,43.38,47.54,45,46.92,44.75,49.02,43.37,43.44,48,4
3,46,42,48,45,48,43,45,46,43,40,42,40,43,43,50,44,50.65,42.11,50,51.44,53.1,52,56.2,45,49
,55)


## Using nls() to find "a" and "b" for model:  W = a*L^b
 WL.nls <- nls((W ~ a * L^b), start = list(a = 0.02, b = 1),
   trace = TRUE, algorithm = "default", model = TRUE)
  summary(WL.nls)

## Scatterplot with fitted model
 plot(L, W)
 lines(L, predict(WL.nls), col = "blue", lwd = 2)

## Finding "log(a)" and "b" for log transformed model: log(W) = log(a)+ b*log(L)
 logWL.lm <- lm(log10(W) ~ log10(L))
  summary(logWL.lm)

## Adding model to plot
 lines(L, 10^coef(logWL.lm)[1]*L^coef(logWL.lm)[2], col="red", lwd=2)

## R-squared for W = a*L^b
 Rsq.nls <- sum((predict(WL.nls) - mean(W))^2) / sum((W - mean(W))^2)

## R-squared for W = a*L^b with coefs from log(W) = log(a)+ b*log(L)
 pred <- 10^coef(logWL.lm )[1]*L^coef(logWL.lm )[2]
  Rsq.lm <- sum((pred - mean(W))^2) / sum((W - mean(W))^2)

  text(c(9, 13), c(50, 20), paste("R-squared:", formatC(c(Rsq.nls, Rsq.lm), 
digits=4)),
col=c("blue", "red"))


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] cluster.stats

2008-06-13 Thread Christian Hennig


Dear Laura,


Dear list,
I just tried to use the function cluster.stat in the package fpc.
I just have a couple of questions about the syntax:

cluster.stats(d,clustering,alt.clustering=NULL,
silhouette=TRUE,G2=FALSE,G3=FALSE)

1) the distance object (d) is an object obtained by the function dist() on
my own original matrix?


d is allowed to be an object of class dist or a dissimilarity matrix.
The answer to your question depends on what your "original matrix" is. If 
it is something on which you can compute a distance by dist(), you're 
right, at least if dist() delivers the distance you are interested in.



2) clustering is the clusters vector as result of one of the many clustering
methods?


The help page tells you what clustering can be. So it could be the 
clustering/partition vector of a clustering method or it could be something 
else. Note that cluster.stats doesn't depend on any particular clustering 
method. It computes the statistics regardless of where the clustering 
vector comes from.


Best regards,
Christian



Thank you very much in advance and sorry for such basic question, but I did
not manage to clarify my mind.

Laura

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



*** --- ***
Christian Hennig
University College London, Department of Statistical Science
Gower St., London WC1E 6BT, phone +44 207 679 1698
[EMAIL PROTECTED], www.homepages.ucl.ac.uk/~ucakche

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] help with colsplit (reshape)

> M.Data2 <- data.frame(M.Data, colsplit(M.Data$variable, split = "\\.", names
> = c("treatment", "time")))
>
> which gave:
>
> head(M.Data2)
>  pid variable value treatment  time
> 1   1predA-1 predA predA
> 2   2predA-2 predA predA
> 3   3predA-1 predA predA
> 4   4predA-2 predA predA
> 5   5predA-1 predA predA
> 6   6predA-2 predA predA
>
> Closer but no cigar.

Have a look at the whole thing - it's getting it right most of the
time.  Going back to the original variable names, I see that "PredA"
does not have a time associated with it.  What do you expect the time
to be?

> I would be grateful if someone will tell me (a) how to reshape the data as
> described above using the reshape package, (b) what difference between split
> = "." and split = "\\." is,

The splitting argument is a regular expression, and in regular
expression speak "." means to match any one character.  "\\." escapes
the full stop, so it only matches full stops.

> and (c) if more information about the colsplit
> command is available anywhere.

Probably the best way is just to look at the code (it's pretty simple):

> colsplit.character
function (x, split = "", names)
{
vars <- as.data.frame(do.call(rbind, strsplit(x, split)))
names(vars) <- names
as.data.frame(lapply(vars, function(x) type.convert(as.character(x
}

If strsplit doesn't do what you want, you might need to write your own
function following those lines.

Hadley

-- 
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] nls() vs lm() estimates

2008-06-13 Thread Janne Huttunen


Héctor Villalobos wrote:

Hi,

I'm trying to understand why the coefficients "a" and "b" for the model: W = 
a*L^b estimated
via nls() differs from those obtained for the log transformed model: log(W) = 
log(a) + b*log(L)
estimated via lm(). Also, if I didn't make a mistake, R-squared suggests a 
"better" adjustment
for the model using coefficients estimated by lm() . Perhaps I'm doing 
something wrong in
nls()?


I didn't tried your code, but in general these estimates are different: 
for the former estimate you minimize the norm of the difference W-a*L^b 
(W are ) and for the latter you minimize the norm of the difference 
log(W)-(log(a)+b*log(L)). The solution for these problems are equal. 
That which approach you should choose depends on errors, for additive 
error model the former is better choice.




--
Janne Huttunen
University of California
Department of Statistics
367 Evans Hall Berlekey, CA 94720-3860
email: [EMAIL PROTECTED]
phone: +1-510-502-5205
office room: 449 Evans Hall

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Maximum likelihood estimation in R with censored Data

2008-06-13 Thread Vincent Goulet



Le ven. 13 juin à 13:55, Ben Bolker a écrit :


Bluder Olivia  k-ai.at> writes:



Hello,

I'm trying to calculate the Maximum likelihood estimators for a  
dataset

which contains censored data.

I started by using the function "nlm", but isn't there a separate  
method
for doing this for e.g. the "weibull" and the "log-normal"  
distribution?


Thanks,

Olivia


 This is not *quite* enough detail about what you
want to do.  Can you (as the posting guide suggests!)
give us a small example of what you want to do?  You may be able
to do this via the survreg() command in the survival
package, or you may want to do it yourself by constructing
a log-likelihood function with dweibull() for uncensored
data and pweibull() for censored data [or dlnorm/plnorm].


If you want to go the second route, function coverage() in package  
actuar will build the censored density function for you. You can then  
feed this function to fitdistr() just like for "usual" ML estimation.


HTH  Vincent




 Ben Bolker

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] nls() vs lm() estimates

2008-06-13 Thread Janne Huttunen



Janne Huttunen wrote:

Héctor Villalobos wrote:

Hi,

I'm trying to understand why the coefficients "a" and "b" for the 
model: W = a*L^b estimated
via nls() differs from those obtained for the log transformed model: 
log(W) = log(a) + b*log(L)
estimated via lm(). Also, if I didn't make a mistake, R-squared 
suggests a "better" adjustment
for the model using coefficients estimated by lm() . Perhaps I'm doing 
something wrong in

nls()?


I didn't tried your code, but in general these estimates are different: 
for the former estimate you minimize the norm of the difference W-a*L^b 
(W are ) and for the latter you minimize the norm of the difference 
log(W)-(log(a)+b*log(L)). The solution for these problems are equal. 
That which approach you should choose depends on errors, for additive 
error model the former is better choice.


I should read what I have written before sending my message. I meant 
that the solutions of these problems are NOT equal (in general) and 
therefore estimates differ.



--
Janne Huttunen
University of California
Department of Statistics
367 Evans Hall Berkeley, CA 94720-3860
email: [EMAIL PROTECTED]
phone: +1-510-502-5205
office room: 449 Evans Hall

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Quartile regression question

2008-06-13 Thread Ranney, Steven

I have data that looks like

lake,loglength,logweight
1,2.369215857,1.929418926
1,2.426511261,2.230448921
1,2.434568904,2.298853076
1,2.437750563,2.298853076
1,2.442479769,2.230448921
1,2.445604203,2.356025857
...
102,2.722633923,3.310268367
102,2.781755375,3.502153893
102,2.836324116,3.683407299
102,2.802773725,3.583312152
102,2.790285164,3.546419267
102,2.806179974,3.599118565
102,2.716837723,3.316180099


I can regress log weight on log length simply enough, but how would I model the 
third quartile of log weights?  In other words, rather than finding a 2nd 
quartile (or 50th percentile) regression line, 

e.g., mod=lm(logweight~loglength)

can R find a 75th percentile line?  Further, since my data is lake>1, is there 
a way to run 3rd quartile regressions on each lake?  I would imagine that 
regressing each population would require some call of the subset function, but 
I cannot figure out how to call it.

Thanks in advance, 

SR 

Steven H. Ranney
Graduate Research Assistant (Ph.D)
USGS Montana Cooperative Fishery Research Unit
Montana State University
PO Box 173460
Bozeman, MT 59717-3460

phone: (406) 994-6643
fax:   (406) 994-7479


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] help with colsplit (reshape)

2008-06-13 Thread Ista Zahn


Thanks Hadley, with your help I'm getting things figured out.
On Jun 13, 2008, at 2:09 PM, hadley wickham wrote:

M.Data2 <- data.frame(M.Data, colsplit(M.Data$variable, split = "\ 
\.", names

= c("treatment", "time")))

which gave:

head(M.Data2)
pid variable value treatment  time
1   1predA-1 predA predA
2   2predA-2 predA predA
3   3predA-1 predA predA
4   4predA-2 predA predA
5   5predA-1 predA predA
6   6predA-2 predA predA

Closer but no cigar.


Have a look at the whole thing - it's getting it right most of the
time.  Going back to the original variable names, I see that "PredA"
does not have a time associated with it.  What do you expect the time
to be?
Right, there is no time associated with this variable. So I tried  
again, treating it as an id:


M.Data <- melt(Data, id = c("pid", "predA"))

From here I was able to achieve the desired result, as follows:

M.Data <- data.frame(M.Data, colsplit(M.Data$variable, split = "\\.",  
names=c("measure", "time")))

M.Data$variable <- M.Data$measure
M.Data <- M.Data[-5]
L.Data <- cast(M.Data, ... ~ variable)

This is perhaps a bit inelegant but it works! I'm interested in  
knowing if there is a better way to do it, but I'm happy that I've at  
least figured out this much. As always I'm humbled by the generosity  
of people who not only make their software available but also take the  
time to answer questions on this list. Thank you!


-Ista



I would be grateful if someone will tell me (a) how to reshape the  
data as
described above using the reshape package, (b) what difference  
between split

= "." and split = "\\." is,


The splitting argument is a regular expression, and in regular
expression speak "." means to match any one character.  "\\." escapes
the full stop, so it only matches full stops.


and (c) if more information about the colsplit
command is available anywhere.


Probably the best way is just to look at the code (it's pretty  
simple):



colsplit.character

function (x, split = "", names)
{
  vars <- as.data.frame(do.call(rbind, strsplit(x, split)))
  names(vars) <- names
  as.data.frame(lapply(vars, function(x)  
type.convert(as.character(x

}

If strsplit doesn't do what you want, you might need to write your own
function following those lines.

Hadley

--
http://had.co.nz/


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Quartile regression question

2008-06-13 Thread Philippe Grosjean

Hello,

Look at package quantreg.

Philippe Grosjean

Ranney, Steven wrote:

I have data that looks like

lake,loglength,logweight
1,2.369215857,1.929418926
1,2.426511261,2.230448921
1,2.434568904,2.298853076
1,2.437750563,2.298853076
1,2.442479769,2.230448921
1,2.445604203,2.356025857
...
102,2.722633923,3.310268367
102,2.781755375,3.502153893
102,2.836324116,3.683407299
102,2.802773725,3.583312152
102,2.790285164,3.546419267
102,2.806179974,3.599118565
102,2.716837723,3.316180099

I can regress log weight on log length simply enough, but how would I model the third quartile of log weights? In other words, rather than finding a 2nd quartile (or 50th percentile) regression line,

e.g., mod=lm(logweight~loglength)

can R find a 75th percentile line? Further, since my data is lake>1, is there
a way to run 3rd quartile regressions on each lake? I would imagine that
regressing each population would require some call of the subset function, but I
cannot figure out how to call it.

Thanks in advance,

Steven H. Ranney
Graduate Research Assistant (Ph.D)
USGS Montana Cooperative Fishery Research Unit
Montana State University
PO Box 173460
Bozeman, MT 59717-3460

phone: (406) 994-6643
fax: (406) 994-7479

[[alternative HTML version deleted]]

Re: [R] Quartile regression question

2008-06-13 Thread Ranney, Steven

Thanks for your help.  Worked great.

SR

Steven H. Ranney
Graduate Research Assistant (Ph.D)
USGS Montana Cooperative Fishery Research Unit
Montana State University
PO Box 173460
Bozeman, MT 59717-3460

phone: (406) 994-6643
fax:   (406) 994-7479




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] help with colsplit (reshape)

> Right, there is no time associated with this variable. So I tried again,
> treating it as an id:
>
> M.Data <- melt(Data, id = c("pid", "predA"))
>
> From here I was able to achieve the desired result, as follows:
>
> M.Data <- data.frame(M.Data, colsplit(M.Data$variable, split = "\\.",
> names=c("measure", "time")))
> M.Data$variable <- M.Data$measure
> M.Data <- M.Data[-5]
> L.Data <- cast(M.Data, ... ~ variable)
>
> This is perhaps a bit inelegant but it works! I'm interested in knowing if
> there is a better way to do it, but I'm happy that I've at least figured out
> this much. As always I'm humbled by the generosity of people who not only
> make their software available but also take the time to answer questions on
> this list. Thank you!

You're welcome.  And don't worry too much about data cleaning routines
being elegant - it's very very hard to write elegant code to clean up
something that's not at all elegant.

Hadley


-- 
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] alternative to matching/merge?

On Fri, Jun 13, 2008 at 11:45 AM, jim holtman <[EMAIL PROTECTED]> wrote:
> What is the structure of 'd.frame' and 'segFile'?  Run Rprof so that
> we can see which of the functions it is spending its time in.  What
> happens if x$index is not in seqFile$index?  Are the values in the
> 'index' unique in both structures?  Subsetting a data frame can be
> expensive when compared to using a matrix.  Could you use a matrix
> instead of a data frame; are all the columns the same mode?  Again
> either a subset of data would be helpful or an 'str' on the data
> objects being used so that we can understand what they are.

A few other ideas to try:

 * try merging do.call("rbind", d.frame) and seqFile, and then
spliting the results back up

 * try turning giving seqFile rownames (rownames(seqFile) <-
seqFile$index) and then use character matching:  cbind(x, seqFile[
as.character(x$index)]

 * if there is a one to one corresponding between index in seqFile and
all data.frames in d.frame, merge all of the d.frames together, order
both by index then just cbind

Hadley


-- 
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] stretching text vertically

2008-06-13 Thread Alex Reynolds

I'd like to stretch a plotted character vertically, to create a 
"sequence logo".


Is there a parameter to allow stretching text() output vertically or 
squeeze horizontally?


I know about Oliver Bembom's seqLogo library, but this generates a 
sequence logo plot using a separate bitmap device. I want to recreate 
the sequence logo *inside* an existing plot.


Alternatively, is there a way to embed one plot inside another?

I could use imagemagick outside R to 'montage' separate bitmaps, but 
then the sequence logo is going to be very difficult to align (base for 
base) with the plot I'm trying to join it to.


Thanks for any tips,
Alex

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Importing data with different delimters

2008-06-13 Thread David Arnold


All,

I have a data file with 56 entries that looks like this:

City State  JanTemp Lat Long
Mobile, AL  44  31.288.5
Montgomery, AL  38  32.986.8
Phoenix, AZ 35  33.6112.5
Little Rock, AR 31  35.492.8
Los Angeles, CA 47  34.3118.7
San Francisco, CA   42  38.4123.0

I would like to "read" this data into a dataframe. Is it possible to  
do without editing the datafile?


D.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Help with stat.table in Epi package,

2008-06-13 Thread Troy S

R Fans--

I am having problems with the following code.  It worked under R 2.6.0 but not 
in 2.7.0.


> library(Epi)
> df <- read.table( "c:/Documents and Settings/Troy S/My 
> Documents/debug_chisq_080613b.txt")
> summary(df)
  cvd agecat 
 Min.   :0.   (0,40] :1  
 1st Qu.:0.   (40,60]:2  
 Median :0.  
 Mean   :0.  
 3rd Qu.:0.5000  
 Max.   :1.  
> fa <- as.factor(df$cvd)
> fb <- as.factor(df$agecat)
> stat.table(index=list("a"=fa, "b"=fb))
Error in eval(expr, envir, enclos) : could not find function "count"

The file contents is 

"cvd" "agecat"
"1" 0 "(0,40]"
"2" 1 "(40,60]"
"3" 0 "(40,60]"

My sessionInfo is

R version 2.7.0 (2008-04-22) 
i386-pc-mingw32 

locale:
LC_COLLATE=English_United States.1252;LC_CTYPE=English_United 
States.1252;LC_MONETARY=English_United 
States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252

attached base packages:
[1] stats4splines   stats graphics  grDevices utils datasets 
[8] methods   base 

other attached packages:
[1] Epi_1.0.8 coin_0.6-9modeltools_0.2-15 mvtnorm_0.9-0
[5] survival_2.34-1  

loaded via a namespace (and not attached):
[1] tools_2.7.0
> 

Any help would be great!

Troy
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Importing data with different delimters

Assuming that the only problem is the blank in the city names, here is
one way of doing it:

> inFile <- textConnection("City State  JanTemp Lat Long
+ Mobile, AL  44  31.288.5
+ Montgomery, AL  38  32.986.8
+ Phoenix, AZ 35  33.6112.5
+ Little Rock, AR 31  35.492.8
+ Los Angeles, CA 47  34.3118.7
+ San Francisco, CA   42  38.4123.0")
> lines <- readLines(inFile)
> # get rid of blanks in city names
> newLines <- sub("(.*?) +(.*),", "\\1_\\2,", lines)
>
> x <- read.table(textConnection(newLines), header=TRUE)
> closeAllConnections()
> x
City State JanTemp  Lat  Long
1Mobile,AL  44 31.2  88.5
2Montgomery,AL  38 32.9  86.8
3   Phoenix,AZ  35 33.6 112.5
4   Little_Rock,AR  31 35.4  92.8
5   Los_Angeles,CA  47 34.3 118.7
6 San_Francisco,CA  42 38.4 123.0
>
>

If you want, you can then go back and replace the "_" with a blank in
the city name.

On Fri, Jun 13, 2008 at 7:14 PM, David Arnold <[EMAIL PROTECTED]> wrote:
> All,
>
> I have a data file with 56 entries that looks like this:
>
> City State  JanTemp Lat Long
> Mobile, AL  44  31.288.5
> Montgomery, AL  38  32.986.8
> Phoenix, AZ 35  33.6112.5
> Little Rock, AR 31  35.492.8
> Los Angeles, CA 47  34.3118.7
> San Francisco, CA   42  38.4123.0
>
> I would like to "read" this data into a dataframe. Is it possible to do
> without editing the datafile?
>
> D.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] rbind() problem

2008-06-13 Thread array chip

Hi, I would like to rbind 2 data frames. They both some common column names, 
but also some unique column names each, is there any simple function that rbind 
these 2 data frames with filling NAs for those columns of unique names?

thanks

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Correcting the display of colnames and rownames

2008-06-13 Thread Steve Murray


Dear all,

I have a data frame of dimension 720 columns by 360 rows, to which I am trying 
to add numerical row and column labels to, using the 'sequence' command. The 
original data, which I read in using 'read.table', had no such labels at all.

I've got as far as successfully using the sequence command and getting the 
labels to display. However, I'm finding that for the minus numbers in 
particular, the values aren't displaying correctly. For the value '-179.75' for 
example, it displays as 'X.179.75'. Even for positive numbers, the 'X' prefix 
appears at the start of the label (but without the '.').

I have tried numerous attempts at addressing this. I'm currently as far as 
adopting the following approach; I'll show what I've done for just the column 
headings - I've adopted the same approach for row headings, with the same 
results/problem so far.

columnnames <- seq(from = -179.75, to = 179.75, length = 720)
as.numeric <- colnames(Jan)
colnames(Jan) <- make.names(columnnames)

N.B. 'Jan' (as in January) refers to the data frame in question.

So my thinking here is to assign the values to be used as column labels to 
'columnnames', and use 'make.names' to assign these values to the column names 
of the data frame. I've also tried changing 'colnames(Jan)' to be a numeric 
class, as I was previously having problems assigning the values to the labels - 
I think because by default 'colnames' is of class 'character vector'?

If anyone is able to suggest a way how I can solve the problem of the values 
not being displayed as I'd hoped (namely, removing the 'X' and displaying '-' 
for minus numbers), then I'd be very grateful.

Many thanks,

Steve

_


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Using lm with a matrix?

2008-06-13 Thread jonboym

Many thanks, works great!

Charilaos Skiadas-3 wrote:
> 
> Try this:
> 
> lapply( 1:2, function(i) lm( y~x, data=list(x=xdat[,i], y=ydat[,i]) ) )
> 
> Haris Skiadas
> 

-- 
View this message in context: 
http://www.nabble.com/Using-lm-with-a-matrix--tp17708207p17829661.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Weights and coxph

2008-06-13 Thread mah

I am confuse by the results of  the weights option for coxph.  I
replicated each row three times from the help page for coxph in the
data frame test_freq.  I had expected that the coefficients,
significance tests, and tests of non-proportionality would yield the
same results for the replicated and non-replicated data, but the
output below shows differences in all three metrics.  Is this the
result of a curved response variable?  This is likely more of a
conceptual question than a language question, but all help is
sincerely appreciated.

Mike

> test1
$time
[1] 4 3 1 1 2 2 3

$status
[1]  1 NA  1  0  1  1  0

$x
[1] 0 2 1 1 1 0 0

$sex
[1] 0 0 0 0 1 1 1

$wt
[1] 3 3 3 3 3 3 3

> test_freq
   time status x sex
1 4  1 0   0
2 4  1 0   0
3 4  1 0   0
4 3 NA 2   0
5 3 NA 2   0
6 3 NA 2   0
7 1  1 1   0
8 1  1 1   0
9 1  1 1   0
101  0 1   0
111  0 1   0
121  0 1   0
132  1 1   1
142  1 1   1
152  1 1   1
162  1 0   1
172  1 0   1
182  1 0   1
193  0 0   1
203  0 0   1
213  0 0   1
> t1 <- coxph( Surv(time, status) ~ x + strata(sex), data=test1, weights=wt)
> summary(t1)
Call:
coxph(formula = Surv(time, status) ~ x + strata(sex), data = test1,
weights = wt)

  n=6 (1 observation deleted due to missingness)
  coef exp(coef) se(coef)zp
x 1.17  3.220.744 1.57 0.12

  exp(coef) exp(-coef) lower .95 upper .95
x  3.22  0.311 0.749  13.8

Rsquare= 0.353   (max possible= 0.999 )
Likelihood ratio test= 2.61  on 1 df,   p=0.106
Wald test= 2.47  on 1 df,   p=0.116
Score (logrank) test = 2.67  on 1 df,   p=0.102

> cox.zph(t1)
  rho   chisq p
x -0.0716 0.00598 0.938
> t_freq <- coxph( Surv(time, status) ~ x + strata(sex), data=test_freq)
> summary(t_freq)
Call:
coxph(formula = Surv(time, status) ~ x + strata(sex), data =
test_freq)

  n=18 (3 observations deleted due to missingness)
  coef exp(coef) se(coef)z p
x 1.41  4.090.756 1.86 0.063

  exp(coef) exp(-coef) lower .95 upper .95
x  4.09  0.245 0.929  18.0

Rsquare= 0.185   (max possible= 0.879 )
Likelihood ratio test= 3.69  on 1 df,   p=0.0549
Wald test= 3.47  on 1 df,   p=0.0626
Score (logrank) test = 3.84  on 1 df,   p=0.0499

> cox.zph(t_freq)
  rho  chisq p
x -0.0697 0.0526 0.819

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] overlaid transparent histograms

2008-06-13 Thread Austin Frank

Hello all--

I'm attempting to produce overlaid histograms with partially transparent
columns.  Whether this display will end up being useful, I can't say.
But I do want to get it right.

I've already got one solution (shown below), but I tried some other
versions and had questions about my results.  (Note:  I'm using a quartz
device, so transparency shows up correctly.  You might need to print to
a pdf device to get transparency, according to the docs I've read)

--8<---cut here---start->8---
## Working version:
data(lexdec, package="languageR")
attach(lexdec)

x <- log(c(BNCw, Frequency))
label <-  c(rep("BNCw", length(BNCw)),
rep("CELEX", length(Frequency)))
h <- data.frame(x, label)

g <- ggplot(h, aes(x=x, fill=label))
g +
  geom_bar(position="identity") +
  scale_fill_manual(values = c(
  alpha("red", 0.5),
  alpha("blue", 0.5)))
detach(lexdec)  
--8<---cut here---end--->8---


Three questions:
1a)  Why does the following code not produce transparent bars?
1b)  How can I manually specify the elements of the legend for this
 version of the plot?

--8<---cut here---start->8---
## Non-working version
data(lexdec, package="languageR")

g <- ggplot(lexdec)
g +
  geom_histogram(aes(x=log(BNCw), fill = alpha("red", .5))) +
  geom_histogram(aes(x=log(BNCc), fill = alpha("blue", .5)))
--8<---cut here---end--->8---

2) Does anyone have a way to accomplish the same thing in lattice?  I
   saw the post at
   
http://www.nabble.com/Overlay-plots-from-different-data-sets-using-the-Lattice-package-tp14824421p14824421.html,
   but couldn't figure out how to extend these suggestions to overlaid
   transparent histograms.

Thanks in advance for any help,
/au

> sessionInfo()
R version 2.7.0 (2008-04-22) 
powerpc-apple-darwin8.10.1 

locale:
C

attached base packages:
[1] grid  splines   stats graphics  grDevices utils datasets 
[8] methods   base 

other attached packages:
 [1] ggplot2_0.6colorspace_0.95RColorBrewer_1.0-2 MASS_7.2-42   
 [5] proto_0.3-8reshape_0.8.0  languageR_0.92 coda_0.13-2   
 [9] lme4_0.999375-15   Matrix_0.999375-10 zipfR_0.6-0lattice_0.17-8
[13] Design_2.1-1   survival_2.34-1Hmisc_3.4-3

-- 
Austin Frank
http://aufrank.net
GPG Public Key (D7398C2F): http://aufrank.net/personal.asc


pgpY8PedpKU6o.pgp
Description: PGP signature
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] "False convergence" in LME

2008-06-13 Thread Rebecca Sela

I tried to use LME (on a fairly large dataset, so I am not including it), and I 
got this error message:

Error in lme.formula(formula(paste(c(toString(TargetName), 
"as.factor(nodeInd)"),  : 
  nlminb problem, convergence error code = 1
  message = false convergence (8)

Is there any way to get more information or to get the potentially wrong 
estimates from LME?

(Also, the page in the NLMINB documentation,  
http://netlib.bell-labs.com/cm/cs/cstr/153.pdf, has errors in it, which makes 
it harder to check on what is happening.)

Thank you in advance!

Rebecca

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Subset by Factor by date

2008-06-13 Thread T.D.Rudolph


I have a dataframe, x, with over 60,000 rows that contains one Factor, "id",
with 27 levels.  
The dataframe contains numerous continuous values (along column "diff") per
day (column "date") for every level of id.  I would like to select only one
row per animal per day, i.e. that containing the minimum value of "diff",
along the full length of 1:nrow(x).  I am not yet able to conduct anything
beyond the simplest of functions and I was hoping someone could suggest an
effective way of producing this output.

e.g. given this input:

id  day diff
1  01-01-09  0.5
1  01-01-09  0.7
2  01-01-09  0.2
2  01-01-09  0.4
1  01-02-09  0.1
1  01-02-09  0.3
2  01-02-09  0.3
2  01-02-09  0.4

I would like to produce this output:
id day  diff
1  01-01-09  0.5
2  01-01-09  0.2
1  01-02-09  0.1
2  01-02-09  0.3

It doesn't seem extremely difficult but I'm sure there are easier ways than
how I am currently approaching it!
-- 
View this message in context: 
http://www.nabble.com/Subset-by-Factor-by-date-tp17835631p17835631.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Subset by Factor by date

2008-06-13 Thread Marc Schwartz

on 06/13/2008 11:10 PM T.D.Rudolph wrote:

I have a dataframe, x, with over 60,000 rows that contains one Factor, "id",
with 27 levels.  
The dataframe contains numerous continuous values (along column "diff") per

day (column "date") for every level of id.  I would like to select only one
row per animal per day, i.e. that containing the minimum value of "diff",
along the full length of 1:nrow(x).  I am not yet able to conduct anything
beyond the simplest of functions and I was hoping someone could suggest an
effective way of producing this output.

e.g. given this input:

id  day diff
1  01-01-09  0.5
1  01-01-09  0.7
2  01-01-09  0.2
2  01-01-09  0.4
1  01-02-09  0.1
1  01-02-09  0.3
2  01-02-09  0.3
2  01-02-09  0.4

I would like to produce this output:
id day  diff
1  01-01-09  0.5
2  01-01-09  0.2
1  01-02-09  0.1
2  01-02-09  0.3

It doesn't seem extremely difficult but I'm sure there are easier ways than
how I am currently approaching it!

See ?aggregate

> DF
  id  day diff
1  1 01-01-09  0.5
2  1 01-01-09  0.7
3  2 01-01-09  0.2
4  2 01-01-09  0.4
5  1 01-02-09  0.1
6  1 01-02-09  0.3
7  2 01-02-09  0.3
8  2 01-02-09  0.4

> aggregate(DF$diff, list(id = DF$id, day = DF$day), min, na.rm = TRUE)
  id  day   x
1  1 01-01-09 0.5
2  2 01-01-09 0.2
3  1 01-02-09 0.1
4  2 01-02-09 0.3

Note that I have not converted the 'day' column to a 'date' class. You 
would need to do that to perform any other date related operations 
(including chronological sorting) on that column. See ?as.Date for more 
information. For example:

  DF$day <- as.Date(DF$day, format = "%m-%d-%y")

HTH,

Marc Schwartz

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Looping, Control Flow & Conditional Statements

2008-06-13 Thread Garth.Warren


Thanks Chuck, 'rle' was just what I needed.

G

-Original Message-
From: Charles C. Berry [mailto:[EMAIL PROTECTED] 
Sent: Saturday, 14 June 2008 02:00
To: Warren, Garth (CSE, Gungahlin)
Cc: r-help@r-project.org
Subject: Re: [R] Looping, Control Flow & Conditional Statements



See

?rle

Start with this:

> a1.runs <- rle( a1 )
> a1.runs$lengths[ a1.runs$values>0 ]
[1] 3 4
>

HTH,

Chuck

p.s.

> library(fortunes)
> fortune(106)

If the answer is parse() you should usually rethink the question.
-- Thomas Lumley
   R-help (February 2005)
--

see

?get

On Fri, 13 Jun 2008, [EMAIL PROTECTED] wrote:

> Dear R Group:
>
>
>
> I have little experience using R and even less experience with control
> flow type questions.
>
>
>
> See the following code:
>
>
>
> a1 = c(0, 1, 1, 1,
>
> 0, 0, 0, 0, 0,
>
> 0, 0, 1,
>
> 1, 1, 1, 0, 0)
>
>
>
> for(i in 1:1){
>
>sx <- paste("a",i,sep="")
>
>s <- eval(parse(text = paste("a",i,sep="")))
>
> {g = numeric(length(s))
>
> k = numeric(length(s))
>
>{for (i in 1:length(s))
>
>{for (j in 1:length(s))
>
>ifelse(((j=i)>1),(g[j] = s[j] + s[i]),(k[j] = s[j] + s[i]))
>
> }}
>
> h1 <- hist(g,freq=TRUE)
>
> h <- h1$counts[4]
>
> cat(sx,":", h,"\n",file = "C:/temp/test-beta.txt", append=TRUE)
>
> }}
>
>
>
>
>
> The output is:
>
>> g
>
> [1] 0 2 2 2 0 0 0 0 0 0 0 2 2 2 2 0 0
>
>> k
>
> [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>
>>
>
>> h
>
> [1] 7
>
>
>
> & a text file, which has:
>
>a1 : 7
>
>
>
> k is a by-product of the ifelse statement and is of no interest & g
and
> h only go part-way to answering my question, which is:
>
>
>
> For every time an object i.e. a1 (which is actually a time series) - 0
1
> 1 1 0 0 0 0 0 0 0 1 1 1 1 0 0   has as value over 0 how long do the
> values stay above 0. So in this case a1 has two goups or events where
> the value is above zero, the first event lasts for 3 'days' and the
> second event lasts for 4 'days'. I have my code telling me that there
> was a total of 7 'days' in event or above 0, but what I need to know
is
> that there were two 'events' and the 1st lasted 3 'days' and the 2nd
> lasted '4' days. Essentially I want a text file output to say:
>
>
> a1.1 : 3
>
>
> a1.2 : 4
>
>
>
> My thinking is that I need to somehow get the code working through
each
> vector one value at a time and when a value is found to meet the
critera
> of > 0  R creates a new vector; to use the above example it would come
> to the first value >0 and then create the new vector a1.1 = (1,1,1)
then
> as the next value in the series is 0 it would close this new vector
> 'a1.1'. It would then continue until it reaches the next value >0 and
> then create the vector a1.2 = (1,1,1,1) then again as the next value
in
> the series is 0 it would close this new vector, and so on.
>
>
>
> Then all I need to do is perform a count of '1's in these new vectors
to
> find how many days they met this criteria of being greater than 0
>
>
>
> I hope the above makes sense and I really hope there is someone
willing
> and able to help. I don't know how to proceed.
>
>
>
> Thanks,
>
> Garth
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>   [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Charles C. Berry(858) 534-2098
 Dept of Family/Preventive
Medicine
E mailto:[EMAIL PROTECTED]  UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego
92093-0901

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] overlaid transparent histograms

> Three questions:
> 1a)  Why does the following code not produce transparent bars?

Because you're setting the fill colour (not mapping it to a variable
in your dataset), the fill needs to be outside of aes()

g +
 geom_histogram(aes(x=log(BNCw)), fill = alpha("red", .5)) +
 geom_histogram(aes(x=log(BNCc)), fill = alpha("blue", .5))


> 1b)  How can I manually specify the elements of the legend for this
> version of the plot?

Use the "manual" scale:

g +
geom_histogram(aes(x=log(BNCw), fill = "w")) +
geom_histogram(aes(x=log(BNCc), fill = "c")) +
scale_fill_manual("BNC type", values = alpha(c("red","blue"), 0.5))

Hadley

-- 
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Subset by Factor by date

2008-06-13 Thread T.D.Rudolph


aggregate() is indeed a useful function in this case, but it only returns the
columns by which it was grouped.  Is there a way I can use this while
simultaneously retaining all the other column values in the dataframe? 

e.g. add superfluous (yet pertinent for later) column containing any
information at all and retain it in the final output


Marc Schwartz wrote:
> 
> on 06/13/2008 11:10 PM T.D.Rudolph wrote:
>> I have a dataframe, x, with over 60,000 rows that contains one Factor,
>> "id",
>> with 27 levels.  
>> The dataframe contains numerous continuous values (along column "diff")
>> per
>> day (column "date") for every level of id.  I would like to select only
>> one
>> row per animal per day, i.e. that containing the minimum value of "diff",
>> along the full length of 1:nrow(x).  I am not yet able to conduct
>> anything
>> beyond the simplest of functions and I was hoping someone could suggest
>> an
>> effective way of producing this output.
>> 
>> e.g. given this input:
>> 
>> id  day diff
>> 1  01-01-09  0.5
>> 1  01-01-09  0.7
>> 2  01-01-09  0.2
>> 2  01-01-09  0.4
>> 1  01-02-09  0.1
>> 1  01-02-09  0.3
>> 2  01-02-09  0.3
>> 2  01-02-09  0.4
>> 
>> I would like to produce this output:
>> id day  diff
>> 1  01-01-09  0.5
>> 2  01-01-09  0.2
>> 1  01-02-09  0.1
>> 2  01-02-09  0.3
>> 
>> It doesn't seem extremely difficult but I'm sure there are easier ways
>> than
>> how I am currently approaching it!
> 
> See ?aggregate
> 
>  > DF
>id  day diff
> 1  1 01-01-09  0.5
> 2  1 01-01-09  0.7
> 3  2 01-01-09  0.2
> 4  2 01-01-09  0.4
> 5  1 01-02-09  0.1
> 6  1 01-02-09  0.3
> 7  2 01-02-09  0.3
> 8  2 01-02-09  0.4
> 
> 
>  > aggregate(DF$diff, list(id = DF$id, day = DF$day), min, na.rm = TRUE)
>id  day   x
> 1  1 01-01-09 0.5
> 2  2 01-01-09 0.2
> 3  1 01-02-09 0.1
> 4  2 01-02-09 0.3
> 
> 
> Note that I have not converted the 'day' column to a 'date' class. You 
> would need to do that to perform any other date related operations 
> (including chronological sorting) on that column. See ?as.Date for more 
> information. For example:
> 
>DF$day <- as.Date(DF$day, format = "%m-%d-%y")
> 
> 
> HTH,
> 
> Marc Schwartz
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Subset-by-Factor-by-date-tp17835631p17836046.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] strsplit, keeping delimiters