Re: [R] Rprintf and C stack usage is too close to the limit

2008-06-13 Thread Prof Brian Ripley
Rprintf is the right function, but this is the wrong list (please see the 
posting guide).  The issue is related to your 'C++ program', not to R 
itself and we have no details.  Non-R programming questions should go to 
R-devel, as the posting guide says.


On Thu, 12 Jun 2008, Youyi Fong wrote:


Hi,

I would appreciate if someone could comment on this problem I am
experiencing. I am writing a C++ program to be called from R. In this
program, there is a verbose switch that decides whether to print some
debugging info using Rprintf. On windows, things work ok. On linux, things
are fine in non-verbose mode, but in verbose mode, I get error saying C
stack usage is too close to the limit after a few lines are printed.

Is Rprintf the right function to use for showing message on R console? If
yes, what should I do about the error message?

Thank you very much in advance! This problem has been bugging me for a few
days now.

Youyi

--
Youyi Fong, Graduate Student, Department of Biostatistics
University of Washington, Box 357232, Seattle, WA 98195

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Regex for Special Characters under Grep

2008-06-13 Thread Prof Brian Ripley

On Thu, 12 Jun 2008, Henrik Bengtsson wrote:


A regular set is given by [set].  The complementary set is given
by [^set] where set is a set of symbols.  I don't think you have
to escape symbols in set (but I might be wrong).


This covered in ?regexp.  The metacharacters in character classes (the 
official name for your 'regular set') are ^]-\.



In any case, this does what you want:


lines - c(abc, !abc, #abc, ^abc,  #abc)
pattern - ^[^!#^];
grep(pattern, lines, value=TRUE)

[1] abc#abc

/Henrik


On Thu, Jun 12, 2008 at 8:06 PM, Marc Schwartz
[EMAIL PROTECTED] wrote:

on 06/12/2008 08:42 PM Gundala Viswanath wrote:


Hi all,

I am trying to capture lines of a file that DO NOT
start with the following header: !, #, ^

But somehow my regex used under grep doesn't
work.

Please advice what's wrong with my code below.

__BEGIN__
in_fname - paste(mydata.txt,.soft,sep=)
data_for_R - paste(data_for_R/, args[3], .softR, sep=)

# my regex construction
cat(temp[-grep(^[\^\!\#],temp,perl=TRUE)], file=data_for_R, sep=\n)


dat - read.table(data_for_R)
___END__



You need to double the escape character when being used to differentiate
meta-characters in a regex. Note also that the only meta-character in your
sequence is the carat ('^').

Lines - c(! Not This Line, # Not This Line, ^ Not This Line,
  This Line)


Lines

[1] ! Not This Line # Not This Line ^ Not This Line
[4] This Line


grep(^[!#\\^], Lines)

[1] 1 2 3


Lines[-grep(^[!#\\^], Lines)]

[1] This Line


HTH,

Marc Schwartz

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Getting Batch mode to continue running a script after running into errors

2008-06-13 Thread Prof Brian Ripley

?stop explains why this happens and how to change it.

You can also set options(error=expression(NULL)) to ignore all errors, and
use tryCatch() (or its wrapper try()) skip particular expressions if tjhey 
fail.


But surely in your example your script should check for existence of the 
file by file.exists() or file.access()?



On Thu, 12 Jun 2008, Josh wrote:


I'm invoking R in batch mode from a bash script as follows:

R --no-restore --no-save --vanilla
$TARGET/$directory/o2sat-$VERSION.R
$TARGET/$directory/o2sat-$VERSION.Routput

When R comes across some error in the script however it seems to halt
instead of running subsequent lines in the script:

Error in file(file, r) : cannot open the connection
Calls: read.table - file
In addition: Warning message:
In file(file, r) :
 cannot open file '/datapool/experiments/ois/080502/petri': No such
file or directory
Execution halted


How can I get R to continue running the script even if it comes across
errors? Thanks in advance

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] x86 SSE* Pointer Favors

2008-06-13 Thread Ivan Adzhubey
Hi Ivo,

On Friday 13 June 2008 12:23:06 am ivo welch wrote:
 Dear Statisticians--- This is not even an R question, so please
 forgive me.  I have so much ignorance in this matter that I do not
 know where to begin.  I hope someone can point me to documentation
 and/or a sample.

You will sure find some answers to your questions if you look into 
R-admin.html file under Building from source section. Do a search on BLAS 
and you will be presented with some options. Using a bit of R web site search 
on the same keyword will give you even more food for thought.

 I want to compute a covariance as quickly as non-humanly possible on
 an Intel core processor (up to SSE4) under linux.  Alas, I have no
 idea how to engage CPU vectorization.  Do I need to use special data
 types, or is double correct?  Does SSE* understand NaN?  Should I
 rely on gcc autodetection of the vectorized meaning of my code, or are
 there specific libraries that I should call?

I use Goto BLAS library and it works great. Usually runs 3 to 30 times faster 
than the stock R BLAS library, depending on your code. Enabling SSE 
instructions in addition while building R (yes, you have to enable them 
explicitly, see man gcc) is possible but does not help much since all maths 
is mostly done in BLAS.

That said, optimized BLAS libraries give most speed increase with older 
processors. Newer crop of multi-core CPUs with large shared caches is much 
more difficult to hand-tune code for. You may want to subscribe to Goto BLAS 
mailing list for an in-depth discussion. ATLAS community is also very helpful 
(I use their code with our AMD CPUs).

 What I want to learn about is as simple as it gets:
   typedef double Double;  // or whatever SSE* needs as close equivalent
   Double vector1[N], vector2[N];
   // then fill them with stuff.

R does not have types, everything that does not look like character string or 
an integer is treated as double. All arithmetics are always done in double 
precision.

   vector3= vector_mult(vector1,vector2, N);
   vector4= sum(vector1, N);

 I just need a pointer and/or primer.  PS: If someone knows of a
 superfast vectorized implementation of Gentleman's WLS algorithm,
 please point me to it, too.  I am still using my old non-vectorized C
 routines.

HTH,
Ivan

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] x86 SSE* Pointer Favors

2008-06-13 Thread Prof Brian Ripley

Let me pick up on

Enabling SSE instructions in addition while building R (yes, you have to 
enable them explicitly, see man gcc) is possible but does not help much 
since all maths is mostly done in BLAS.


The final part is not true for my 'maths', only for those doing linear 
algebra.  Enabling use of SSE registers can help with CPU scheduling, and 
so can have a suprisingly large effect, so if you only run R on a single 
CPU type it is worth tuning the code to that CPU (e.g. -mtune=core2) 
alongside turning up optimization levels.



On Fri, 13 Jun 2008, Ivan Adzhubey wrote:


Hi Ivo,

On Friday 13 June 2008 12:23:06 am ivo welch wrote:

Dear Statisticians--- This is not even an R question, so please
forgive me.  I have so much ignorance in this matter that I do not
know where to begin.  I hope someone can point me to documentation
and/or a sample.


You will sure find some answers to your questions if you look into
R-admin.html file under Building from source section. Do a search on BLAS
and you will be presented with some options. Using a bit of R web site search
on the same keyword will give you even more food for thought.


I want to compute a covariance as quickly as non-humanly possible on
an Intel core processor (up to SSE4) under linux.  Alas, I have no
idea how to engage CPU vectorization.  Do I need to use special data
types, or is double correct?  Does SSE* understand NaN?  Should I
rely on gcc autodetection of the vectorized meaning of my code, or are
there specific libraries that I should call?


I use Goto BLAS library and it works great. Usually runs 3 to 30 times faster
than the stock R BLAS library, depending on your code. Enabling SSE
instructions in addition while building R (yes, you have to enable them
explicitly, see man gcc) is possible but does not help much since all maths
is mostly done in BLAS.

That said, optimized BLAS libraries give most speed increase with older
processors. Newer crop of multi-core CPUs with large shared caches is much
more difficult to hand-tune code for. You may want to subscribe to Goto BLAS
mailing list for an in-depth discussion. ATLAS community is also very helpful
(I use their code with our AMD CPUs).


What I want to learn about is as simple as it gets:
  typedef double Double;  // or whatever SSE* needs as close equivalent
  Double vector1[N], vector2[N];
  // then fill them with stuff.


R does not have types, everything that does not look like character string or
an integer is treated as double. All arithmetics are always done in double
precision.


  vector3= vector_mult(vector1,vector2, N);
  vector4= sum(vector1, N);

I just need a pointer and/or primer.  PS: If someone knows of a
superfast vectorized implementation of Gentleman's WLS algorithm,
please point me to it, too.  I am still using my old non-vectorized C
routines.


HTH,
Ivan

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Problem with rowMeans()

2008-06-13 Thread Wacek Kusnierczyk
Erik Iverson wrote:


 ss wrote:
 It is:

   data -
 read.table('E-TABM-1-processed-data-1342561271_log2_with_symbols.txt',
 row.names = NULL ,header=TRUE, fill=TRUE)
   class(data[3])
 [1] data.frame
  


 Oops, should have said  class(data[[3]]) and
 is.numeric(data[[3]])

oops, my typo.  of course, data[3] is a *data frame* (if data is one),
so is.numeric(data[3]) must be FALSE.  but clearly if column 3 was
excluded, is.numeric(data[[3]]) must have been FALSE.

vQ

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] parsing - input buffer overflow

2008-06-13 Thread Daniel Malter
Hi,

I am trying to parse a large amount of text using gregexpr(). Unfortunately,
I get an input buffer overflow message when I attempt that with too large
an amount of text. The error messages occurs before the parsing. The problem
is that I cannot assign the text to a variable (an object) if the text is
too large.

This problem has been mentioned before, which I found using the RSiteSearch.
However, the post is from 2006, and I thought it might have improved by now.
Is there any way to increase the limit or to get around this problem?

x=Saint Lucia, Saint Kitts and Nevis, Saint Helena, Clipperton Island,
Tristan da Cunha

#What I want to achieve is to parse the text for the number of occurrences
of a certain character string within the text.

#This is done using:

n=100 #choose n large enough
length(which(is.na(gregexpr(Saint,x,ignore.case=TRUE)[[1]][1:n])==FALSE))

But again, if the text is large, I cannot assign it to x. I'd be grateful
for any suggestions.

Cheers,
Daniel


-
cuncta stricte discussurus

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Sweave: looping over mixed R/LaTeX code

2008-06-13 Thread Stephan Kolassa
Dear guRus,

I would like to loop over a medium amount of Sweave code, including both R and 
LaTeX chunks. Is there any way to do so? As an illustration, can I create a 
.tex file like this using a loop within a .Rnw file, where the 1,2,3 comes 
from some iteration variable in R?


\documentclass{article}
\usepackage{Sweave}
\begin{document}
Iteration 1
Iteration 2
Iteration 3
\end{document}


Right now, I do have a working but painful solution. I put the loop contents in 
a separate loop.Rnw file, then:
1. run everything before the loop through R for initialization
2. Sweave loop.Rnw; shell(move loop.tex loop_1.tex)
   Sweave loop.Rnw; shell(move loop.tex loop_2.tex)
   ...
   Sweave loop.Rnw; shell(move loop.tex loop_n.tex)
3. \input all loop_i.tex files into master.Rnw and Sweave master.Rnw

This does what I need, however, it is a major pain code-wise, e.g., there 
appears to be no way to control the loop during execution (n must be known in 
advance), and I need to control all graphics using \includegraphics with the 
iteration counter paste()d into the filename.

An alternative may be not using Sweave and working with one giant sink() and 
lots of print()s, letting R just write the entire .tex file. This also appears 
inelegant to me.

Is there a better way to do this?

I have tried to do my homework, see below. Do I get partial credit ;-) ?

Thank you all for your time!
Stephan


#


I can't simply start a for loop within an R chunk and finish it in another one.

whiledo in the ifthen.sty package doesn't like Sweave at all. And of course, it 
would simply reuse the R chunks if it did work, without changing things between 
loops. For the same reason, I cannot define a \newcommand{\loopcontent}{...} 
with the entire loop contents and then simply write \loopcontent \loopcontent 
... or \input or \include the loop content from an external file.

Of course it would be possible to not use Sweave and just use the output from 
the R console, but there are a couple of figures I would really like to see 
close to the relevant portions of the calculations.

I also thought about putting the entire loop in *one* R chunk, but then I see 
no way to include LaTeX chunks *within* this R chunk. I can't just sink() to 
the .tex file in the middle of the R chunk (as the sink() gets appended to the 
.tex file only after Sweave is done with it). 

I have read the Sweave manual and FAQs and the R/R Windows FAQ, I did both 
RSiteSearches and RSeek searches for all combinations of Sweave and loop, 
for, while I could think of.

For what it's worth, here's my sessionInfo():

R version 2.7.0 (2008-04-22) 
i386-pc-mingw32 

locale:
LC_COLLATE=German_Germany.1252;LC_CTYPE=German_Germany.1252;LC_MONETARY=German_Germany.1252;LC_NUMERIC=C;LC_TIME=German_Germany.1252

attached base packages:
[1] stats graphics  grDevices utils datasets  tcltk methods   base  
   

other attached packages:
[1] svIDE_0.9-5

loaded via a namespace (and not attached):
[1] svMisc_0.9-5

--

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] adding custom axis to image.plot() and strange clipping behavior

2008-06-13 Thread Stephen Tucker
Hi list,

I wanted to plot an image with a colorbar to the right of the plot, but set my 
own axis labels (text rather than numbers) to the image. I have previously 
accomplished this with two calls to image(), but the package 'fields' has a 
wrapper function, image.plot(), which does this task conveniently.

However, I could not add axes to the original image after a call to 
image.plot(); I have found that I needed to set par(xpd=TRUE) within the 
function to allow this to happen:

###=== begin code
library(fields)

## make data matrix
m - matrix(1:15,ncol=3)

## plot
image.plot(m,axes=FALSE)
axis(1) # doesn't work

par(xpd=TRUE)
axis(1) # still doesn't work

## replace the 28th element of the body of image.plot()
## and assign to new function called 'imp'
## here I just use the second condition of 'if' statement
## and set 'xpd = TRUE'
imp - `body-`(image.plot,value=`[[-`(body(image.plot),28,
quote({par(big.par)
  par(plt = big.par$plt, xpd = TRUE)
  par(mfg = mfg.save, new = FALSE)
  invisible()})))
imp(m,axes=FALSE)
box()
axis(1,axTicks(1),lab=letters[1:length(axTicks(1))])
## clip to plotting region for additional
## graphical elements to be added:
par(xpd=FALSE)
abline(v=0.5)
###=== end code

I wonder if anyone has any insights into this behavior? Since in the axis() 
documentation, it says:
Note that xpd is not accepted as clipping is always to the device region
I am surprised to find (1) that the par(xpd=TRUE) works in the case above, and 
(2) that it must be called before the function call is terminated.

I wonder if anyone has any insights into this behavior. I have reproduced this 
on both my Linux box (Ubuntu Gutsy Gibbon 64-bit, R 2.7.0, fields package 
version 4.1) and Windows machine (32-bit XP Pro, R 2.7.0, fields package 
version 4.1).

Thanks very much,

Stephen

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Sweave: looping over mixed R/LaTeX code

2008-06-13 Thread Delphine Fontaine
Dear Stephan,

I have the same problem than you. My solution is a bit different but not very 
elegant
I have a master document (let say master.Snw) and a file containing the code to 
repeat (which would be in the loop).
In the master document I start a counter at 0, and I copy  
\SweaveInput{loop.Snw} as many times as the n of the loop.
And in my loop.Snw, I don't forget to increment the counter of 1.
Not marvelous, but it works...

Delphine




Delphine Fontaine
Statistician
Data  Statistics Department
Genexion SA 

 Please consider the environment before printing this e-mail


 -Original Message-
 From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
 project.org] On Behalf Of Stephan Kolassa
 Sent: vendredi 13 juin 2008 10:22
 To: r-help@r-project.org
 Subject: [R] Sweave: looping over mixed R/LaTeX code
 
 
 Dear guRus,
 
 I would like to loop over a medium amount of Sweave code, including
 both R and LaTeX chunks. Is there any way to do so? As an illustration,
 can I create a .tex file like this using a loop within a .Rnw file,
 where the 1,2,3 comes from some iteration variable in R?
 
 
 \documentclass{article}
 \usepackage{Sweave}
 \begin{document}
 Iteration 1
 Iteration 2
 Iteration 3
 \end{document}
 
 
 Right now, I do have a working but painful solution. I put the loop
 contents in a separate loop.Rnw file, then:
 1. run everything before the loop through R for initialization
 2. Sweave loop.Rnw; shell(move loop.tex loop_1.tex)
Sweave loop.Rnw; shell(move loop.tex loop_2.tex)
...
Sweave loop.Rnw; shell(move loop.tex loop_n.tex)
 3. \input all loop_i.tex files into master.Rnw and Sweave master.Rnw
 
 This does what I need, however, it is a major pain code-wise, e.g.,
 there appears to be no way to control the loop during execution (n must
 be known in advance), and I need to control all graphics using
 \includegraphics with the iteration counter paste()d into the filename.
 
 An alternative may be not using Sweave and working with one giant
 sink() and lots of print()s, letting R just write the entire .tex file.
 This also appears inelegant to me.
 
 Is there a better way to do this?
 
 I have tried to do my homework, see below. Do I get partial credit ;-)
 ?
 
 Thank you all for your time!
 Stephan
 
 
 #
 
 
 I can't simply start a for loop within an R chunk and finish it in
 another one.
 
 whiledo in the ifthen.sty package doesn't like Sweave at all. And of
 course, it would simply reuse the R chunks if it did work, without
 changing things between loops. For the same reason, I cannot define a
 \newcommand{\loopcontent}{...} with the entire loop contents and then
 simply write \loopcontent \loopcontent ... or \input or \include the
 loop content from an external file.
 
 Of course it would be possible to not use Sweave and just use the
 output from the R console, but there are a couple of figures I would
 really like to see close to the relevant portions of the calculations.
 
 I also thought about putting the entire loop in *one* R chunk, but then
 I see no way to include LaTeX chunks *within* this R chunk. I can't
 just sink() to the .tex file in the middle of the R chunk (as the
 sink() gets appended to the .tex file only after Sweave is done with
 it).
 
 I have read the Sweave manual and FAQs and the R/R Windows FAQ, I did
 both RSiteSearches and RSeek searches for all combinations of Sweave
 and loop, for, while I could think of.
 
 For what it's worth, here's my sessionInfo():
 
 R version 2.7.0 (2008-04-22)
 i386-pc-mingw32
 
 locale:
 LC_COLLATE=German_Germany.1252;LC_CTYPE=German_Germany.1252;LC_MONETARY
 =German_Germany.1252;LC_NUMERIC=C;LC_TIME=German_Germany.1252
 
 attached base packages:
 [1] stats graphics  grDevices utils datasets  tcltk methods
 base
 
 other attached packages:
 [1] svIDE_0.9-5
 
 loaded via a namespace (and not attached):
 [1] svMisc_0.9-5
 
 --
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-
 guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Looping, Control Flow Conditional Statements

2008-06-13 Thread Garth.Warren
Dear R Group:

 

I have little experience using R and even less experience with control
flow type questions.

 

See the following code:

 

a1 = c(0, 1, 1, 1,

0, 0, 0, 0, 0,

0, 0, 1,

1, 1, 1, 0, 0)  

 

for(i in 1:1){

sx - paste(a,i,sep=)

s - eval(parse(text = paste(a,i,sep=)))

{g = numeric(length(s))

 k = numeric(length(s))

{for (i in 1:length(s))

{for (j in 1:length(s))

ifelse(((j=i)1),(g[j] = s[j] + s[i]),(k[j] = s[j] + s[i]))

}}

h1 - hist(g,freq=TRUE)

h - h1$counts[4]

cat(sx,:, h,\n,file = C:/temp/test-beta.txt, append=TRUE)

}}

 

 

The output is:

 g

 [1] 0 2 2 2 0 0 0 0 0 0 0 2 2 2 2 0 0

 k

 [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

 

 h

[1] 7

 

 a text file, which has:

a1 : 7

 

k is a by-product of the ifelse statement and is of no interest  g and
h only go part-way to answering my question, which is:

 

For every time an object i.e. a1 (which is actually a time series) - 0 1
1 1 0 0 0 0 0 0 0 1 1 1 1 0 0   has as value over 0 how long do the
values stay above 0. So in this case a1 has two goups or events where
the value is above zero, the first event lasts for 3 'days' and the
second event lasts for 4 'days'. I have my code telling me that there
was a total of 7 'days' in event or above 0, but what I need to know is
that there were two 'events' and the 1st lasted 3 'days' and the 2nd
lasted '4' days. Essentially I want a text file output to say:

 
a1.1 : 3

 
a1.2 : 4

 

My thinking is that I need to somehow get the code working through each
vector one value at a time and when a value is found to meet the critera
of  0  R creates a new vector; to use the above example it would come
to the first value 0 and then create the new vector a1.1 = (1,1,1) then
as the next value in the series is 0 it would close this new vector
'a1.1'. It would then continue until it reaches the next value 0 and
then create the vector a1.2 = (1,1,1,1) then again as the next value in
the series is 0 it would close this new vector, and so on. 

 

Then all I need to do is perform a count of '1's in these new vectors to
find how many days they met this criteria of being greater than 0

 

I hope the above makes sense and I really hope there is someone willing
and able to help. I don't know how to proceed.

 

Thanks,

Garth 

 

 

 

 

 

 


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] parsing - input buffer overflow

2008-06-13 Thread Prof Brian Ripley

On Fri, 13 Jun 2008, Daniel Malter wrote:


Hi,

I am trying to parse a large amount of text using gregexpr(). Unfortunately,
I get an input buffer overflow message when I attempt that with too large
an amount of text. The error messages occurs before the parsing. The problem
is that I cannot assign the text to a variable (an object) if the text is
too large.


R does have limits on the command line length (1024 bytes up to R-devel, 
4096 bytes there).  What happens if you exceed that depends on the 
interface you are using (and you have not told us).  Beyond that, the 
parser has a limit of MAXELTSIZE (8192 bytes) on strings.


I don't see any need for 'improvement' though: why are you entering very 
long strings as part of the R program?  They are data, and e.g. 
readLines() and scan() have no limits on string length beyond those 
imposed by R's internals (2^31-1 bytes).



This problem has been mentioned before, which I found using the RSiteSearch.
However, the post is from 2006, and I thought it might have improved by now.
Is there any way to increase the limit or to get around this problem?

x=Saint Lucia, Saint Kitts and Nevis, Saint Helena, Clipperton Island,
Tristan da Cunha


I presume that is not an example?  It looks like a character vector which 
has been collapsed by paste(x, , ) and would be better strsplit() into 
its components than using gregexpr.



#What I want to achieve is to parse the text for the number of occurrences
of a certain character string within the text.

#This is done using:

n=100 #choose n large enough
length(which(is.na(gregexpr(Saint,x,ignore.case=TRUE)[[1]][1:n])==FALSE))

But again, if the text is large, I cannot assign it to x. I'd be grateful
for any suggestions.

Cheers,
Daniel


-
cuncta stricte discussurus

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to increase the for() loop speed?

2008-06-13 Thread Karl Ove Hufthammer
Rafael Barros de Rezende:

 I would like to know if there is a way to increase the for() loop speed
 because in my routine the calculations are too slow.

Read the article 'How Can I Avoid This Loop or Make It Faster?' on page 46
in the latest R News http://cran.r-project.org/doc/Rnews/Rnews_2008-1.pdf;.

-- 
Karl Ove Hufthammer

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Output of silhouette (cluster package)

2008-06-13 Thread Cristiano Varin
Dear R users,
I am mailing you about the graphical output of silhouette (cluster  
package)

 From the example of silhouette in help(silhouette):

  ar - agnes(ruspini)
  si3 - silhouette(cutree(ar, k = 5), # k = 4 gave the same as pam()  
above
+daisy(ruspini))
  plot(si3, nmax = 80, cex.names = 0.5)

from which one may conclude that group 1 is composed by units from 1  
to 20, group 2 by units from 21 to 43, group 3 by units from 44 to 57,  
group 4 by units from 58 to 60 and, finally, group 5 by units from 61  
to 75.

However, this seems to be in contrast with the output of silhouette  
where the fourth group is composed by units from 46 to 48 instead of  
units from 58 to 60 (belonging to the third cluster), see
  si3
   cluster neighbor   sil_width
  [1,]   15 0.679838078
  [2,]   15 0.745615002
  [3,]   15 0.758796123
  [4,]   14 0.715554768
  [5,]   15 0.664657114
  [6,]   14 0.783993831
  [7,]   12 0.590057470
  [8,]   14 0.747969458
  [9,]   15 0.792304760
[10,]   14 0.803547635
[11,]   14 0.742402051
[12,]   14 0.722302731
[13,]   14 0.665412622
[14,]   15 0.756910666
[15,]   15 0.700685403
[16,]   15 0.743601834
[17,]   15 0.614854124
[18,]   15 0.708007860
[19,]   15 0.700093839
[20,]   14 0.568989067
[21,]   24 0.751866935
[22,]   24 0.790783667
[23,]   24 0.802659788
[24,]   24 0.785895823
[25,]   24 0.822943473
[26,]   24 0.831313347
[27,]   24 0.818043337
[28,]   24 0.805454305
[29,]   24 0.770547118
[30,]   24 0.768289979
[31,]   23 0.794485567
[32,]   24 0.829925955
[33,]   24 0.807379640
[34,]   24 0.790626589
[35,]   24 0.817427927
[36,]   23 0.793572412
[37,]   24 0.760561408
[38,]   24 0.743170109
[39,]   23 0.761413953
[40,]   23 0.704193051
[41,]   24 0.297007126
[42,]   24 0.522049838
[43,]   23 0.488556828
[44,]   34 0.377632488
[45,]   34 0.007214464
[46,]   43 0.699407534
[47,]   43 0.837451212
[48,]   43 0.794349431
[49,]   34 0.632862996
[50,]   34 0.586149139
[51,]   34 0.647326133
[52,]   34 0.650020368
[53,]   34 0.629131005
[54,]   34 0.618843633
[55,]   34 0.586439350
[56,]   34 0.586788051
[57,]   34 0.668108812
[58,]   34 0.650074540
[59,]   34 0.628444500
[60,]   34 0.591393005
[61,]   51 0.770110294
[62,]   51 0.815309198
[63,]   54 0.771622667
[64,]   51 0.806125429
[65,]   51 0.850310507
[66,]   51 0.822984066
[67,]   51 0.852743923
[68,]   51 0.762055943
[69,]   51 0.839180986
[70,]   51 0.854894699
[71,]   51 0.838106473
[72,]   51 0.774812117
[73,]   51 0.795021304
[74,]   51 0.759681469
[75,]   51 0.742553847
attr(,Ordered)
[1] FALSE
attr(,call)
silhouette.default(x = cutree(ar, k = 5), dist = daisy(ruspini))
attr(,class)
[1] silhouette

Thanks for your attention,
Cristiano
-
Cristiano Varin
[EMAIL PROTECTED]
http://www.dst.unive.it/~sammy/









[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] model simplification using Crawley as a guide

2008-06-13 Thread Jim Lemon

Peter Dalgaard wrote:

...
That'll be anti-hist()-amine, I presume?


I would think p-necillin a more appropriate treatment.

Jim

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] uncertainty bounds for a weighted moving average

2008-06-13 Thread jgarcia
Hi,
well; this is not a R-specific question. But perhaps you can help.
If I've got an irregularly sampled time series, and conduct a moving
average filter (e.g., with a triangular kernel), how could the uncertainty
bounds be calculated?

Thanks and best regards

J.
---

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Writing a new link for a GLM.

2008-06-13 Thread Jan Graffelman

Hi,

I wish to write a new link function for a GLM. R's glm routine does
not supply the loglog link. I modified the make.link function adding
the code:

}, loglog = {
linkfun - function(mu) -log(-log(mu))
linkinv - function(eta) exp(-exp(-eta))
mu.eta - function(eta) exp(-exp(-eta)-eta)
valideta - function(eta) all(eta != 0)
}, stop(sQuote(link),  link not recognised))
structure(list(linkfun = linkfun, linkinv = linkinv, mu.eta = mu.eta,
valideta = valideta, name = link), class = link-glm)
}


and then call glm with argument

glm(y~x1+x2+x3,family=binomial(link=make.link(loglog)),data=X)

and that seems to work.

Is this the way to include a new link function? Any other suggestions?

Jan.

--

|Jan Graffelman  |tel:   +34-93-4011739|
|Dpt. of Statistics  Operations Research|fax:   +34-93-4016575|
|Universitat Politecnica de Catalunya|email: [EMAIL PROTECTED]|
|Av. Diagonal 647, 6th floor |www: |
|08028 Barcelona, Spain  |  http://www-eio.upc.es/~jan/|

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] adding custom axis to image.plot() and strange clipping behavior

2008-06-13 Thread Katharine Mullen
I also noticed that adding a custom axis with image.plot was a problem;
you can also do:

library(fields)

m - matrix(1:15,ncol=3)

par(mar=c(5,5,5,7))

image(m, axes=FALSE)

# add axis
axis(1,axTicks(1),lab=letters[1:length(axTicks(1))])
box()

## add legend
image.plot(m, legend.only=TRUE)

On Fri, 13 Jun 2008, Stephen Tucker wrote:

 Hi list,

 I wanted to plot an image with a colorbar to the right of the plot, but
 set my own axis labels (text rather than numbers) to the image. I have
 previously accomplished this with two calls to image(), but the package
 'fields' has a wrapper function, image.plot(), which does this task
 conveniently.

 However, I could not add axes to the original image after a call to
 image.plot(); I have found that I needed to set par(xpd=TRUE) within the
 function to allow this to happen:

 ###=== begin code
 library(fields)

 ## make data matrix
 m - matrix(1:15,ncol=3)

 ## plot
 image.plot(m,axes=FALSE)
 axis(1) # doesn't work

 par(xpd=TRUE)
 axis(1) # still doesn't work

 ## replace the 28th element of the body of image.plot()
 ## and assign to new function called 'imp'
 ## here I just use the second condition of 'if' statement
 ## and set 'xpd = TRUE'
 imp - `body-`(image.plot,value=`[[-`(body(image.plot),28,
 quote({par(big.par)
   par(plt = big.par$plt, xpd = TRUE)
   par(mfg = mfg.save, new = FALSE)
   invisible()})))
 imp(m,axes=FALSE)
 box()
 axis(1,axTicks(1),lab=letters[1:length(axTicks(1))])
 ## clip to plotting region for additional
 ## graphical elements to be added:
 par(xpd=FALSE)
 abline(v=0.5)
 ###=== end code

 I wonder if anyone has any insights into this behavior? Since in the axis() 
 documentation, it says:
 Note that xpd is not accepted as clipping is always to the device region
 I am surprised to find (1) that the par(xpd=TRUE) works in the case above, 
 and (2) that it must be called before the function call is terminated.

 I wonder if anyone has any insights into this behavior. I have reproduced 
 this on both my Linux box (Ubuntu Gutsy Gibbon 64-bit, R 2.7.0, fields 
 package version 4.1) and Windows machine (32-bit XP Pro, R 2.7.0, fields 
 package version 4.1).

 Thanks very much,

 Stephen

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Sweave: looping over mixed R/LaTeX code

2008-06-13 Thread Dieter Menne
Stephan Kolassa Stephan.Kolassa at gmx.de writes:

 I would like to loop over a medium amount of Sweave code, including both R and
LaTeX chunks. Is there any way to
 do so? As an illustration, can I create a .tex file like this using a loop
within a .Rnw file, where the
 1,2,3 comes from some iteration variable in R?
 
 
 \documentclass{article}
 \usepackage{Sweave}
 \begin{document}
 Iteration 1
 Iteration 2
 Iteration 3
 \end{document}
 

I normally do this with a \newcommand: all latex stuff in the newcommand{},
passing parameters created by R.

Dieter

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] MCA in R

2008-06-13 Thread John Fox
Dear Kimmo,

 -Original Message-
 From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
On
 Behalf Of K. Elo
 Sent: June-13-08 1:43 AM
 To: r-help@r-project.org
 Subject: Re: [R] MCA in R
 
 Dear John,
 
 thanks for Your quick reply.
 
  John Fox wrote:
  Dear Kimmo,
 
  MCA is a rather old name (introduced, I think, in the 1960s by
  Songuist and Morgan in the OSIRIS package) for a linear model
  consisting entirely of factors and with only additive effects --
  i.e., an ANOVA model will no interactions.
 
 It is true, that MCA is an old name, but the technique itself is still
 robust, I think. The problem I am facing is that I have a research
 project where I try to find out which factors affect measured knowledge
 of a specific issue. As predictors I have formal education, interest,
 gender and consumption of different medias (TV, newspapers etc.). Now,
 these are correlated predictors and running e.g. a simple anova
 (anova(lm(...)) as You suggested) won't - if I have understood correctly
 - consider the problem of correlated predictors. MCA would do this.

That's because anova() calculates sequential (type-I) sums of squares; if
you use the Anova() function in the car package, for example, you'll get
so-called type-II sums of squares -- for each factor after the others. You
could also more tediously do these tests directly using the anova()
function, by contrasting alternative models: the full model and the model
deleting each factor in turn.

 
 A colleague of mine has run anova and MCA in SPSS and the results differ
 significantly.

Yes, see above.

  Because I am more familiar with R, I just hoped that this
 marvelous statistical package could handle MCA, too :)
 
  Typically, the results of
  an MCA are reported using adjusted means. You could compute these
  manually, or via the effects package.
 
 Well, I am interested in the eta and beta values, too. 

Aren't the eta values just the square-roots of the R^2's from the individual
one-way ANOVAs? I don't remember how the betas are defined, but do recall
that they are a peculiar attempt to define standardized partial regression
coefficients for factors that combine all of the levels.

 I have tried to
 use the effects package but my attempts with all.effects resulted in
 errors. I have to figure out what's going wrong here :)

If you tell me what you did, ideally including an example that I can
reproduce, I can probably tell you what's wrong.

Regards,
 John

 
 Kind regards,
 Kimmo Elo
 
 --
 University of Turku, Finland
 Dep. of political science
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Switching the order of legend boxes in a lattice bar graph

2008-06-13 Thread Bob Green


I suspect there is a simple solution to this problem, but have been 
unable to find it. Below is some code that I have run to create 3 
lattice graphs. I have been asked to change the legend so that the 
'No' and dark blue are above Y and light blue in the legend to 
mirror the stacked bars in the graph which feature dark blue above light blue.


I have tried changing the data as well as the order of the legend 
text, without success.  Any assistance is much appreciated,


regards

Bob Green



library(lattice)
SNFP1 - as.table(matrix(c(4,1, 4,4, 1,3, 2,7, 1,6, 0,4), ncol = 6, 
dimnames = list(group=c(Y,No), Status=c(A,B, C, D, E, F
barplot(SNFP1, beside=FALSE, legend=TRUE, ylim=c(0, 60), ylab=N of 
patients, main =district 1, col=c(light blue, dark blue))


# A,B, C, D, E, F

SNFP2 - as.table(matrix(c(3,7, 1,5, 0,1, 0,1), ncol = 4, dimnames = 
list(group=c(Y,No), Status=c(G,H, I, J
barplot(SNFP2, beside=FALSE, legend=TRUE, ylim=c(0, 60), ylab=N of 
patients, main =district 2, col=c(light blue, dark blue))


# G, H, I, J,

SNFP3 - as.table(matrix(c(3,0, 0,2, 3,4), ncol = 3, dimnames = 
list(group=c(Y,No), Status=c(K,L, M
barplot(SNFP3, beside=FALSE, legend=TRUE, ylim=c(0, 60), ylab=N of 
patients, main =district 3, col=c(light blue, dark blue))



df1 - as.data.frame(t(SNFP1))
df2 - as.data.frame(t(SNFP2))
df3 - as.data.frame(t(SNFP3))
stuff - make.groups(A=df1, B=df2, C=df3)

# simple version
barchart(Freq ~ Status | which, groups=group, data=stuff, 
stack=TRUE,scales=list(x=list(relation=free)), auto.key=TRUE)


# advanced version
barchart(Freq ~ Status | which, groups=group, data=stuff, stack=TRUE, 
as.table=TRUE, layout=c(2,2), 
skip=c(F,T,F,F),scales=list(x=list(relation=free)), ylab=patients,
main=Figure 1: X by district, 
par.settings=list(superpose.polygon=list(col=c(light blue, dark 
blue))), auto.key=list(x = .6, y = .7, corner = c(0, 0)))


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Switching the order of legend boxes in a lattice bar graph

2008-06-13 Thread Markus Gesmann
Hi Bob,

Would this:

mykey - list(
 rectangles = list(col=c(dark blue,light blue) ),
 text=list(lab=c(No,Yes)),x = .6, y = .7, corner = c(0, 0))

barchart(Freq ~ Status | which, groups=group, data=stuff, stack=TRUE, 
as.table=TRUE, layout=c(2,2), 
skip=c(F,T,F,F),scales=list(x=list(relation=free)), ylab=patients, 
main=Figure 1: X by district,  
par.settings=list(superpose.polygon=list(col=c(light blue, dark blue))), 
key=mykey)

solve your problem?

Regards,

Markus

Markus Gesmann │Associate Director│Libero Ventures Ltd, One Broadgate, London 
EC2M 2QS
tel: +44 (0)207 826 9080│ dir: +44 (0)207 826 9085│fax: +44 (0)207 826 9090 
│www.libero.uk.com

A Lehman Brothers Company

AUTHORISED AND REGULATED BY THE FINANCIAL SERVICES AUTHORITY


-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Bob Green
Sent: 13 June 2008 12:14
To: r-help@r-project.org
Subject: [R] Switching the order of legend boxes in a lattice bar graph


I suspect there is a simple solution to this problem, but have been
unable to find it. Below is some code that I have run to create 3
lattice graphs. I have been asked to change the legend so that the
'No' and dark blue are above Y and light blue in the legend to
mirror the stacked bars in the graph which feature dark blue above light blue.

I have tried changing the data as well as the order of the legend
text, without success.  Any assistance is much appreciated,

regards

Bob Green



library(lattice)
SNFP1 - as.table(matrix(c(4,1, 4,4, 1,3, 2,7, 1,6, 0,4), ncol = 6,
dimnames = list(group=c(Y,No), Status=c(A,B, C, D, E, F

barplot(SNFP1, beside=FALSE, legend=TRUE, ylim=c(0, 60), ylab=N of
patients, main =district 1, col=c(light blue, dark blue))

# A,B, C, D, E, F

SNFP2 - as.table(matrix(c(3,7, 1,5, 0,1, 0,1), ncol = 4, dimnames =
list(group=c(Y,No), Status=c(G,H, I, J

barplot(SNFP2, beside=FALSE, legend=TRUE, ylim=c(0, 60), ylab=N of
patients, main =district 2, col=c(light blue, dark blue))

# G, H, I, J,

SNFP3 - as.table(matrix(c(3,0, 0,2, 3,4), ncol = 3, dimnames =
list(group=c(Y,No), Status=c(K,L, M

barplot(SNFP3, beside=FALSE, legend=TRUE, ylim=c(0, 60), ylab=N of
patients, main =district 3, col=c(light blue, dark blue))


df1 - as.data.frame(t(SNFP1))
df2 - as.data.frame(t(SNFP2))
df3 - as.data.frame(t(SNFP3))
stuff - make.groups(A=df1, B=df2, C=df3)

# simple version
barchart(Freq ~ Status | which, groups=group, data=stuff,
stack=TRUE,scales=list(x=list(relation=free)), auto.key=TRUE)

# advanced version
barchart(Freq ~ Status | which, groups=group, data=stuff, stack=TRUE,
as.table=TRUE, layout=c(2,2),
skip=c(F,T,F,F),scales=list(x=list(relation=free)), ylab=patients,
main=Figure 1: X by district,
par.settings=list(superpose.polygon=list(col=c(light blue, dark
blue))), auto.key=list(x = .6, y = .7, corner = c(0, 0)))

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

This message is intended for the personal and confidential use for the 
designated recipient(s) named above.  If you are not the intended recipient of 
this message you are hereby notified that any review, dissemination,  
distribution or copying of this message is strictly prohibited. This 
communication is for information purposes only and should not be regarded as an 
offer to sell or as a solicitation of an offer to buy any financial product, an 
official confirmation of any transaction or as an official statement of Libero 
Ventures Ltd.  Email transmissions cannot be guaranteed to be secure or 
error-free. Therefore we do not represent that this information is complete or 
accurate and it should not be relied upon as such.  All information is subject 
to change without notice.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Writing a new link for a GLM.

2008-06-13 Thread roger koenker

I wrote an R-news note about this sort of thing in 2006, you can
navigate there via CRAN...

url:www.econ.uiuc.edu/~rogerRoger Koenker
email[EMAIL PROTECTED]Department of Economics
vox: 217-333-4558University of Illinois
fax:   217-244-6678Champaign, IL 61820


On Jun 13, 2008, at 4:54 AM, Jan Graffelman wrote:


Hi,

I wish to write a new link function for a GLM. R's glm routine does
not supply the loglog link. I modified the make.link function adding
the code:

   }, loglog = {
   linkfun - function(mu) -log(-log(mu))
   linkinv - function(eta) exp(-exp(-eta))
   mu.eta - function(eta) exp(-exp(-eta)-eta)
   valideta - function(eta) all(eta != 0)
   }, stop(sQuote(link),  link not recognised))
   structure(list(linkfun = linkfun, linkinv = linkinv, mu.eta =  
mu.eta,

   valideta = valideta, name = link), class = link-glm)
}


and then call glm with argument

glm(y~x1+x2+x3,family=binomial(link=make.link(loglog)),data=X)

and that seems to work.

Is this the way to include a new link function? Any other suggestions?

Jan.

--

|Jan Graffelman  |tel:
+34-93-4011739|
|Dpt. of Statistics  Operations Research|fax:
+34-93-4016575|
|Universitat Politecnica de Catalunya|email: [EMAIL PROTECTED] 
|
|Av. Diagonal 647, 6th floor | 
www: |
|08028 Barcelona, Spain  |  http://www-eio.upc.es/~jan/ 
|


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] R and Browninan Motion/ Langevin Equation package

2008-06-13 Thread Peter Mueller
Hi,

I'm writing a short course tutorial to Browninan Motion/ Langevin Equation.
At the end of the theory section I wanted to add a short GNU R example, so the 
students can play a little around.

I already looked in the MASS book (by Venables and Ripley) but I couldn't find 
any Brownian Motion/ Langevin Equation package.
Are there any good packages or tutorials available  which cover R and Browninan 
Motion/ Langevin Equation?

Thanks
Peter
-- 

Jetzt dabei sein: http://www.shortview.de/[EMAIL PROTECTED]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] MCA in R

2008-06-13 Thread Prof Brian Ripley
Although John Fox naturally mentions his Anova function, I would like to 
point out that drop1() (and MASS::dropterm) also does the tests of Type-II 
ANOVA of which John says 'more tediously do these tests directly'.


It seems a lot easier to teach newcomers about drop1() than to introduce 
the SAS terminology and then say (to quote ?Anova)


  'the definitions used here do not correspond precisely to those
   employed by SAS'

(I would welcome a description of the precise differences on the Anova 
help page.)



On Fri, 13 Jun 2008, John Fox wrote:


Dear Kimmo,


-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]

On

Behalf Of K. Elo
Sent: June-13-08 1:43 AM
To: r-help@r-project.org
Subject: Re: [R] MCA in R

Dear John,

thanks for Your quick reply.


John Fox wrote:
Dear Kimmo,

MCA is a rather old name (introduced, I think, in the 1960s by
Songuist and Morgan in the OSIRIS package) for a linear model
consisting entirely of factors and with only additive effects --
i.e., an ANOVA model will no interactions.


It is true, that MCA is an old name, but the technique itself is still
robust, I think. The problem I am facing is that I have a research
project where I try to find out which factors affect measured knowledge
of a specific issue. As predictors I have formal education, interest,
gender and consumption of different medias (TV, newspapers etc.). Now,
these are correlated predictors and running e.g. a simple anova
(anova(lm(...)) as You suggested) won't - if I have understood correctly
- consider the problem of correlated predictors. MCA would do this.


That's because anova() calculates sequential (type-I) sums of squares; if
you use the Anova() function in the car package, for example, you'll get
so-called type-II sums of squares -- for each factor after the others. You
could also more tediously do these tests directly using the anova()
function, by contrasting alternative models: the full model and the model
deleting each factor in turn.



A colleague of mine has run anova and MCA in SPSS and the results differ
significantly.


Yes, see above.


 Because I am more familiar with R, I just hoped that this
marvelous statistical package could handle MCA, too :)


Typically, the results of
an MCA are reported using adjusted means. You could compute these
manually, or via the effects package.


Well, I am interested in the eta and beta values, too.


Aren't the eta values just the square-roots of the R^2's from the individual
one-way ANOVAs? I don't remember how the betas are defined, but do recall
that they are a peculiar attempt to define standardized partial regression
coefficients for factors that combine all of the levels.


I have tried to
use the effects package but my attempts with all.effects resulted in
errors. I have to figure out what's going wrong here :)


If you tell me what you did, ideally including an example that I can
reproduce, I can probably tell you what's wrong.

Regards,
John



Kind regards,
Kimmo Elo

--
University of Turku, Finland
Dep. of political science

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide

http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] MCA in R

2008-06-13 Thread John Fox
Dear Brian,

 -Original Message-
 From: Prof Brian Ripley [mailto:[EMAIL PROTECTED]
 Sent: June-13-08 8:13 AM
 To: John Fox
 Cc: 'K. Elo'; r-help@r-project.org
 Subject: Re: [R] MCA in R
 
 Although John Fox naturally mentions his Anova function, I would like to
 point out that drop1() (and MASS::dropterm) also does the tests of Type-II
 ANOVA of which John says 'more tediously do these tests directly'.

It's true that for an additive model (such as Kimmo's), drop1() and Anova()
produce the same sums of squares, but for a model in which some terms are
marginal to others, drop1() produces tests only for the high-order terms.
One could specify scope = ~ . to drop1(), but that produces so-called
type-III tests. Perhaps there's some convenient way around this of which
I'm unaware.

 
 It seems a lot easier to teach newcomers about drop1() than to introduce
 the SAS terminology and then say (to quote ?Anova)
 
'the definitions used here do not correspond precisely to those
 employed by SAS'
 
 (I would welcome a description of the precise differences on the Anova
 help page.)

As I recall, the differences are for type-III tests, where in Anova()
these are dependent upon contrast coding.

Regards,
 John

 
 
 On Fri, 13 Jun 2008, John Fox wrote:
 
  Dear Kimmo,
 
  -Original Message-
  From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED]
  On
  Behalf Of K. Elo
  Sent: June-13-08 1:43 AM
  To: r-help@r-project.org
  Subject: Re: [R] MCA in R
 
  Dear John,
 
  thanks for Your quick reply.
 
  John Fox wrote:
  Dear Kimmo,
 
  MCA is a rather old name (introduced, I think, in the 1960s by
  Songuist and Morgan in the OSIRIS package) for a linear model
  consisting entirely of factors and with only additive effects --
  i.e., an ANOVA model will no interactions.
 
  It is true, that MCA is an old name, but the technique itself is still
  robust, I think. The problem I am facing is that I have a research
  project where I try to find out which factors affect measured knowledge
  of a specific issue. As predictors I have formal education, interest,
  gender and consumption of different medias (TV, newspapers etc.). Now,
  these are correlated predictors and running e.g. a simple anova
  (anova(lm(...)) as You suggested) won't - if I have understood
correctly
  - consider the problem of correlated predictors. MCA would do this.
 
  That's because anova() calculates sequential (type-I) sums of squares;
if
  you use the Anova() function in the car package, for example, you'll get
  so-called type-II sums of squares -- for each factor after the others.
You
  could also more tediously do these tests directly using the anova()
  function, by contrasting alternative models: the full model and the
model
  deleting each factor in turn.
 
 
  A colleague of mine has run anova and MCA in SPSS and the results
differ
  significantly.
 
  Yes, see above.
 
   Because I am more familiar with R, I just hoped that this
  marvelous statistical package could handle MCA, too :)
 
  Typically, the results of
  an MCA are reported using adjusted means. You could compute these
  manually, or via the effects package.
 
  Well, I am interested in the eta and beta values, too.
 
  Aren't the eta values just the square-roots of the R^2's from the
 individual
  one-way ANOVAs? I don't remember how the betas are defined, but do
recall
  that they are a peculiar attempt to define standardized partial
regression
  coefficients for factors that combine all of the levels.
 
  I have tried to
  use the effects package but my attempts with all.effects resulted in
  errors. I have to figure out what's going wrong here :)
 
  If you tell me what you did, ideally including an example that I can
  reproduce, I can probably tell you what's wrong.
 
  Regards,
  John
 
 
  Kind regards,
  Kimmo Elo
 
  --
  University of Turku, Finland
  Dep. of political science
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide http://www.R-project.org/posting-
 guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 
 --
 Brian D. Ripley,  [EMAIL PROTECTED]
 Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
 University of Oxford, Tel:  +44 1865 272861 (self)
 1 South Parks Road, +44 1865 272866 (PA)
 Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and 

Re: [R] Problems with mars in R in the case of nonlinear functions

2008-06-13 Thread Stephen Milborrow
| I'm trying to use mars function in R to interpolate nonlinear
| multivariate functions.
| However, it seems that mars gives me a fit which uses only very few
| basis function and it underfits very badly.

Try the earth package which extends the mars function in the mda package.

Your example becomes

library(earth) # was mda
f - function(x,y) { x^2-y^2 }
x - seq(-1,1,length=10)
x - outer(x*0,x,FUN=+)
y - t(x)
X - cbind(as.vector(x),as.vector(y))
z - f(x,y)
fit - earth(X, as.vector(z))
summary(fit)
plotmo(fit) # note better fit than before
# your original plotting code could be used too

For this kind of data, you could possibly use the minspan parameter.  MARS
by default does not allow every observation to be used as a knot in the
generated basis functions. This strategyy increases resistance to runs of
correlated noise in the data.  For non-noisy data, you can set minspan=1 to
allow MARS to consider
every observation as a potential knot.  If your data were noisy then
minspan=1 could overfit the data.  With earth, you can use trace=2 to see
the calculated minspan value.

If you run the above example with the earth parameter trace=1, you will see
that the stopping condition for the forward pass is:

Reached delta RSq threshold (DeltaRSq 0.00030214  0.001)

To make the forward pass continue further, change the delta RSq threshold
by using the thresh parameter:

fit - earth(X, as.vector(z), thresh=1e-6)

The resulting model looks better when plotted, but note that using thresh
here makes almost no change to the GRSq.  That is, with the lower threshold
the model is more complicated (has more terms) but does not have a greater
predictive power.  The threshold is just one of the reasons that the forward
pass can terminate (reaching the the maximum number of terms nk is another).
AFAIK Friedman's code (that you ran from Matlab) does not use the threshold
but instead just continues forward stepping until nk is reached.  In this
case the Matlab model is arguably more complicated than it need be.  I
believe the forward threshhold for MARS was an innovation of Hastie and
Tibshirani, but I could be wrong.

To reduce mailing list traffic, let's continue this discussion off-line i.e.
by direct mail to each other, and if necessary I will summarize results of
our discussions in the earth documentation.

Regards
Steve

| Message: 76
| Date: Thu, 12 Jun 2008 13:35:35 -0700
| From: Janne Huttunen [EMAIL PROTECTED]
| Subject: [R] Problems with mars in R in the case of nonlinear
| functions
| To:
| Message-ID: [EMAIL PROTECTED]
| Content-Type: text/plain; charset=ISO-8859-1; format=flowed
|
| Hi,
|
| I'm trying to use mars function in R to interpolate nonlinear
| multivariate functions.
| However, it seems that mars gives me a fit which uses only very few
| basis function and
| it underfits very badly.
|
| For example, I have tried the following code to test mars:
|
| require(mda)
|
| f - function(x,y) { x^2-y^2 };
| #f - function(x,y) { x+2*y };
|
| # Grid
| x - seq(-1,1,length=10);
| x - outer(x*0,x,FUN=+); y - t(x);
| X - cbind(as.vector(x),as.vector(y));
|
| # Data
| z - f(x,y);
|
| fit - mars(X,as.vector(z),nk=200,penalty=2,thresh=1e-3,degree=2);
|
| # Plotting
| par(mfrow=c(1,2),pty=s)
| lims - c(min(c(min(z),min(fit$fitted))),max(c(max(z),max(fit$fitted
| persp(z=z,ticktype='detailed',col='lightblue',shade=.75,ltheta=50,
|   xlab='x',ylab='y',zlab='z',main='true',phi=25,theta=55,zlim=lims)
|
persp(z=matrix(fit$fitted.values,nrow=nrow(x),byrow=F),ticktype='detailed',
|col='lightblue',
| xlab='x',ylab='y',zlab='z',shade=.75,ltheta=50,main='MARS',
|phi=25,theta=55,zlim=lims)
|
| (the code is also here if someone wants to try it:
| http://venda.uku.fi/~jmhuttun/R/marstest.R)
|
| The results are here: http://venda.uku.fi/~jmhuttun/R/R-10.pdf . The
| fitted model contains only
| 5 terms which is not enough in this case. Adjusting parameters like nk,
| thresh, penalty and degree
| seems only have minor effect or no effect at all. It's also strange that
| when I increase
| the number of points in the grid, the results are ever worse:
| see e.g. http://venda.uku.fi/~jmhuttun/R/R-20.pdf for a 20x20 grid.
| However Mars seems to work well with linear functions (e.g. with the
| function which
| is commented in the above code).
|
| Do anyone know what is wrong in this case? Do I miss something is there
| something
| wrong in my code?
|
| This seems not to be a problem with MARS method in general. For example,
| Friedman's MARS implementation (ran in Matlab) gives a rather good fit:
| see http://venda.uku.fi/~jmhuttun/R/Matlab.pdf .
|
| Thank you
|
| Janne
|
| -- 
| Janne Huttunen
| University of California
| Department of Statistics
| 367 Evans Hall Berlekey, CA 94720-3860
| email: [EMAIL PROTECTED]
| phone: +1-510-502-5205
| office room: 449 Evans Hall

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting 

Re: [R] R and Browninan Motion/ Langevin Equation package

2008-06-13 Thread jim holtman
Google R and Browninan Motion.. It turned up this link:
http://landshape.org/enm/r-code-for-brownian-motion/

Mybe this will help.

On Fri, Jun 13, 2008 at 8:08 AM, Peter Mueller [EMAIL PROTECTED] wrote:
 Hi,

 I'm writing a short course tutorial to Browninan Motion/ Langevin Equation.
 At the end of the theory section I wanted to add a short GNU R example, so 
 the students can play a little around.

 I already looked in the MASS book (by Venables and Ripley) but I couldn't 
 find any Brownian Motion/ Langevin Equation package.
 Are there any good packages or tutorials available  which cover R and 
 Browninan Motion/ Langevin Equation?

 Thanks
 Peter
 --

 Jetzt dabei sein: http://www.shortview.de/[EMAIL PROTECTED]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] piper diagram

2008-06-13 Thread stephen sefick
RSEIS -  I think may have a piper diagram.

On Thu, Jun 12, 2008 at 8:39 PM, Michael Grant [EMAIL PROTECTED] wrote:

 Sorry no previous message text or addresses, but I just cleaned my mailbox
 and then found something relevant. Regarding the Piper diagram. I just
 noticed the 'hydrogeo' package on CRAN, courtesy of one Myles English. That
 should be what you need or close to it.



 Best regards,

 Michael Grant


[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Let's not spend our time and resources thinking about things that are so
little or so large that all they really do for us is puff us up and make us
feel like gods. We are mammals, and have not exhausted the annoying little
problems of being mammals.

-K. Mullis

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] C# and R

2008-06-13 Thread Neil Gupta
Hello R-Users,

I came across this link on CodeProject.com and was wondering, if anyone has
implemented this and the benefits of doing so.
This may also be of some help for others. Here is a link to the project:
http://www.codeproject.com/KB/cs/RtoCSharp.aspx

Regards,

Neil Gupta

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] C# and R

2008-06-13 Thread Prof Brian Ripley
This is about Windows, C# and R-(D)COM.  The latter has its own list which 
would be much more appropriate.  See http://sunsite.univie.ac.at/rcom/

(Linked from CRAN-Software-Other.)

On Fri, 13 Jun 2008, Neil Gupta wrote:


Hello R-Users,

I came across this link on CodeProject.com and was wondering, if anyone has
implemented this and the benefits of doing so.
This may also be of some help for others. Here is a link to the project:
http://www.codeproject.com/KB/cs/RtoCSharp.aspx

Regards,

Neil Gupta

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Problem with Freq function {prettyR}

2008-06-13 Thread ukoenig
Does someone have an idea?
Thanks a lot!

Udo


Quoting Udo [EMAIL PROTECTED]:

 Dear list,
 I have a problem with freq from prettyR.

 Please have a look at my syntax with a litte example:


 library(prettyR)

 #Version 1
 test.df-data.frame(q1=sample(1:4,8,TRUE), gender=sample(c(f,m),8,TRUE))
 test.df
 freq(test.df) #No error message

 #Version 2
 test.df-data.frame(gender=sample(c(f,m),8,TRUE), q1=sample(1:4,8,TRUE))
 test.df
 freq(test.df)

 Error message: Error in vector(integer, length) : Vector size can´t be NA

 Can someone tell me, why an error message occurs in version two? I am
 helpless...

 Thanks in advance!

 Udo K ö n i g

 

 Clinic for Child an Adolescent Psychiatry
 Philipps University of Marburg / Germany


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Access violation when calling Front41

2008-06-13 Thread Siyi FENG
  Hello! When I tried to call Front41 in R, I met some problem. After I
entered: system ('front41.exe'),  an error occured :

jwe0019i-u The program was terminated abnormally with Exception Code
EXCEPTION_ACCESS_VIOLATION.
error summary (Fortran)
error number  error level  error count
  jwe0019i u   1
total error count = 1

FRONTIER - Version 4.1c
***

How can i deal with it?




-- 
Siyi FENG
Department of Agricultural Economics
Texas AM University, 2124 TAMU
College Station, TX 77843-2124
[EMAIL PROTECTED]

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] subsetting data-frame by vector of characters

2008-06-13 Thread james perkins

Hi,

I have a very simple problem but I can't think how to solve it without 
using a for loop and creating a large logical vector. However given the 
nature of the problem I am sure there is a 1-liner that could do the 
same thing much more efficiently.


bascially I have a dataframe with characters in, eg

names.and.numbers

(index)NameFave.Number
1John7
2Tony12
3Phil14
4Adam22
5Robert23


Now, imagine I have a vector of names, ie:

names = c(John,Phil,Robert)

All I want to do is get the subset of the dataframe which corresponds to 
the names in the vector Names. IE


(index)NameFave.Number
1John7
2Phil14
3Robert23

Sorry, I know its trivial but I'm new to R and its hard to start 
thinking in R, as I say, I've written a complicated for loop using 
intersect and creating a logical table, but this is very long winded!!!


Regards,

Jim

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Maximum likelihood estimation in R with censored Data

2008-06-13 Thread Bluder Olivia
Hello,

 

I'm trying to calculate the Maximum likelihood estimators for a dataset
which contains censored data.

 

I started by using the function nlm, but isn't there a separate method
for doing this for e.g. the weibull and the log-normal distribution?

 

Thanks,

Olivia

 


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Package Installation produces linux/limits.h: No such file or directory error when installing the lpSolve package

2008-06-13 Thread Basileis

Hi,
   I too had this same problem but it got resolved by installing two
packages :

1. kernel-headers
2. kernel-devel

I hope this helps in your case.

Regards
Sharwan

Joe_K wrote:
 
 Dear Friends,
 
 I am trying to install a few packages in R and am receiving error
 messages.  Since the error messages are different, I am posting them
 separately.  The second error is with the installation of lpSolve.
 
 The core error message is:
 
 In file included from /usr/include/bits/posix1_lim.h:153,
  from /usr/include/limits.h:145,
  from
 /usr/lib64/gcc/x86_64-suse-linux/4.2.1/include/limits.h:122,
  from
 /usr/lib64/gcc/x86_64-suse-linux/4.2.1/include/syslimits.h:7,
  from
 /usr/lib64/gcc/x86_64-suse-linux/4.2.1/include/limits.h:11,
  from colamd.c:677:
 /usr/include/bits/local_lim.h:36:26
  error: linux/limits.h: No such file or directory
 make: *** [colamd.o] Error 1
 ERROR: compilation failed for package 'lpSolve'
 
 
 The first things that I tried was to figure out where linux/limits.h was. 
 I discovered that there are seven versions of limits.h on the system and
 they are not identical.
 
 /usr/include/limits.h
 /usr/src/linux-2.6.22.13-0.3/Documentation/i2c/chips/limits.h
 /usr/include/c++/4.2.1/tr1/limits.h
 /usr/lib64/qt4/demos/qtdemo/xml/limits.h
 /usr/src/linux-2.6.22.13-0.3/include/linux/limits.h
 /usr/src/linux-2.6.22.13-0.3/include/asm-arm/limits.h
 /usr/src/linux-2.6.22.13-0.3/include/asm-arm26/limits.h
 
 Only one has linux immediately preceding it in the path:
 /usr/src/linux-2.6.22.13-0.3/include/linux/limits.h
 
 I assume that /usr/include/bits/local_lim.h is trying to use a relative
 path.  The only line in local_lim.h with limits.h in it is:
 
 #include linux/limits.h
 
 So, I tried modifying the line to read:
 
 #include /usr/src/linux-2.6.22.13-0.3/include/linux/limits.h
 
 That did not work, so I changed it back again.  I guess my theory about it
 looking for a relative path was wrong.
 
 Since then, I have been Googling the issue all weekend and have found
 similar errors, but not exactly the same.  Some are suggesting changing
 kernel headers and other files.  Since the context of these other posts
 are dissimilar, I figured it best not to mess with kernel headers or some
 of the other radical solutions offered.
 
 There was one suggestion in a post to install glibc-headers, however, I
 cannot seem to find that for Suse 10.3.  Is it something included in
 another package?  Is it something that is now obsolete?
 
 CAN ANYONE HELP ME DEBUG THIS?
 
 I am running R version 2.6.1 (2007-11-26) on Suse Linux 10.3 64-bit x86_64
 on a Boxx Technologies computer with a TYAN Thunder K8WE S2895 Motherboard
 with 4Gb Ram and 2 dual CPUs (total of 4 CPUs).  The CPUs are AMD Opteron. 
 Hard Disk Usage is 4 150 Gb SATA drives array with a Com3 9550SX
 Controller set at RAID 5.
 
 The full error message received from Rkward upon the package installation
 attempt was:
 
 R version 2.6.1 (2007-11-26)
 Copyright (C) 2007 The R Foundation for Statistical Computing
 ISBN 3-900051-07-0
 R is free software and comes with ABSOLUTELY NO WARRANTY.
 You are welcome to redistribute it under certain conditions.
 Type 'license()' or 'licence()' for distribution details.
 
   Natural language support but running in an English locale
 
 R is a collaborative project with many contributors.
 Type 'contributors()' for more information and
 'citation()' on how to cite R or R packages in publications.
 
 Type 'demo()' for some demos, 'help()' for on-line help, or
 'help.start()' for an HTML browser interface to help.
 Type 'q()' to quit R.

 options (repos=c (CRAN=http://lib.stat.cmu.edu/R/CRAN;))
 install.packages (pkgs=c (lpSolve),
 lib=/home/joe/R/x86_64-unknown-linux-gnu-library/2.6,
 destdir=/home/joe/.rkward/package_archive, dependencies=TRUE)
 trying URL
 'http://lib.stat.cmu.edu/R/CRAN/src/contrib/lpSolve_5.5.8.tar.gz'
 Content type 'application/x-gzip' length 449804 bytes (439 Kb)
 opened URL
 
 downloaded 439 Kb
 /home/joe/R/x86_64-unknown-linux-gnu-library/2.6
 * Installing *source* package 'lpSolve' ...
 ** libs
 gcc -std=gnu99 -I/usr/lib64/R/include -I/usr/lib64/R/include -I .
 -DINTEGERTIME -DPARSER_LP -DBUILDING_FOR_R -DYY_NEVER_INTERACTIVE -DUSRDLL
 -DCLOCKTIME -DRoleIsExternalInvEngine -DINVERSE_ACTIVE=INVERSE_LUSOL
 -DINLINE=static -DParanoia -I/usr/local/include-fpic  -g -O2 -c
 colamd.c -o colamd.o
 In file included from /usr/include/bits/posix1_lim.h:153,
  from /usr/include/limits.h:145,
  from
 /usr/lib64/gcc/x86_64-suse-linux/4.2.1/include/limits.h:122,
  from
 /usr/lib64/gcc/x86_64-suse-linux/4.2.1/include/syslimits.h:7,
  from
 

Re: [R] Problem with Freq function {prettyR}

2008-06-13 Thread James W. MacDonald
Since this is a contributed package, you should be contacting the 
maintainer (as mentioned in the posting guide).


Anyway, the problem occurs because in the second case you have a factor 
in the first column and numeric in the second. This part of the code 
will illustrate what I mean:


for (i in 1:nfreq) {
if (display.na)
nna - sum(is.na(x[[i]]))
else nna - 0
xt - na.omit(x[[i]])
if (is.null(levels))
levels - unique(xt)
if (is.numeric(x[[i]]))
xt - factor(xt, levels = levels)

So the first time through this loop the levels variable is set to 
c(m,f). On the second time levels is no longer NULL, so when the xt 
variable is created it is essentially this:


xt - factor(xt, levels = c(m,f))

and since xt contains only numbers you get

[1] NA NA NA NA NA NA NA NA
Levels: m f

Best,

Jim



[EMAIL PROTECTED] wrote:

Does someone have an idea?
Thanks a lot!

Udo


Quoting Udo [EMAIL PROTECTED]:


Dear list,
I have a problem with freq from prettyR.

Please have a look at my syntax with a litte example:


library(prettyR)

#Version 1
test.df-data.frame(q1=sample(1:4,8,TRUE), gender=sample(c(f,m),8,TRUE))
test.df
freq(test.df) #No error message

#Version 2
test.df-data.frame(gender=sample(c(f,m),8,TRUE), q1=sample(1:4,8,TRUE))
test.df
freq(test.df)

Error message: Error in vector(integer, length) : Vector size can´t be NA

Can someone tell me, why an error message occurs in version two? I am
helpless...

Thanks in advance!

Udo K ö n i g



Clinic for Child an Adolescent Psychiatry
Philipps University of Marburg / Germany



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


--
James W. MacDonald, M.S.
Biostatistician
Affymetrix and cDNA Microarray Core
University of Michigan Cancer Center
1500 E. Medical Center Drive
7410 CCGC
Ann Arbor MI 48109
734-647-5623

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] subsetting data-frame by vector of characters

2008-06-13 Thread Chuck Cleland

On 6/13/2008 10:07 AM, james perkins wrote:

Hi,

I have a very simple problem but I can't think how to solve it without 
using a for loop and creating a large logical vector. However given the 
nature of the problem I am sure there is a 1-liner that could do the 
same thing much more efficiently.


bascially I have a dataframe with characters in, eg

 names.and.numbers

(index)NameFave.Number
1John7
2Tony12
3Phil14
4Adam22
5Robert23


Now, imagine I have a vector of names, ie:

 names = c(John,Phil,Robert)

All I want to do is get the subset of the dataframe which corresponds to 
the names in the vector Names. IE


(index)NameFave.Number
1John7
2Phil14
3Robert23

Sorry, I know its trivial but I'm new to R and its hard to start 
thinking in R, as I say, I've written a complicated for loop using 
intersect and creating a logical table, but this is very long winded!!!


  How about this:

subset(names.and.numbers, Name %in% mynames)

  where mynames is the vector of names you want?

?subset

?is.element


Regards,

Jim

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code. 


--
Chuck Cleland, Ph.D.
NDRI, Inc. (www.ndri.org)
71 West 23rd Street, 8th floor
New York, NY 10010
tel: (212) 845-4495 (Tu, Th)
tel: (732) 512-0171 (M, W, F)
fax: (917) 438-0894

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] histogram

2008-06-13 Thread Paul Adams
Hello everyone,
I am trying to plot a histogram from the following code:
dat-read.table(file=C:\\Documents and Settings\\Owner\\My 
Documents\\Yeast\\Yeast.txt,header=T,row.names=1)
file.show(file=C:\\Documents and Settings\\Owner\\My 
Documents\\Yeast\\Yeast.txt)
x-dat[2,23:46]
y=mean(x,trim=0,na.rm=T)
colMeans(dat[2,23:46])
boxplot(dat[2,23:46])
hist(dat[2,23:46])
The box plot is fine but the histogram keeps giving me the error that x
must be numeric.I am not sure what is wrong here with the instructions
for the histogram plot.
Any help would be appreciated
Paul


  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] subsetting data-frame by vector of characters

2008-06-13 Thread Wacek Kusnierczyk
james perkins wrote:
 Hi,

 I have a very simple problem but I can't think how to solve it without
 using a for loop and creating a large logical vector. However given
 the nature of the problem I am sure there is a 1-liner that could do
 the same thing much more efficiently.

 bascially I have a dataframe with characters in, eg

 names.and.numbers

 (index)NameFave.Number
 1John7
 2Tony12
 3Phil14
 4Adam22
 5Robert23


 Now, imagine I have a vector of names, ie:

 names = c(John,Phil,Robert)

this is a one-element vector of string(s) that are concatenated names
(strings with names).
or you mean:  names = c(John, Phil, Robert)



 All I want to do is get the subset of the dataframe which corresponds
 to the names in the vector Names. IE

 (index)NameFave.Number
 1John7
 2Phil14
 3Robert23

this should do:
names.and.numbers[names.and.numbers$Name %in% names,]

if names is as you say above, do
names.and.numbers[names.and.numbers$Name %in% strsplit(names,,), ]

you do create a logical vector here (what does 'large' mean?), but no
loop is involved at the surface.

vQ

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] histogram

2008-06-13 Thread jim holtman
It is hard to respond without reproducible examples.  Do
str(dat[2,23:46]) and see what it reports.  My guess is that one of
the columns is not numeric.  Find out which one it is, fix it and then
try 'hist' again.

On Fri, Jun 13, 2008 at 10:21 AM, Paul Adams [EMAIL PROTECTED] wrote:
 Hello everyone,
 I am trying to plot a histogram from the following code:
 dat-read.table(file=C:\\Documents and Settings\\Owner\\My 
 Documents\\Yeast\\Yeast.txt,header=T,row.names=1)
 file.show(file=C:\\Documents and Settings\\Owner\\My 
 Documents\\Yeast\\Yeast.txt)
 x-dat[2,23:46]
 y=mean(x,trim=0,na.rm=T)
 colMeans(dat[2,23:46])
 boxplot(dat[2,23:46])
 hist(dat[2,23:46])
 The box plot is fine but the histogram keeps giving me the error that x
 must be numeric.I am not sure what is wrong here with the instructions
 for the histogram plot.
 Any help would be appreciated
 Paul



[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] histogram

2008-06-13 Thread Erik Iverson



Paul Adams wrote:

Hello everyone,
I am trying to plot a histogram from the following code:
dat-read.table(file=C:\\Documents and Settings\\Owner\\My 
Documents\\Yeast\\Yeast.txt,header=T,row.names=1)
file.show(file=C:\\Documents and Settings\\Owner\\My 
Documents\\Yeast\\Yeast.txt)
x-dat[2,23:46]
y=mean(x,trim=0,na.rm=T)
colMeans(dat[2,23:46])
boxplot(dat[2,23:46])
hist(dat[2,23:46])


Check what the class of your object is

class(dat[2, 23:46])

may be a data.frame. If so, you can try to convert accordingly (see 
?as.numeric)


Erik

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] subsetting data-frame by vector of characters

2008-06-13 Thread james perkins

Thanks a lot for that. Its the %in% I needed to work out mainly

large didn't mean anything in particular, just that it gets quite long 
with the real data.

I did mean: names = c(John, Phil, Robert)

The only problem is that using the method you suggest is that I lose the 
indexing, ie in the example, instead of:


(index)NameFave.Number
1John7
2Phil14
3Robert23


I end up with


(index) Name Fave.Number
1 John 7
3 Phil 14
5 Robert 23

This isnt a problem at the moment but I guess it could be if I used the 
table later in loops. Is there an easy way to re-index the table?


Kind regards

Jim

Wacek Kusnierczyk wrote:

james perkins wrote:
  

Hi,

I have a very simple problem but I can't think how to solve it without
using a for loop and creating a large logical vector. However given
the nature of the problem I am sure there is a 1-liner that could do
the same thing much more efficiently.

bascially I have a dataframe with characters in, eg



names.and.numbers
  

(index)NameFave.Number
1John7
2Tony12
3Phil14
4Adam22
5Robert23


Now, imagine I have a vector of names, ie:



names = c(John,Phil,Robert)
  


this is a one-element vector of string(s) that are concatenated names
(strings with names).
or you mean:  names = c(John, Phil, Robert)


  

All I want to do is get the subset of the dataframe which corresponds
to the names in the vector Names. IE

(index)NameFave.Number
1John7
2Phil14
3Robert23



this should do:
names.and.numbers[names.and.numbers$Name %in% names,]

if names is as you say above, do
names.and.numbers[names.and.numbers$Name %in% strsplit(names,,), ]

you do create a logical vector here (what does 'large' mean?), but no
loop is involved at the surface.

vQ

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] histogram

2008-06-13 Thread Lars Fischer
Hi, 

please someone correct me, but
On 13/06/2008, 07:21, [EMAIL PROTECTED] wrote:
 dat-read.table(file=C:\\Documents and Settings\\Owner\\My 
 Documents\\Yeast\\Yeast.txt,header=T,row.names=1)

Check mode and class of dat. read.table provided you with a dataframe
of, essentially, string data. You have to apply as.numeric where it
fits.


 x-dat[2,23:46]
 ^
most probably here.

Regards
Lars

p.s. Your code is awfully to read, please add some spaces where
appropriate.


-- 
Lars Fischertel: +49 (0)6151 16-2889
Technische Universität Darmstadt
Fachbereich Informatik/ FG Sicherheit in der Informationstechnik
PGP FPR: A197 CBE1 91FC 0CE3 A71D  77F2 1094 CB6E CEE3 7111

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] histogram

2008-06-13 Thread Peter Dalgaard
jim holtman wrote:
 It is hard to respond without reproducible examples.  Do
 str(dat[2,23:46]) and see what it reports.  My guess is that one of
 the columns is not numeric.  Find out which one it is, fix it and then
 try 'hist' again.
   
No, this will be wrong whatever the data are. The problem is that
dat[2,23:46] is a one-row dataframe, i.e. a list, which is not a numeric
vector. Possibly
hist(unlist(dat[2,23:46])) is what is wanted. I don't think the boxplot
is fine either, except in the sense that it does not give an error
(try boxplot(airquality[2,])).

 On Fri, Jun 13, 2008 at 10:21 AM, Paul Adams [EMAIL PROTECTED] wrote:
   
 Hello everyone,
 I am trying to plot a histogram from the following code:
 dat-read.table(file=C:\\Documents and Settings\\Owner\\My 
 Documents\\Yeast\\Yeast.txt,header=T,row.names=1)
 file.show(file=C:\\Documents and Settings\\Owner\\My 
 Documents\\Yeast\\Yeast.txt)
 x-dat[2,23:46]
 y=mean(x,trim=0,na.rm=T)
 colMeans(dat[2,23:46])
 boxplot(dat[2,23:46])
 hist(dat[2,23:46])
 The box plot is fine but the histogram keeps giving me the error that x
 must be numeric.I am not sure what is wrong here with the instructions
 for the histogram plot.
 Any help would be appreciated
 Paul



[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

 



   


-- 
   O__   Peter Dalgaard Øster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark  Ph:  (+45) 35327918
~~ - ([EMAIL PROTECTED])  FAX: (+45) 35327907

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] subsetting data-frame by vector of characters

2008-06-13 Thread Peter Dalgaard
james perkins wrote:
 Thanks a lot for that. Its the %in% I needed to work out mainly

 large didn't mean anything in particular, just that it gets quite long
 with the real data.
 I did mean: names = c(John, Phil, Robert)

 The only problem is that using the method you suggest is that I lose
 the indexing, ie in the example, instead of:

 (index)NameFave.Number
 1John7
 2Phil14
 3Robert23


 I end up with


 (index) Name Fave.Number
 1 John 7
 3 Phil 14
 5 Robert 23

 This isnt a problem at the moment but I guess it could be if I used
 the table later in loops. Is there an easy way to re-index the table?

Notice that these are names, not numbers:  result[2,1] is Phil in both
cases. If it bothers you, just set rownames(result) - NULL

(BTW, are your names unique? in that case you could set them as rownames
and use them for indexing:

rownames(names.and.numbers) - names.and.numbers$Name
names.and.numbers[names, ]

 Kind regards

 Jim

 Wacek Kusnierczyk wrote:
 james perkins wrote:
  
 Hi,

 I have a very simple problem but I can't think how to solve it without
 using a for loop and creating a large logical vector. However given
 the nature of the problem I am sure there is a 1-liner that could do
 the same thing much more efficiently.

 bascially I have a dataframe with characters in, eg


 names.and.numbers
   
 (index)NameFave.Number
 1John7
 2Tony12
 3Phil14
 4Adam22
 5Robert23


 Now, imagine I have a vector of names, ie:


 names = c(John,Phil,Robert)
   

 this is a one-element vector of string(s) that are concatenated names
 (strings with names).
 or you mean:  names = c(John, Phil, Robert)


  
 All I want to do is get the subset of the dataframe which corresponds
 to the names in the vector Names. IE

 (index)NameFave.Number
 1John7
 2Phil14
 3Robert23
 

 this should do:
 names.and.numbers[names.and.numbers$Name %in% names,]

 if names is as you say above, do
 names.and.numbers[names.and.numbers$Name %in% strsplit(names,,), ]

 you do create a logical vector here (what does 'large' mean?), but no
 loop is involved at the surface.

 vQ

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


-- 
   O__   Peter Dalgaard Øster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark  Ph:  (+45) 35327918
~~ - ([EMAIL PROTECTED])  FAX: (+45) 35327907

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] package under unix

2008-06-13 Thread cgenolin

Hi the list,

I write a package for clusterizing longitudinal data using a non 
parametric algorithm. I develop the package under windows. To be as 
user friendly as possible, the package use some graphical procedure to 
show to the user the evolution of the cluster construction, and to 
export the graph in a friendly way.


Here are some example : http://christophe.genolini.free.fr/kml

Everything works fine... under windows.

Unfortunately, it seems it does not work under linux. I first use the 
instruction:



windows(5,5,xpos=0)


which seems to be incompatible. Then I used :


if(getOption(device)==windows){windows(5,5,xpos=0)}else{}


but it is non portable either.

I do not know linux so it will be very hard for me to test and change my code.
On the other hand, I spend a lot of time to develop a graphical 
interface for exporting the result in a easy way, so it would be a pity 
to remove the code that deal with graphics.


Can someone help ?

Christophe

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] subsetting data-frame by vector of characters

2008-06-13 Thread Wacek Kusnierczyk
james perkins wrote:
 Thanks a lot for that. Its the %in% I needed to work out mainly

 large didn't mean anything in particular, just that it gets quite long
 with the real data.
 I did mean: names = c(John, Phil, Robert)

 The only problem is that using the method you suggest is that I lose
 the indexing, ie in the example, instead of:

 (index)NameFave.Number
 1John7
 2Phil14
 3Robert23


 I end up with


 (index) Name Fave.Number
 1 John 7
 3 Phil 14
 5 Robert 23

 This isnt a problem at the moment but I guess it could be if I used
 the table later in loops. Is there an easy way to re-index the table?
strange.  i run this simulated example, and it's ok:

d = data.frame(a=letters[rep(1:5,2)], b=letters[10:1])
d[d$a %in% letters[1:3], ]

you can always add an index column:

d = data.frame(index=1:dim(d)[[1]],d)

vQ

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Wanted: your examples of logged axes with custom tick marks

2008-06-13 Thread hadley wickham
Dear all,

I'm trying to improve the default layout of tick marks for log scaled
axes in ggplot2.  To this end, it would be really useful to see what
people actually do in practice.  If you've ever made a log-log (or
semi-log) plot and customised the location of the ticks, I'd really
appreciate a copy of your graph (if it's publicly available) or a
statement of the range of the data, and the tick marks you used.

I'm not aware of any published research on this topic, but if I've
missed something, a pointer to relevant work would be greatly
appreciated.

Thanks!

Hadley

-- 
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] CRAN package XML (omegahat)

2008-06-13 Thread David Keegan
Hi,

I'm having issues using this package to parse large XML files.
Where should bugs be reported? The omegahat website has several
broken links.

Regards
David Keegan.
--

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Rest of a division

2008-06-13 Thread Eric Ferreira
Dear useRs,

How do I ask for the rest of a division?

For instantce, in C is like:

4%2 = 0

Best regards,

-- 
Eric B Ferreira
Exact Sciences Department
Federal University of Lavras
Brasil

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Rest of a division

2008-06-13 Thread Peter Dalgaard
Eric Ferreira wrote:
 Dear useRs,

 How do I ask for the rest of a division?

 For instantce, in C is like:

 4%2 = 0

 Best regards,

   
 4%%2
[1] 0


-- 
   O__   Peter Dalgaard Øster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark  Ph:  (+45) 35327918
~~ - ([EMAIL PROTECTED])  FAX: (+45) 35327907

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Rest of a division

2008-06-13 Thread Charilaos Skiadas

?%%

On Jun 13, 2008, at 11:23 AM, Eric Ferreira wrote:


Dear useRs,

How do I ask for the rest of a division?

For instantce, in C is like:

4%2 = 0

Best regards,

--
Eric B Ferreira
Exact Sciences Department
Federal University of Lavras
Brasil



Haris Skiadas
Department of Mathematics and Computer Science
Hanover College

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] CRAN package XML (omegahat)

2008-06-13 Thread Martin Morgan

Bugs to the package maintainer, for this and all packages

 packageDescription('XML')[['Maintainer']]
[1] Duncan Temple Lang [EMAIL PROTECTED]

Best luck will come with the usual, sessionInfo(), easily reproducible 
and compact example, use of current software versions, etc.


Martin

David Keegan wrote:

Hi,

I'm having issues using this package to parse large XML files.
Where should bugs be reported? The omegahat website has several
broken links.

Regards
David Keegan.
--

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Martin Morgan
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M2 B169
Phone: (206) 667-2793

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Sweave: looping over mixed R/LaTeX code

2008-06-13 Thread Jeffrey Horner

Stephan Kolassa wrote on 06/13/2008 03:22 AM:

Dear guRus,

I would like to loop over a medium amount of Sweave code, including both R and LaTeX 
chunks. Is there any way to do so? As an illustration, can I create a .tex file like this 
using a loop within a .Rnw file, where the 1,2,3 comes from some iteration 
variable in R?


\documentclass{article}
\usepackage{Sweave}
\begin{document}
Iteration 1
Iteration 2
Iteration 3
\end{document}



Another alternative would be to use the brew package from CRAN:

http://cran.r-project.org/web/packages/brew/index.html

While the disadvantage would be a change of syntax from Sweave to brew, 
you would gain the advantage of looping over code chunks.


brew also installs a collection of example files, one being a conversion 
of the Sweave test file to brew. Scope out the 'Examples' section from 
the brew help page.


Best,

Jeff



Right now, I do have a working but painful solution. I put the loop contents in 
a separate loop.Rnw file, then:
1. run everything before the loop through R for initialization
2. Sweave loop.Rnw; shell(move loop.tex loop_1.tex)
   Sweave loop.Rnw; shell(move loop.tex loop_2.tex)
   ...
   Sweave loop.Rnw; shell(move loop.tex loop_n.tex)
3. \input all loop_i.tex files into master.Rnw and Sweave master.Rnw

This does what I need, however, it is a major pain code-wise, e.g., there 
appears to be no way to control the loop during execution (n must be known in 
advance), and I need to control all graphics using \includegraphics with the 
iteration counter paste()d into the filename.

An alternative may be not using Sweave and working with one giant sink() and 
lots of print()s, letting R just write the entire .tex file. This also appears 
inelegant to me.

Is there a better way to do this?

I have tried to do my homework, see below. Do I get partial credit ;-) ?

Thank you all for your time!
Stephan


#


I can't simply start a for loop within an R chunk and finish it in another one.

whiledo in the ifthen.sty package doesn't like Sweave at all. And of course, it 
would simply reuse the R chunks if it did work, without changing things between 
loops. For the same reason, I cannot define a \newcommand{\loopcontent}{...} 
with the entire loop contents and then simply write \loopcontent \loopcontent 
... or \input or \include the loop content from an external file.

Of course it would be possible to not use Sweave and just use the output from 
the R console, but there are a couple of figures I would really like to see 
close to the relevant portions of the calculations.

I also thought about putting the entire loop in *one* R chunk, but then I see no way to include LaTeX chunks *within* this R chunk. I can't just sink() to the .tex file in the middle of the R chunk (as the sink() gets appended to the .tex file only after Sweave is done with it). 


I have read the Sweave manual and FAQs and the R/R Windows FAQ, I did both RSiteSearches and RSeek searches for all 
combinations of Sweave and loop, for, while I could think of.

For what it's worth, here's my sessionInfo():

R version 2.7.0 (2008-04-22) 
i386-pc-mingw32 


locale:
LC_COLLATE=German_Germany.1252;LC_CTYPE=German_Germany.1252;LC_MONETARY=German_Germany.1252;LC_NUMERIC=C;LC_TIME=German_Germany.1252

attached base packages:
[1] stats graphics  grDevices utils datasets  tcltk methods   base 


other attached packages:
[1] svIDE_0.9-5

loaded via a namespace (and not attached):
[1] svMisc_0.9-5

--

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
http://biostat.mc.vanderbilt.edu/JeffreyHorner

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Rest of a division

2008-06-13 Thread Erik Iverson

?Arithmetic

Eric Ferreira wrote:

Dear useRs,

How do I ask for the rest of a division?

For instantce, in C is like:

4%2 = 0

Best regards,



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] help with colsplit (reshape)

2008-06-13 Thread Ista Zahn

Dear list,

I'm trying to figure out how to use the reshape package to reshape  
data from a wide format to a long format. I have data like this


pid - c(1:10)
predA - c(-1,-2,-1,-2,-1,-2,-1,-2,-1,-2)
predB.1 - c(0,0,0,1,1,0,0,0,1,1)
predB.2 - c(2,2,3,3,3,2,2,3,3,3)
predC.1 - c(10,10,10,10,10,11,11,11,11,11)
predC.2 - c(12,12,13,13,13,12,12,13,13,13)
out.1 - c(100:109)
out.2 - c(200:209)
Data - data.frame(pid, predA, predB.1, predB.2, predC.1, predC.2, out. 
1, out.2)


and I want to make it look like this:

head(L.Data - reshape(Data, varying = list(3:4, 5:6, 7:8),  
idvar=pid, v.names=c(PredA, PredB, Out),  
timevar=measure.num, times=c(1,2), direction=long))

pid predA measure.num PredA PredB Out
1.1   1-1   1 010 100
2.1   2-2   1 010 101
3.1   3-1   1 010 102
4.1   4-2   1 110 103
5.1   5-1   1 110 104
6.1   6-2   1 011 105

Using Hadley's JSS article Reshaping Data with the reshape Package  
as a guide, I tried the following:


M.Data - melt(Data, id=pid)
M.Data2 - cbind(M.Data, colsplit(M.Data$variable, split = ., names  
= c(treatment, time)))


but this gave a warning and resulted in

head(M.Data2)
  pid variable value treatment time NA. NA..1 NA..2 NA..3 NA..4
1   1predA-1NA   NA  NANANANANA
2   2predA-2NA   NA  NANANANANA
3   3predA-1NA   NA  NANANANANA
4   4predA-2NA   NA  NANANANANA
5   5predA-1NA   NA  NANANANANA
6   6predA-2NA   NA  NANANANANA

I searched the mailing list and found this post: http://tolstoy.newcastle.edu.au/R/e4/help/08/05/11857.html 
 which led me to try


M.Data2 - data.frame(M.Data, colsplit(M.Data$variable, split = \\.,  
names = c(treatment, time)))


which gave:

head(M.Data2)
  pid variable value treatment  time
1   1predA-1 predA predA
2   2predA-2 predA predA
3   3predA-1 predA predA
4   4predA-2 predA predA
5   5predA-1 predA predA
6   6predA-2 predA predA

Closer but no cigar.

I would be grateful if someone will tell me (a) how to reshape the  
data as described above using the reshape package, (b) what difference  
between split = . and split = \\. is, and (c) if more information  
about the colsplit command is available anywhere.


Thank you very much in advance,
Ista

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Looping, Control Flow Conditional Statements

2008-06-13 Thread Charles C. Berry



See

?rle

Start with this:


a1.runs - rle( a1 )
a1.runs$lengths[ a1.runs$values0 ]

[1] 3 4




HTH,

Chuck

p.s.


library(fortunes)
fortune(106)


If the answer is parse() you should usually rethink the question.
   -- Thomas Lumley
  R-help (February 2005)
--

see

?get

On Fri, 13 Jun 2008, [EMAIL PROTECTED] wrote:


Dear R Group:



I have little experience using R and even less experience with control
flow type questions.



See the following code:



a1 = c(0, 1, 1, 1,

0, 0, 0, 0, 0,

0, 0, 1,

1, 1, 1, 0, 0)



for(i in 1:1){

   sx - paste(a,i,sep=)

   s - eval(parse(text = paste(a,i,sep=)))

{g = numeric(length(s))

k = numeric(length(s))

   {for (i in 1:length(s))

   {for (j in 1:length(s))

   ifelse(((j=i)1),(g[j] = s[j] + s[i]),(k[j] = s[j] + s[i]))

}}

h1 - hist(g,freq=TRUE)

h - h1$counts[4]

cat(sx,:, h,\n,file = C:/temp/test-beta.txt, append=TRUE)

}}





The output is:


g


[1] 0 2 2 2 0 0 0 0 0 0 0 2 2 2 2 0 0


k


[1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0






h


[1] 7



 a text file, which has:

   a1 : 7



k is a by-product of the ifelse statement and is of no interest  g and
h only go part-way to answering my question, which is:



For every time an object i.e. a1 (which is actually a time series) - 0 1
1 1 0 0 0 0 0 0 0 1 1 1 1 0 0   has as value over 0 how long do the
values stay above 0. So in this case a1 has two goups or events where
the value is above zero, the first event lasts for 3 'days' and the
second event lasts for 4 'days'. I have my code telling me that there
was a total of 7 'days' in event or above 0, but what I need to know is
that there were two 'events' and the 1st lasted 3 'days' and the 2nd
lasted '4' days. Essentially I want a text file output to say:


a1.1 : 3


a1.2 : 4



My thinking is that I need to somehow get the code working through each
vector one value at a time and when a value is found to meet the critera
of  0  R creates a new vector; to use the above example it would come
to the first value 0 and then create the new vector a1.1 = (1,1,1) then
as the next value in the series is 0 it would close this new vector
'a1.1'. It would then continue until it reaches the next value 0 and
then create the vector a1.2 = (1,1,1,1) then again as the next value in
the series is 0 it would close this new vector, and so on.



Then all I need to do is perform a count of '1's in these new vectors to
find how many days they met this criteria of being greater than 0



I hope the above makes sense and I really hope there is someone willing
and able to help. I don't know how to proceed.



Thanks,

Garth














[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



Charles C. Berry(858) 534-2098
Dept of Family/Preventive Medicine
E mailto:[EMAIL PROTECTED]  UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] alternative to matching/merge?

2008-06-13 Thread Lana Schaffer
Jim,
My code is this:
 mergefunc - function(x,seqFile){
# merge(seqFile,x)
cbind(x, seqFile[ match(as.vector(x$index), as.vector(seqFile$index)),
])
}
LIX - lapply(d.frame[[1]], mergefunc,seqFile=seqFile)
Each matrix/data.frame takes 0.2 seconds and then to do this
1240 times takes ~4 minutes.
Thanks,
Lana

-Original Message-
From: jim holtman [mailto:[EMAIL PROTECTED] 
Sent: Thursday, June 12, 2008 6:40 PM
To: Lana Schaffer
Cc: r-help@r-project.org
Subject: Re: [R] alternative to matching/merge?

It would be nice if you at least included the code that you are using
and a subset of the data.  Have you run Rprof to determine which of the
functions is consuming the time?

On Thu, Jun 12, 2008 at 3:25 PM, Lana Schaffer [EMAIL PROTECTED]
wrote:

 Greetings,
 I am doing matching/merge for a table (40919x3) to data which is in 
 the form of a list of 1268 data.frames.  Using lapply this is taking 
 ~5 minutes.  I know that the match/merge functions are time consuming,

 so is there an alternative to this accomplish this goal?  is lapply 
 not efficient?

 Lana Schaffer
 Biostatistics/Informatics
 The Scripps Research Institute
 DNA Array Core Facility
 La Jolla, CA 92037
 (858) 784-2263
 (858) 784-2994
 [EMAIL PROTECTED]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




--
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Problem with Freq function {prettyR}

2008-06-13 Thread ukoenig
Thanks a lot, Jim!

 Since this is a contributed package, you should be contacting the
 maintainer (as mentioned in the posting guide).
sorry



 Anyway, the problem occurs because in the second case you have a factor
 in the first column and numeric in the second. This part of the code
 will illustrate what I mean:

 for (i in 1:nfreq) {
  if (display.na)
  nna - sum(is.na(x[[i]]))
  else nna - 0
  xt - na.omit(x[[i]])
  if (is.null(levels))
  levels - unique(xt)
  if (is.numeric(x[[i]]))
  xt - factor(xt, levels = levels)

 So the first time through this loop the levels variable is set to
 c(m,f). On the second time levels is no longer NULL, so when the xt
 variable is created it is essentially this:

 xt - factor(xt, levels = c(m,f))

 and since xt contains only numbers you get

 [1] NA NA NA NA NA NA NA NA
 Levels: m f

 Best,

 Jim



 [EMAIL PROTECTED] wrote:
  Does someone have an idea?
  Thanks a lot!
 
  Udo
 
 
  Quoting Udo [EMAIL PROTECTED]:
 
  Dear list,
  I have a problem with freq from prettyR.
 
  Please have a look at my syntax with a litte example:
 
 
  library(prettyR)
 
  #Version 1
  test.df-data.frame(q1=sample(1:4,8,TRUE),
 gender=sample(c(f,m),8,TRUE))
  test.df
  freq(test.df) #No error message
 
  #Version 2
  test.df-data.frame(gender=sample(c(f,m),8,TRUE),
 q1=sample(1:4,8,TRUE))
  test.df
  freq(test.df)
 
  Error message: Error in vector(integer, length) : Vector size can´t be
 NA
 
  Can someone tell me, why an error message occurs in version two? I am
  helpless...
 
  Thanks in advance!
 
  Udo K ö n i g
 
  
 
  Clinic for Child an Adolescent Psychiatry
  Philipps University of Marburg / Germany
 
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.

 --
 James W. MacDonald, M.S.
 Biostatistician
 Affymetrix and cDNA Microarray Core
 University of Michigan Cancer Center
 1500 E. Medical Center Drive
 7410 CCGC
 Ann Arbor MI 48109
 734-647-5623


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] alternative to matching/merge?

2008-06-13 Thread jim holtman
What is the structure of 'd.frame' and 'segFile'?  Run Rprof so that
we can see which of the functions it is spending its time in.  What
happens if x$index is not in seqFile$index?  Are the values in the
'index' unique in both structures?  Subsetting a data frame can be
expensive when compared to using a matrix.  Could you use a matrix
instead of a data frame; are all the columns the same mode?  Again
either a subset of data would be helpful or an 'str' on the data
objects being used so that we can understand what they are.

On Fri, Jun 13, 2008 at 12:03 PM, Lana Schaffer [EMAIL PROTECTED] wrote:
 Jim,
 My code is this:
  mergefunc - function(x,seqFile){
 # merge(seqFile,x)
 cbind(x, seqFile[ match(as.vector(x$index), as.vector(seqFile$index)),
 ])
 }
 LIX - lapply(d.frame[[1]], mergefunc,seqFile=seqFile)
 Each matrix/data.frame takes 0.2 seconds and then to do this
 1240 times takes ~4 minutes.
 Thanks,
 Lana

 -Original Message-
 From: jim holtman [mailto:[EMAIL PROTECTED]
 Sent: Thursday, June 12, 2008 6:40 PM
 To: Lana Schaffer
 Cc: r-help@r-project.org
 Subject: Re: [R] alternative to matching/merge?

 It would be nice if you at least included the code that you are using
 and a subset of the data.  Have you run Rprof to determine which of the
 functions is consuming the time?

 On Thu, Jun 12, 2008 at 3:25 PM, Lana Schaffer [EMAIL PROTECTED]
 wrote:

 Greetings,
 I am doing matching/merge for a table (40919x3) to data which is in
 the form of a list of 1268 data.frames.  Using lapply this is taking
 ~5 minutes.  I know that the match/merge functions are time consuming,

 so is there an alternative to this accomplish this goal?  is lapply
 not efficient?

 Lana Schaffer
 Biostatistics/Informatics
 The Scripps Research Institute
 DNA Array Core Facility
 La Jolla, CA 92037
 (858) 784-2263
 (858) 784-2994
 [EMAIL PROTECTED]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




 --
 Jim Holtman
 Cincinnati, OH
 +1 513 646 9390

 What is the problem you are trying to solve?




-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] alternative to matching/merge?

2008-06-13 Thread Lana Schaffer
Jim,
d.frame[[i]] is a list of data.frames and seqFile is a
data.frame.  I have coverted them to vectors/matrixes and
the timing is the same as data.frame.  'index' is unique
in both structures.  The list is subset into data.frame/matrix
structures.  
Lana

-Original Message-
From: jim holtman [mailto:[EMAIL PROTECTED] 
Sent: Friday, June 13, 2008 9:45 AM
To: Lana Schaffer
Cc: r-help@r-project.org
Subject: Re: [R] alternative to matching/merge?

What is the structure of 'd.frame' and 'segFile'?  Run Rprof so that we
can see which of the functions it is spending its time in.  What happens
if x$index is not in seqFile$index?  Are the values in the 'index'
unique in both structures?  Subsetting a data frame can be expensive
when compared to using a matrix.  Could you use a matrix instead of a
data frame; are all the columns the same mode?  Again either a subset of
data would be helpful or an 'str' on the data objects being used so that
we can understand what they are.

On Fri, Jun 13, 2008 at 12:03 PM, Lana Schaffer [EMAIL PROTECTED]
wrote:
 Jim,
 My code is this:
  mergefunc - function(x,seqFile){
 # merge(seqFile,x)
 cbind(x, seqFile[ match(as.vector(x$index), as.vector(seqFile$index)),
 ])
 }
 LIX - lapply(d.frame[[1]], mergefunc,seqFile=seqFile) Each 
 matrix/data.frame takes 0.2 seconds and then to do this 1240 times 
 takes ~4 minutes.
 Thanks,
 Lana

 -Original Message-
 From: jim holtman [mailto:[EMAIL PROTECTED]
 Sent: Thursday, June 12, 2008 6:40 PM
 To: Lana Schaffer
 Cc: r-help@r-project.org
 Subject: Re: [R] alternative to matching/merge?

 It would be nice if you at least included the code that you are using 
 and a subset of the data.  Have you run Rprof to determine which of 
 the functions is consuming the time?

 On Thu, Jun 12, 2008 at 3:25 PM, Lana Schaffer [EMAIL PROTECTED]
 wrote:

 Greetings,
 I am doing matching/merge for a table (40919x3) to data which is in 
 the form of a list of 1268 data.frames.  Using lapply this is taking
 ~5 minutes.  I know that the match/merge functions are time 
 consuming,

 so is there an alternative to this accomplish this goal?  is lapply 
 not efficient?

 Lana Schaffer
 Biostatistics/Informatics
 The Scripps Research Institute
 DNA Array Core Facility
 La Jolla, CA 92037
 (858) 784-2263
 (858) 784-2994
 [EMAIL PROTECTED]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




 --
 Jim Holtman
 Cincinnati, OH
 +1 513 646 9390

 What is the problem you are trying to solve?




--
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Level Plot and Scale of Colorkey

2008-06-13 Thread emma hartnett
I am drawing level plots but I would like to specify the range of the colorkey, 
I am not having any success figuring this out so any help would be greatly 
appreciated!

Here is an example of what I am trying to do:

disp-1

x - seq(1, 10,by=1) 
y - seq(1,10,by=1)
g - expand.grid(x = x, y = y)
g$z - 1/exp((abs(g$x-5)+abs(g$y-5))*disp)
g$z-g$z/sum(g$z)

levelplot(z ~ x * y, g,xlab=x co-ordinate, ylab=y co-ordinate 
,colorkey=TRUE,col.regions=(col=gray((0:32)/32)))

I would like to enforce the number of divisions on the colorkey scale and the 
size – so for example from 0 to 0.1 in increments of 0.02 (just as an example). 
 

I apologize if this is an obvious question but I have read the documentation 
and scoured the archives and cannot figure it out.




  __
can.html

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] cluster.stats

2008-06-13 Thread Laura Poggio
Dear list,
I just tried to use the function cluster.stat in the package fpc.
I just have a couple of questions about the syntax:

cluster.stats(d,clustering,alt.clustering=NULL,
silhouette=TRUE,G2=FALSE,G3=FALSE)

1) the distance object (d) is an object obtained by the function dist() on
my own original matrix?
2) clustering is the clusters vector as result of one of the many clustering
methods?

Thank you very much in advance and sorry for such basic question, but I did
not manage to clarify my mind.

Laura

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Level Plot and Scale of Colorkey

2008-06-13 Thread Toby Marthews
Try

colscaledivs=100#colscaledivs=15 here is the R default
levelplot(z ~ x * y, g,xlab=x co-ordinate,ylab=y
co-ordinate,colorkey=TRUE,at=seq(from=-0.01,to=0.25,length=colscaledivs),col.regions=(col=gray((0:colscaledivs)/colscaledivs)))

Toby Marthews


Le Ven 13 juin 2008 18:50, emma hartnett a écrit :
 I am drawing level plots but I would like to specify the range of the
 colorkey, I am not having any success figuring this out so any help would
 be greatly appreciated!

 Here is an example of what I am trying to do:

 disp-1

 x - seq(1, 10,by=1)
 y - seq(1,10,by=1)
 g - expand.grid(x = x, y = y)
 g$z - 1/exp((abs(g$x-5)+abs(g$y-5))*disp)
 g$z-g$z/sum(g$z)

 levelplot(z ~ x * y, g,xlab=x co-ordinate, ylab=y co-ordinate
 ,colorkey=TRUE,col.regions=(col=gray((0:32)/32)))

 I would like to enforce the number of divisions on the colorkey scale and
 the size – so for example from 0 to 0.1 in increments of 0.02 (just as an
 example).

 I apologize if this is an obvious question but I have read the
 documentation and scoured the archives and cannot figure it out.




   __
 can.html

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] restricted coefficient and factor for linear regression.

2008-06-13 Thread Oh Dong-hyun

Hi,

my data set is data.frame(id, yr, y, l, e, k).

I would like to estimate Lee and Schmidts (1993, OUP) model in R.

My colleague wrote SAS code as follows:
** procedures for creating dummy variables are omitted **
** di# and dt# are dummy variables for industry and time **
data a2; merge a1 a2 a; by id yr;
 proc sysnlin maxit=100 outest=beta2;
 endogenous y;
 exogenous  l e k di1-di12 dt2-dt10;
 parms a0 0.94 al -0.14 ae 1.8 ak -0.9
 b1 0 b2 0 b3 0 b4 0 b5 0 b6 0 b7 0 b8 0 b9 0 b10 0 b11 0
 b12 0 c2 0 c3 0 c4 0 c5 0 c6 0 c7 0 c8 0 c9 0 c10 0;
 y=a0+al*l+ae*e+ak*k
 +(b1*di1+b2*di2+b3*di3+b4*di4+b5*di5+b6*di6
 +b7*di7+b8*di8+b9*di9+b10*di10+b11*di11+b12*di12)*
 (1*dt1+c2*dt2+c3*dt3+c4*dt4+c5*dt5+c6*dt6+c7*dt7
 +c8*dt8+c9*dt9+c10*dt10);
 title '* lee/schmidt parameter estimates *';

My R code is as follows:
##
library(plm)
dt - read.table(dt.dta, sep = \t, header= T)
dt$id - factor(dt$id)
dt$yr - factor(dt$yr)
fit.model - I(log(y)) ~ I(log(l)) + I(log(e)) + yr * id
re.fit.gls - pggls(fit.model, data = dt)
#

I've got the following error message:
# Error message ###
Error in dimnames(x) - dn :
  length of 'dimnames' [2] not equal to array extent
 End of Error message

I would like to figure out three things.
1. How can I restrict coefficient in model? As you can see in SAS  
code, coefficient of dt1 is restricted to 1.
2. If it is possible to restrict coefficients, it is possible to  
restrict coefficients of factors? If so, how?


Thanks in advance.

Best,


=
Dong-hyun Oh
Center of Excellence for Science and Innovation Studies
Royal Institute or Technology, Sweden
e-mail: [EMAIL PROTECTED]
cel: +46 73 563 45 22

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Maximum likelihood estimation in R with censored Data

2008-06-13 Thread Ben Bolker
Bluder Olivia olivia.bluder at k-ai.at writes:

 
 Hello,
 
 I'm trying to calculate the Maximum likelihood estimators for a dataset
 which contains censored data.
 
 I started by using the function nlm, but isn't there a separate method
 for doing this for e.g. the weibull and the log-normal distribution?
 
 Thanks,
 
 Olivia

  This is not *quite* enough detail about what you
want to do.  Can you (as the posting guide suggests!)
give us a small example of what you want to do?  You may be able
to do this via the survreg() command in the survival
package, or you may want to do it yourself by constructing
a log-likelihood function with dweibull() for uncensored
data and pweibull() for censored data [or dlnorm/plnorm].

  Ben Bolker

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] nls() vs lm() estimates

2008-06-13 Thread Héctor Villalobos
Hi,

I'm trying to understand why the coefficients a and b for the model: W = 
a*L^b estimated
via nls() differs from those obtained for the log transformed model: log(W) = 
log(a) + b*log(L)
estimated via lm(). Also, if I didn't make a mistake, R-squared suggests a 
better adjustment
for the model using coefficients estimated by lm() . Perhaps I'm doing 
something wrong in
nls()?

I hope the code below explains this better. Thanks in advance for any hints.

Héctor


L -
c(8,8.1,8.5,9,9.4,9.4,9.5,9.5,9.5,9.6,9.8,10,10,10,10,10,10,10,10,10,10,10.2,10.3,10.4,10.4,1
0.4,10.4,10.5,10.5,10.5,10.5,10.5,10.5,10.5,10.5,10.7,10.7,10.8,10.9,10.9,10.9,11,11,11,11,1
1,11,11,11,11,11,11,11,11,11,11,11,11,11.1,11.1,11.2,11.2,11.2,11.3,11.3,11.3,11.3,11.3,11.
4,11.4,11.4,11.4,11.5,11.5,11.5,11.5,11.5,11.5,11.5,11.5,11.6,11.6,11.6,11.6,11.6,11.6,11.6,
11.6,11.7,11.7,11.7,11.7,11.7,11.8,11.8,11.8,11.8,11.8,11.9,12,12,12,12,12,12,12,12,12,12,1
2,12,12,12,12,12,12,12,12,12,12,12,12,12,12.1,12.2,12.2,12.2,12.3,12.3,12.3,12.3,12.3,12.3,
12.3,12.3,12.3,12.4,12.4,12.4,12.4,12.4,12.4,12.5,12.5,12.5,12.5,12.5,12.5,12.5,12.5,12.6,12
.6,12.7,12.7,12.8,12.8,12.8,12.9,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,
13,13.2,13.2,13.3,13.5,13.5,13.5,13.5,13.5,13.5,14)

W -
c(11,13,13.45,21.66,19.5,19.73,19.74,19.42,21.48,20.47,23.02,22.7,20.19,23.3,27.05,19.81,
20.01,26,24,25,20,25,26.29,31.26,23.08,29.85,24.27,27.49,25,26.03,24,26,28.21,24.62,21.6
9,24.68,23.6,25.42,26.7,30.25,30.06,33.62,32,30,32.46,30,30,28.8,30.2,31.44,32.84,33.04,3
5,28,29,33,34,28,28.51,35.67,33.72,33,28.53,34.85,34.5,37.44,37.74,31.36,30.12,36.03,33.4
,33.51,34,33,33.79,34.93,35,34.13,35.65,34,32.77,41.71,31.26,32.4,28.81,35.63,34.96,36.74
,32.38,38.14,34.12,40.26,40.27,36.96,38.35,42.36,40.33,31.59,34.44,38,42.63,40,36.28,37,3
4.4,34,33.64,39.05,40.46,35.45,38.72,35,33,35,33,40,35,37,36,32,43,35,40,33.54,40.06,43.3
8,40.3,44.81,43,46.32,37.45,37.71,45.9,36.1,44.78,43.12,45.5,41.62,38,37,43.08,43.82,47.2
5,43,41.59,43.58,41,44,48,43,45.46,43.5,43.38,47.54,45,46.92,44.75,49.02,43.37,43.44,48,4
3,46,42,48,45,48,43,45,46,43,40,42,40,43,43,50,44,50.65,42.11,50,51.44,53.1,52,56.2,45,49
,55)


## Using nls() to find a and b for model:  W = a*L^b
 WL.nls - nls((W ~ a * L^b), start = list(a = 0.02, b = 1),
   trace = TRUE, algorithm = default, model = TRUE)
  summary(WL.nls)

## Scatterplot with fitted model
 plot(L, W)
 lines(L, predict(WL.nls), col = blue, lwd = 2)

## Finding log(a) and b for log transformed model: log(W) = log(a)+ b*log(L)
 logWL.lm - lm(log10(W) ~ log10(L))
  summary(logWL.lm)

## Adding model to plot
 lines(L, 10^coef(logWL.lm)[1]*L^coef(logWL.lm)[2], col=red, lwd=2)

## R-squared for W = a*L^b
 Rsq.nls - sum((predict(WL.nls) - mean(W))^2) / sum((W - mean(W))^2)

## R-squared for W = a*L^b with coefs from log(W) = log(a)+ b*log(L)
 pred - 10^coef(logWL.lm )[1]*L^coef(logWL.lm )[2]
  Rsq.lm - sum((pred - mean(W))^2) / sum((W - mean(W))^2)

  text(c(9, 13), c(50, 20), paste(R-squared:, formatC(c(Rsq.nls, Rsq.lm), 
digits=4)),
col=c(blue, red))


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] cluster.stats

2008-06-13 Thread Christian Hennig

Dear Laura,


Dear list,
I just tried to use the function cluster.stat in the package fpc.
I just have a couple of questions about the syntax:

cluster.stats(d,clustering,alt.clustering=NULL,
silhouette=TRUE,G2=FALSE,G3=FALSE)

1) the distance object (d) is an object obtained by the function dist() on
my own original matrix?


d is allowed to be an object of class dist or a dissimilarity matrix.
The answer to your question depends on what your original matrix is. If 
it is something on which you can compute a distance by dist(), you're 
right, at least if dist() delivers the distance you are interested in.



2) clustering is the clusters vector as result of one of the many clustering
methods?


The help page tells you what clustering can be. So it could be the 
clustering/partition vector of a clustering method or it could be something 
else. Note that cluster.stats doesn't depend on any particular clustering 
method. It computes the statistics regardless of where the clustering 
vector comes from.


Best regards,
Christian



Thank you very much in advance and sorry for such basic question, but I did
not manage to clarify my mind.

Laura

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



*** --- ***
Christian Hennig
University College London, Department of Statistical Science
Gower St., London WC1E 6BT, phone +44 207 679 1698
[EMAIL PROTECTED], www.homepages.ucl.ac.uk/~ucakche

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] help with colsplit (reshape)

2008-06-13 Thread hadley wickham
 M.Data2 - data.frame(M.Data, colsplit(M.Data$variable, split = \\., names
 = c(treatment, time)))

 which gave:

 head(M.Data2)
  pid variable value treatment  time
 1   1predA-1 predA predA
 2   2predA-2 predA predA
 3   3predA-1 predA predA
 4   4predA-2 predA predA
 5   5predA-1 predA predA
 6   6predA-2 predA predA

 Closer but no cigar.

Have a look at the whole thing - it's getting it right most of the
time.  Going back to the original variable names, I see that PredA
does not have a time associated with it.  What do you expect the time
to be?

 I would be grateful if someone will tell me (a) how to reshape the data as
 described above using the reshape package, (b) what difference between split
 = . and split = \\. is,

The splitting argument is a regular expression, and in regular
expression speak . means to match any one character.  \\. escapes
the full stop, so it only matches full stops.

 and (c) if more information about the colsplit
 command is available anywhere.

Probably the best way is just to look at the code (it's pretty simple):

 colsplit.character
function (x, split = , names)
{
vars - as.data.frame(do.call(rbind, strsplit(x, split)))
names(vars) - names
as.data.frame(lapply(vars, function(x) type.convert(as.character(x
}

If strsplit doesn't do what you want, you might need to write your own
function following those lines.

Hadley

-- 
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] nls() vs lm() estimates

2008-06-13 Thread Janne Huttunen

Héctor Villalobos wrote:

Hi,

I'm trying to understand why the coefficients a and b for the model: W = 
a*L^b estimated
via nls() differs from those obtained for the log transformed model: log(W) = 
log(a) + b*log(L)
estimated via lm(). Also, if I didn't make a mistake, R-squared suggests a 
better adjustment
for the model using coefficients estimated by lm() . Perhaps I'm doing 
something wrong in
nls()?


I didn't tried your code, but in general these estimates are different: 
for the former estimate you minimize the norm of the difference W-a*L^b 
(W are ) and for the latter you minimize the norm of the difference 
log(W)-(log(a)+b*log(L)). The solution for these problems are equal. 
That which approach you should choose depends on errors, for additive 
error model the former is better choice.




--
Janne Huttunen
University of California
Department of Statistics
367 Evans Hall Berlekey, CA 94720-3860
email: [EMAIL PROTECTED]
phone: +1-510-502-5205
office room: 449 Evans Hall

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Maximum likelihood estimation in R with censored Data

2008-06-13 Thread Vincent Goulet


Le ven. 13 juin à 13:55, Ben Bolker a écrit :


Bluder Olivia olivia.bluder at k-ai.at writes:



Hello,

I'm trying to calculate the Maximum likelihood estimators for a  
dataset

which contains censored data.

I started by using the function nlm, but isn't there a separate  
method
for doing this for e.g. the weibull and the log-normal  
distribution?


Thanks,

Olivia


 This is not *quite* enough detail about what you
want to do.  Can you (as the posting guide suggests!)
give us a small example of what you want to do?  You may be able
to do this via the survreg() command in the survival
package, or you may want to do it yourself by constructing
a log-likelihood function with dweibull() for uncensored
data and pweibull() for censored data [or dlnorm/plnorm].


If you want to go the second route, function coverage() in package  
actuar will build the censored density function for you. You can then  
feed this function to fitdistr() just like for usual ML estimation.


HTH  Vincent




 Ben Bolker

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] nls() vs lm() estimates

2008-06-13 Thread Janne Huttunen


Janne Huttunen wrote:

Héctor Villalobos wrote:

Hi,

I'm trying to understand why the coefficients a and b for the 
model: W = a*L^b estimated
via nls() differs from those obtained for the log transformed model: 
log(W) = log(a) + b*log(L)
estimated via lm(). Also, if I didn't make a mistake, R-squared 
suggests a better adjustment
for the model using coefficients estimated by lm() . Perhaps I'm doing 
something wrong in

nls()?


I didn't tried your code, but in general these estimates are different: 
for the former estimate you minimize the norm of the difference W-a*L^b 
(W are ) and for the latter you minimize the norm of the difference 
log(W)-(log(a)+b*log(L)). The solution for these problems are equal. 
That which approach you should choose depends on errors, for additive 
error model the former is better choice.


I should read what I have written before sending my message. I meant 
that the solutions of these problems are NOT equal (in general) and 
therefore estimates differ.



--
Janne Huttunen
University of California
Department of Statistics
367 Evans Hall Berkeley, CA 94720-3860
email: [EMAIL PROTECTED]
phone: +1-510-502-5205
office room: 449 Evans Hall

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Quartile regression question

2008-06-13 Thread Ranney, Steven
I have data that looks like

lake,loglength,logweight
1,2.369215857,1.929418926
1,2.426511261,2.230448921
1,2.434568904,2.298853076
1,2.437750563,2.298853076
1,2.442479769,2.230448921
1,2.445604203,2.356025857
...
102,2.722633923,3.310268367
102,2.781755375,3.502153893
102,2.836324116,3.683407299
102,2.802773725,3.583312152
102,2.790285164,3.546419267
102,2.806179974,3.599118565
102,2.716837723,3.316180099


I can regress log weight on log length simply enough, but how would I model the 
third quartile of log weights?  In other words, rather than finding a 2nd 
quartile (or 50th percentile) regression line, 

e.g., mod=lm(logweight~loglength)

can R find a 75th percentile line?  Further, since my data is lake1, is there 
a way to run 3rd quartile regressions on each lake?  I would imagine that 
regressing each population would require some call of the subset function, but 
I cannot figure out how to call it.

Thanks in advance, 

SR 

Steven H. Ranney
Graduate Research Assistant (Ph.D)
USGS Montana Cooperative Fishery Research Unit
Montana State University
PO Box 173460
Bozeman, MT 59717-3460

phone: (406) 994-6643
fax:   (406) 994-7479


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] help with colsplit (reshape)

2008-06-13 Thread Ista Zahn

Thanks Hadley, with your help I'm getting things figured out.
On Jun 13, 2008, at 2:09 PM, hadley wickham wrote:

M.Data2 - data.frame(M.Data, colsplit(M.Data$variable, split = \ 
\., names

= c(treatment, time)))

which gave:

head(M.Data2)
pid variable value treatment  time
1   1predA-1 predA predA
2   2predA-2 predA predA
3   3predA-1 predA predA
4   4predA-2 predA predA
5   5predA-1 predA predA
6   6predA-2 predA predA

Closer but no cigar.


Have a look at the whole thing - it's getting it right most of the
time.  Going back to the original variable names, I see that PredA
does not have a time associated with it.  What do you expect the time
to be?
Right, there is no time associated with this variable. So I tried  
again, treating it as an id:


M.Data - melt(Data, id = c(pid, predA))

From here I was able to achieve the desired result, as follows:

M.Data - data.frame(M.Data, colsplit(M.Data$variable, split = \\.,  
names=c(measure, time)))

M.Data$variable - M.Data$measure
M.Data - M.Data[-5]
L.Data - cast(M.Data, ... ~ variable)

This is perhaps a bit inelegant but it works! I'm interested in  
knowing if there is a better way to do it, but I'm happy that I've at  
least figured out this much. As always I'm humbled by the generosity  
of people who not only make their software available but also take the  
time to answer questions on this list. Thank you!


-Ista



I would be grateful if someone will tell me (a) how to reshape the  
data as
described above using the reshape package, (b) what difference  
between split

= . and split = \\. is,


The splitting argument is a regular expression, and in regular
expression speak . means to match any one character.  \\. escapes
the full stop, so it only matches full stops.


and (c) if more information about the colsplit
command is available anywhere.


Probably the best way is just to look at the code (it's pretty  
simple):



colsplit.character

function (x, split = , names)
{
  vars - as.data.frame(do.call(rbind, strsplit(x, split)))
  names(vars) - names
  as.data.frame(lapply(vars, function(x)  
type.convert(as.character(x

}

If strsplit doesn't do what you want, you might need to write your own
function following those lines.

Hadley

--
http://had.co.nz/


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Quartile regression question

2008-06-13 Thread Philippe Grosjean

Hello,

Look at package quantreg.

Philippe Grosjean


Ranney, Steven wrote:

I have data that looks like

lake,loglength,logweight
1,2.369215857,1.929418926
1,2.426511261,2.230448921
1,2.434568904,2.298853076
1,2.437750563,2.298853076
1,2.442479769,2.230448921
1,2.445604203,2.356025857
...
102,2.722633923,3.310268367
102,2.781755375,3.502153893
102,2.836324116,3.683407299
102,2.802773725,3.583312152
102,2.790285164,3.546419267
102,2.806179974,3.599118565
102,2.716837723,3.316180099


I can regress log weight on log length simply enough, but how would I model the third quartile of log weights?  In other words, rather than finding a 2nd quartile (or 50th percentile) regression line, 


e.g., mod=lm(logweight~loglength)

can R find a 75th percentile line?  Further, since my data is lake1, is there 
a way to run 3rd quartile regressions on each lake?  I would imagine that 
regressing each population would require some call of the subset function, but I 
cannot figure out how to call it.

Thanks in advance, 

SR 


Steven H. Ranney
Graduate Research Assistant (Ph.D)
USGS Montana Cooperative Fishery Research Unit
Montana State University
PO Box 173460
Bozeman, MT 59717-3460

phone: (406) 994-6643
fax:   (406) 994-7479


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Quartile regression question

2008-06-13 Thread Ranney, Steven
Thanks for your help.  Worked great.

SR

Steven H. Ranney
Graduate Research Assistant (Ph.D)
USGS Montana Cooperative Fishery Research Unit
Montana State University
PO Box 173460
Bozeman, MT 59717-3460

phone: (406) 994-6643
fax:   (406) 994-7479




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] help with colsplit (reshape)

2008-06-13 Thread hadley wickham
 Right, there is no time associated with this variable. So I tried again,
 treating it as an id:

 M.Data - melt(Data, id = c(pid, predA))

 From here I was able to achieve the desired result, as follows:

 M.Data - data.frame(M.Data, colsplit(M.Data$variable, split = \\.,
 names=c(measure, time)))
 M.Data$variable - M.Data$measure
 M.Data - M.Data[-5]
 L.Data - cast(M.Data, ... ~ variable)

 This is perhaps a bit inelegant but it works! I'm interested in knowing if
 there is a better way to do it, but I'm happy that I've at least figured out
 this much. As always I'm humbled by the generosity of people who not only
 make their software available but also take the time to answer questions on
 this list. Thank you!

You're welcome.  And don't worry too much about data cleaning routines
being elegant - it's very very hard to write elegant code to clean up
something that's not at all elegant.

Hadley


-- 
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] alternative to matching/merge?

2008-06-13 Thread hadley wickham
On Fri, Jun 13, 2008 at 11:45 AM, jim holtman [EMAIL PROTECTED] wrote:
 What is the structure of 'd.frame' and 'segFile'?  Run Rprof so that
 we can see which of the functions it is spending its time in.  What
 happens if x$index is not in seqFile$index?  Are the values in the
 'index' unique in both structures?  Subsetting a data frame can be
 expensive when compared to using a matrix.  Could you use a matrix
 instead of a data frame; are all the columns the same mode?  Again
 either a subset of data would be helpful or an 'str' on the data
 objects being used so that we can understand what they are.

A few other ideas to try:

 * try merging do.call(rbind, d.frame) and seqFile, and then
spliting the results back up

 * try turning giving seqFile rownames (rownames(seqFile) -
seqFile$index) and then use character matching:  cbind(x, seqFile[
as.character(x$index)]

 * if there is a one to one corresponding between index in seqFile and
all data.frames in d.frame, merge all of the d.frames together, order
both by index then just cbind

Hadley


-- 
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] stretching text vertically

2008-06-13 Thread Alex Reynolds
I'd like to stretch a plotted character vertically, to create a 
sequence logo.


Is there a parameter to allow stretching text() output vertically or 
squeeze horizontally?


I know about Oliver Bembom's seqLogo library, but this generates a 
sequence logo plot using a separate bitmap device. I want to recreate 
the sequence logo *inside* an existing plot.


Alternatively, is there a way to embed one plot inside another?

I could use imagemagick outside R to 'montage' separate bitmaps, but 
then the sequence logo is going to be very difficult to align (base for 
base) with the plot I'm trying to join it to.


Thanks for any tips,
Alex

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Importing data with different delimters

2008-06-13 Thread David Arnold

All,

I have a data file with 56 entries that looks like this:

City State  JanTemp Lat Long
Mobile, AL  44  31.288.5
Montgomery, AL  38  32.986.8
Phoenix, AZ 35  33.6112.5
Little Rock, AR 31  35.492.8
Los Angeles, CA 47  34.3118.7
San Francisco, CA   42  38.4123.0

I would like to read this data into a dataframe. Is it possible to  
do without editing the datafile?


D.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Help with stat.table in Epi package,

2008-06-13 Thread Troy S
R Fans--

I am having problems with the following code.  It worked under R 2.6.0 but not 
in 2.7.0.


 library(Epi)
 df - read.table( c:/Documents and Settings/Troy S/My 
 Documents/debug_chisq_080613b.txt)
 summary(df)
  cvd agecat 
 Min.   :0.   (0,40] :1  
 1st Qu.:0.   (40,60]:2  
 Median :0.  
 Mean   :0.  
 3rd Qu.:0.5000  
 Max.   :1.  
 fa - as.factor(df$cvd)
 fb - as.factor(df$agecat)
 stat.table(index=list(a=fa, b=fb))
Error in eval(expr, envir, enclos) : could not find function count

The file contents is 

cvd agecat
1 0 (0,40]
2 1 (40,60]
3 0 (40,60]

My sessionInfo is

R version 2.7.0 (2008-04-22) 
i386-pc-mingw32 

locale:
LC_COLLATE=English_United States.1252;LC_CTYPE=English_United 
States.1252;LC_MONETARY=English_United 
States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252

attached base packages:
[1] stats4splines   stats graphics  grDevices utils datasets 
[8] methods   base 

other attached packages:
[1] Epi_1.0.8 coin_0.6-9modeltools_0.2-15 mvtnorm_0.9-0
[5] survival_2.34-1  

loaded via a namespace (and not attached):
[1] tools_2.7.0
 

Any help would be great!

Troy
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Importing data with different delimters

2008-06-13 Thread jim holtman
Assuming that the only problem is the blank in the city names, here is
one way of doing it:

 inFile - textConnection(City State  JanTemp Lat Long
+ Mobile, AL  44  31.288.5
+ Montgomery, AL  38  32.986.8
+ Phoenix, AZ 35  33.6112.5
+ Little Rock, AR 31  35.492.8
+ Los Angeles, CA 47  34.3118.7
+ San Francisco, CA   42  38.4123.0)
 lines - readLines(inFile)
 # get rid of blanks in city names
 newLines - sub((.*?) +(.*),, \\1_\\2,, lines)

 x - read.table(textConnection(newLines), header=TRUE)
 closeAllConnections()
 x
City State JanTemp  Lat  Long
1Mobile,AL  44 31.2  88.5
2Montgomery,AL  38 32.9  86.8
3   Phoenix,AZ  35 33.6 112.5
4   Little_Rock,AR  31 35.4  92.8
5   Los_Angeles,CA  47 34.3 118.7
6 San_Francisco,CA  42 38.4 123.0



If you want, you can then go back and replace the _ with a blank in
the city name.

On Fri, Jun 13, 2008 at 7:14 PM, David Arnold [EMAIL PROTECTED] wrote:
 All,

 I have a data file with 56 entries that looks like this:

 City State  JanTemp Lat Long
 Mobile, AL  44  31.288.5
 Montgomery, AL  38  32.986.8
 Phoenix, AZ 35  33.6112.5
 Little Rock, AR 31  35.492.8
 Los Angeles, CA 47  34.3118.7
 San Francisco, CA   42  38.4123.0

 I would like to read this data into a dataframe. Is it possible to do
 without editing the datafile?

 D.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] rbind() problem

2008-06-13 Thread array chip
Hi, I would like to rbind 2 data frames. They both some common column names, 
but also some unique column names each, is there any simple function that rbind 
these 2 data frames with filling NAs for those columns of unique names?

thanks

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Correcting the display of colnames and rownames

2008-06-13 Thread Steve Murray

Dear all,

I have a data frame of dimension 720 columns by 360 rows, to which I am trying 
to add numerical row and column labels to, using the 'sequence' command. The 
original data, which I read in using 'read.table', had no such labels at all.

I've got as far as successfully using the sequence command and getting the 
labels to display. However, I'm finding that for the minus numbers in 
particular, the values aren't displaying correctly. For the value '-179.75' for 
example, it displays as 'X.179.75'. Even for positive numbers, the 'X' prefix 
appears at the start of the label (but without the '.').

I have tried numerous attempts at addressing this. I'm currently as far as 
adopting the following approach; I'll show what I've done for just the column 
headings - I've adopted the same approach for row headings, with the same 
results/problem so far.

columnnames - seq(from = -179.75, to = 179.75, length = 720)
as.numeric - colnames(Jan)
colnames(Jan) - make.names(columnnames)

N.B. 'Jan' (as in January) refers to the data frame in question.

So my thinking here is to assign the values to be used as column labels to 
'columnnames', and use 'make.names' to assign these values to the column names 
of the data frame. I've also tried changing 'colnames(Jan)' to be a numeric 
class, as I was previously having problems assigning the values to the labels - 
I think because by default 'colnames' is of class 'character vector'?

If anyone is able to suggest a way how I can solve the problem of the values 
not being displayed as I'd hoped (namely, removing the 'X' and displaying '-' 
for minus numbers), then I'd be very grateful.

Many thanks,

Steve

_


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Using lm with a matrix?

2008-06-13 Thread jonboym

Many thanks, works great!


Charilaos Skiadas-3 wrote:
 
 Try this:
 
 lapply( 1:2, function(i) lm( y~x, data=list(x=xdat[,i], y=ydat[,i]) ) )
 
 Haris Skiadas
 

-- 
View this message in context: 
http://www.nabble.com/Using-lm-with-a-matrix--tp17708207p17829661.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Weights and coxph

2008-06-13 Thread mah
I am confuse by the results of  the weights option for coxph.  I
replicated each row three times from the help page for coxph in the
data frame test_freq.  I had expected that the coefficients,
significance tests, and tests of non-proportionality would yield the
same results for the replicated and non-replicated data, but the
output below shows differences in all three metrics.  Is this the
result of a curved response variable?  This is likely more of a
conceptual question than a language question, but all help is
sincerely appreciated.

Mike

 test1
$time
[1] 4 3 1 1 2 2 3

$status
[1]  1 NA  1  0  1  1  0

$x
[1] 0 2 1 1 1 0 0

$sex
[1] 0 0 0 0 1 1 1

$wt
[1] 3 3 3 3 3 3 3

 test_freq
   time status x sex
1 4  1 0   0
2 4  1 0   0
3 4  1 0   0
4 3 NA 2   0
5 3 NA 2   0
6 3 NA 2   0
7 1  1 1   0
8 1  1 1   0
9 1  1 1   0
101  0 1   0
111  0 1   0
121  0 1   0
132  1 1   1
142  1 1   1
152  1 1   1
162  1 0   1
172  1 0   1
182  1 0   1
193  0 0   1
203  0 0   1
213  0 0   1
 t1 - coxph( Surv(time, status) ~ x + strata(sex), data=test1, weights=wt)
 summary(t1)
Call:
coxph(formula = Surv(time, status) ~ x + strata(sex), data = test1,
weights = wt)

  n=6 (1 observation deleted due to missingness)
  coef exp(coef) se(coef)zp
x 1.17  3.220.744 1.57 0.12

  exp(coef) exp(-coef) lower .95 upper .95
x  3.22  0.311 0.749  13.8

Rsquare= 0.353   (max possible= 0.999 )
Likelihood ratio test= 2.61  on 1 df,   p=0.106
Wald test= 2.47  on 1 df,   p=0.116
Score (logrank) test = 2.67  on 1 df,   p=0.102

 cox.zph(t1)
  rho   chisq p
x -0.0716 0.00598 0.938
 t_freq - coxph( Surv(time, status) ~ x + strata(sex), data=test_freq)
 summary(t_freq)
Call:
coxph(formula = Surv(time, status) ~ x + strata(sex), data =
test_freq)

  n=18 (3 observations deleted due to missingness)
  coef exp(coef) se(coef)z p
x 1.41  4.090.756 1.86 0.063

  exp(coef) exp(-coef) lower .95 upper .95
x  4.09  0.245 0.929  18.0

Rsquare= 0.185   (max possible= 0.879 )
Likelihood ratio test= 3.69  on 1 df,   p=0.0549
Wald test= 3.47  on 1 df,   p=0.0626
Score (logrank) test = 3.84  on 1 df,   p=0.0499

 cox.zph(t_freq)
  rho  chisq p
x -0.0697 0.0526 0.819

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] overlaid transparent histograms

2008-06-13 Thread Austin Frank
Hello all--

I'm attempting to produce overlaid histograms with partially transparent
columns.  Whether this display will end up being useful, I can't say.
But I do want to get it right.

I've already got one solution (shown below), but I tried some other
versions and had questions about my results.  (Note:  I'm using a quartz
device, so transparency shows up correctly.  You might need to print to
a pdf device to get transparency, according to the docs I've read)

--8---cut here---start-8---
## Working version:
data(lexdec, package=languageR)
attach(lexdec)

x - log(c(BNCw, Frequency))
label -  c(rep(BNCw, length(BNCw)),
rep(CELEX, length(Frequency)))
h - data.frame(x, label)

g - ggplot(h, aes(x=x, fill=label))
g +
  geom_bar(position=identity) +
  scale_fill_manual(values = c(
  alpha(red, 0.5),
  alpha(blue, 0.5)))
detach(lexdec)  
--8---cut here---end---8---


Three questions:
1a)  Why does the following code not produce transparent bars?
1b)  How can I manually specify the elements of the legend for this
 version of the plot?

--8---cut here---start-8---
## Non-working version
data(lexdec, package=languageR)

g - ggplot(lexdec)
g +
  geom_histogram(aes(x=log(BNCw), fill = alpha(red, .5))) +
  geom_histogram(aes(x=log(BNCc), fill = alpha(blue, .5)))
--8---cut here---end---8---

2) Does anyone have a way to accomplish the same thing in lattice?  I
   saw the post at
   
http://www.nabble.com/Overlay-plots-from-different-data-sets-using-the-Lattice-package-tp14824421p14824421.html,
   but couldn't figure out how to extend these suggestions to overlaid
   transparent histograms.

Thanks in advance for any help,
/au

 sessionInfo()
R version 2.7.0 (2008-04-22) 
powerpc-apple-darwin8.10.1 

locale:
C

attached base packages:
[1] grid  splines   stats graphics  grDevices utils datasets 
[8] methods   base 

other attached packages:
 [1] ggplot2_0.6colorspace_0.95RColorBrewer_1.0-2 MASS_7.2-42   
 [5] proto_0.3-8reshape_0.8.0  languageR_0.92 coda_0.13-2   
 [9] lme4_0.999375-15   Matrix_0.999375-10 zipfR_0.6-0lattice_0.17-8
[13] Design_2.1-1   survival_2.34-1Hmisc_3.4-3

-- 
Austin Frank
http://aufrank.net
GPG Public Key (D7398C2F): http://aufrank.net/personal.asc


pgpY8PedpKU6o.pgp
Description: PGP signature
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] False convergence in LME

2008-06-13 Thread Rebecca Sela
I tried to use LME (on a fairly large dataset, so I am not including it), and I 
got this error message:

Error in lme.formula(formula(paste(c(toString(TargetName), 
as.factor(nodeInd)),  : 
  nlminb problem, convergence error code = 1
  message = false convergence (8)

Is there any way to get more information or to get the potentially wrong 
estimates from LME?

(Also, the page in the NLMINB documentation,  
http://netlib.bell-labs.com/cm/cs/cstr/153.pdf, has errors in it, which makes 
it harder to check on what is happening.)

Thank you in advance!

Rebecca

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Subset by Factor by date

2008-06-13 Thread T.D.Rudolph

I have a dataframe, x, with over 60,000 rows that contains one Factor, id,
with 27 levels.  
The dataframe contains numerous continuous values (along column diff) per
day (column date) for every level of id.  I would like to select only one
row per animal per day, i.e. that containing the minimum value of diff,
along the full length of 1:nrow(x).  I am not yet able to conduct anything
beyond the simplest of functions and I was hoping someone could suggest an
effective way of producing this output.

e.g. given this input:

id  day diff
1  01-01-09  0.5
1  01-01-09  0.7
2  01-01-09  0.2
2  01-01-09  0.4
1  01-02-09  0.1
1  01-02-09  0.3
2  01-02-09  0.3
2  01-02-09  0.4

I would like to produce this output:
id day  diff
1  01-01-09  0.5
2  01-01-09  0.2
1  01-02-09  0.1
2  01-02-09  0.3

It doesn't seem extremely difficult but I'm sure there are easier ways than
how I am currently approaching it!
-- 
View this message in context: 
http://www.nabble.com/Subset-by-Factor-by-date-tp17835631p17835631.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Subset by Factor by date

2008-06-13 Thread Marc Schwartz

on 06/13/2008 11:10 PM T.D.Rudolph wrote:

I have a dataframe, x, with over 60,000 rows that contains one Factor, id,
with 27 levels.  
The dataframe contains numerous continuous values (along column diff) per

day (column date) for every level of id.  I would like to select only one
row per animal per day, i.e. that containing the minimum value of diff,
along the full length of 1:nrow(x).  I am not yet able to conduct anything
beyond the simplest of functions and I was hoping someone could suggest an
effective way of producing this output.

e.g. given this input:

id  day diff
1  01-01-09  0.5
1  01-01-09  0.7
2  01-01-09  0.2
2  01-01-09  0.4
1  01-02-09  0.1
1  01-02-09  0.3
2  01-02-09  0.3
2  01-02-09  0.4

I would like to produce this output:
id day  diff
1  01-01-09  0.5
2  01-01-09  0.2
1  01-02-09  0.1
2  01-02-09  0.3

It doesn't seem extremely difficult but I'm sure there are easier ways than
how I am currently approaching it!


See ?aggregate

 DF
  id  day diff
1  1 01-01-09  0.5
2  1 01-01-09  0.7
3  2 01-01-09  0.2
4  2 01-01-09  0.4
5  1 01-02-09  0.1
6  1 01-02-09  0.3
7  2 01-02-09  0.3
8  2 01-02-09  0.4


 aggregate(DF$diff, list(id = DF$id, day = DF$day), min, na.rm = TRUE)
  id  day   x
1  1 01-01-09 0.5
2  2 01-01-09 0.2
3  1 01-02-09 0.1
4  2 01-02-09 0.3


Note that I have not converted the 'day' column to a 'date' class. You 
would need to do that to perform any other date related operations 
(including chronological sorting) on that column. See ?as.Date for more 
information. For example:


  DF$day - as.Date(DF$day, format = %m-%d-%y)


HTH,

Marc Schwartz

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Looping, Control Flow Conditional Statements

2008-06-13 Thread Garth.Warren

Thanks Chuck, 'rle' was just what I needed.

G

-Original Message-
From: Charles C. Berry [mailto:[EMAIL PROTECTED] 
Sent: Saturday, 14 June 2008 02:00
To: Warren, Garth (CSE, Gungahlin)
Cc: r-help@r-project.org
Subject: Re: [R] Looping, Control Flow  Conditional Statements



See

?rle

Start with this:

 a1.runs - rle( a1 )
 a1.runs$lengths[ a1.runs$values0 ]
[1] 3 4


HTH,

Chuck

p.s.

 library(fortunes)
 fortune(106)

If the answer is parse() you should usually rethink the question.
-- Thomas Lumley
   R-help (February 2005)
--

see

?get

On Fri, 13 Jun 2008, [EMAIL PROTECTED] wrote:

 Dear R Group:



 I have little experience using R and even less experience with control
 flow type questions.



 See the following code:



 a1 = c(0, 1, 1, 1,

 0, 0, 0, 0, 0,

 0, 0, 1,

 1, 1, 1, 0, 0)



 for(i in 1:1){

sx - paste(a,i,sep=)

s - eval(parse(text = paste(a,i,sep=)))

 {g = numeric(length(s))

 k = numeric(length(s))

{for (i in 1:length(s))

{for (j in 1:length(s))

ifelse(((j=i)1),(g[j] = s[j] + s[i]),(k[j] = s[j] + s[i]))

 }}

 h1 - hist(g,freq=TRUE)

 h - h1$counts[4]

 cat(sx,:, h,\n,file = C:/temp/test-beta.txt, append=TRUE)

 }}





 The output is:

 g

 [1] 0 2 2 2 0 0 0 0 0 0 0 2 2 2 2 0 0

 k

 [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0



 h

 [1] 7



  a text file, which has:

a1 : 7



 k is a by-product of the ifelse statement and is of no interest  g
and
 h only go part-way to answering my question, which is:



 For every time an object i.e. a1 (which is actually a time series) - 0
1
 1 1 0 0 0 0 0 0 0 1 1 1 1 0 0   has as value over 0 how long do the
 values stay above 0. So in this case a1 has two goups or events where
 the value is above zero, the first event lasts for 3 'days' and the
 second event lasts for 4 'days'. I have my code telling me that there
 was a total of 7 'days' in event or above 0, but what I need to know
is
 that there were two 'events' and the 1st lasted 3 'days' and the 2nd
 lasted '4' days. Essentially I want a text file output to say:


 a1.1 : 3


 a1.2 : 4



 My thinking is that I need to somehow get the code working through
each
 vector one value at a time and when a value is found to meet the
critera
 of  0  R creates a new vector; to use the above example it would come
 to the first value 0 and then create the new vector a1.1 = (1,1,1)
then
 as the next value in the series is 0 it would close this new vector
 'a1.1'. It would then continue until it reaches the next value 0 and
 then create the vector a1.2 = (1,1,1,1) then again as the next value
in
 the series is 0 it would close this new vector, and so on.



 Then all I need to do is perform a count of '1's in these new vectors
to
 find how many days they met this criteria of being greater than 0



 I hope the above makes sense and I really hope there is someone
willing
 and able to help. I don't know how to proceed.



 Thanks,

 Garth














   [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


Charles C. Berry(858) 534-2098
 Dept of Family/Preventive
Medicine
E mailto:[EMAIL PROTECTED]  UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego
92093-0901

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] overlaid transparent histograms

2008-06-13 Thread hadley wickham
 Three questions:
 1a)  Why does the following code not produce transparent bars?

Because you're setting the fill colour (not mapping it to a variable
in your dataset), the fill needs to be outside of aes()

g +
 geom_histogram(aes(x=log(BNCw)), fill = alpha(red, .5)) +
 geom_histogram(aes(x=log(BNCc)), fill = alpha(blue, .5))


 1b)  How can I manually specify the elements of the legend for this
 version of the plot?

Use the manual scale:

g +
geom_histogram(aes(x=log(BNCw), fill = w)) +
geom_histogram(aes(x=log(BNCc), fill = c)) +
scale_fill_manual(BNC type, values = alpha(c(red,blue), 0.5))

Hadley

-- 
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Subset by Factor by date

2008-06-13 Thread T.D.Rudolph

aggregate() is indeed a useful function in this case, but it only returns the
columns by which it was grouped.  Is there a way I can use this while
simultaneously retaining all the other column values in the dataframe? 

e.g. add superfluous (yet pertinent for later) column containing any
information at all and retain it in the final output


Marc Schwartz wrote:
 
 on 06/13/2008 11:10 PM T.D.Rudolph wrote:
 I have a dataframe, x, with over 60,000 rows that contains one Factor,
 id,
 with 27 levels.  
 The dataframe contains numerous continuous values (along column diff)
 per
 day (column date) for every level of id.  I would like to select only
 one
 row per animal per day, i.e. that containing the minimum value of diff,
 along the full length of 1:nrow(x).  I am not yet able to conduct
 anything
 beyond the simplest of functions and I was hoping someone could suggest
 an
 effective way of producing this output.
 
 e.g. given this input:
 
 id  day diff
 1  01-01-09  0.5
 1  01-01-09  0.7
 2  01-01-09  0.2
 2  01-01-09  0.4
 1  01-02-09  0.1
 1  01-02-09  0.3
 2  01-02-09  0.3
 2  01-02-09  0.4
 
 I would like to produce this output:
 id day  diff
 1  01-01-09  0.5
 2  01-01-09  0.2
 1  01-02-09  0.1
 2  01-02-09  0.3
 
 It doesn't seem extremely difficult but I'm sure there are easier ways
 than
 how I am currently approaching it!
 
 See ?aggregate
 
   DF
id  day diff
 1  1 01-01-09  0.5
 2  1 01-01-09  0.7
 3  2 01-01-09  0.2
 4  2 01-01-09  0.4
 5  1 01-02-09  0.1
 6  1 01-02-09  0.3
 7  2 01-02-09  0.3
 8  2 01-02-09  0.4
 
 
   aggregate(DF$diff, list(id = DF$id, day = DF$day), min, na.rm = TRUE)
id  day   x
 1  1 01-01-09 0.5
 2  2 01-01-09 0.2
 3  1 01-02-09 0.1
 4  2 01-02-09 0.3
 
 
 Note that I have not converted the 'day' column to a 'date' class. You 
 would need to do that to perform any other date related operations 
 (including chronological sorting) on that column. See ?as.Date for more 
 information. For example:
 
DF$day - as.Date(DF$day, format = %m-%d-%y)
 
 
 HTH,
 
 Marc Schwartz
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 

-- 
View this message in context: 
http://www.nabble.com/Subset-by-Factor-by-date-tp17835631p17836046.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] strsplit, keeping delimiters

2008-06-13 Thread hadley wickham
Hi all,

Does anyone have a version of strsplit that keeps the string that is
split by.  e.g. from
x - A: 123 B: 456 C: 678

I'd like to get

c(A:, 123 , B: , 456 , C: , 678)

but
strsplit(x, [A-Z]+:)

gives me
c(,  123 ,  456 ,  678)

Any ideas?

Thanks,

Hadley

-- 
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] rbind() problem

2008-06-13 Thread Tobias Verbeke

Hi array (?),


Hi, I would like to rbind 2 data frames. They both some common column names, 
but also some unique column names each, is there any simple function that rbind 
these 2 data frames with filling NAs for those columns of unique names?


You can use the reshape package by Hadley Wickham for this:


df1 - data.frame(V1 = rnorm(10),
  V2 = rnorm(10),
  V4 = rnorm(10))
df2 - data.frame(V1 = rnorm(10),
  V3 = rnorm(10),
  V4 = rnorm(10))
library(reshape)
rbind.fill(df1, df2)

HTH,
Tobias

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Subset by Factor by date

2008-06-13 Thread Charilaos Skiadas


On Jun 14, 2008, at 1:25 AM, T.D.Rudolph wrote:



aggregate() is indeed a useful function in this case, but it only  
returns the

columns by which it was grouped.  Is there a way I can use this while
simultaneously retaining all the other column values in the dataframe?

e.g. add superfluous (yet pertinent for later) column containing any
information at all and retain it in the final output


I had exactly this kind of need many times, and I have finally  
created a function for it, which I hope to include soon in an  
upcoming package. Here is a run of it (I added an extra A column  
containing just the numbers 1:8):


 DF
  id  day diff A
1  1 01-01-09  0.5 1
2  1 01-01-09  0.7 2
3  2 01-01-09  0.2 3
4  2 01-01-09  0.4 4
5  1 01-02-09  0.1 5
6  1 01-02-09  0.3 6
7  2 01-02-09  0.3 7
8  2 01-02-09  0.4 8
 byDataFrame(DF, list(id, day), function(x) x[which.min(x$diff),])
  diff A id  day
1  0.5 1  1 01-01-09
2  0.2 3  2 01-01-09
3  0.1 5  1 01-02-09
4  0.3 7  2 01-02-09

Would that do what you want?

I've appended the function byDataFrame, and its prerequisite, a  
function parseIndexList. I'm not quite set on the names yet, but  
anyway. Hope this helps. I haven't really tested it on large sets, it  
might perform poorly. Any suggestions on speeding the code /  
corrections are welcome.


Haris Skiadas
Department of Mathematics and Computer Science
Hanover College



parseIndexList - function(indexList) {
  # browser()
  if (!is.list(indexList))
indexList - as.list(indexList)
  nI - length(indexList)
  namelist - vector(list, nI)
  names(namelist) - names(indexList)
  extent - integer(nI)
  nx - length(indexList[[1]])
  one - as.integer(1)
  group - rep.int(one, nx)
  ngroup - one
  for (i in seq.int(indexList)) {
  index - as.factor(indexList[[i]])
  if (length(index) != nx)
  stop(arguments must have same length)
  namelist[[i]] - sort(unique(indexList[[i]]))
  extent[i] - length(namelist[[i]])
  group - group + ngroup * (as.integer(index) - one)
  ngroup - ngroup * nlevels(index)
  }
  nms - do.call(expand.grid, namelist)
  ind - unique(sort(group))
  res - data.frame(index=ind, nms[ind, , drop=FALSE])
  return(list(cases=group, groups=res))
}

byDataFrame - function (data, INDEX, FUN, newnames,  
omit.index.cols=TRUE, ...) {

# # Part of the code shamelessly stolen from tapply
  IND - eval(substitute(INDEX), data)
  nms - as.character(as.list(substitute(INDEX)))
  if (!is.list(IND)) {
IND - list(IND)
names(IND) - nms
  } else {
names(IND) - nms[-1]
  }
  funname - paste(as.character(substitute(FUN)), collapse=.)
  indexInfo - parseIndexList(IND)
  FUNx - if (omit.index.cols) {
omit.cols - match(names(indexInfo$groups)[-1], names(data))
function(x, ...) FUN(data[x, -omit.cols], ...)
  } else {
function(x, ...) FUN(data[x, ], ...)
  }
  ans - lapply(split(1:nrow(data), indexInfo$cases), FUNx, ...)
  index - as.numeric(names(ans))
  if (!is.data.frame(ans[[1]])) {
ans - lapply(ans, function(x) {
  dframe - as.data.frame(t(x))
  if (is.null(names(x)))
names(dframe) - funname
  dframe
})
  }
  lengths - sapply(ans, nrow)
  ans - do.call(rbind, ans)
  if (!missing(newnames))
names(ans) - newnames
  nms - indexInfo$groups[rep(index, lengths),-1, drop=FALSE]
  res - cbind(ans, nms)
  res
}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] strsplit, keeping delimiters

2008-06-13 Thread Gabor Grothendieck
Try this:

 library(gsubfn)
 x - A: 123 B: 456 C: 678
 strapply(x, [^ :]+[ :]|[^ :]+$)
[[1]]
[1] A:   123  B:   456  C:   678

and check out the gsubfn home page at:

http://gsubfn.googlecode.com


On Sat, Jun 14, 2008 at 1:35 AM, hadley wickham [EMAIL PROTECTED] wrote:
 Hi all,

 Does anyone have a version of strsplit that keeps the string that is
 split by.  e.g. from
 x - A: 123 B: 456 C: 678

 I'd like to get

 c(A:, 123 , B: , 456 , C: , 678)

 but
 strsplit(x, [A-Z]+:)

 gives me
 c(,  123 ,  456 ,  678)

 Any ideas?

 Thanks,

 Hadley

 --
 http://had.co.nz/

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.