date:20090916

Re: [R] Questions on pareto

2009-09-16 Thread TsubasaZero


Hi,

Thanks a lot. I think this is what I want.

actuar package get more distributions.

Zero~


David Winsemius wrote:
 
 
 On Sep 15, 2009, at 1:09 PM, TsubasaZero wrote:
 

 Hi all,

 I had generated 1000 random variates (u,v), and I would like to find  
 the
 corresponding (x,y) for a bivariate pareto distribution. Which  
 x=inverse
 pareto of u and y=inverse pareto of v.

 What is the code I should use to find (x,y).
 
 Perhaps:
 
 ??Pareto
 
 On my machine it offers a choice of two packages (actuar and VGAM)  
 that offer
 Pareto functions. But ?? only searches installed packages, so this  
 would be
 more general:
 
   library(sos)
   ???Pareto
 retrieving page 1:
   found 187 matches; retrieving 10 pages
 2 3 4 5 6 7 8 9 10
 
 Or the old fashioned way with r-search... hint: put it on your browser  
 toolbar:
 
 http://search.r-project.org/cgi-bin/namazu.cgi?query=paretomax=100result=normalsort=scoreidxname=functionsidxname=Rhelp08idxname=views
 
 --
 David Winsemius, MD
 Heritage Laboratories
 West Hartford, CT
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 

-- 
View this message in context: 
http://www.nabble.com/Questions-on-pareto-tp25457966p25464918.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Rounding to the nearest 5

2009-09-16 Thread Chris Li


Hi all,

I want to round my values to the nearest 5.

For example, if I have a value of 11, I want R to round it to 10.

I have been searching for this command on the web and in the help file, but
I couldn't find anything useful.

Any help would be greatly appreciated.

Many thanks,
Chris
-- 
View this message in context: 
http://www.nabble.com/Rounding-to-the-nearest-5-tp25463367p25463367.html
Sent from the R help mailing list archive at Nabble.com.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] best method to format output of frequency table

2009-09-16 Thread esawdust


I have some security alert log data that I'm parsing and doing some stats on. 
One of the fields is the Classtype which is the enumerated value of the
type of alert found.

classtypes = factor( alerts$Classtype )
fclass_types = table( classtypes )

fclass_types gives me a frequency table of the intrusion types:

fclass_types
classtypes
  attempted-admin  attempted-recon 
  18   93   35 
  attempted-usermisc-activity  misc-attack 
  21   30   12 
 protocol-command-decodeunsuccessful-user web-application-activity 
   22  287 


I would like to be able to somehow output this table in the form:

attempted-admin  93
attempted-recon   35
attempted-user21
misc-activity   30

and so on.   Is there a function I'm missing or some option that will let me
be able to dump the fclass_types out in a two-column format?

Thanks for any tips,

Landon
-- 
View this message in context: 
http://www.nabble.com/best-method-to-format-output-of-frequency-table-tp25462448p25462448.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] cluster-lite

2009-09-16 Thread Henrik Bengtsson

The R.batch package does most of this.  It's been several years, but I
used it run batch jobs on multiple (30-40) machines with different
OS:es but sharing the same file system.  No communication needed
between hosts.  It worked like a charm and I could process 3-4 CPU
years in a few weeks for some bootstrap simulations.  It got exception
handling and much more.  See also r-help thread 'R.batch (Was: Re: [R]
Calling R from R and specifying wait until script is finished)' on
May 22, 2005:

  https://www.stat.math.ethz.ch/pipermail/r-help/2005-May/071981.html

The installation is no longer as documented there, but instead you can
grab it from R-forge:

  http://r-forge.r-project.org/R/?group_id=428

This very moment, the r-forge server seems to be down, so you can also
install a copy via:

source(http://www.braju.com/R/hbLite.R;);
hbLite(R.batch);

Then try this:

library(R.batch);
example(JobBatch);

See the FileProgressBar class in the R.utils package for reporting
progress via the file size (0-100 bytes), which allows you to use ls
-l (or ftp remotely) to check the progress of your jobs.

Feel free to do whatever you want with it.

/Henrik

On Tue, Sep 15, 2009 at 5:01 PM, ivo welch ivo_we...@brown.edu wrote:
 I am about to write a cluster-lite R solution for myself.  I wanted to
 know whether it already exists.  If not, I will probably write up how I do
 this, and I will make the code available.

 Background: we have various linux and OSX systems, which are networked, but
 not set up as a cluster.  I have no one here to set up a cluster, so I need
 a hack that facilitates parallel programming on standard networked
 machines.   I have accounts on all the machines, ssh access (of course
 password-less), and networked file directory access.

 what I am ultimately trying to accomplish is built around a simple
 function, that my master program would invoke:

 master.R:
   multisystem( c(R slv.R 1 20 file1.out, R slv.R 21 40 file2.out, ssh
 anotherhost R slv.R 41 80 file3.out), announce=300)

 multisystem() should submit all jobs simultaneously and continue only after
 all are completed.  it should also tell me every 300 seconds what jobs it is
 still waiting for, and which have completed.

 with basically no logic in the cluster, my master and slv programs have to
 make up for it.  master.R must have the smarts to know where it can spawn
 jobs and how big each job should be.  slv.R must have the smarts to place
 its outputs into the marked files on the networked file directory.  master.R
 needs the smarts to combine the outputs of all jobs, and to resubmit jobs
 that did not complete successfully.  again, the main reason for doing all of
 this is to avoid setting up a cluster across OSX and linux system, and still
 to make parallel processing across linux/osx as easy as possible.  I don't
 think it gets much simpler than this.

 now, I know how to write the multisystem() in perl, but not in R.  so, if I
 roll it myself, I will probably rely on a mixed R/perl system here.  This is
 not desirable, but it is the only way I know how to do this.  if something
 like multisystem() already exists in R native, please let me know and save
 me from reinventing the wheel.  if it does not, some perl/R combo for this
 soon will.

 regards,

 /iaw


 --
 Ivo Welch (ivo.we...@brown.edu, ivo.we...@gmail.com)

        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] How to calculate min of a dataset that contains empty column

2009-09-16 Thread Chris Li


Hi all,

I have got a dataset like the following:

a  b  c
1  5
2
3  7

I am taking the minimum of each column and use it as the minimum of the
y-axis of my graphs. My scripts (simplified version) are like the following:

f-array
f[1]=a
f[2]=b
f[3]=c
for i in 1:3
name=f[i]
ymin-min(dataset$f[i])
plot(x,y,ylim=c(ymin,100))

The script stops at b, because the min function returns inf value. What can
I do to overcome it?

Many thanks,
Chris
-- 
View this message in context: 
http://www.nabble.com/How-to-calculate-min-of-a-dataset-that-contains-empty-column-tp25463449p25463449.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] VAR analysis

2009-09-16 Thread MD. ABDULLAH AL MAMUN Mamun




  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Rounding to the nearest 5

2009-09-16 Thread Bill.Venables

x5 -  5*round(x/5)

From: r-help-boun...@r-project.org [r-help-boun...@r-project.org] On Behalf Of 
Chris Li [chri...@austwaterenv.com.au]
Sent: 16 September 2009 09:15
To: r-help@r-project.org
Subject: [R]  Rounding to the nearest 5

Hi all,

I want to round my values to the nearest 5.

For example, if I have a value of 11, I want R to round it to 10.

I have been searching for this command on the web and in the help file, but
I couldn't find anything useful.

Any help would be greatly appreciated.

Many thanks,
Chris
--
View this message in context: 
http://www.nabble.com/Rounding-to-the-nearest-5-tp25463367p25463367.html
Sent from the R help mailing list archive at Nabble.com.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to calculate min of a dataset that contains empty column

2009-09-16 Thread Tal Galili

Let's say your data.frame is called xx

Then try:
apply(xx,2, function(x) {min(na.omit(x))}  )

Does it work ?





On Wed, Sep 16, 2009 at 2:24 AM, Chris Li chri...@austwaterenv.com.auwrote:


 Hi all,

 I have got a dataset like the following:

 a  b  c
 1  5
 2
 3  7

 I am taking the minimum of each column and use it as the minimum of the
 y-axis of my graphs. My scripts (simplified version) are like the
 following:

 f-array
 f[1]=a
 f[2]=b
 f[3]=c
 for i in 1:3
 name=f[i]
 ymin-min(dataset$f[i])
 plot(x,y,ylim=c(ymin,100))

 The script stops at b, because the min function returns inf value. What can
 I do to overcome it?

 Many thanks,
 Chris
 --
 View this message in context:
 http://www.nabble.com/How-to-calculate-min-of-a-dataset-that-contains-empty-column-tp25463449p25463449.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
--


My contact information:
Tal Galili
Phone number: 972-50-3373767
FaceBook: Tal Galili
My Blogs:
http://www.r-statistics.com/
http://www.talgalili.com
http://www.biostatistics.co.il

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] how to load only lines that start with a particular symbol

2009-09-16 Thread Patrick Connolly

On Tue, 15-Sep-2009 at 02:44PM -0700, William Dunlap wrote:


| perl can do more complicated processing and filtering than grep.

I once used perl to filter the useful 4 or 5 Mb out of a file that was
over 250Mb.  It took about 3 lines of perl code and about 40 seconds
to run.  Perl's not exactly my strong point, but I could look it up
next time I'm on that machine if anyone is interested.




-- 
~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.   
   ___Patrick Connolly   
 {~._.~}   Great minds discuss ideas
 _( Y )_ Average minds discuss events 
(:_~*~_:)  Small minds discuss people  
 (_)-(_)  . Eleanor Roosevelt
  
~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Registration

2009-09-16 Thread ogbos okike

Hi,
I have a question to post. But I have a problem finding the appropriate
forum. Some of the FAQ on pastecs are answered by
r-h...@stat.math.ethz.chmailing list. I searched to see if this
mailing list requires registration
or not. There was no information about that. I am a registered member of R-
help general mailing list but I am aware it is better to post my question to
a specific group in some cases.
If your mailing list requires registration, please help me with the web page
where I could do that.
Please also advise if your forum attends to questions on use of pastecs
package or if there is a better forum for that.
Thank you.
Ogbos

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] how can we check file empty or not

2009-09-16 Thread deepak m r

Hi All,
  I need to check whether the file is empty or not, Please help me
in this regards.
Best regards,
Deepak.M.R

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] how can we check file empty or not

2009-09-16 Thread Paul Hiemstra


deepak m r wrote:

Hi All,
  I need to check whether the file is empty or not, Please help me
in this regards.
Best regards,
Deepak.M.R

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
  

Hi,

file.info(your_file)$size == 0

cheers,
Paul

--
Drs. Paul Hiemstra
Department of Physical Geography
Faculty of Geosciences
University of Utrecht
Heidelberglaan 2
P.O. Box 80.115
3508 TC Utrecht
Phone:  +3130 274 3113 Mon-Tue
Phone:  +3130 253 5773 Wed-Fri
http://intamap.geo.uu.nl/~paul

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Example data for analysis of a highly pseudoreplicate mixed-effects experiment

2009-09-16 Thread Matthias Gralle


Hello everybody,

it may be better to have sample data. I have provided data with less 
levels of gene and day and only ca. 400 data points per condition.


Sample code:
small=as.data.frame(read.csv(small.csv))
small$species=factor(small$species)
small$gene=factor(small$gene)
small$day=factor(small$day)
small$repl=factor(small$repl)
library(lme4)
model1=lmer(APC~(FITC+species+gene)^3+(1|day)+(1|repl),REML=F,data=small)
model2=lmer(APC~(FITC+species+gene)^2+(1|day)+(1|repl),REML=F,data=small)
anova(model1,model2)

gives me a significant difference, but in fact there are far too many 
residual df, and this is much worse in the original data. I suppose I 
cannot trust this difference.


model3=lmer(APC~species*gene+(1|day/repl/FITC),data=small)
Error: length(f1) == length(f2) is not TRUE
In addition: Warning messages:
1: In FITC:(repl:day) :
 numerical expression has 886 elements: only the first used
2: In FITC:(repl:day) :
 numerical expression has 886 elements: only the first used

Can I do nesting without incurring this kind of error ?

And finally
model4=lmer(APC~gene*species+(1|day) + (1|repl) + 
(1+(gene:species)|FITC),data=small)

model5=lmer(APC~gene+species+(1|day) + (1|repl) + (1|FITC),data=small)
anova(model4,model5)

works with this reduced data set, but fails to converge for the 
original, much larger data set. I am unsure if this is the right kind of 
analysis for the data and there is only a problem of convergence, or if 
it is the wrong formula.


Can anybody give me some advice on this problem ? Or should I just stick 
to reducing the data from each condition to a single parameter, as 
explained in my first email below?


Thank you again!

Matthias Gralle wrote:

Hello everybody,

I have been trying for some weeks to state the correct design of my 
experiment as a GLM formula, and have not been able to find something 
appropriate in Pinheiro  Bates, so I am posting it here and hope 
somebody can help me.


In each experimental condition, described by
1) gene (10 levels, fixed, because of high interest to me)
2) species (2 levels, fixed, because of high interest)
3) day (2 levels, random)
4) replicate (2 levels per day, random),

I have several thousand data points consisting of two variables:

5) FITC (level of transfection of a cell)
6) APC (antibody binding to the cell)
Because of intrinsic and uncontrollable cell-to-cell variation, FITC 
varies quite uniformly over a wide range, and APC correlates rather 
well with FITC. In some cases, I pasted day and replicate together as 
day_repl.


My question is the following:

Is there any gene (in my set of 10 genes) where the species makes a 
difference in the relation between FITC and APC ? If yes, in what gene 
does species have an effect ? And what is the effect of the species 
difference ?


My attempts are the following:
1. Fit the data points of each experimental condition to a linear 
equation APC=Intercept+Slope*FITC and analyse the slopes :

lm(Slope~species*gene*day_repl)
This analysis shows clear differences between the genes, but no effect 
of species and no interaction gene:species.


The linear fit to the cells is reasonably good, but of course does not 
represent the data set completely, so I wanted to incorporate the 
complete data set.


2a. lmer(APC~FITC*species*gene+(1|day)+(1|repl))
This gives extremely significant values for any interaction and 
variable because there are 200 000 df. Of course, it cannot be true, 
because the cells are not really independent. I have done many 
variations of the above, e.g.

2b. lmer(APC~FITC*species*gene+(1|day)+(1+FITC|day_repl)),
but they all suffer from the excess of df.

3. lmer(APC~species*gene+(1|day/repl/FITC) gives several warning 
messages like this one:

In repl:day :
numerical expression has 275591 elements: only the first used

4. lmer(APC~gene*species+(1|day_repl) + (1+gene:species|FITC)) ran 
several days, but failed to converge...


Can somebody give me any hint, or do you think the only possible 
analysis is a simplification as in my model 1 ?


By the way, I am using R version 2.8.0 (2008-10-20) on Ubuntu 8.04 on 
a linux 2.6.24-24-generic kernel on different Intel systems. I am 
using the lme4 that came with R 2.8.0.


Thank you very much for your time!

-- Matthias Gralle, PhD
Dept. Evolutionary Genetics
Max Planck Institute for Evolutionary Anthropology
Deutscher Platz 6
04103 Leipzig, Germany
Tel +49 341 3550 519
Fax +49 341 3550 555

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.






--
Matthias Gralle, PhD
Dept. Evolutionary Genetics
Max Planck Institute for Evolutionary Anthropology
Deutscher Platz 6
04103 Leipzig, Germany
Tel +49 341 3550 519
Fax +49 341 3550 555

__
R-help@r-project.org mailing list

[R] Error with JGR

2009-09-16 Thread srpd TCLTK


Hi,

 

I'm trying to run the JGR package (in windows). I have followed all the steps 
in the web page http://www.rosuda.org/JGR but when run jgr.exe it returns the 
error message: Cannot find R.exe - please make sure R version 2.3.0 or higher 
is installed. I have in my computer R version 2.9.0 so I dont know wath is the 
problem... 

 

Thanks in advance for your help,

 

Srpd

 
_

m só local.

rk-connector.aspx
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to calculate min of a dataset that contains empty column

2009-09-16 Thread timholland


Presumably there is no reason to attempt plotting an empty column, so best
approach is probably to remove the columns that contain no values.  

Try
 f.temp-f[,-which(apply(f,2,function(x)all(is.na(x]

and then run your script for f.temp instead of the original f.  

Also, you may find you have to use
 min(x, na.rm=T) 
instead of just
 min(x)
for any other columns that contain NA values as well as real values.  





Chris Li wrote:
 
 Hi all,
 
 I have got a dataset like the following:
 
 a  b  c
 1  5
 2
 3  7
 
 I am taking the minimum of each column and use it as the minimum of the
 y-axis of my graphs. My scripts (simplified version) are like the
 following:
 
 f-array
 f[1]=a
 f[2]=b
 f[3]=c
 for i in 1:3
 name=f[i]
 ymin-min(dataset$f[i])
 plot(x,y,ylim=c(ymin,100))
 
 The script stops at b, because the min function returns inf value. What
 can I do to overcome it?
 
 Many thanks,
 Chris
 

-- 
View this message in context: 
http://www.nabble.com/How-to-calculate-min-of-a-dataset-that-contains-empty-column-tp25463449p25466916.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] extracting one element from all list elements

2009-09-16 Thread Rainer M Krug

Hi

I have a list which cosists out of dataframes of the same structure. Now I
want to extract one column (let's say column A) from all dataframes and
have them in a matrix (or dataframe).

At the moment I am doiong:

d - data.frame(A=runif(100), B=rnorm(100))
theList - list(e1=d, e2=d, e3=d, e4=d)
f - sapply(theList, function(l){l$A} )

But I am sure ther is a more elegant way?

Rainer

-- 
Rainer M. Krug, Centre of Excellence for Invasion Biology, Stellenbosch
University, South Africa

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] extracting one element from all list elements

2009-09-16 Thread Henrique Dallazuanna

Try this:

 sapply(theList, '[[', 'A')

On Wed, Sep 16, 2009 at 8:34 AM, Rainer M Krug r.m.k...@gmail.com wrote:
 Hi

 I have a list which cosists out of dataframes of the same structure. Now I
 want to extract one column (let's say column A) from all dataframes and
 have them in a matrix (or dataframe).

 At the moment I am doiong:

 d - data.frame(A=runif(100), B=rnorm(100))
 theList - list(e1=d, e2=d, e3=d, e4=d)
 f - sapply(theList, function(l){l$A} )

 But I am sure ther is a more elegant way?

 Rainer

 --
 Rainer M. Krug, Centre of Excellence for Invasion Biology, Stellenbosch
 University, South Africa

        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Henrique Dallazuanna
Curitiba-Paraná-Brasil
25° 25' 40 S 49° 16' 22 O

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] comma as decimal separator in xtable

2009-09-16 Thread Jakson A. Aquino

On Wed, Sep 16, 2009 at 07:11:46AM +0200, Schalk Heunis wrote:
 This might be of help, first applies the formatting:print(xtable(prettyNum(d,
 decimal.mark=,)))

Thanks! Your solution works for the example that I gave.
However, I failed to provide an example that really represent my
problem because I'm passing a lm object to xtable.

Currently, I'm using the following function, which also puts the
table header in bold font:

tabprint - function(x, ...) {
  colsani - function(x){paste({\\bf , x, }, sep = )}
  p - capture.output(print(x, caption.placement = top,
sanitize.colnames.function = colsani, ...))
  writeLines(p, /tmp/xtableOutPut)
  system(sed -i -e 's/\\([0-9]\\)\\.\\([0-9]\\)/\\1,\\2/g' /tmp/xtableOutPut)
  p - readLines(/tmp/xtableOutPut)
  cat(p, sep = \n)
}

tabprint(xtable(lm.model))

I could call gsub() if I knew the perl regular expression
equivalent to the sed one that I'm using.


 On Tue, Sep 15, 2009 at 5:06 PM, Jakson A. Aquino 
 jaksonaqu...@gmail.comwrote:
 
  Hello,
 
  How can I make xtable print a comma as decimal separator? Setting
  the option OutDec isn't enough for xtable:
 
  library(xtable)
  options(OutDec = ,)
 
  x - c(1.1, 1.2, 1.3)
  y - c(2.3, 2.2, 2.1)
  d - data.frame(x, y)
 
  d
  print(xtable(d))
 
 
  Thanks!
 
  Jakson Aquino
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] error: could not find function sd

2009-09-16 Thread Ben Bolker




sselamat wrote:
 
 Hi, 
 Everytime I try to get the standard deviation of a vector and enter
 command of sd(x), i get a reply
 Error: could not find function sd
 Can someone help me please?
 Thanks. 
 
 

Hmmm.  Please provide more information, it's hard to imagine how this could
happen.
Results of sessionInfo() ???

 Here's the only way I could get this to happen:

 find(sd)
[1] package:stats
 detach(package:stats)
 sd(1)
Error: could not find function sd



-- 
View this message in context: 
http://www.nabble.com/error%3A-could-not-find-function-%22sd%22-tp25462193p25470840.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] comma as decimal separator in xtable

2009-09-16 Thread David Hajage

Perhaps with the dcolumn package ?
http://newton.ex.ac.uk/research/qsystems/people/latham/LaTeX/dcolumn.pdf

David

2009/9/16 Jakson A. Aquino jaksonaqu...@gmail.com

 On Wed, Sep 16, 2009 at 07:11:46AM +0200, Schalk Heunis wrote:
  This might be of help, first applies the
 formatting:print(xtable(prettyNum(d,
  decimal.mark=,)))

 Thanks! Your solution works for the example that I gave.
 However, I failed to provide an example that really represent my
 problem because I'm passing a lm object to xtable.

 Currently, I'm using the following function, which also puts the
 table header in bold font:

 tabprint - function(x, ...) {
  colsani - function(x){paste({\\bf , x, }, sep = )}
  p - capture.output(print(x, caption.placement = top,
sanitize.colnames.function = colsani, ...))
  writeLines(p, /tmp/xtableOutPut)
  system(sed -i -e 's/\\([0-9]\\)\\.\\([0-9]\\)/\\1,\\2/g'
 /tmp/xtableOutPut)
  p - readLines(/tmp/xtableOutPut)
  cat(p, sep = \n)
 }

 tabprint(xtable(lm.model))

 I could call gsub() if I knew the perl regular expression
 equivalent to the sed one that I'm using.


  On Tue, Sep 15, 2009 at 5:06 PM, Jakson A. Aquino 
 jaksonaqu...@gmail.comwrote:
 
   Hello,
  
   How can I make xtable print a comma as decimal separator? Setting
   the option OutDec isn't enough for xtable:
  
   library(xtable)
   options(OutDec = ,)
  
   x - c(1.1, 1.2, 1.3)
   y - c(2.3, 2.2, 2.1)
   d - data.frame(x, y)
  
   d
   print(xtable(d))
  
  
   Thanks!
  
   Jakson Aquino
  
   __
   R-help@r-project.org mailing list
   https://stat.ethz.ch/mailman/listinfo/r-help
   PLEASE do read the posting guide
   http://www.R-project.org/posting-guide.html
   and provide commented, minimal, self-contained, reproducible code.
  
 
[[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] installation problem

2009-09-16 Thread Michael Bibo

cls59 chuck at sharpsteen.net writes:

 
 
 wesley mathew wrote:
  
  Hello All
  
  I have some problem for installing  XML_2.6-0.tar . I am working in widows
  and R version is  R-2.9.1
  
  
 
 Unfortunately, I think there are some problems with CRAN being able to build
 the XML package for Windows, at least the page:
 
 http://cran.r-project.org/web/packages/XML/index.html
 
 Lists it as unavailable. However, I know I have installed it on a Windows
 machine at the university using install.packages()-- R may be set up to pull
 from some additional repository there. I'll check the next time I'm in the
 lab.
 
 -Charlie

As Charlie notes, the CRAN page lists the Windows version as unavailable, but
links to a README file which details:

The packages
  BRugs, ncdf, RCurl, RDieHarder, RNetCDF, udunits, and XML
do not build out of the box or are special in other circumstances.
Nevertheless these are available at
  http://www.stats.ox.ac.uk/pub/RWin/bin/windows/contrib/2.9/
kindly provided by Professor Brian D. Ripley.

Thus you can download the Windows zip file from there, and install it using the
install package(s) from local zip files... menu option.

Michael Bibo
Queensland Health

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] comma as decimal separator in xtable

2009-09-16 Thread Henrique Dallazuanna

Try this also;

format(coef(summary(lm.D9)), decimal.mark = ',')

or using gsub:

 apply(coef(summary(lm.D9)), 2, gsub, pattern = \\., replacement = ,)

using lm.D9 object from ?lm example.

On Wed, Sep 16, 2009 at 8:50 AM, Jakson A. Aquino
jaksonaqu...@gmail.com wrote:
 On Wed, Sep 16, 2009 at 07:11:46AM +0200, Schalk Heunis wrote:
 This might be of help, first applies the formatting:print(xtable(prettyNum(d,
 decimal.mark=,)))

 Thanks! Your solution works for the example that I gave.
 However, I failed to provide an example that really represent my
 problem because I'm passing a lm object to xtable.

 Currently, I'm using the following function, which also puts the
 table header in bold font:

 tabprint - function(x, ...) {
  colsani - function(x){paste({\\bf , x, }, sep = )}
  p - capture.output(print(x, caption.placement = top,
        sanitize.colnames.function = colsani, ...))
  writeLines(p, /tmp/xtableOutPut)
  system(sed -i -e 's/\\([0-9]\\)\\.\\([0-9]\\)/\\1,\\2/g' /tmp/xtableOutPut)
  p - readLines(/tmp/xtableOutPut)
  cat(p, sep = \n)
 }

 tabprint(xtable(lm.model))

 I could call gsub() if I knew the perl regular expression
 equivalent to the sed one that I'm using.


 On Tue, Sep 15, 2009 at 5:06 PM, Jakson A. Aquino 
 jaksonaqu...@gmail.comwrote:

  Hello,
 
  How can I make xtable print a comma as decimal separator? Setting
  the option OutDec isn't enough for xtable:
 
  library(xtable)
  options(OutDec = ,)
 
  x - c(1.1, 1.2, 1.3)
  y - c(2.3, 2.2, 2.1)
  d - data.frame(x, y)
 
  d
  print(xtable(d))
 
 
  Thanks!
 
  Jakson Aquino
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 

       [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Henrique Dallazuanna
Curitiba-Paraná-Brasil
25° 25' 40 S 49° 16' 22 O

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] extracting one element from all list elements

2009-09-16 Thread Rainer M Krug

On Wed, Sep 16, 2009 at 1:35 PM, Henrique Dallazuanna www...@gmail.comwrote:

 Try this:

  sapply(theList, '[[', 'A')


Thanks Henrique - that is much more elegant.

Rainer



 On Wed, Sep 16, 2009 at 8:34 AM, Rainer M Krug r.m.k...@gmail.com wrote:
  Hi
 
  I have a list which cosists out of dataframes of the same structure. Now
 I
  want to extract one column (let's say column A) from all dataframes and
  have them in a matrix (or dataframe).
 
  At the moment I am doiong:
 
  d - data.frame(A=runif(100), B=rnorm(100))
  theList - list(e1=d, e2=d, e3=d, e4=d)
  f - sapply(theList, function(l){l$A} )
 
  But I am sure ther is a more elegant way?
 
  Rainer
 
  --
  Rainer M. Krug, Centre of Excellence for Invasion Biology, Stellenbosch
  University, South Africa
 
 [[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 



 --
 Henrique Dallazuanna
 Curitiba-Paraná-Brasil
 25° 25' 40 S 49° 16' 22 O




-- 
Rainer M. Krug, Centre of Excellence for Invasion Biology, Stellenbosch
University, South Africa

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] best method to format output of frequency table

2009-09-16 Thread Henrique Dallazuanna

Try this:

 as.matrix(table(x))

On Tue, Sep 15, 2009 at 6:50 PM, esawdust lan...@360vl.com wrote:

 I have some security alert log data that I'm parsing and doing some stats on.
 One of the fields is the Classtype which is the enumerated value of the
 type of alert found.

 classtypes = factor( alerts$Classtype )
 fclass_types = table( classtypes )

 fclass_types gives me a frequency table of the intrusion types:

 fclass_types
 classtypes
                                  attempted-admin          attempted-recon
                      18                       93                       35
          attempted-user            misc-activity              misc-attack
                      21                       30                       12
  protocol-command-decode        unsuccessful-user web-application-activity
                       2                        2                      287


 I would like to be able to somehow output this table in the form:

 attempted-admin  93
 attempted-recon   35
 attempted-user    21
 misc-activity       30

 and so on.   Is there a function I'm missing or some option that will let me
 be able to dump the fclass_types out in a two-column format?

 Thanks for any tips,

 Landon
 --
 View this message in context: 
 http://www.nabble.com/best-method-to-format-output-of-frequency-table-tp25462448p25462448.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Henrique Dallazuanna
Curitiba-Paraná-Brasil
25° 25' 40 S 49° 16' 22 O

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] How can I replicate Excel's 'growth trend' series interpolation?

2009-09-16 Thread Jorgy Porgee

Good day all, I'm trying to replicate Excel's growth trend series
interpolation, available by highlighting a number of cells and then
under Edit--Fill--Series--Type=growth and select 'Trend'.
For example, if my starting point is 1.4 and end point is 30 with the
sequence length being 11, the resulting series generated by the above
is as follows:
1.4
1.902073774
2.584203316
3.510960968
4.770076272
6.480740698
8.804890658
11.96253686
16.25259117
22.08116245
30

Any idea how this can be replicated in R would be much appreciated.

Thanks in advance,

George.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Error...

2009-09-16 Thread Douglas Bates

On Tue, Sep 15, 2009 at 9:05 AM, Inez Campbell i...@st-andrews.ac.uk wrote:
 Hi,
 I am trying to find in FAQ, in the manual, everywhere, the reason and most
 importantly, how to solve:
 Error: protect(): protection stack overflow
 It appeared after requesting a plot.

That error message comes from deep inside the compiled code for the R
interpreter.  It may indicate that calls are nested too deeply or an
infinite loop or ...  We can't really tell without a reproducible
example.

(When an R object is created inside compiled code it must be protected
from garbage collection and this is done by putting its address onto a
structure called the protection stack.  When the object goes out of
scope it is unprotected.  Like most such structures the protection
stack has a finite capacity.)

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How can I replicate Excel's 'growth trend' series interpolation?

2009-09-16 Thread Richard M. Heiberger


 exp(seq(log(1.4), log(30), length=11))
 [1]  1.40  1.902074  2.584203  3.510961  4.770076  6.480741 
8.804891 11.962537 16.252591 22.081162

[11] 30.00


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How can I replicate Excel's 'growth trend' series interpolation?

2009-09-16 Thread Jorgy Porgee

Wow :). Thanks a ton.

On Wed, Sep 16, 2009 at 2:31 PM, Richard M. Heiberger r...@temple.edu wrote:
 exp(seq(log(1.4), log(30), length=11))
  [1]  1.40  1.902074  2.584203  3.510961  4.770076  6.480741 8.804891
 11.962537 16.252591 22.081162
 [11] 30.00



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] coefficients of aov results has less number of elements?

2009-09-16 Thread Douglas Bates

On Tue, Sep 15, 2009 at 10:34 AM, Peng Yu pengyu...@gmail.com wrote:
 Hi,

 I run the following commands. 'A' has 3 levels and 'B' has 4 levels.
 Should there be totally 3+4 = 7 coefficients (A1, A2, A3, B1, B2, B3,
 B4)?

If you were to define a model matrix for those coefficients it would
be rank deficient.  (the sum of the first three columns is 1 and the
sum of the last four columns is 1)  You only have 6 degrees of freedom
in total, not 7.  Because of the -1 in the formula the first factor is
expanded to the indicator columns but the second is represented with a
set of 3 contrasts which are the contr.treatment version, by
default.

 a=3
 b=4
 n=1000
 A = rep(sapply(1:a,function(x){rep(x,n)}),b)
 B = as.vector(sapply(sapply(1:b, function(x){rep(x,n)}), 
 function(x){rep(x,a)}))
 Y = A + B + rnorm(a*b*n)

 fr = data.frame(Y=Y,A=as.factor(A),B=as.factor(B))
 afit=aov(Y ~ A + B - 1,fr)
 afit$coefficients
      A1       A2       A3       B2       B3       B4
 1.993067 2.969252 3.965888 1.005445 2.020717 3.023940

 Regards,
 Peng

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Questions on pareto

2009-09-16 Thread David Winsemius

The other package that I can think of that you might want to  
investigate if you are attempting to construct bivariate distributions  
is the copula package.


--
David.
On Sep 15, 2009, at 10:27 PM, TsubasaZero wrote:



Hi,

Thanks a lot. I think this is what I want.

actuar package get more distributions.

Zero~


David Winsemius wrote:



On Sep 15, 2009, at 1:09 PM, TsubasaZero wrote:



Hi all,

I had generated 1000 random variates (u,v), and I would like to find
the
corresponding (x,y) for a bivariate pareto distribution. Which
x=inverse
pareto of u and y=inverse pareto of v.

What is the code I should use to find (x,y).


Perhaps:

??Pareto

On my machine it offers a choice of two packages (actuar and VGAM)
that offer
Pareto functions. But ?? only searches installed packages, so this
would be
more general:


library(sos)
???Pareto

retrieving page 1:
 found 187 matches; retrieving 10 pages
2 3 4 5 6 7 8 9 10

Or the old fashioned way with r-search... hint: put it on your  
browser

toolbar:

http://search.r-project.org/cgi-bin/namazu.cgi?query=paretomax=100result=normalsort=scoreidxname=functionsidxname=Rhelp08idxname=views

--
David Winsemius, MD
Heritage Laboratories
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




--
View this message in context: 
http://www.nabble.com/Questions-on-pareto-tp25457966p25464918.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
Heritage Laboratories
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] error: could not find function sd

2009-09-16 Thread Uwe Ligges




Ben Bolker wrote:



sselamat wrote:
Hi, 
Everytime I try to get the standard deviation of a vector and enter

command of sd(x), i get a reply
Error: could not find function sd
Can someone help me please?
Thanks. 





Hmmm.  Please provide more information, it's hard to imagine how this could
happen.
Results of sessionInfo() ???

 Here's the only way I could get this to happen:


find(sd)

[1] package:stats

detach(package:stats)
sd(1)

Error: could not find function sd



Or if you start R by specifying that stats is not among the default 
packages.


Anyway, I guess it is a broken installation and re-installing will fix it.

Uwe Ligges

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] installation problem

2009-09-16 Thread Uwe Ligges




Michael Bibo wrote:

cls59 chuck at sharpsteen.net writes:



wesley mathew wrote:

Hello All

I have some problem for installing  XML_2.6-0.tar . I am working in widows
and R version is  R-2.9.1



Unfortunately, I think there are some problems with CRAN being able to build
the XML package for Windows, at least the page:

http://cran.r-project.org/web/packages/XML/index.html

Lists it as unavailable. However, I know I have installed it on a Windows
machine at the university using install.packages()-- R may be set up to pull
from some additional repository there. I'll check the next time I'm in the
lab.

-Charlie


As Charlie notes, the CRAN page lists the Windows version as unavailable, but
links to a README file which details:

The packages
  BRugs, ncdf, RCurl, RDieHarder, RNetCDF, udunits, and XML
do not build out of the box or are special in other circumstances.
Nevertheless these are available at
  http://www.stats.ox.ac.uk/pub/RWin/bin/windows/contrib/2.9/
kindly provided by Professor Brian D. Ripley.

Thus you can download the Windows zip file from there, and install it using the
install package(s) from local zip files... menu option.




or just say

install.packages(XML)

as that CRAN extras repository is already a default under Windows.

Best,
Uwe Ligges






Michael Bibo
Queensland Health

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] comma as decimal separator in xtable

2009-09-16 Thread Jakson A. Aquino

On Wed, Sep 16, 2009 at 01:59:29PM +0200, David Hajage wrote:
 Perhaps with the dcolumn package ?
 http://newton.ex.ac.uk/research/qsystems/people/latham/LaTeX/dcolumn.pdf

Thanks for the suggestion, but the problem isn't of alignment.
When I have commas as decimal separators, the values are
aligned because all values of each column were formated with the
same number of digits.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Registration

2009-09-16 Thread Uwe Ligges




ogbos okike wrote:

Hi,
I have a question to post. But I have a problem finding the appropriate
forum. Some of the FAQ on pastecs are answered by
r-h...@stat.math.ethz.chmailing list. I searched to see if this
mailing list requires registration
or not. There was no information about that. I am a registered member of R-
help general mailing list


... which is the same as you found out now. So time to as the question.

Uwe Ligges


 but I am aware it is better to post my question to

a specific group in some cases.
If your mailing list requires registration, please help me with the web page
where I could do that.
Please also advise if your forum attends to questions on use of pastecs
package or if there is a better forum for that.
Thank you.
Ogbos

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] cycles in a graphical model

2009-09-16 Thread shuva gupta

Hi,
Is there is any R package or existing codes in R to detect cycles in a 
graphical model or DAG  (Directed Acyclic Graph) ?
Thanks.



  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] comma as decimal separator in xtable

2009-09-16 Thread David Hajage

But, it seems that dcolumn can change the decimal separator too (see the
table on the first page of the pdf document).

For example:

\documentclass{article}

\usepackage{dcolumn}

\newcolumntype{d}[1]{D{.}{,}{#1}}

\begin{document}

results=tex=
x - matrix(rnorm(4), 2, 2)
library(xtable)
xtable(x, align = c(l, |, d{2}, |, c, |))
@

\end{document}

2009/9/16 Jakson A. Aquino jaksonaqu...@gmail.com

 On Wed, Sep 16, 2009 at 01:59:29PM +0200, David Hajage wrote:
  Perhaps with the dcolumn package ?
  http://newton.ex.ac.uk/research/qsystems/people/latham/LaTeX/dcolumn.pdf

 Thanks for the suggestion, but the problem isn't of alignment.
 When I have commas as decimal separators, the values are
 aligned because all values of each column were formated with the
 same number of digits.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] A matrix problem

2009-09-16 Thread Bogaso


Hi all,

I have following ch. matrix :

 mat
 [,1][,2]  [,3][,4] 

[1,] NA0.0671746073115122  1.15281464953731 
0.822277316923348  
[2,] 0.113184828073156 -0.0133150789005112 0.640912657072027
-0.0667317787225847
[3,] -1.40593584523871 1.107555494147580.493828059815449
-1.09233516877992  
[4,] 0.577643850085066 1.102795250710261.16725625310315 
0.367724768195794  
[5,] 0.746100264271392 -0.335556133578362  NA   
0.328559028366446  

Now I want to convert it to a numeric matrix. So I used following code :

 as.numeric(mat)
 [1]  NA  0.11318483 -1.40593585  0.57764385  0.74610026  0.06717461
-0.01331508  1.10755549  1.10279525 -0.33555613
[11]  1.15281465  0.64091266  0.49382806  1.16725625  NA  0.82227732
-0.06673178 -1.09233517  0.36772477  0.32855903
Warning message:
NAs introduced by coercion 

What I noticed is that :
1. Original matrix converted to vector
2. A waring message is there.

I do not want such things to happen. Is there any direct way to convert my
original ch. matrix to a numeric matrix ?
Thanks
-- 
View this message in context: 
http://www.nabble.com/A-matrix-problem-tp25472583p25472583.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] cycles in a graphical model

2009-09-16 Thread Chuck Cleland

On 9/16/2009 9:26 AM, shuva gupta wrote:
 Hi,
 Is there is any R package or existing codes in R to detect cycles in a 
 graphical model or DAG  (Directed Acyclic Graph) ?
 Thanks.

  You might try this to search R packages:

library(sos)
findFn('Directed Acyclic Graph')

   [[alternative HTML version deleted]] 
 
  
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

-- 
Chuck Cleland, Ph.D.
NDRI, Inc. (www.ndri.org)
71 West 23rd Street, 8th floor
New York, NY 10010
tel: (212) 845-4495 (Tu, Th)
tel: (732) 512-0171 (M, W, F)
fax: (917) 438-0894

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] How to extract a specific substring from a string (regular expressions) ? See details inside

2009-09-16 Thread Giulio Di Giovanni


 

Hi all,

I have thousands of strings like these ones:

 

1159_1; YP_177963; PPE FAMILY PROTEIN

1100_13; SECRETED L-ALANINE DEHYDROGENASE ALD CAA15575

1141_24; gi;2894249;emb;CAA17111.1; PROBABLE ISOCITRATE DEHYDROGENASE

 

and various others..

 

I'm interested to extract the code for the protein (in this example: YP_177963, 
CAA15575, CAA17111).

I found only one common criterion to identify the protein codes in ALL my 
strings:

I need a sequence of characters selected in this way:

 

start:

the first alphabetic capital letter followed after three characters by a digit

 

end: 

the last following digit before a non-digit character, or nothing.

 

Tricky, isn't it?

Well, I'm not an expert, and I played a lot with regular expressions and sub() 
command with no big results. Also with substring.location in Hmisc package (but 
here I don't know how to use regular expressions). 

Maybe there are other more useful functions  or maybe is just a matter to use 
regular expression in a better way...

 

Can anybody help me?

 

Thanks a lot in advance...


_
Racconta la tua estate, crea il tuo blog.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] A matrix problem

2009-09-16 Thread David Winsemius



On Sep 16, 2009, at 9:40 AM, Bogaso wrote:



Hi all,

I have following ch. matrix :


mat

[,1][,2]  [,3][,4]
[1,] NA0.0671746073115122  1.15281464953731
0.822277316923348
[2,] 0.113184828073156 -0.0133150789005112 0.640912657072027
-0.0667317787225847
[3,] -1.40593584523871 1.107555494147580.493828059815449
-1.09233516877992
[4,] 0.577643850085066 1.102795250710261.16725625310315
0.367724768195794
[5,] 0.746100264271392 -0.335556133578362  NA
0.328559028366446

Now I want to convert it to a numeric matrix. So I used following  
code :



as.numeric(mat)
[1]  NA  0.11318483 -1.40593585  0.57764385  0.74610026   
0.06717461

-0.01331508  1.10755549  1.10279525 -0.33555613
[11]  1.15281465  0.64091266  0.49382806  1.16725625  NA   
0.82227732

-0.06673178 -1.09233517  0.36772477  0.32855903
Warning message:
NAs introduced by coercion

What I noticed is that :
1. Original matrix converted to vector


I'm not sure if this is a recommended method but it is compact:

 M - matrix(as.character(c(NA,NA, rnorm(14))), ncol=4)
 M
 [,1][,2] [,3] [,4]
[1,] NA-0.601520920256251 -1.00727470025995   
-0.510937067339167
[2,] NA  -0.643289410284616 -0.560094940433377  
1.65539295869595
[3,] -1.58527121117001 -0.303929580683863 0.656380566243141   
0.863929178838393
[4,] 0.466350502132621 1.85702073433996   0.349130873645143   
-0.821650081436582


I put in both a real NA and a character NA just to be sure.

 mode(M) - numeric
Warning message:
In eval(expr, envir, enclos) : NAs introduced by coercion
 M
[,1][,2][,3][,4]
[1,]  NA -0.6015 -1.0073 -0.5109
[2,]  NA -0.6433 -0.5601  1.6554
[3,] -1.5853 -0.3039  0.6564  0.8639
[4,]  0.4664  1.8570  0.3491 -0.8217


2. A waring message is there.


You can suppress warnings:

options(warn = -1)  # and then restore the warning level to the  
default of 0


I do not want such things to happen. Is there any direct way to  
convert my

original ch. matrix to a numeric matrix ?
Thanks
--



David Winsemius, MD
Heritage Laboratories
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] list of symbols to substitution

2009-09-16 Thread Michael Spiegel

Ah whoops (and thank you).  I forgot to mention an additional
constraint: I'm trying to use exactly one eval statement.  Some
background information on this constraint.  Some of the expressions
that are being evaluated may return an error.  At the moment, I'm
wrapping each of the individual expressions inside a try-catch block
and an eval statement.  I was hoping to find a solution that uses a
single try-catch block and a single eval statement.  Today I realize
that a partial solution is to use a single try-catch block and
multiple eval statements, as has been suggested.  Here is the code
snippet, slightly updated:

symnames - c('foo', 'bar', 'baz')
foo - expression(1 + 2)
bar - expression(3 + 4)
baz - expression(abs('c'))
expr - substitute(expr, list(expr = lapply(symnames, as.symbol)))
eval(expr)

Anybody know of a solution that uses a single eval expression?

--Michael

On Wed, Sep 16, 2009 at 1:31 AM, Schalk Heunis
schalk.heu...@enerweb.co.za wrote:
 Instead of eval(expr), use lapply(expr,eval)
 HTH
 Schalk Heunis


 On Wed, Sep 16, 2009 at 5:50 AM, Michael Spiegel
 michael.m.spie...@gmail.com wrote:

 Hi,

 I'm trying to use a list of symbols as one of the values to be
 substituted in a substitute expression, but I can't figure out how to
 write the correct expression for my problem.  Let me illustrate a
 simple example of what I'm trying to do.  The following code snippet
 will evaluate to '5':

 symname - 'foo'
 foo - 5
 expr - substitute(c(expr), list(expr = as.symbol(symname)))
 eval(expr)

 I would like the next similar example to evaluate to the list('5',
 '6', '7'), but instead it evaluates to the list(foo, bar, baz) where
 the type of foo, bar, and baz are all symbols.  For the purposes of
 finding a solution assume that the length and contents of symnames are
 unknown ahead of time (symname is not a constant, assume it is some
 parameter to a function).

 symnames - c('foo', 'bar', 'baz')
 foo - 5
 bar - 6
 baz - 7
 expr - substitute(expr, list(expr = lapply(symnames, as.symbol)))
 eval(expr)

 Thanks!
 --Michael

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] a sequence that wraps around

2009-09-16 Thread Jack Tanner

I'd like to have something like seq() where I can pass in a length of the
desired sequence and a right limit so that the sequence goes up to the limit and
then starts again from 1.

# works now
seq(from=2, length.out=3)
[1] 2 3 4

# what I want
seq(from=2, length.out=3, rlimit=3)
[1] 2 3 1

# additional examples of what I want
seq(from=2, length.out=4, rlimit=3)
[1] 2 3 1 2
seq(from=2, length.out=4, rlimit=4)
[1] 2 3 4 1
seq(from=2, length.out=3, rlimit=2)
[1] 2 1 2

I can write this procedurally, but it seems like there ought to be a cleaner R
way of doing it. Thanks in advance for any suggestions.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Is there a way to round numbers up or down to the nearest natural looking number?

2009-09-16 Thread Jorgy Porgee

Hi all,

Given the following series of numbers:

 t
 [1]  21.85000  30.90410  43.71000  61.82234  87.43999 123.67296
 [7] 174.91997 247.40249 349.91996 494.91815 700.0

What's the simplest way of formatting them into the nearest natural
looking (rounded to nearest multiple of 10) numbers?

So:

t[1] becomes 20
t[2] becomes 30
t[3] becomes 40
t[4] becomes 60
t[5] becomes 90
t[6] becomes 120, etc ?

Thus far I've tried signif(t,2) but that's not working great..

Any help would be much appreciated,

Thanks in advance,

George.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] A matrix problem

2009-09-16 Thread Ted Harding

On 16-Sep-09 13:40:03, Bogaso wrote:
 Hi all,
 I have following ch. matrix :
 
 mat
  [,1][,2]  [,3][,4]
 [1,] NA0.0671746073115122  1.15281464953731 
 0.822277316923348  
 [2,] 0.113184828073156 -0.0133150789005112 0.640912657072027
 -0.0667317787225847
 [3,] -1.40593584523871 1.107555494147580.493828059815449
 -1.09233516877992  
 [4,] 0.577643850085066 1.102795250710261.16725625310315 
 0.367724768195794  
 [5,] 0.746100264271392 -0.335556133578362  NA   
 0.328559028366446  
 
 Now I want to convert it to a numeric matrix. So
 I used following code:
 
 as.numeric(mat)
  [1]  NA  0.11318483 -1.40593585  0.57764385  0.74610026 
 0.06717461
 -0.01331508  1.10755549  1.10279525 -0.33555613
 [11]  1.15281465  0.64091266  0.49382806  1.16725625  NA 
 0.82227732
 -0.06673178 -1.09233517  0.36772477  0.32855903
 Warning message:
 NAs introduced by coercion 
 
 What I noticed is that :
 1. Original matrix converted to vector
 2. A waring message is there.
 
 I do not want such things to happen. Is there any direct way to
 convert my original ch. matrix to a numeric matrix ?
 Thanks

matn-as.numeric(mat); dim(matn)-dim(mat)

This does the job (without requiring you to use a specific indication
of dimension, e.g. ncol=4), but the warning message will still be
there unless, as David Winsemius said, you also use options(warn = -1).
However, a warning message such as this does not mean that anything has
gone wrong -- it is simply a notification that something which was not
the character representation of a number was converted into NA.
So you can ignore it. (It would lead to exactly the same result if,
instead of NA in mat[1,1] and mat[5,3], you had some other string
such as tiger -- try it! You would still get the warning, and this
time there might even be some point to it ... ).

Ted.


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 16-Sep-09   Time: 15:13:39
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to extract a specific substring from a string (regular expressions) ? See details inside

2009-09-16 Thread Henrique Dallazuanna

Try this:

library(gsubfn)
strapply(x, [A-Z]{3}[0-9]+)

On Wed, Sep 16, 2009 at 10:53 AM, Giulio Di Giovanni
perimessagg...@hotmail.com wrote:



 Hi all,

 I have thousands of strings like these ones:



 1159_1; YP_177963; PPE FAMILY PROTEIN

 1100_13; SECRETED L-ALANINE DEHYDROGENASE ALD CAA15575

 1141_24; gi;2894249;emb;CAA17111.1; PROBABLE ISOCITRATE DEHYDROGENASE



 and various others..



 I'm interested to extract the code for the protein (in this example: 
 YP_177963, CAA15575, CAA17111).

 I found only one common criterion to identify the protein codes in ALL my 
 strings:

 I need a sequence of characters selected in this way:



 start:

 the first alphabetic capital letter followed after three characters by a digit



 end:

 the last following digit before a non-digit character, or nothing.



 Tricky, isn't it?

 Well, I'm not an expert, and I played a lot with regular expressions and 
 sub() command with no big results. Also with substring.location in Hmisc 
 package (but here I don't know how to use regular expressions).

 Maybe there are other more useful functions  or maybe is just a matter to use 
 regular expression in a better way...



 Can anybody help me?



 Thanks a lot in advance...


 _
 Racconta la tua estate, crea il tuo blog.

        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Henrique Dallazuanna
Curitiba-Paraná-Brasil
25° 25' 40 S 49° 16' 22 O

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Is there a way to round numbers up or down to the nearest natural looking number?

2009-09-16 Thread Uwe Ligges


round(t, -1)

Uwe Ligges


Jorgy Porgee wrote:

Hi all,

Given the following series of numbers:


t

 [1]  21.85000  30.90410  43.71000  61.82234  87.43999 123.67296
 [7] 174.91997 247.40249 349.91996 494.91815 700.0

What's the simplest way of formatting them into the nearest natural
looking (rounded to nearest multiple of 10) numbers?

So:

t[1] becomes 20
t[2] becomes 30
t[3] becomes 40
t[4] becomes 60
t[5] becomes 90
t[6] becomes 120, etc ?

Thus far I've tried signif(t,2) but that's not working great..

Any help would be much appreciated,

Thanks in advance,

George.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to extract a specific substring from a string (regular expressions) ? See details inside

2009-09-16 Thread jim holtman

This should do it for you:

 pat - .*(\\b[A-Z]..[0-9]+).*
 grep(pat, x)
[1] 1 3 5
 sub(pat, '\\1', x)
[1] YP_177963   CAA15575CAA17111



On Wed, Sep 16, 2009 at 9:53 AM, Giulio Di Giovanni
perimessagg...@hotmail.com wrote:



 Hi all,

 I have thousands of strings like these ones:



 1159_1; YP_177963; PPE FAMILY PROTEIN

 1100_13; SECRETED L-ALANINE DEHYDROGENASE ALD CAA15575

 1141_24; gi;2894249;emb;CAA17111.1; PROBABLE ISOCITRATE DEHYDROGENASE



 and various others..



 I'm interested to extract the code for the protein (in this example: 
 YP_177963, CAA15575, CAA17111).

 I found only one common criterion to identify the protein codes in ALL my 
 strings:

 I need a sequence of characters selected in this way:



 start:

 the first alphabetic capital letter followed after three characters by a digit



 end:

 the last following digit before a non-digit character, or nothing.



 Tricky, isn't it?

 Well, I'm not an expert, and I played a lot with regular expressions and 
 sub() command with no big results. Also with substring.location in Hmisc 
 package (but here I don't know how to use regular expressions).

 Maybe there are other more useful functions  or maybe is just a matter to use 
 regular expression in a better way...



 Can anybody help me?



 Thanks a lot in advance...


 _
 Racconta la tua estate, crea il tuo blog.

        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Memory in R on windows running 64bit XP

2009-09-16 Thread michael watson (IAH-C)

Hi
I'm running 32-bit R on Windows XP 64bit and the machine has 16Gb of RAM. The 
help for memory.limit states:

If 32-bit R is run on some 64-bit versions of Windows the maximum value of 
obtainable memory is just under 4GB.

So, using the help which states the size parameter can go up to 4095:

memory.limit(size=4095)

When I run mclust I get:

Error: cannot allocate vector of size 1.3 Gb

So, now I set the max-mem-size flag:

--max-mem-size=3500M

Repeat and I still get the same error:

Error: cannot allocate vector of size 1.3 Gb

So, can anyone help?  32 bit R is reported to be able to use up to 4Gb of RAM 
on 64 bit windows, yet mine croaks at 1.3Gb.

 sessionInfo()
R version 2.9.2 (2009-08-24)
i386-pc-mingw32
locale:
LC_COLLATE=English_United Kingdom.1252;LC_CTYPE=English_United 
Kingdom.1252;LC_MONETARY=English_United 
Kingdom.1252;LC_NUMERIC=C;LC_TIME=English_United Kingdom.1252
attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base
other attached packages:
[1] mclust_3.3.1

Thanks
Mick

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] a sequence that wraps around

2009-09-16 Thread Henrique Dallazuanna

Try rep:

rep(2:4, lenght.out = 3, times = 10)

On Wed, Sep 16, 2009 at 11:08 AM, Jack Tanner i...@hotmail.com wrote:
 I'd like to have something like seq() where I can pass in a length of the
 desired sequence and a right limit so that the sequence goes up to the limit 
 and
 then starts again from 1.

 # works now
 seq(from=2, length.out=3)
 [1] 2 3 4

 # what I want
 seq(from=2, length.out=3, rlimit=3)
 [1] 2 3 1

 # additional examples of what I want
 seq(from=2, length.out=4, rlimit=3)
 [1] 2 3 1 2
 seq(from=2, length.out=4, rlimit=4)
 [1] 2 3 4 1
 seq(from=2, length.out=3, rlimit=2)
 [1] 2 1 2

 I can write this procedurally, but it seems like there ought to be a cleaner R
 way of doing it. Thanks in advance for any suggestions.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Henrique Dallazuanna
Curitiba-Paraná-Brasil
25° 25' 40 S 49° 16' 22 O

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to extract a specific substring from a string (regular expressions) ? See details inside

2009-09-16 Thread David Winsemius

I'm was guessing that the .1 was a part of the protein code for  
third example and looking at:
http://www.plosone.org/article/fetchSingleRepresentation.action?uri=info:doi/10.1371/journal.pone.0003840.s002 

I see quite a few protein codes of that form. I am a complete  
ignoramus with regex strings but I am guessing that the OP will need a  
. added to the termination pattern. Experimentation shows that  
simply adding a period after the 9 works for this example:


pat - .*(\\b[A-Z]..[0-9.]+).*

--
David

On Sep 16, 2009, at 10:15 AM, jim holtman wrote:


This should do it for you:


pat - .*(\\b[A-Z]..[0-9]+).*
grep(pat, x)

[1] 1 3 5

sub(pat, '\\1', x)

[1] YP_177963   CAA15575CAA17111





On Wed, Sep 16, 2009 at 9:53 AM, Giulio Di Giovanni
perimessagg...@hotmail.com wrote:




Hi all,

I have thousands of strings like these ones:



1159_1; YP_177963; PPE FAMILY PROTEIN

1100_13; SECRETED L-ALANINE DEHYDROGENASE ALD CAA15575

1141_24; gi;2894249;emb;CAA17111.1; PROBABLE ISOCITRATE  
DEHYDROGENASE




and various others..



I'm interested to extract the code for the protein (in this  
example: YP_177963, CAA15575, CAA17111).


I found only one common criterion to identify the protein codes in  
ALL my strings:


I need a sequence of characters selected in this way:



start:

the first alphabetic capital letter followed after three characters  
by a digit




end:

the last following digit before a non-digit character, or nothing.



Tricky, isn't it?

Well, I'm not an expert, and I played a lot with regular  
expressions and sub() command with no big results. Also with  
substring.location in Hmisc package (but here I don't know how to  
use regular expressions).


Maybe there are other more useful functions  or maybe is just a  
matter to use regular expression in a better way...




Can anybody help me?



Thanks a lot in advance...


_
Racconta la tua estate, crea il tuo blog.

   [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.





--
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
Heritage Laboratories
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Memory in R on windows running 64bit XP

2009-09-16 Thread Duncan Murdoch


On 9/16/2009 10:16 AM, michael watson (IAH-C) wrote:

Hi
I'm running 32-bit R on Windows XP 64bit and the machine has 16Gb of RAM. The 
help for memory.limit states:

If 32-bit R is run on some 64-bit versions of Windows the maximum value of 
obtainable memory is just under 4GB.

So, using the help which states the size parameter can go up to 4095:

memory.limit(size=4095)

When I run mclust I get:

Error: cannot allocate vector of size 1.3 Gb

So, now I set the max-mem-size flag:

--max-mem-size=3500M

Repeat and I still get the same error:

Error: cannot allocate vector of size 1.3 Gb

So, can anyone help?  32 bit R is reported to be able to use up to 4Gb of RAM 
on 64 bit windows, yet mine croaks at 1.3Gb.


You are misreading things in a couple of places.  The most important one 
is the error message: it says allocation of a 1.3 Gb object failed.  You 
may have 4 Gb available in total, but have used so much of it already 
that Windows won't allocate a 1.3 Gb piece to R.  Using Rprofmem() might 
show how the allocations are happening.


Another possibility is that your particular version of Windows doesn't 
support giving 4 Gb to R.  Run memory.size(NA) to find out the limit, 
memory.size() to find out the current amount in use.


And even if the limit is close to 4 Gb and you have more than 1.3 Gb 
free, the allocation may fail because there is no sufficiently large 
contiguous block available.  Once memory is allocated, that address is 
in use and is unavailable.  Since the 32 bit address space only covers 4 
Gb, you can fairly easily end up with allocations that are spread out 
across it leaving no large blocks available.  That's the advantage of 
switching to 64 bit R:  even if you still only had 4 Gb of memory 
available, the address space is so much bigger that fragmentation 
probably won't be a problem.


Duncan Murdoch




sessionInfo()

R version 2.9.2 (2009-08-24)
i386-pc-mingw32
locale:
LC_COLLATE=English_United Kingdom.1252;LC_CTYPE=English_United 
Kingdom.1252;LC_MONETARY=English_United 
Kingdom.1252;LC_NUMERIC=C;LC_TIME=English_United Kingdom.1252
attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base
other attached packages:
[1] mclust_3.3.1

Thanks
Mick

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to extract a specific substring from a string (regular expressions) ? See details inside

2009-09-16 Thread Gabor Grothendieck

Assuming the rule is an upper case alphabetic character followed by
two other characters followed by a string of digits then try this:

 library(gsubfn)
 strapply(x, [A-Z][^ ][^ ][0-9]+)
[[1]]
[1] YP_177963

[[2]]
[1] CAA15575

[[3]]
[1] CAA17111

If you prefer the output as one long vector of strings try this:

 strapply(x, [A-Z][^ ][^ ][0-9]+, simplify = c)
[1] YP_177963 CAA15575  CAA17111

If the string that denotes a protein can be part of a word which
itself does not denote a protein then we will need something like
this:

 strapply(x, \\b[A-Z][^ ][^ ][0-9]+\\b, perl = TRUE)
[[1]]
[1] YP_177963

[[2]]
[1] CAA15575

[[3]]
[1] CAA17111

however, I would expect this second solution using perl's \b to be
much slower because the first one uses tcl code underneath whereas the
second uses R code.

See http://gsubfn.googlecode.com for more.

On Wed, Sep 16, 2009 at 9:53 AM, Giulio Di Giovanni
perimessagg...@hotmail.com wrote:



 Hi all,

 I have thousands of strings like these ones:



 1159_1; YP_177963; PPE FAMILY PROTEIN

 1100_13; SECRETED L-ALANINE DEHYDROGENASE ALD CAA15575

 1141_24; gi;2894249;emb;CAA17111.1; PROBABLE ISOCITRATE DEHYDROGENASE



 and various others..



 I'm interested to extract the code for the protein (in this example: 
 YP_177963, CAA15575, CAA17111).

 I found only one common criterion to identify the protein codes in ALL my 
 strings:

 I need a sequence of characters selected in this way:



 start:

 the first alphabetic capital letter followed after three characters by a digit



 end:

 the last following digit before a non-digit character, or nothing.



 Tricky, isn't it?

 Well, I'm not an expert, and I played a lot with regular expressions and 
 sub() command with no big results. Also with substring.location in Hmisc 
 package (but here I don't know how to use regular expressions).

 Maybe there are other more useful functions  or maybe is just a matter to use 
 regular expression in a better way...



 Can anybody help me?



 Thanks a lot in advance...


 _
 Racconta la tua estate, crea il tuo blog.

        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to extract a specific substring from a string (regular expressions) ? See details inside

2009-09-16 Thread David Winsemius


That did not work on all three:

 strapply(x, [A-Z]{3}[0-9]+)
[[1]]
NULL

[[2]]
[1] CAA15575

[[3]]
[1] CAA17111

But adding a _ to the initiation pattern and a period to the  
termination pattern makes it complete:


 library(gsubfn)
 strapply(x, [A-Z_]{3}[0-9.]+)
[[1]]
[1] YP_177963

[[2]]
[1] CAA15575

[[3]]
[1] CAA17111.1

Maybe between the two of you and Jim Holtman, I can eventually learn  
how to use regular expressions.


--
David.



On Sep 16, 2009, at 10:14 AM, Henrique Dallazuanna wrote:


Try this:

library(gsubfn)
strapply(x, [A-Z]{3}[0-9]+)

On Wed, Sep 16, 2009 at 10:53 AM, Giulio Di Giovanni
perimessagg...@hotmail.com wrote:




Hi all,

I have thousands of strings like these ones:



1159_1; YP_177963; PPE FAMILY PROTEIN

1100_13; SECRETED L-ALANINE DEHYDROGENASE ALD CAA15575

1141_24; gi;2894249;emb;CAA17111.1; PROBABLE ISOCITRATE  
DEHYDROGENASE




and various others..



I'm interested to extract the code for the protein (in this  
example: YP_177963, CAA15575, CAA17111).


I found only one common criterion to identify the protein codes in  
ALL my strings:


I need a sequence of characters selected in this way:



start:

the first alphabetic capital letter followed after three characters  
by a digit




end:

the last following digit before a non-digit character, or nothing.



Tricky, isn't it?

Well, I'm not an expert, and I played a lot with regular  
expressions and sub() command with no big results. Also with  
substring.location in Hmisc package (but here I don't know how to  
use regular expressions).


Maybe there are other more useful functions  or maybe is just a  
matter to use regular expression in a better way...




Can anybody help me?



David Winsemius, MD
Heritage Laboratories
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] wilcox.test p-value = 0

2009-09-16 Thread Bryan Keller

That's right, if the test is exact it is not possible to get a p-value of zero. 
 wilcox.test does not provide an exact p-value in the presence of ties so if 
there are any ties in your data you are getting a normal approximation.  
Incidentally, if there are any ties in your data set I would strongly recommend 
computing the *exact* p-value because using the normal approximation on tied 
data sets will either inflate type I error rate or reduce power depending on 
how the ties are distributed.  Depending on the pattern of ties this can result 
in gross under or over estimation of the p-value.

I guess this is all by way of saying that you should always compute the exact 
p-value if possible.

The package exactRankTests uses the algorithm by Mehta Patel and Tsiatis 
(1984).  If your sample sizes are larger, there is a freely available .exe by 
Cheung and Klotz (1995) that will do exact p-values for sample sizes larger 
than 100 in each group!

You can find it at http://pages.cs.wisc.edu/~klotz/

Bryan

 Hi Murat,
 I am not an expert in either statistics nor R, but I can imagine that since 
 the 
 default is exact=TRUE, It numerically computes the probability, and it may 
 indeed be 0. if you use wilcox.test(x, y, exact=FALSE) it will give you a 
 normal aproximation, which will most likely be different from zero.

No, the exact p-value can't be zero for a discrete distribution. The smallest 
possible value in this case would, I think, be 
1/choose(length(x)+length(y),length(x)), or perhaps twice that.

More generally, the approach used by format.pvalue() is to display very small 
p-values as 2e-16, where 2e-16 is machine epsilon.  I wouldn't want to claim 
optimality for this choice, but it seems a reasonable way to represent very 
small.

 -thomas


 Hope this helps.
 Keo.

 Murat Tasan escribi?:
 hi, folks,
 
 how have you gone about reporting a p-value from a test when the
 returned value from a test (in this case a rank-sum test) is
 numerically equal to 0 according to the machine?
 
 the next lowest value greater than zero that is distinct from zero on
 the machine is likely algorithm-dependent (the algorithm of the test
 itself), but without knowing the explicit steps of the algorithm
 implementation, it is difficult to provide any non-zero value.  i
 initially thought to look at .mach...@double.xmin, but i'm not
 comfortable with reporting p  .mach...@double.xmin, since without
 knowing the specifics of the implementation, this may not be true!
 
 to be clear, if i have data x, and i run the following line, the
 returned value is TRUE.
 
 wilcox.test(x)$p.value == 0
 
 thanks for any help on this!
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


Thomas Lumley   Assoc. Professor, Biostatistics
tlum...@u.washington.eduUniversity of Washington, Seattle



--

-
Bryan Keller, Doctoral Student/Project Assistant
Educational Psychology - Quantitative Methods
The University of Wisconsin - Madison

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Memory in R on windows running 64bit XP

2009-09-16 Thread michael watson (IAH-C)

Hi Duncan

Many thanks for the reply.  Here are the results of the commands:

 Rprofmem() 
Error in Rprofmem() : memory profiling is not available on this system
 memory.size(NA)
[1] 3500
 memory.size() 
[1] 16.19


Clearly Rprofmem() is a dead end, though if there is some way of enabling it, I 
will be happy to.

I am only actually using 16.19Mb of memory, as all I do is open R and read in 
some data (a 2.7Mb text file).

I am not sure how to guard against the memory fragmentation problem.  The 
machine has 16Gb of RAM and I am only running R and Internet Explorer, but I 
don't know if that helps.  Is there a way to deliver contiguous memory to R?

Is 64bit R available for Windows?  Sorry to bang on about Windows, I'm waiting 
for my IT department to build Linux machines, and this is what I have in the 
meantime.

Thanks
Mick


From: Duncan Murdoch [murd...@stats.uwo.ca]
Sent: 16 September 2009 15:43
To: michael watson (IAH-C)
Cc: r-help@r-project.org
Subject: Re: [R] Memory in R on windows running 64bit XP

On 9/16/2009 10:16 AM, michael watson (IAH-C) wrote:
 Hi
 I'm running 32-bit R on Windows XP 64bit and the machine has 16Gb of RAM. The 
 help for memory.limit states:

 If 32-bit R is run on some 64-bit versions of Windows the maximum value of 
 obtainable memory is just under 4GB.

 So, using the help which states the size parameter can go up to 4095:

 memory.limit(size=4095)

 When I run mclust I get:

 Error: cannot allocate vector of size 1.3 Gb

 So, now I set the max-mem-size flag:

 --max-mem-size=3500M

 Repeat and I still get the same error:

 Error: cannot allocate vector of size 1.3 Gb

 So, can anyone help?  32 bit R is reported to be able to use up to 4Gb of RAM 
 on 64 bit windows, yet mine croaks at 1.3Gb.

You are misreading things in a couple of places.  The most important one
is the error message: it says allocation of a 1.3 Gb object failed.  You
may have 4 Gb available in total, but have used so much of it already
that Windows won't allocate a 1.3 Gb piece to R.  Using Rprofmem() might
show how the allocations are happening.

Another possibility is that your particular version of Windows doesn't
support giving 4 Gb to R.  Run memory.size(NA) to find out the limit,
memory.size() to find out the current amount in use.

And even if the limit is close to 4 Gb and you have more than 1.3 Gb
free, the allocation may fail because there is no sufficiently large
contiguous block available.  Once memory is allocated, that address is
in use and is unavailable.  Since the 32 bit address space only covers 4
Gb, you can fairly easily end up with allocations that are spread out
across it leaving no large blocks available.  That's the advantage of
switching to 64 bit R:  even if you still only had 4 Gb of memory
available, the address space is so much bigger that fragmentation
probably won't be a problem.

Duncan Murdoch


 sessionInfo()
 R version 2.9.2 (2009-08-24)
 i386-pc-mingw32
 locale:
 LC_COLLATE=English_United Kingdom.1252;LC_CTYPE=English_United 
 Kingdom.1252;LC_MONETARY=English_United 
 Kingdom.1252;LC_NUMERIC=C;LC_TIME=English_United Kingdom.1252
 attached base packages:
 [1] stats graphics  grDevices utils datasets  methods   base
 other attached packages:
 [1] mclust_3.3.1

 Thanks
 Mick

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Can someone please explain why the last tick mark on this chart is not showing?

2009-09-16 Thread Jorgy Porgee

Hi all,

I'm trying to log chart but with natural looking tick marks. My
specifications are very specific -- it must indicate the lowest
number's tick as well as the maximum.

I've attached sample code and data for a particular  case (and there
are a few more like this) where the bottom tickmarks on the chart are
not set to where I want them to be and yet they fit in the ylim range.

Please, can anyone help? I'm tearing my hair out at this point..

Thanking you in advance,

George.

# Code below

sample-c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, 5525, 5600, 7100, 6124, 6550, 6061, 6950, 6650, 7150,
7275, 8700, 10511, 10399, 10345, 11200, 10200, 11035, 12500,
15750, 14445, 10902, 11500, 8527, 6440, 7140, 7190, 6844, 6750,
6924, 6268, 8010, 7575, 8200, 9240)

# Plotting code
# Box settings
par(bty=l,ps=7,tcl=0.2,mgp=c(0.5,0.2,0),bg=white);
# Make x axis match exactly the data
par(xaxs=i,yaxs=r);
# Margins
par(mar=c(1.2,1.8,1,1))

# Step 1: Generate logs for the data
sampleLog-log(sample,10)
# Step 2: Determine pretty min/max for the data
purtee-pretty(c(sample,pxTarget))
dataMin-min(sample,na.rm=T)*0.95
dataMax-max(purtee)
# Step 3: Create exponential series
t-exp(seq(log(dataMin),log(dataMax),length=11))
t2-round(t,-1)
t2[t2==0]-1
# natural looking labels
lab-signif(round(t,-1),2)
lab[1]-signif(lab[1],1)
ylimits-range(log(t2,10))
plot(sampleLog,type='l',xaxt=n,yaxt=n,ylim=ylimits,las=2,ann=FALSE)
axis(2,las=2,at=log(lab,10),labels=lab);

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Can someone please explain why the last tick mark on this chart is not showing?

2009-09-16 Thread David Winsemius



On Sep 16, 2009, at 11:13 AM, Jorgy Porgee wrote:


Hi all,

I'm trying to log chart but with natural looking tick marks. My
specifications are very specific -- it must indicate the lowest
number's tick as well as the maximum.

I've attached sample code and data for a particular  case (and there
are a few more like this) where the bottom tickmarks on the chart are
not set to where I want them to be and yet they fit in the ylim range.

Please, can anyone help? I'm tearing my hair out at this point..

Thanking you in advance,

George.

# Code below

sample-c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, 5525, 5600, 7100, 6124, 6550, 6061, 6950, 6650, 7150,
7275, 8700, 10511, 10399, 10345, 11200, 10200, 11035, 12500,
15750, 14445, 10902, 11500, 8527, 6440, 7140, 7190, 6844, 6750,
6924, 6268, 8010, 7575, 8200, 9240)

# Plotting code
# Box settings
par(bty=l,ps=7,tcl=0.2,mgp=c(0.5,0.2,0),bg=white);
# Make x axis match exactly the data
par(xaxs=i,yaxs=r);
# Margins
par(mar=c(1.2,1.8,1,1))

# Step 1: Generate logs for the data
sampleLog-log(sample,10)
# Step 2: Determine pretty min/max for the data
purtee-pretty(c(sample,pxTarget))


Error in pretty(c(sample, pxTarget)) : object 'pxTarget' not found
...which then creates a cascade of errors


dataMin-min(sample,na.rm=T)*0.95
dataMax-max(purtee)
# Step 3: Create exponential series
t-exp(seq(log(dataMin),log(dataMax),length=11))
t2-round(t,-1)
t2[t2==0]-1
# natural looking labels
lab-signif(round(t,-1),2)
lab[1]-signif(lab[1],1)
ylimits-range(log(t2,10))
plot
(sampleLog,type='l',xaxt=n,yaxt=n,ylim=ylimits,las=2,ann=FALSE)
axis(2,las=2,at=log(lab,10),labels=lab);

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
Heritage Laboratories
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Can someone please explain why the last tick mark on this chart is not showing?

2009-09-16 Thread Jorgy Porgee

Hi David,

Sorry, forgot to define pxTarget. It can be anything you choose so say
pxTarget-1. I've also renamed sample to sampleData. The
following cut and paste should work:
The last tick mark (5000) is not showing for some reason... :-/

Regards,

George


sampleData-c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, 5525, 5600, 7100, 6124, 6550, 6061, 6950, 6650, 7150,
7275, 8700, 10511, 10399, 10345, 11200, 10200, 11035, 12500,
15750, 14445, 10902, 11500, 8527, 6440, 7140, 7190, 6844, 6750,
6924, 6268, 8010, 7575, 8200, 9240)
pxTarget-1
# Plotting code
# Box settings
par(bty=l,ps=7,tcl=0.2,mgp=c(0.5,0.2,0),bg=white);
# Make x axis match exactly the data
par(xaxs=i,yaxs=r);
# Margins
par(mar=c(1.2,1.8,1,1))

# Step 1: Generate logs for the data
sampleLog-log(sampleData,10)
# Step 2: Determine pretty min/max for the data
purtee-pretty(c(sampleData,pxTarget))
dataMin-min(sampleData,na.rm=T)*0.95
dataMax-max(purtee)
# Step 3: Create exponential series
t-exp(seq(log(dataMin),log(dataMax),length=11))
t2-round(t,-1)
t2[t2==0]-1
# natural looking labels
lab-signif(round(t,-1),2)
lab[1]-signif(lab[1],1)
ylimits-range(log(t2,10))
plot(sampleLog,type='l',xaxt=n,yaxt=n,ylim=ylimits,las=2,ann=FALSE)
axis(2,las=2,at=log(lab,10),labels=lab);


On Wed, Sep 16, 2009 at 5:20 PM, David Winsemius dwinsem...@comcast.net wrote:

 On Sep 16, 2009, at 11:13 AM, Jorgy Porgee wrote:

 Hi all,

 I'm trying to log chart but with natural looking tick marks. My
 specifications are very specific -- it must indicate the lowest
 number's tick as well as the maximum.

 I've attached sample code and data for a particular  case (and there
 are a few more like this) where the bottom tickmarks on the chart are
 not set to where I want them to be and yet they fit in the ylim range.

 Please, can anyone help? I'm tearing my hair out at this point..

 Thanking you in advance,

 George.

 # Code below

 sample-c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
 NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
 NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
 NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
 NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
 NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
 NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
 NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
 NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
 NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
 NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
 NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
 NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
 NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
 NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
 NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
 NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
 NA, NA, NA, 5525, 5600, 7100, 6124, 6550, 6061, 6950, 6650, 7150,
 7275, 8700, 10511, 10399, 10345, 11200, 10200, 11035, 12500,
 15750, 14445, 10902, 11500, 8527, 6440, 7140, 7190, 6844, 6750,
 6924, 6268, 8010, 7575, 8200, 9240)

 # Plotting code
 # Box settings
 par(bty=l,ps=7,tcl=0.2,mgp=c(0.5,0.2,0),bg=white);
 # Make x axis match exactly the data
 par(xaxs=i,yaxs=r);
 # Margins
 par(mar=c(1.2,1.8,1,1))

 # Step 1: Generate logs for the data
 sampleLog-log(sample,10)
 # Step 2: Determine pretty min/max for the data
 purtee-pretty(c(sample,pxTarget))

 Error in pretty(c(sample, pxTarget)) : object 'pxTarget' not found
 ...which then creates a cascade of errors

 dataMin-min(sample,na.rm=T)*0.95
 dataMax-max(purtee)
 # Step 3: Create exponential series
 t-exp(seq(log(dataMin),log(dataMax),length=11))
 t2-round(t,-1)
 t2[t2==0]-1
 # natural looking labels
 lab-signif(round(t,-1),2)
 lab[1]-signif(lab[1],1)

Re: [R] Can someone please explain why the last tick mark on this chart is not showing?

2009-09-16 Thread David Winsemius

It shows if you lower your minimum a bit by applying a factor of 0.90  
to your generation of dataMin.


--
David.
On Sep 16, 2009, at 11:27 AM, Jorgy Porgee wrote:


Hi David,

Sorry, forgot to define pxTarget. It can be anything you choose so say
pxTarget-1. I've also renamed sample to sampleData. The
following cut and paste should work:
The last tick mark (5000) is not showing for some reason... :-/

Regards,

George


sampleData-c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,  
NA, NA,

NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, 5525, 5600, 7100, 6124, 6550, 6061, 6950, 6650, 7150,
7275, 8700, 10511, 10399, 10345, 11200, 10200, 11035, 12500,
15750, 14445, 10902, 11500, 8527, 6440, 7140, 7190, 6844, 6750,
6924, 6268, 8010, 7575, 8200, 9240)
pxTarget-1
# Plotting code
# Box settings
par(bty=l,ps=7,tcl=0.2,mgp=c(0.5,0.2,0),bg=white);
# Make x axis match exactly the data
par(xaxs=i,yaxs=r);
# Margins
par(mar=c(1.2,1.8,1,1))

# Step 1: Generate logs for the data
sampleLog-log(sampleData,10)
# Step 2: Determine pretty min/max for the data
purtee-pretty(c(sampleData,pxTarget))
dataMin-min(sampleData,na.rm=T)*0.95
dataMax-max(purtee)
# Step 3: Create exponential series
t-exp(seq(log(dataMin),log(dataMax),length=11))
t2-round(t,-1)
t2[t2==0]-1
# natural looking labels
lab-signif(round(t,-1),2)
lab[1]-signif(lab[1],1)
ylimits-range(log(t2,10))
plot 
(sampleLog,type='l',xaxt=n,yaxt=n,ylim=ylimits,las=2,ann=FALSE)

axis(2,las=2,at=log(lab,10),labels=lab);


On Wed, Sep 16, 2009 at 5:20 PM, David Winsemius dwinsem...@comcast.net 
 wrote:


On Sep 16, 2009, at 11:13 AM, Jorgy Porgee wrote:


Hi all,

I'm trying to log chart but with natural looking tick marks. My
specifications are very specific -- it must indicate the lowest
number's tick as well as the maximum.

I've attached sample code and data for a particular  case (and there
are a few more like this) where the bottom tickmarks on the chart  
are
not set to where I want them to be and yet they fit in the ylim  
range.


Please, can anyone help? I'm tearing my hair out at this point..

Thanking you in advance,

George.

# Code below

sample-c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,  
NA,

NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, 5525, 5600, 7100, 6124, 6550, 6061, 6950, 6650, 7150,
7275, 8700, 10511, 10399, 10345, 11200, 10200, 11035, 12500,
15750, 14445, 10902, 11500, 8527, 6440, 7140, 7190, 6844, 6750,
6924, 6268, 8010, 7575, 8200, 9240)

# Plotting code
# Box settings
par(bty=l,ps=7,tcl=0.2,mgp=c(0.5,0.2,0),bg=white);
# Make x axis match exactly the data
par(xaxs=i,yaxs=r);
# Margins
par(mar=c(1.2,1.8,1,1))

# Step 1: Generate logs for the data
sampleLog-log(sample,10)
# Step 2: Determine pretty min/max for the data
purtee-pretty(c(sample,pxTarget))


Error in pretty(c(sample, pxTarget)) : object 'pxTarget' not found
...which then creates a cascade of errors


dataMin-min(sample,na.rm=T)*0.95
dataMax-max(purtee)
# Step 3: Create exponential series

[R] lattice: How to display no box but only a y-axis on the left + Thicker lines

2009-09-16 Thread lith

Hi,

I have two somewhat embarassing questions about the lattice-related
plot functions:

1.) How do I make lattice (e.g. barchart) to not draw a box but only a
y-axis on the left hand side so that the plot looks like barplot with
default settings? So that the following two code snippets look more
alike:

barplot(VADeaths)

library(reshape)
vad - melt(data.frame(VADeaths, Age=rownames(VADeaths)), id=Age)
barchart(value ~ variable, groups=Age, horizontal=FALSE, stack=TRUE,
data=vad)

2.) What is the proper way to make lines in xyplots thicker? When I
set lwd=2, I get thicker lines but in conjunction with lty=1:n they
also look somewhat ugly which makes me wonder if I'm doing something
wrong.

I guess both questions are newbie questions and the information should
be easily available somewhere. Unfortunately I wasn't able to find the
answer to these questions myself.

Regards,
Tom

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] best method to format output of frequency table

2009-09-16 Thread esawdust



Perfect, that works well.  Thank you for the suggestion.


Henrique Dallazuanna wrote:
 
 Try this:
 
  as.matrix(table(x))
 
 

-- 
View this message in context: 
http://www.nabble.com/best-method-to-format-output-of-frequency-table-tp25462448p25472239.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] a sequence that wraps around

2009-09-16 Thread Jack Tanner

Henrique Dallazuanna wwwhsd at gmail.com writes:

 Try rep:
 
 rep(2:4, length.out = 3, times = 10)

That's close, but it doesn't wrap to start at 1.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] a sequence that wraps around

2009-09-16 Thread Szabolcs Horvát


On 2009.09.16. 16:08, Jack Tanner wrote:

I'd like to have something like seq() where I can pass in a length of the
desired sequence and a right limit so that the sequence goes up to the limit and
then starts again from 1.



Disclaimer: total R beginner here (started to learn a 1 day ago), who 
just came to this list to learn.


You could use the modulo operator.


# works now
seq(from=2, length.out=3)
[1] 2 3 4

# what I want
seq(from=2, length.out=3, rlimit=3)
[1] 2 3 1

# additional examples of what I want
seq(from=2, length.out=4, rlimit=3)
[1] 2 3 1 2


seq(from=1, length.out=4) %% 3 + 1


seq(from=2, length.out=4, rlimit=4)
[1] 2 3 4 1


seq(from=1, length.out=4) %% 4 + 1


seq(from=2, length.out=3, rlimit=2)
[1] 2 1 2


seq(from=1, length.out=3) %% 2 + 1

(the 'from' needed to be decreased by one, otherwise it'd start from 0)



I can write this procedurally, but it seems like there ought to be a cleaner R
way of doing it. Thanks in advance for any suggestions.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Memory in R on windows running 64bit XP

2009-09-16 Thread Duncan Murdoch


On 9/16/2009 10:54 AM, michael watson (IAH-C) wrote:

Hi Duncan

Many thanks for the reply.  Here are the results of the commands:

Rprofmem() 

Error in Rprofmem() : memory profiling is not available on this system

memory.size(NA)

[1] 3500
memory.size() 

[1] 16.19


Clearly Rprofmem() is a dead end, though if there is some way of enabling it, I 
will be happy to.


I forgot that you need to recompile R to enable it.  Not impossible, but 
probably not worth the trouble just for this problem.  Using 
gcinfo(TRUE) may give you enough information:  it will report whenever 
there's a garbage collection.



I am only actually using 16.19Mb of memory, as all I do is open R and read in 
some data (a 2.7Mb text file).


I would guess that mclust does a lot of other allocations before it fails.


I am not sure how to guard against the memory fragmentation problem.  The 
machine has 16Gb of RAM and I am only running R and Internet Explorer, but I 
don't know if that helps.  Is there a way to deliver contiguous memory to R?


R doesn't give you any real control over that.  I would guess that R is 
getting 3.5 Gb in a contiguous chunk based on your memory.size report, 
but then it is fragmenting it.



Is 64bit R available for Windows?  Sorry to bang on about Windows, I'm waiting 
for my IT department to build Linux machines, and this is what I have in the 
meantime.


Revolution Computing sells one.  There's no (reliable) 64 bit version of 
gcc for Windows, so we don't distribute one.


Duncan Murdoch



Thanks
Mick


From: Duncan Murdoch [murd...@stats.uwo.ca]
Sent: 16 September 2009 15:43
To: michael watson (IAH-C)
Cc: r-help@r-project.org
Subject: Re: [R] Memory in R on windows running 64bit XP

On 9/16/2009 10:16 AM, michael watson (IAH-C) wrote:

Hi
I'm running 32-bit R on Windows XP 64bit and the machine has 16Gb of RAM. The 
help for memory.limit states:

If 32-bit R is run on some 64-bit versions of Windows the maximum value of 
obtainable memory is just under 4GB.

So, using the help which states the size parameter can go up to 4095:

memory.limit(size=4095)

When I run mclust I get:

Error: cannot allocate vector of size 1.3 Gb

So, now I set the max-mem-size flag:

--max-mem-size=3500M

Repeat and I still get the same error:

Error: cannot allocate vector of size 1.3 Gb

So, can anyone help?  32 bit R is reported to be able to use up to 4Gb of RAM 
on 64 bit windows, yet mine croaks at 1.3Gb.


You are misreading things in a couple of places.  The most important one
is the error message: it says allocation of a 1.3 Gb object failed.  You
may have 4 Gb available in total, but have used so much of it already
that Windows won't allocate a 1.3 Gb piece to R.  Using Rprofmem() might
show how the allocations are happening.

Another possibility is that your particular version of Windows doesn't
support giving 4 Gb to R.  Run memory.size(NA) to find out the limit,
memory.size() to find out the current amount in use.

And even if the limit is close to 4 Gb and you have more than 1.3 Gb
free, the allocation may fail because there is no sufficiently large
contiguous block available.  Once memory is allocated, that address is
in use and is unavailable.  Since the 32 bit address space only covers 4
Gb, you can fairly easily end up with allocations that are spread out
across it leaving no large blocks available.  That's the advantage of
switching to 64 bit R:  even if you still only had 4 Gb of memory
available, the address space is so much bigger that fragmentation
probably won't be a problem.

Duncan Murdoch




sessionInfo()

R version 2.9.2 (2009-08-24)
i386-pc-mingw32
locale:
LC_COLLATE=English_United Kingdom.1252;LC_CTYPE=English_United 
Kingdom.1252;LC_MONETARY=English_United 
Kingdom.1252;LC_NUMERIC=C;LC_TIME=English_United Kingdom.1252
attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base
other attached packages:
[1] mclust_3.3.1

Thanks
Mick

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] a sequence that wraps around

2009-09-16 Thread William Dunlap

 -Original Message-
 From: r-help-boun...@r-project.org 
 [mailto:r-help-boun...@r-project.org] On Behalf Of Jack Tanner
 Sent: Wednesday, September 16, 2009 7:08 AM
 To: r-h...@stat.math.ethz.ch
 Subject: [R] a sequence that wraps around

 I'd like to have something like seq() where I can pass in a 
 length of the
 desired sequence and a right limit so that the sequence goes 
 up to the limit and
 then starts again from 1.

 # works now
 seq(from=2, length.out=3)
 [1] 2 3 4

 # what I want
 seq(from=2, length.out=3, rlimit=3)
 [1] 2 3 1

 # additional examples of what I want
 seq(from=2, length.out=4, rlimit=3)
 [1] 2 3 1 2
 seq(from=2, length.out=4, rlimit=4)
 [1] 2 3 4 1
 seq(from=2, length.out=3, rlimit=2)
 [1] 2 1 2

 I can write this procedurally, but it seems like there ought 
 to be a cleaner R
 way of doing it. Thanks in advance for any suggestions.

The remainder (modulus) operator, x%%n, wraps to the range 0..n-1
but you can write a function that wraps to 1..n as
   mod1 - function(x, n) (x-1L)%%n + 1L
Then pass the output of seq through that:
mod1(-2:10, 3)
[1] 1 2 3 1 2 3 1 2 3 1 2 3 1

Bill Dunlap
TIBCO Software Inc - Spotfire Division
wdunlap tibco.com  

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Memory in R on windows running 64bit XP

2009-09-16 Thread michael watson (IAH-C)

Fair enough, thanks for your time. 
I will wait for my Linux PCs!

-Original Message-
From: Duncan Murdoch [mailto:murd...@stats.uwo.ca] 
Sent: 16 September 2009 16:42
To: michael watson (IAH-C)
Cc: r-help@r-project.org
Subject: Re: [R] Memory in R on windows running 64bit XP

On 9/16/2009 10:54 AM, michael watson (IAH-C) wrote:
 Hi Duncan

 Many thanks for the reply.  Here are the results of the commands:

 Rprofmem() 
 Error in Rprofmem() : memory profiling is not available on this system
 memory.size(NA)
 [1] 3500
 memory.size() 
 [1] 16.19

 Clearly Rprofmem() is a dead end, though if there is some way of enabling it, 
 I will be happy to.

I forgot that you need to recompile R to enable it.  Not impossible, but 
probably not worth the trouble just for this problem.  Using 
gcinfo(TRUE) may give you enough information:  it will report whenever 
there's a garbage collection.

 I am only actually using 16.19Mb of memory, as all I do is open R and read in 
 some data (a 2.7Mb text file).

I would guess that mclust does a lot of other allocations before it fails.

 I am not sure how to guard against the memory fragmentation problem.  The 
 machine has 16Gb of RAM and I am only running R and Internet Explorer, but I 
 don't know if that helps.  Is there a way to deliver contiguous memory to R?

R doesn't give you any real control over that.  I would guess that R is 
getting 3.5 Gb in a contiguous chunk based on your memory.size report, 
but then it is fragmenting it.

 Is 64bit R available for Windows?  Sorry to bang on about Windows, I'm 
 waiting for my IT department to build Linux machines, and this is what I have 
 in the meantime.

Revolution Computing sells one.  There's no (reliable) 64 bit version of 
gcc for Windows, so we don't distribute one.

Duncan Murdoch

 Thanks
 Mick

 From: Duncan Murdoch [murd...@stats.uwo.ca]
 Sent: 16 September 2009 15:43
 To: michael watson (IAH-C)
 Cc: r-help@r-project.org
 Subject: Re: [R] Memory in R on windows running 64bit XP

 On 9/16/2009 10:16 AM, michael watson (IAH-C) wrote:
 Hi
 I'm running 32-bit R on Windows XP 64bit and the machine has 16Gb of RAM. 
 The help for memory.limit states:

 If 32-bit R is run on some 64-bit versions of Windows the maximum value of 
 obtainable memory is just under 4GB.

 So, using the help which states the size parameter can go up to 4095:

 memory.limit(size=4095)

 When I run mclust I get:

 Error: cannot allocate vector of size 1.3 Gb

 So, now I set the max-mem-size flag:

 --max-mem-size=3500M

 Repeat and I still get the same error:

 Error: cannot allocate vector of size 1.3 Gb

 So, can anyone help?  32 bit R is reported to be able to use up to 4Gb of 
 RAM on 64 bit windows, yet mine croaks at 1.3Gb.

 You are misreading things in a couple of places.  The most important one
 is the error message: it says allocation of a 1.3 Gb object failed.  You
 may have 4 Gb available in total, but have used so much of it already
 that Windows won't allocate a 1.3 Gb piece to R.  Using Rprofmem() might
 show how the allocations are happening.

 Another possibility is that your particular version of Windows doesn't
 support giving 4 Gb to R.  Run memory.size(NA) to find out the limit,
 memory.size() to find out the current amount in use.

 And even if the limit is close to 4 Gb and you have more than 1.3 Gb
 free, the allocation may fail because there is no sufficiently large
 contiguous block available.  Once memory is allocated, that address is
 in use and is unavailable.  Since the 32 bit address space only covers 4
 Gb, you can fairly easily end up with allocations that are spread out
 across it leaving no large blocks available.  That's the advantage of
 switching to 64 bit R:  even if you still only had 4 Gb of memory
 available, the address space is so much bigger that fragmentation
 probably won't be a problem.

 Duncan Murdoch

 sessionInfo()
 R version 2.9.2 (2009-08-24)
 i386-pc-mingw32
 locale:
 LC_COLLATE=English_United Kingdom.1252;LC_CTYPE=English_United 
 Kingdom.1252;LC_MONETARY=English_United 
 Kingdom.1252;LC_NUMERIC=C;LC_TIME=English_United Kingdom.1252
 attached base packages:
 [1] stats graphics  grDevices utils datasets  methods   base
 other attached packages:
 [1] mclust_3.3.1

 Thanks
 Mick

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] a sequence that wraps around

2009-09-16 Thread Duncan Murdoch


On 9/16/2009 10:08 AM, Jack Tanner wrote:

I'd like to have something like seq() where I can pass in a length of the
desired sequence and a right limit so that the sequence goes up to the limit and
then starts again from 1.

# works now
seq(from=2, length.out=3)
[1] 2 3 4

# what I want
seq(from=2, length.out=3, rlimit=3)
[1] 2 3 1

# additional examples of what I want
seq(from=2, length.out=4, rlimit=3)
[1] 2 3 1 2
seq(from=2, length.out=4, rlimit=4)
[1] 2 3 4 1
seq(from=2, length.out=3, rlimit=2)
[1] 2 1 2

I can write this procedurally, but it seems like there ought to be a cleaner R
way of doing it. Thanks in advance for any suggestions.


Here's a start.  You probably want some sanity checks on the arguments; 
e.g. from=1 doesn't work because of the old 1:0 infelicity.


seq2 - function(from, length.out, rlimit)
  rep(1:rlimit, length.out=length.out+from-1)[-(1:(from-1))]

Duncan Murdoch

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] wilcox.test p-value = 0

2009-09-16 Thread Marc Schwartz

Once one gets past the issue of the p value being extremely small,  
irrespective of the test being used, the OP has asked the question of  
how to report it.


Most communities will have standards for how to report p values,  
covering things like how many significant digits and a minimum p value  
threshold to report.


For example, in medicine, it is common to report 'small' p values as  
'p  0.001' or 'p  0.0001'.


Thus, below those numbers, the precision is largely irrelevant and one  
need not report the actual p value.


I just wanted to be sure that we don't lose sight of the forest for  
the trees...  :-)


The OP should consult a relevant guidance document or an experienced  
author in the domain of interest.


HTH,

Marc Schwartz


On Sep 16, 2009, at 9:54 AM, Bryan Keller wrote:

That's right, if the test is exact it is not possible to get a p- 
value of zero.  wilcox.test does not provide an exact p-value in the  
presence of ties so if there are any ties in your data you are  
getting a normal approximation.  Incidentally, if there are any ties  
in your data set I would strongly recommend computing the *exact* p- 
value because using the normal approximation on tied data sets will  
either inflate type I error rate or reduce power depending on how  
the ties are distributed.  Depending on the pattern of ties this can  
result in gross under or over estimation of the p-value.


I guess this is all by way of saying that you should always compute  
the exact p-value if possible.


The package exactRankTests uses the algorithm by Mehta Patel and  
Tsiatis (1984).  If your sample sizes are larger, there is a freely  
available .exe by Cheung and Klotz (1995) that will do exact p- 
values for sample sizes larger than 100 in each group!


You can find it at http://pages.cs.wisc.edu/~klotz/

Bryan


Hi Murat,
I am not an expert in either statistics nor R, but I can imagine  
that since the
default is exact=TRUE, It numerically computes the probability, and  
it may
indeed be 0. if you use wilcox.test(x, y, exact=FALSE) it will give  
you a

normal aproximation, which will most likely be different from zero.


No, the exact p-value can't be zero for a discrete distribution. The  
smallest possible value in this case would, I think, be 1/ 
choose(length(x)+length(y),length(x)), or perhaps twice that.


More generally, the approach used by format.pvalue() is to display  
very small p-values as 2e-16, where 2e-16 is machine epsilon.  I  
wouldn't want to claim optimality for this choice, but it seems a  
reasonable way to represent very small.


-thomas



Hope this helps.
Keo.

Murat Tasan escribi?:

hi, folks,

how have you gone about reporting a p-value from a test when the
returned value from a test (in this case a rank-sum test) is
numerically equal to 0 according to the machine?

the next lowest value greater than zero that is distinct from zero  
on

the machine is likely algorithm-dependent (the algorithm of the test
itself), but without knowing the explicit steps of the algorithm
implementation, it is difficult to provide any non-zero value.  i
initially thought to look at .mach...@double.xmin, but i'm not
comfortable with reporting p  .mach...@double.xmin, since without
knowing the specifics of the implementation, this may not be true!

to be clear, if i have data x, and i run the following line, the
returned value is TRUE.

wilcox.test(x)$p.value == 0

thanks for any help on this!


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] expression

2009-09-16 Thread Marco Chiapello


/Dear all,///
/I am very thankful, if you could tell what is the right way to write:

mtext(paste(expression(R^2),round(marco2[1,i],digits=3),   N° of 
proteins:,marco3[i]),side=4,cex=.6)

in this case the output is:

R^2

I tried also in this way:

mtext(paste(expression(paste(R^2)),round(marco2[1,i],digits=3),   N° of 
proteins:,marco3[i]),side=4,cex=.6)

the output is:

paste(R^2)

Thank you in advance
Marco
/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] a sequence that wraps around

2009-09-16 Thread Jack Tanner

Szabolcs Horvát szhorvat at gmail.com writes:

 You could use the modulo operator.
 
  # additional examples of what I want
  seq(from=2, length.out=4, rlimit=3)
  [1] 2 3 1 2
 
 seq(from=1, length.out=4) %% 3 + 1

Ah, that's so slick. You're off to a great start! Huge thanks to you and 
everyone else who responded.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] expression

2009-09-16 Thread baptiste auguie

Try this,

marco2 = matrix(rnorm(4), nrow=2, ncol=2)
marco3 = 2.12
i =1

plot.new()
mtext(bquote(R^2*.(round(marco2[1,i],digits=3))* N° of
proteins:*.(marco3[i])),side=4,cex=.6)


HTH,

baptiste

2009/9/16 Marco Chiapello marpe...@gmail.com

 /Dear all,///
 /I am very thankful, if you could tell what is the right way to write:

 mtext(paste(expression(R^2),round(marco2[1,i],digits=3),   N° of
 proteins:,marco3[i]),side=4,cex=.6)

 in this case the output is:

 R^2

 I tried also in this way:

 mtext(paste(expression(paste(R^2)),round(marco2[1,i],digits=3),   N° of
 proteins:,marco3[i]),side=4,cex=.6)

 the output is:

 paste(R^2)

 Thank you in advance
 Marco
 /

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
_

Baptiste Auguié

School of Physics
University of Exeter
Stocker Road,
Exeter, Devon,
EX4 4QL, UK

http://newton.ex.ac.uk/research/emag
__

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to extract partial predictions, package mgcv

2009-09-16 Thread Staffan Roos


Thanks Simon, 

I wasn't aware that predict.gam could be used in this situation. I
followed you advice and managed to extract the data I needed. 

For people interested, I add the code with comments I used here:

#
# Full code for extracting partial predictions
# Start the package mgcv
library(mgcv)

# Simulate some data (this is from Simon Wood's example in the 
# package mgcv manual)
n-200
sig - 2
dat - gamSim(1,n=n,scale=sig)

# Check the data
dat

## It looks alright :-)

# Run the GAM
b-gam(y~s(x0)+s(I(x1^2))+s(x2)+offset(x3),data=dat)

# Plot the partial predictions (You may need to rescale your window to 
# see the non-linearity
par(mfrow=c(1,3))
plot(b, scale=0)

### Clearly, the variables x0 and x2 are non-linear!

# I would like to extract the solid line in the graphs and the
# associated standard errors from the plots. Thus, I use predict 
# and change to a data.frame:
c-data.frame(predict(b,type=terms,se.fit=TRUE)$fit)
names(c)-c(pp_s.x0.,pp_s.I.x1.2..,pp_s.x2.)

d-data.frame(predict(b,type=terms,se.fit=TRUE)$se.fit)
names(d)-c(se_s.x0.,se_s.I.x1.2..,se_s.x2.)

# Merge the three datasets:
f=cbind(dat,c,d)

#Restrict the number of variables to the ones I am interested in:
f-f[,c(x0,pp_s.x0., se_s.x0.)]
names(f)

### This data, i.e. f, can now be exported or used for further
calculations within R

#plot the data
par(mfrow=c(1,1))
plot(pp_s.x0.~x0,type=p,pch=2,cex=0.50)
sequence=order(x0)
lines(x0[sequence],pp_s.x0.[sequence])
lines
rug(jitter(x0))
##

Thanks, 

Staffan
--
Staffan Roos, PhD
Research Ecologist
BTO Scotland


Staffan,

Take a look at ?predict.gam. You need to use type=terms with se=TRUE to
get 
the quantities that you need.

Simon

ps. with binary data,  method=REML or method=ML for the gam fit are
often 
somewhat more reliable than  the default.

On Monday 14 September 2009 19:30, Staffan Roos wrote:
 Dear package mgcv users,



 I am using package mgcv to describe presence of a migratory bird species
 as
 a function of several variables, including year, day number (i.e.
 day-of-the-year), duration of survey, latitude and longitude. Thus, the
 global model is:



 global_model-gam(present ~ as.factor(year) + s(dayno, k=5) + s(duration,
 k=5) + s(x, k=5) + s(y, k=5), family = binomial(link = logit), data =
 presence, gamma =1.4)



 The model works fine, suggesting that all the variables have strong,
 non-linear, influences on the probability of detecting the species. My
 main
 interest is in the effect dayno has on presence, given the inclusion of
 the other explanatory variables. Thus, I would like to extract the values
 of the partial prediction of dayno and its associated 2 standard errors
 above and below the main effect, as shown by the code
 plot(global_model).



 I have tried to extract the values by using
 fitted.values(global_model,dayno), but when plotted, the figure doesn't
 look like the partial prediction for dayno. Instead, it gives me a very
 scattered figure (shotgun effect).



 Has anyone tried to extract the partial predictions? If so, please could
 you advise me how to do this?



 Thanks,



 Staffan



 P.S.. For comparison, please have a look at Simon Woods paper in R News,
 1(2):20-25, June 2001, especially the figures showing the partial
 predictions of Mackerel egg densities. Using those graphs as an example, I
 would like to extract the partial predictions for e.g. s(b.depth), given
 the inclusion of the other variables.

 --
 Staffan Roos, PhD
 Research Ecologist
 BTO Scotland

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html and provide commented,
 minimal,
 self-contained, reproducible code.

-- 
 Simon Wood, Mathematical Sciences, University of Bath, Bath, BA2 7AY UK
 +44 1225 386603  www.maths.bath.ac.uk/~sw283

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


-- 
View this message in context: 
http://www.nabble.com/How-to-extract-partial-predictions%2C-package-mgcv-tp25441149p25476107.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to extract partial predictions, package mgcv

2009-09-16 Thread David Winsemius


It is nice to see worked examples, but there were some errors that
relate to not including dataframe references. I've paste in code that
runs without error on my machine:

On Sep 16, 2009, at 12:39 PM, Staffan Roos wrote:



Thanks Simon,

I wasn't aware that predict.gam could be used in this situation. I
followed you advice and managed to extract the data I needed.

For people interested, I add the code with comments I used here:

#
# Full code for extracting partial predictions
# Start the package mgcv
library(mgcv)

# Simulate some data (this is from Simon Wood's example in the
# package mgcv manual)
n-200
sig - 2
dat - gamSim(1,n=n,scale=sig)

# Check the data
dat

## It looks alright :-)

# Run the GAM
b-gam(y~s(x0)+s(I(x1^2))+s(x2)+offset(x3),data=dat)

# Plot the partial predictions (You may need to rescale your window to
# see the non-linearity
par(mfrow=c(1,3))
plot(b, scale=0)

### Clearly, the variables x0 and x2 are non-linear!



# I would like to extract the solid line in the graphs and the

# associated standard errors from the plots. Thus, I use predict
# and change to a data.frame:
c-data.frame(predict(b,type=terms,se.fit=TRUE)$fit)
names(c)-c(pp_s.x0.,pp_s.I.x1.2..,pp_s.x2.)

d-data.frame(predict(b,type=terms,se.fit=TRUE)$se.fit)
names(d)-c(se_s.x0.,se_s.I.x1.2..,se_s.x2.)

# Merge the three datasets:
f=cbind(dat,c,d)

#Restrict the number of variables to the ones I am interested in:
f-f[,c(x0,pp_s.x0., se_s.x0.)]
names(f)

### This data, i.e. f, can now be exported or used for further
### calculations within R

#plot the data
par(mfrow=c(1,1))
plot(pp_s.x0.~x0,type=p,pch=2,cex=0.50, data=f)
sequence=order(f$x0)
lines(f$x0[sequence],f$pp_s.x0.[sequence])
lines
rug(jitter(f$x0))


##


Staffan,

Take a look at ?predict.gam. You need to use type=terms with  
se=TRUE to

get
the quantities that you need.

Simon

ps. with binary data,  method=REML or method=ML for the gam fit  
are

often
somewhat more reliable than  the default.

On Monday 14 September 2009 19:30, Staffan Roos wrote:

Dear package mgcv users,



I am using package mgcv to describe presence of a migratory bird  
species

as
a function of several variables, including year, day number (i.e.
day-of-the-year), duration of survey, latitude and longitude. Thus,  
the

global model is:

global_model-gam(present ~ as.factor(year) + s(dayno, k=5) +  
s(duration,

k=5) + s(x, k=5) + s(y, k=5), family = binomial(link = logit), data =
presence, gamma =1.4)

The model works fine, suggesting that all the variables have strong,
non-linear, influences on the probability of detecting the species.  
My

main
interest is in the effect dayno has on presence, given the  
inclusion of
the other explanatory variables. Thus, I would like to extract the  
values
of the partial prediction of dayno and its associated 2 standard  
errors

above and below the main effect, as shown by the code
plot(global_model).

I have tried to extract the values by using
fitted.values(global_model,dayno), but when plotted, the figure  
doesn't
look like the partial prediction for dayno. Instead, it gives me  
a very

scattered figure (shotgun effect).

Has anyone tried to extract the partial predictions? If so, please  
could

you advise me how to do this?




Staffan

P.S.. For comparison, please have a look at Simon Woods paper in R  
News,

1(2):20-25, June 2001, especially the figures showing the partial
predictions of Mackerel egg densities. Using those graphs as an  
example, I
would like to extract the partial predictions for e.g.  
s(b.depth), given

the inclusion of the other variables.



David Winsemius, MD
Heritage Laboratories
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to extract partial predictions, package mgcv

2009-09-16 Thread David Winsemius


But removing the extraneous second to last line that says just:

lines

... would leave the console looking less puzzling.

--  
David

On Sep 16, 2009, at 1:02 PM, David Winsemius wrote:


It is nice to see worked examples, but there were some errors that
relate to not including dataframe references. I've paste in code that
runs without error on my machine:

On Sep 16, 2009, at 12:39 PM, Staffan Roos wrote:



Thanks Simon,

I wasn't aware that predict.gam could be used in this situation. I
followed you advice and managed to extract the data I needed.

For people interested, I add the code with comments I used here:

#
# Full code for extracting partial predictions
# Start the package mgcv
library(mgcv)

# Simulate some data (this is from Simon Wood's example in the
# package mgcv manual)
n-200
sig - 2
dat - gamSim(1,n=n,scale=sig)

# Check the data
dat

## It looks alright :-)

# Run the GAM
b-gam(y~s(x0)+s(I(x1^2))+s(x2)+offset(x3),data=dat)

# Plot the partial predictions (You may need to rescale your window  
to

# see the non-linearity
par(mfrow=c(1,3))
plot(b, scale=0)

### Clearly, the variables x0 and x2 are non-linear!



# I would like to extract the solid line in the graphs and the

# associated standard errors from the plots. Thus, I use predict
# and change to a data.frame:
c-data.frame(predict(b,type=terms,se.fit=TRUE)$fit)
names(c)-c(pp_s.x0.,pp_s.I.x1.2..,pp_s.x2.)

d-data.frame(predict(b,type=terms,se.fit=TRUE)$se.fit)
names(d)-c(se_s.x0.,se_s.I.x1.2..,se_s.x2.)

# Merge the three datasets:
f=cbind(dat,c,d)

#Restrict the number of variables to the ones I am interested in:
f-f[,c(x0,pp_s.x0., se_s.x0.)]
names(f)

### This data, i.e. f, can now be exported or used for further
### calculations within R

#plot the data
par(mfrow=c(1,1))
plot(pp_s.x0.~x0,type=p,pch=2,cex=0.50, data=f)
sequence=order(f$x0)
lines(f$x0[sequence],f$pp_s.x0.[sequence])
rug(jitter(f$x0))


##


Staffan,

Take a look at ?predict.gam. You need to use type=terms with  
se=TRUE to

get
the quantities that you need.

Simon

ps. with binary data,  method=REML or method=ML for the gam fit  
are

often
somewhat more reliable than  the default.

On Monday 14 September 2009 19:30, Staffan Roos wrote:

Dear package mgcv users,



I am using package mgcv to describe presence of a migratory bird  
species

as
a function of several variables, including year, day number (i.e.
day-of-the-year), duration of survey, latitude and longitude.  
Thus, the

global model is:

global_model-gam(present ~ as.factor(year) + s(dayno, k=5) +  
s(duration,
k=5) + s(x, k=5) + s(y, k=5), family = binomial(link = logit),  
data =

presence, gamma =1.4)

The model works fine, suggesting that all the variables have strong,
non-linear, influences on the probability of detecting the  
species. My

main
interest is in the effect dayno has on presence, given the  
inclusion of
the other explanatory variables. Thus, I would like to extract the  
values
of the partial prediction of dayno and its associated 2 standard  
errors

above and below the main effect, as shown by the code
plot(global_model).

I have tried to extract the values by using
fitted.values(global_model,dayno), but when plotted, the figure  
doesn't
look like the partial prediction for dayno. Instead, it gives me  
a very

scattered figure (shotgun effect).

Has anyone tried to extract the partial predictions? If so, please  
could

you advise me how to do this?




Staffan

P.S.. For comparison, please have a look at Simon Woods paper in R  
News,

1(2):20-25, June 2001, especially the figures showing the partial
predictions of Mackerel egg densities. Using those graphs as an  
example, I
would like to extract the partial predictions for e.g.  
s(b.depth), given

the inclusion of the other variables.



David Winsemius, MD
Heritage Laboratories
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
Heritage Laboratories
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] comma as decimal separator in xtable

2009-09-16 Thread Jakson A. Aquino

On Wed, Sep 16, 2009 at 09:14:39AM -0300, Henrique Dallazuanna wrote:
 Try this also;
 
 format(coef(summary(lm.D9)), decimal.mark = ',')
 
 or using gsub:
 
  apply(coef(summary(lm.D9)), 2, gsub, pattern = \\., replacement = ,)
 
 using lm.D9 object from ?lm example.

Thanks for your suggestion! The problem with the above approach is
that all values get the same formatting. For example, 

x - c(1.2, 1.2e10)
format(x, format = 'g', decimal.mark = ',')
[1] 1,2e+00 1,2e+10

What I would like to get was:
[1] 1,2 1,2e+10

I already solved my problem, but I'm using an external program to
replace . with ,. Perhaps I should ask my question again, with a
new subject, since what I need now is the perl regular expression
equivalent to the sed command:

s/\([0-9]\)a/a\1/

That's is, 1a becomes a1; 3b becomes b3, etc.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Fourier Transform fft help

2009-09-16 Thread delic


I wrote a script that I anticipating seeing a spike at  10Hz with the
function 10* sin(2*pi*10*t). 
I can't figure out why my plots  do not show spikes at the frequencies I
expect. Am I doing something wrong or is my expectations wrong?


require(stats)
layout(matrix(c(1,2,3), 3, 1, byrow = TRUE))
#SETUP
   n- 256 #nextn(1001) gives next power 2
   F- 100 #Hz -50 to 50 Hz
   dt   - 1/F
   T- n*dt
   df   - 1/T
   t- seq(0,T,by=dt)  #also try ts function
   t-t[1:length(t)-1]
   freq - 5 #Hz

#SIGNAL FUNCTION
   y - 10* sin(2*pi*10*t)   #10*sin(2*pi*freq*t) 
#FREQ ARRAY
   #f - seq(-F/2,F/2,by=df)
f - F/2*seq(-1,1,length.out=n+1)
   f-f[1:length(f)-1]
#FOURIER WORK
   Y - fft(y)
   mag   - sqrt(Re(Y)^2+Im(Y)^2)
   phase - atan(Im(Y)/Re(Y))
   Yr- Re(Y)
   Yi- Im(Y)

#PLOT SIGNALS
   plot(t,y,type=l,xlim=c(0,T)) 
   plot(f,Re(Y),type=l)
   plot(f,Im(Y),type=l)
-- 
View this message in context: 
http://www.nabble.com/Fourier-Transform-fft-help-tp25475063p25475063.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] comma as decimal separator in xtable

2009-09-16 Thread Jakson A. Aquino


On Wed, Sep 16, 2009 at 03:29:44PM +0200, David Hajage wrote:
 But, it seems that dcolumn can change the decimal separator too (see the
 table on the first page of the pdf document).
 
 For example:
 
 \documentclass{article}
 
 \usepackage{dcolumn}
 
 \newcolumntype{d}[1]{D{.}{,}{#1}}
 
 \begin{document}
 
 results=tex=
 x - matrix(rnorm(4), 2, 2)
 library(xtable)
 xtable(x, align = c(l, |, d{2}, |, c, |))
 @
 
 \end{document}

It works. Thanks!

 
 2009/9/16 Jakson A. Aquino jaksonaqu...@gmail.com
 
  On Wed, Sep 16, 2009 at 01:59:29PM +0200, David Hajage wrote:
   Perhaps with the dcolumn package ?
   http://newton.ex.ac.uk/research/qsystems/people/latham/LaTeX/dcolumn.pdf

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to extract partial predictions, package mgcv

2009-09-16 Thread Staffan Roos


Hi David, 

Yes, the strange code lines was clearly a mistake from my side. Sorry.
What dataframe references did you add in your code?

/Staffan



David Winsemius wrote:
 
 But removing the extraneous second to last line that says just:
 
 lines
 
 ... would leave the console looking less puzzling.
 
 --  
 David
 On Sep 16, 2009, at 1:02 PM, David Winsemius wrote:
 
 It is nice to see worked examples, but there were some errors that
 relate to not including dataframe references. I've paste in code that
 runs without error on my machine:

 On Sep 16, 2009, at 12:39 PM, Staffan Roos wrote:


 Thanks Simon,

 I wasn't aware that predict.gam could be used in this situation. I
 followed you advice and managed to extract the data I needed.

 For people interested, I add the code with comments I used here:

 #
 # Full code for extracting partial predictions
 # Start the package mgcv
 library(mgcv)

 # Simulate some data (this is from Simon Wood's example in the
 # package mgcv manual)
 n-200
 sig - 2
 dat - gamSim(1,n=n,scale=sig)

 # Check the data
 dat

 ## It looks alright :-)

 # Run the GAM
 b-gam(y~s(x0)+s(I(x1^2))+s(x2)+offset(x3),data=dat)

 # Plot the partial predictions (You may need to rescale your window  
 to
 # see the non-linearity
 par(mfrow=c(1,3))
 plot(b, scale=0)

 ### Clearly, the variables x0 and x2 are non-linear!

 # I would like to extract the solid line in the graphs and the
 # associated standard errors from the plots. Thus, I use predict
 # and change to a data.frame:
 c-data.frame(predict(b,type=terms,se.fit=TRUE)$fit)
 names(c)-c(pp_s.x0.,pp_s.I.x1.2..,pp_s.x2.)

 d-data.frame(predict(b,type=terms,se.fit=TRUE)$se.fit)
 names(d)-c(se_s.x0.,se_s.I.x1.2..,se_s.x2.)

 # Merge the three datasets:
 f=cbind(dat,c,d)

 #Restrict the number of variables to the ones I am interested in:
 f-f[,c(x0,pp_s.x0., se_s.x0.)]
 names(f)

 ### This data, i.e. f, can now be exported or used for further
 ### calculations within R

 #plot the data
 par(mfrow=c(1,1))
 plot(pp_s.x0.~x0,type=p,pch=2,cex=0.50, data=f)
 sequence=order(f$x0)
 lines(f$x0[sequence],f$pp_s.x0.[sequence])
 rug(jitter(f$x0))

 ##


 Staffan,

 Take a look at ?predict.gam. You need to use type=terms with  
 se=TRUE to
 get
 the quantities that you need.

 Simon

 ps. with binary data,  method=REML or method=ML for the gam fit  
 are
 often
 somewhat more reliable than  the default.

 On Monday 14 September 2009 19:30, Staffan Roos wrote:
 Dear package mgcv users,



 I am using package mgcv to describe presence of a migratory bird  
 species
 as
 a function of several variables, including year, day number (i.e.
 day-of-the-year), duration of survey, latitude and longitude.  
 Thus, the
 global model is:

 global_model-gam(present ~ as.factor(year) + s(dayno, k=5) +  
 s(duration,
 k=5) + s(x, k=5) + s(y, k=5), family = binomial(link = logit),  
 data =
 presence, gamma =1.4)

 The model works fine, suggesting that all the variables have strong,
 non-linear, influences on the probability of detecting the  
 species. My
 main
 interest is in the effect dayno has on presence, given the  
 inclusion of
 the other explanatory variables. Thus, I would like to extract the  
 values
 of the partial prediction of dayno and its associated 2 standard  
 errors
 above and below the main effect, as shown by the code
 plot(global_model).

 I have tried to extract the values by using
 fitted.values(global_model,dayno), but when plotted, the figure  
 doesn't
 look like the partial prediction for dayno. Instead, it gives me  
 a very
 scattered figure (shotgun effect).

 Has anyone tried to extract the partial predictions? If so, please  
 could
 you advise me how to do this?


 Staffan

 P.S.. For comparison, please have a look at Simon Woods paper in R  
 News,
 1(2):20-25, June 2001, especially the figures showing the partial
 predictions of Mackerel egg densities. Using those graphs as an  
 example, I
 would like to extract the partial predictions for e.g.  
 s(b.depth), given
 the inclusion of the other variables.


 David Winsemius, MD
 Heritage Laboratories
 West Hartford, CT

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 David Winsemius, MD
 Heritage Laboratories
 West Hartford, CT
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 

-- 
View this message in context: 
http://www.nabble.com/How-to-extract-partial-predictions%2C-package-mgcv-tp25441149p25477056.html
Sent from the R help mailing list

Re: [R] Fourier Transform fft help

2009-09-16 Thread stephen sefick

why not try spectrum (power spectrum)
spectrum(y, log=no)

the major spike is at 0.1 and 1/0.1 = 10Hz

On Wed, Sep 16, 2009 at 12:13 PM, delic kran...@optonline.net wrote:

 I wrote a script that I anticipating seeing a spike at  10Hz with the
 function 10* sin(2*pi*10*t).
 I can't figure out why my plots  do not show spikes at the frequencies I
 expect. Am I doing something wrong or is my expectations wrong?


 require(stats)
 layout(matrix(c(1,2,3), 3, 1, byrow = TRUE))
 #SETUP
   n    - 256 #nextn(1001) gives next power 2
   F    - 100 #Hz -50 to 50 Hz
   dt   - 1/F
   T    - n*dt
   df   - 1/T
   t    - seq(0,T,by=dt)  #also try ts function
   t-t[1:length(t)-1]
   freq - 5 #Hz

 #SIGNAL FUNCTION
   y     - 10* sin(2*pi*10*t)   #10*sin(2*pi*freq*t)
 #FREQ ARRAY
   #f     - seq(-F/2,F/2,by=df)
        f     - F/2*seq(-1,1,length.out=n+1)
   f-f[1:length(f)-1]
 #FOURIER WORK
   Y     - fft(y)
   mag   - sqrt(Re(Y)^2+Im(Y)^2)
   phase - atan(Im(Y)/Re(Y))
   Yr    - Re(Y)
   Yi    - Im(Y)

 #PLOT SIGNALS
   plot(t,y,type=l,xlim=c(0,T))
   plot(f,Re(Y),type=l)
   plot(f,Im(Y),type=l)
 --
 View this message in context: 
 http://www.nabble.com/Fourier-Transform-fft-help-tp25475063p25475063.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Stephen Sefick

Let's not spend our time and resources thinking about things that are
so little or so large that all they really do for us is puff us up and
make us feel like gods.  We are mammals, and have not exhausted the
annoying little problems of being mammals.

-K. Mullis

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Turnpoints

2009-09-16 Thread ogbos okike

Good evening to everybody. I have a data of four columns - three of the
columns represent date while the fourth is counts.  Setting 4000 as my
minimum point/value, I wish to find out the number of counts that are less
or equal to 4000 and the dates they occur. I have installed pastecs and have
been trying to use turnpoints function to do this. I have not been much
lucky. I am a still learning. I have added part of my data here where y
stands for year, m month, d day and lastly the count.
I would be grateful to anyone who could show me the way the do this.
Best regards
Ogbos

y  m   d  count
93 02 07 3974.6
93 02 08 3976.7
93 02 09 3955.2
93 02 10 3955.0
93 02 11 3971.8
93 02 12 3972.8
93 02 13 3961.0
93 02 14 3972.8
93 02 15 4008.0
93 02 16 4004.2
93 02 17 3981.2
93 02 18 3996.8
93 02 19 4028.2
93 02 20 4029.5
93 02 21 3953.4
93 02 22 3857.3
93 02 23 3848.3
93 02 24 3869.8
93 02 25 3898.1
93 02 26 3920.5
93 02 27 3936.7
93 02 28 3931.9

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to extract partial predictions, package mgcv

2009-09-16 Thread David Winsemius


Your code minus the extraneous lines  was:

#plot the data
plot(pp_s.x0.~x0,type=p,pch=2,cex=0.50)
sequence=order(x0)
lines(x0[sequence],pp_s.x0.[sequence])
rug(jitter(x0))

My edition was:

plot(pp_s.x0.~x0,type=p,pch=2,cex=0.50, data=f)
sequence=order(f$x0)
lines(f$x0[sequence],f$pp_s.x0.[sequence])
rug(jitter(f$x0))

So almost every one of the last set of plotting needed one or more  
additions
of f$ or data=f in order to run without error. Perhaps you  
attached f
earlier and didn't include that code? I personally have given up using  
attach()

for just that reason.

--
David.


On Sep 16, 2009, at 1:31 PM, Staffan Roos wrote:



Hi David,

Yes, the strange code lines was clearly a mistake from my side.  
Sorry.

What dataframe references did you add in your code?

/Staffan



David Winsemius wrote:


But removing the extraneous second to last line that says just:

lines

... would leave the console looking less puzzling.

--  
David

On Sep 16, 2009, at 1:02 PM, David Winsemius wrote:


It is nice to see worked examples, but there were some errors that
relate to not including dataframe references. I've paste in code  
that

runs without error on my machine:

On Sep 16, 2009, at 12:39 PM, Staffan Roos wrote:



Thanks Simon,

I wasn't aware that predict.gam could be used in this  
situation. I

followed you advice and managed to extract the data I needed.

For people interested, I add the code with comments I used here:

#
# Full code for extracting partial predictions
# Start the package mgcv
library(mgcv)

# Simulate some data (this is from Simon Wood's example in the
# package mgcv manual)
n-200
sig - 2
dat - gamSim(1,n=n,scale=sig)

# Check the data
dat

## It looks alright :-)

# Run the GAM
b-gam(y~s(x0)+s(I(x1^2))+s(x2)+offset(x3),data=dat)

# Plot the partial predictions (You may need to rescale your window
to
# see the non-linearity
par(mfrow=c(1,3))
plot(b, scale=0)

### Clearly, the variables x0 and x2 are non-linear!



# I would like to extract the solid line in the graphs and the

# associated standard errors from the plots. Thus, I use predict
# and change to a data.frame:
c-data.frame(predict(b,type=terms,se.fit=TRUE)$fit)
names(c)-c(pp_s.x0.,pp_s.I.x1.2..,pp_s.x2.)

d-data.frame(predict(b,type=terms,se.fit=TRUE)$se.fit)
names(d)-c(se_s.x0.,se_s.I.x1.2..,se_s.x2.)

# Merge the three datasets:
f=cbind(dat,c,d)

#Restrict the number of variables to the ones I am interested in:
f-f[,c(x0,pp_s.x0., se_s.x0.)]
names(f)

### This data, i.e. f, can now be exported or used for further
### calculations within R

#plot the data
par(mfrow=c(1,1))
plot(pp_s.x0.~x0,type=p,pch=2,cex=0.50, data=f)
sequence=order(f$x0)
lines(f$x0[sequence],f$pp_s.x0.[sequence])
rug(jitter(f$x0))


##


Staffan,

Take a look at ?predict.gam. You need to use type=terms with
se=TRUE to
get
the quantities that you need.

Simon

ps. with binary data,  method=REML or method=ML for the gam fit
are
often
somewhat more reliable than  the default.

On Monday 14 September 2009 19:30, Staffan Roos wrote:

Dear package mgcv users,



I am using package mgcv to describe presence of a migratory bird
species
as
a function of several variables, including year, day number (i.e.
day-of-the-year), duration of survey, latitude and longitude.
Thus, the
global model is:

global_model-gam(present ~ as.factor(year) + s(dayno, k=5) +
s(duration,
k=5) + s(x, k=5) + s(y, k=5), family = binomial(link = logit),
data =
presence, gamma =1.4)

The model works fine, suggesting that all the variables have  
strong,

non-linear, influences on the probability of detecting the
species. My
main
interest is in the effect dayno has on presence, given the
inclusion of
the other explanatory variables. Thus, I would like to extract the
values
of the partial prediction of dayno and its associated 2 standard
errors
above and below the main effect, as shown by the code
plot(global_model).

I have tried to extract the values by using
fitted.values(global_model,dayno), but when plotted, the figure
doesn't
look like the partial prediction for dayno. Instead, it gives me
a very
scattered figure (shotgun effect).

Has anyone tried to extract the partial predictions? If so, please
could
you advise me how to do this?




Staffan

P.S.. For comparison, please have a look at Simon Woods paper in R
News,
1(2):20-25, June 2001, especially the figures showing the partial
predictions of Mackerel egg densities. Using those graphs as an
example, I
would like to extract the partial predictions for e.g.
s(b.depth), given
the inclusion of the other variables.



David Winsemius, MD
Heritage Laboratories
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David

Re: [R] Fourier Transform fft help

2009-09-16 Thread delic


After extensive research I found some information hinting at my freq array
being wrong. I thought that the data spanned -F/2 to F/2. It seems that I
should only plot the first half of the fft vector? Can anyone confirm or
shed some light on this?

One other thing I thought the magnitude should be 10 for 10* sin(2*pi*10*t),
my magnitudes are much higher.



delic wrote:
 
 I wrote a script that I anticipating seeing a spike at  10Hz with the
 function 10* sin(2*pi*10*t). 
 I can't figure out why my plots  do not show spikes at the frequencies I
 expect. Am I doing something wrong or is my expectations wrong?
 
 
 require(stats)
 layout(matrix(c(1,2,3), 3, 1, byrow = TRUE))
 #SETUP
n- 256 #nextn(1001) gives next power 2
F- 100 #Hz -50 to 50 Hz
dt   - 1/F
T- n*dt
df   - 1/T
t- seq(0,T,by=dt)  #also try ts function
t-t[1:length(t)-1]
freq - 5 #Hz
 
 #SIGNAL FUNCTION
y - 10* sin(2*pi*10*t)   #10*sin(2*pi*freq*t) 
 #FREQ ARRAY
#f - seq(-F/2,F/2,by=df)
   f - F/2*seq(-1,1,length.out=n+1)
f-f[1:length(f)-1]
 #FOURIER WORK
Y - fft(y)
mag   - sqrt(Re(Y)^2+Im(Y)^2)
phase - atan(Im(Y)/Re(Y))
Yr- Re(Y)
Yi- Im(Y)
 
 #PLOT SIGNALS
plot(t,y,type=l,xlim=c(0,T)) 
plot(f,Re(Y),type=l)
plot(f,Im(Y),type=l)
 

-- 
View this message in context: 
http://www.nabble.com/Fourier-Transform-fft-help-tp25475063p25477281.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] re-code missing value

2009-09-16 Thread Steve Hong

Dear all,

I have partial data set with four colums.  First column is site with three
factors (i.e., A, B, and C).  From second to fourth columns (v1 ~ v3) are my
observations.  In the observations of the data set, . indicates missing
value.  I replaced . with NA.  To replace . with NA, I used two steps.
First, I replaced . with NA, and then, changed each variable from factor
to numeric using df1$v1 - as.numeric(df1$v1).  The second step was OK
when I have low numbers of variables, however, it is painful when I have a
lot of variables.

My question is: Is there any much more efficient way to convert this kind of
large scale data?  In short, I am looking for an alternative way of STEP 2.
Or whole procedure if there is.

Any comment will be highly appreciated.

Thank you in advance!!

Steve

P.S.: Below is an example of what I did.

STEP 1
 df1
   site   v1  v2 v3
1 A   10   5  .
2 A   22  54  .
3 A   44 214  2
4 A  521  14  4
5 A5  73  1
6 A 1654 0.4  4
7 B   16   1  .
8 B.   4  5
9 B.   .  4
10B.   4  1
11B   51   .  2
12B5   .  .
13C1 0.4  .
14C0   4  .
15C1   1  4
16C   40   .  7
17C4   .  7
18C   10   .  1
 str(df1)
'data.frame':   18 obs. of  4 variables:
 $ site: Factor w/ 3 levels A,B,C: 1 1 1 1 1 1 2 2 2 2 ...
 $ v1  : Factor w/ 13 levels .,0,1,10,..: 4 7 10 13 11 6 5 1 1 1 ...
 $ v2  : Factor w/ 9 levels .,0.4,1,..: 7 8 5 4 9 2 3 6 1 6 ...
 $ v3  : Factor w/ 6 levels .,1,2,4,..: 1 1 3 4 2 4 1 5 4 2 ...
 df1[df1==.] - NA
Warning messages:
1: In `[-.factor`(`*tmp*`, thisvar, value = NA) :
  invalid factor level, NAs generated
2: In `[-.factor`(`*tmp*`, thisvar, value = NA) :
  invalid factor level, NAs generated
3: In `[-.factor`(`*tmp*`, thisvar, value = NA) :
  invalid factor level, NAs generated
 df1
   site   v1   v2   v3
1 A   105 NA
2 A   22   54 NA
3 A   44  2142
4 A  521   144
5 A5   731
6 A 1654  0.44
7 B   161 NA
8 B NA45
9 B NA NA4
10B NA41
11B   51 NA2
12B5 NA NA
13C1  0.4 NA
14C04 NA
15C114
16C   40 NA7
17C4 NA7
18C   10 NA1
 str(df1)
'data.frame':   18 obs. of  4 variables:
 $ site: Factor w/ 3 levels A,B,C: 1 1 1 1 1 1 2 2 2 2 ...
 $ v1  : Factor w/ 13 levels .,0,1,10,..: 4 7 10 13 11 6 5 NA NA NA
...
 $ v2  : Factor w/ 9 levels .,0.4,1,..: 7 8 5 4 9 2 3 6 NA 6 ...
 $ v3  : Factor w/ 6 levels .,1,2,4,..: NA NA 3 4 2 4 NA 5 4 2 ...

STEP 2.

 df1$v1 - as.numeric(df1$v1)
 df1$v2 - as.numeric(df1$v2)
 df1$v3 - as.numeric(df1$v3)
 df1
   site v1 v2 v3
1 A  4  7 NA
2 A  7  8 NA
3 A 10  5  3
4 A 13  4  4
5 A 11  9  2
6 A  6  2  4
7 B  5  3 NA
8 B NA  6  5
9 B NA NA  4
10B NA  6  2
11B 12 NA  3
12B 11 NA NA
13C  3  2 NA
14C  2  6 NA
15C  3  3  4
16C  9 NA  6
17C  8 NA  6
18C  4 NA  2
 str(df1)
'data.frame':   18 obs. of  4 variables:
 $ site: Factor w/ 3 levels A,B,C: 1 1 1 1 1 1 2 2 2 2 ...
 $ v1  : num  4 7 10 13 11 6 5 NA NA NA ...
 $ v2  : num  7 8 5 4 9 2 3 6 NA 6 ...
 $ v3  : num  NA NA 3 4 2 4 NA 5 4 2 ...


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to extract partial predictions, package mgcv

2009-09-16 Thread Staffan Roos


Yes, you're correct David, I used attach(f) earlier. Thanks for clarifying! I
should change my own code accordingly. 
/Staffan


David Winsemius wrote:
 
 Your code minus the extraneous lines  was:
 
 #plot the data
 plot(pp_s.x0.~x0,type=p,pch=2,cex=0.50)
 sequence=order(x0)
 lines(x0[sequence],pp_s.x0.[sequence])
 rug(jitter(x0))
 
 My edition was:
 
 plot(pp_s.x0.~x0,type=p,pch=2,cex=0.50, data=f)
 sequence=order(f$x0)
 lines(f$x0[sequence],f$pp_s.x0.[sequence])
 rug(jitter(f$x0))
 
 So almost every one of the last set of plotting needed one or more  
 additions
 of f$ or data=f in order to run without error. Perhaps you  
 attached f
 earlier and didn't include that code? I personally have given up using  
 attach()
 for just that reason.
 
 -- 
 David.
 
 
 On Sep 16, 2009, at 1:31 PM, Staffan Roos wrote:
 

 Hi David,

 Yes, the strange code lines was clearly a mistake from my side.  
 Sorry.
 What dataframe references did you add in your code?

 /Staffan



 David Winsemius wrote:

 But removing the extraneous second to last line that says just:

 lines

 ... would leave the console looking less puzzling.

 --  
 David
 On Sep 16, 2009, at 1:02 PM, David Winsemius wrote:

 It is nice to see worked examples, but there were some errors that
 relate to not including dataframe references. I've paste in code  
 that
 runs without error on my machine:

 On Sep 16, 2009, at 12:39 PM, Staffan Roos wrote:


 Thanks Simon,

 I wasn't aware that predict.gam could be used in this  
 situation. I
 followed you advice and managed to extract the data I needed.

 For people interested, I add the code with comments I used here:

 #
 # Full code for extracting partial predictions
 # Start the package mgcv
 library(mgcv)

 # Simulate some data (this is from Simon Wood's example in the
 # package mgcv manual)
 n-200
 sig - 2
 dat - gamSim(1,n=n,scale=sig)

 # Check the data
 dat

 ## It looks alright :-)

 # Run the GAM
 b-gam(y~s(x0)+s(I(x1^2))+s(x2)+offset(x3),data=dat)

 # Plot the partial predictions (You may need to rescale your window
 to
 # see the non-linearity
 par(mfrow=c(1,3))
 plot(b, scale=0)

 ### Clearly, the variables x0 and x2 are non-linear!

 # I would like to extract the solid line in the graphs and the
 # associated standard errors from the plots. Thus, I use predict
 # and change to a data.frame:
 c-data.frame(predict(b,type=terms,se.fit=TRUE)$fit)
 names(c)-c(pp_s.x0.,pp_s.I.x1.2..,pp_s.x2.)

 d-data.frame(predict(b,type=terms,se.fit=TRUE)$se.fit)
 names(d)-c(se_s.x0.,se_s.I.x1.2..,se_s.x2.)

 # Merge the three datasets:
 f=cbind(dat,c,d)

 #Restrict the number of variables to the ones I am interested in:
 f-f[,c(x0,pp_s.x0., se_s.x0.)]
 names(f)

 ### This data, i.e. f, can now be exported or used for further
 ### calculations within R

 #plot the data
 par(mfrow=c(1,1))
 plot(pp_s.x0.~x0,type=p,pch=2,cex=0.50, data=f)
 sequence=order(f$x0)
 lines(f$x0[sequence],f$pp_s.x0.[sequence])
 rug(jitter(f$x0))

 ##


 Staffan,

 Take a look at ?predict.gam. You need to use type=terms with
 se=TRUE to
 get
 the quantities that you need.

 Simon

 ps. with binary data,  method=REML or method=ML for the gam fit
 are
 often
 somewhat more reliable than  the default.

 On Monday 14 September 2009 19:30, Staffan Roos wrote:
 Dear package mgcv users,



 I am using package mgcv to describe presence of a migratory bird
 species
 as
 a function of several variables, including year, day number (i.e.
 day-of-the-year), duration of survey, latitude and longitude.
 Thus, the
 global model is:

 global_model-gam(present ~ as.factor(year) + s(dayno, k=5) +
 s(duration,
 k=5) + s(x, k=5) + s(y, k=5), family = binomial(link = logit),
 data =
 presence, gamma =1.4)

 The model works fine, suggesting that all the variables have  
 strong,
 non-linear, influences on the probability of detecting the
 species. My
 main
 interest is in the effect dayno has on presence, given the
 inclusion of
 the other explanatory variables. Thus, I would like to extract the
 values
 of the partial prediction of dayno and its associated 2 standard
 errors
 above and below the main effect, as shown by the code
 plot(global_model).

 I have tried to extract the values by using
 fitted.values(global_model,dayno), but when plotted, the figure
 doesn't
 look like the partial prediction for dayno. Instead, it gives me
 a very
 scattered figure (shotgun effect).

 Has anyone tried to extract the partial predictions? If so, please
 could
 you advise me how to do this?


 Staffan

 P.S.. For comparison, please have a look at Simon Woods paper in R
 News,
 1(2):20-25, June 2001, especially the figures showing the partial
 predictions of Mackerel egg densities. Using those graphs as an
 example, I
 would like to extract the partial predictions for e.g.
 s(b.depth), given
 the inclusion of the other variables.


 David Winsemius, MD
 Heritage Laboratories
 West

Re: [R] Turnpoints

2009-09-16 Thread Steve Lianoglou


Hi,

On Sep 16, 2009, at 1:42 PM, ogbos okike wrote:

Good evening to everybody. I have a data of four columns - three of  
the

columns represent date while the fourth is counts.  Setting 4000 as my
minimum point/value, I wish to find out the number of counts that  
are less
or equal to 4000 and the dates they occur. I have installed pastecs  
and have
been trying to use turnpoints function to do this. I have not been  
much
lucky. I am a still learning. I have added part of my data here  
where y

stands for year, m month, d day and lastly the count.
I would be grateful to anyone who could show me the way the do this.
Best regards
Ogbos

y  m   d  count
93 02 07 3974.6
93 02 08 3976.7
93 02 09 3955.2
93 02 10 3955.0
93 02 11 3971.8
93 02 12 3972.8
93 02 13 3961.0
93 02 14 3972.8
93 02 15 4008.0
93 02 16 4004.2
93 02 17 3981.2
93 02 18 3996.8
93 02 19 4028.2
93 02 20 4029.5
93 02 21 3953.4
93 02 22 3857.3
93 02 23 3848.3
93 02 24 3869.8
93 02 25 3898.1
93 02 26 3920.5
93 02 27 3936.7
93 02 28 3931.9


Assume your data is stored in a variable called df
Look at what this gives you:

R less4000 - df$count  4000

1. You can sum that vector to tell you how many rows have count less  
than 4000

2. You can use it to select out of the vector.

R df[less4000,]

You can also use the subset function

R subset(df, count  4000)

-steve

--
Steve Lianoglou
Graduate Student: Computational Systems Biology
  |  Memorial Sloan-Kettering Cancer Center
  |  Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] re-code missing value

2009-09-16 Thread Jorge Ivan Velez

Hi Steve,
Here is a suggestion using your original df1:

# Create a copy  -- you can avoid this
newdf1 - df1

# Process
newdf1[,2:4] - apply(newdf1[,2:4], 2, function(x) as.numeric(x))

# Removing df1
rm(df1)

# Result
newdf1

# str()
 str(newdf1)
# 'data.frame':   18 obs. of  4 variables:
#  $ site: Factor w/ 3 levels A,B,C: 1 1 1 1 1 1 2 2 2 2 ...
#  $ v1  : num  10 22 44 521 5 ...
#  $ v2  : num  5 54 214 14 73 0.4 1 4 NA 4 ...
#  $ v3  : num  NA NA 2 4 1 4 NA 5 4 1 ...

HTH,
Jorge

On Wed, Sep 16, 2009 at 1:50 PM, Steve Hong  wrote:

 Dear all,

 I have partial data set with four colums.  First column is site with
 three
 factors (i.e., A, B, and C).  From second to fourth columns (v1 ~ v3) are
 my
 observations.  In the observations of the data set, . indicates missing
 value.  I replaced . with NA.  To replace . with NA, I used two steps.
 First, I replaced . with NA, and then, changed each variable from factor
 to numeric using df1$v1 - as.numeric(df1$v1).  The second step was OK
 when I have low numbers of variables, however, it is painful when I have a
 lot of variables.

 My question is: Is there any much more efficient way to convert this kind
 of
 large scale data?  In short, I am looking for an alternative way of STEP 2.
 Or whole procedure if there is.

 Any comment will be highly appreciated.

 Thank you in advance!!

 Steve

 P.S.: Below is an example of what I did.

 STEP 1
  df1
   site   v1  v2 v3
 1 A   10   5  .
 2 A   22  54  .
 3 A   44 214  2
 4 A  521  14  4
 5 A5  73  1
 6 A 1654 0.4  4
 7 B   16   1  .
 8 B.   4  5
 9 B.   .  4
 10B.   4  1
 11B   51   .  2
 12B5   .  .
 13C1 0.4  .
 14C0   4  .
 15C1   1  4
 16C   40   .  7
 17C4   .  7
 18C   10   .  1
  str(df1)
 'data.frame':   18 obs. of  4 variables:
  $ site: Factor w/ 3 levels A,B,C: 1 1 1 1 1 1 2 2 2 2 ...
  $ v1  : Factor w/ 13 levels .,0,1,10,..: 4 7 10 13 11 6 5 1 1 1
 ...
  $ v2  : Factor w/ 9 levels .,0.4,1,..: 7 8 5 4 9 2 3 6 1 6 ...
  $ v3  : Factor w/ 6 levels .,1,2,4,..: 1 1 3 4 2 4 1 5 4 2 ...
  df1[df1==.] - NA
 Warning messages:
 1: In `[-.factor`(`*tmp*`, thisvar, value = NA) :
  invalid factor level, NAs generated
 2: In `[-.factor`(`*tmp*`, thisvar, value = NA) :
  invalid factor level, NAs generated
 3: In `[-.factor`(`*tmp*`, thisvar, value = NA) :
  invalid factor level, NAs generated
  df1
   site   v1   v2   v3
 1 A   105 NA
 2 A   22   54 NA
 3 A   44  2142
 4 A  521   144
 5 A5   731
 6 A 1654  0.44
 7 B   161 NA
 8 B NA45
 9 B NA NA4
 10B NA41
 11B   51 NA2
 12B5 NA NA
 13C1  0.4 NA
 14C04 NA
 15C114
 16C   40 NA7
 17C4 NA7
 18C   10 NA1
  str(df1)
 'data.frame':   18 obs. of  4 variables:
  $ site: Factor w/ 3 levels A,B,C: 1 1 1 1 1 1 2 2 2 2 ...
  $ v1  : Factor w/ 13 levels .,0,1,10,..: 4 7 10 13 11 6 5 NA NA NA
 ...
  $ v2  : Factor w/ 9 levels .,0.4,1,..: 7 8 5 4 9 2 3 6 NA 6 ...
  $ v3  : Factor w/ 6 levels .,1,2,4,..: NA NA 3 4 2 4 NA 5 4 2 ...

 STEP 2.

  df1$v1 - as.numeric(df1$v1)
  df1$v2 - as.numeric(df1$v2)
  df1$v3 - as.numeric(df1$v3)
  df1
   site v1 v2 v3
 1 A  4  7 NA
 2 A  7  8 NA
 3 A 10  5  3
 4 A 13  4  4
 5 A 11  9  2
 6 A  6  2  4
 7 B  5  3 NA
 8 B NA  6  5
 9 B NA NA  4
 10B NA  6  2
 11B 12 NA  3
 12B 11 NA NA
 13C  3  2 NA
 14C  2  6 NA
 15C  3  3  4
 16C  9 NA  6
 17C  8 NA  6
 18C  4 NA  2
  str(df1)
 'data.frame':   18 obs. of  4 variables:
  $ site: Factor w/ 3 levels A,B,C: 1 1 1 1 1 1 2 2 2 2 ...
  $ v1  : num  4 7 10 13 11 6 5 NA NA NA ...
  $ v2  : num  7 8 5 4 9 2 3 6 NA 6 ...
  $ v3  : num  NA NA 3 4 2 4 NA 5 4 2 ...
 

[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Getting best three dimentions of data after dimentionality reduction (using kpca)

2009-09-16 Thread Abbas R. Ali


Hi
 
I am using kernlab package and kpca function for dimensionality reduction of my 
data. Can any body tell me that how can I get best three dimensions from output 
of kpca function, which are principal component vector, eignvalues, rotated 
vector, and original matrix. I want to get back my original matrix with reduced 
dimensions (best three dimensions).
 
Thanks
Abbas
 
 


  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Example data for analysis of a highly pseudoreplicate mixed-effects experiment

2009-09-16 Thread Matthias Gralle

There were some wrong NA values in the provided data set, this is now 
corrected.

The data can be read in as

small=read.csv(small.csv,colClasses=c(character,rep(integer,2),rep(factor,5)))

The high number of residual df can be seen using the nlme package (can 
it be seen in the lme4 package, too ?):

library(nlme)
model1=lme(APC~(FITC+species+gene)^3,random=~1|day_repl,method=ML,data=small) 


anova(model1)
 numDF denDF   F-value p-value
(Intercept)   1   789 1365.7400  .0001
FITC  1   789 3040.8168  .0001
species   1   789   24.0521  .0001
gene  1   789   10.4035  0.0013
FITC:species  1   7895.0690  0.0246
FITC:gene 1   789   10.7925  0.0011
species:gene  1   789   72.5823  .0001
FITC:species:gene 1   7894.6742  0.0309

In the original data set, denDF is 200 000.

Once again, how do I correctly account for pseudoreplication, avoiding 
the artificially high number of df ?


Thank you,

Matthias

Matthias Gralle wrote:

Hello everybody,

it may be better to have sample data. I have provided data with less 
levels of gene and day and only ca. 400 data points per condition.


Sample code:
small=as.data.frame(read.csv(small.csv))
small$species=factor(small$species)
small$gene=factor(small$gene)
small$day=factor(small$day)
small$repl=factor(small$repl)
library(lme4)
model1=lmer(APC~(FITC+species+gene)^3+(1|day)+(1|repl),REML=F,data=small)
model2=lmer(APC~(FITC+species+gene)^2+(1|day)+(1|repl),REML=F,data=small)
anova(model1,model2)

gives me a significant difference, but in fact there are far too many 
residual df, and this is much worse in the original data. I suppose I 
cannot trust this difference.


model3=lmer(APC~species*gene+(1|day/repl/FITC),data=small)
Error: length(f1) == length(f2) is not TRUE
In addition: Warning messages:
1: In FITC:(repl:day) :
 numerical expression has 886 elements: only the first used
2: In FITC:(repl:day) :
 numerical expression has 886 elements: only the first used

Can I do nesting without incurring this kind of error ?

And finally
model4=lmer(APC~gene*species+(1|day) + (1|repl) + 
(1+(gene:species)|FITC),data=small)

model5=lmer(APC~gene+species+(1|day) + (1|repl) + (1|FITC),data=small)
anova(model4,model5)

works with this reduced data set, but fails to converge for the 
original, much larger data set. I am unsure if this is the right kind 
of analysis for the data and there is only a problem of convergence, 
or if it is the wrong formula.


Can anybody give me some advice on this problem ? Or should I just 
stick to reducing the data from each condition to a single parameter, 
as explained in my first email below?


Thank you again!

Matthias Gralle wrote:

Hello everybody,

I have been trying for some weeks to state the correct design of my 
experiment as a GLM formula, and have not been able to find something 
appropriate in Pinheiro  Bates, so I am posting it here and hope 
somebody can help me.


In each experimental condition, described by
1) gene (10 levels, fixed, because of high interest to me)
2) species (2 levels, fixed, because of high interest)
3) day (2 levels, random)
4) replicate (2 levels per day, random),

I have several thousand data points consisting of two variables:

5) FITC (level of transfection of a cell)
6) APC (antibody binding to the cell)
Because of intrinsic and uncontrollable cell-to-cell variation, FITC 
varies quite uniformly over a wide range, and APC correlates rather 
well with FITC. In some cases, I pasted day and replicate together as 
day_repl.


My question is the following:

Is there any gene (in my set of 10 genes) where the species makes a 
difference in the relation between FITC and APC ? If yes, in what 
gene does species have an effect ? And what is the effect of the 
species difference ?


My attempts are the following:
1. Fit the data points of each experimental condition to a linear 
equation APC=Intercept+Slope*FITC and analyse the slopes :

lm(Slope~species*gene*day_repl)
This analysis shows clear differences between the genes, but no 
effect of species and no interaction gene:species.


The linear fit to the cells is reasonably good, but of course does 
not represent the data set completely, so I wanted to incorporate the 
complete data set.


2a. lmer(APC~FITC*species*gene+(1|day)+(1|repl))
This gives extremely significant values for any interaction and 
variable because there are 200 000 df. Of course, it cannot be true, 
because the cells are not really independent. I have done many 
variations of the above, e.g.

2b. lmer(APC~FITC*species*gene+(1|day)+(1+FITC|day_repl)),
but they all suffer from the excess of df.

3. lmer(APC~species*gene+(1|day/repl/FITC) gives several warning 
messages like this one:

In repl:day :
numerical expression has 275591 elements: only the first used

4. lmer(APC~gene*species+(1|day_repl) + (1+gene:species|FITC)) ran 
several days, but failed to converge...


Can somebody give me

Re: [R] comma as decimal separator in xtable

2009-09-16 Thread Henrique Dallazuanna

Try this:

sapply(x, format, decimal.mark = ',')

On Wed, Sep 16, 2009 at 2:06 PM, Jakson A. Aquino
jaksonaqu...@gmail.com wrote:
 On Wed, Sep 16, 2009 at 09:14:39AM -0300, Henrique Dallazuanna wrote:
 Try this also;

 format(coef(summary(lm.D9)), decimal.mark = ',')

 or using gsub:

  apply(coef(summary(lm.D9)), 2, gsub, pattern = \\., replacement = ,)

 using lm.D9 object from ?lm example.

 Thanks for your suggestion! The problem with the above approach is
 that all values get the same formatting. For example,

 x - c(1.2, 1.2e10)
 format(x, format = 'g', decimal.mark = ',')
 [1] 1,2e+00 1,2e+10

 What I would like to get was:
 [1] 1,2 1,2e+10

 I already solved my problem, but I'm using an external program to
 replace . with ,. Perhaps I should ask my question again, with a
 new subject, since what I need now is the perl regular expression
 equivalent to the sed command:

 s/\([0-9]\)a/a\1/

 That's is, 1a becomes a1; 3b becomes b3, etc.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Henrique Dallazuanna
Curitiba-Paraná-Brasil
25° 25' 40 S 49° 16' 22 O

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Turnpoints

2009-09-16 Thread David Winsemius



On Sep 16, 2009, at 1:42 PM, ogbos okike wrote:

Good evening to everybody. I have a data of four columns - three of  
the

columns represent date while the fourth is counts.  Setting 4000 as my
minimum point/value, I wish to find out the number of counts that  
are less

or equal to 4000



You already have two good worked solutions from Steve Lianoglou for the
second part. I find table(logical expression) to be a useful  
construct.


 ct - read.table(textConnection(y  m   d  count
+ 93 02 07 3974.6
+ 93 02 08 3976.7
+ 93 02 09 3955.2
+ 93 02 10 3955.0
+ 93 02 11 3971.8
+ 93 02 12 3972.8
+ 93 02 13 3961.0
+ 93 02 14 3972.8
+ 93 02 15 4008.0
+ 93 02 16 4004.2
+ 93 02 17 3981.2
+ 93 02 18 3996.8
+ 93 02 19 4028.2
+ 93 02 20 4029.5
+ 93 02 21 3953.4
+ 93 02 22 3857.3
+ 93 02 23 3848.3
+ 93 02 24 3869.8
+ 93 02 25 3898.1
+ 93 02 26 3920.5
+ 93 02 27 3936.7
+ 93 02 28 3931.9), header=T)

 table(ct$count = 4000)

FALSE  TRUE
418


and the dates they occur. I have installed pastecs and have
been trying to use turnpoints function to do this. I have not been  
much
lucky. I am a still learning. I have added part of my data here  
where y

stands for year, m month, d day and lastly the count.
I would be grateful to anyone who could show me the way the do this.
Best regards
Ogbos



--
David Winsemius, MD
Heritage Laboratories
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] comma as decimal separator in xtable

2009-09-16 Thread Jakson A. Aquino

On Wed, Sep 16, 2009 at 03:16:12PM -0300, Henrique Dallazuanna wrote:
 Try this:
 
 sapply(x, format, decimal.mark = ',')

Yes, this works as I need, regarding the replacement of dots with
commas.

Thanks!

 
 On Wed, Sep 16, 2009 at 2:06 PM, Jakson A. Aquino
 jaksonaqu...@gmail.com wrote:
  On Wed, Sep 16, 2009 at 09:14:39AM -0300, Henrique Dallazuanna wrote:
  Try this also;
 
  format(coef(summary(lm.D9)), decimal.mark = ',')
 
  or using gsub:
 
   apply(coef(summary(lm.D9)), 2, gsub, pattern = \\., replacement = ,)
 
  using lm.D9 object from ?lm example.
 
  Thanks for your suggestion! The problem with the above approach is
  that all values get the same formatting. For example,
 
  x - c(1.2, 1.2e10)
  format(x, format = 'g', decimal.mark = ',')
  [1] 1,2e+00 1,2e+10
 
  What I would like to get was:
  [1] 1,2 1,2e+10

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] T-test to check equality, unable to interpret the results.

2009-09-16 Thread Robert Hall

Hi,
I have the precision values of a system on two different data sets.
The snippets of these results are as shown:

sample1: (total 194 samples)
0.600238
0.800119
0.600238
0.200030
0.600238
...
...

sample2: (total 188 samples)
0.8001
0.2000
0.8001
0.
0.8001
0.4001
...
...

I want to check if these results are statistically significant? Intuitively,
the similarity in the two results mean the results are statistically
significant.
I am using the t-test t.test(sample1,sample2)to check for similarity amongst
the two results.
I get the following output:

---
Welch Two Sample t-test

data:  s1p5 and s2p5
t = 0.9778, df = 374.904, p-value = 0.3288
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -0.03170059  0.09441172
sample estimates:
mean of x mean of y
0.5138298 0.4824742


I believe the t-test checks for difference amongst the two sets, and p-value
 0.05 means both thesets are statistically different. Here while checking
for dissimilarity the p-value is 0.3288, does it mean that higher the
p-value (while t.test checks for dis-similarity) means more similar the
results are (which is the case above as the means of the results are very
close!)
Please help me interpret the results..
thanks in advance!

--
Rob Hall
Masters Student
ANU

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] T-test to check equality, unable to interpret the results.

2009-09-16 Thread Erik Iverson

Robert, 

We unfortunately do not have enough information to help you interpret the 
results, and this is not really an R question at all, but general statistical 
advice.  You will probably have much better understanding and confidence in 
your results by consulting a local statistical consultant at your university.

The values shown for sample 1 and sample 2 lead me to believe that these are 
not drawn from a homogeneous population.  Whoever ends up helping you is going 
to need to know how these measurements were obtained in much greater detail 
than you've given here. 

Best Regards,
Erik Iverson 

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Robert Hall
Sent: Wednesday, September 16, 2009 1:55 PM
To: r-help
Subject: [R] T-test to check equality, unable to interpret the results.

Hi,
I have the precision values of a system on two different data sets.
The snippets of these results are as shown:

sample1: (total 194 samples)
0.600238
0.800119
0.600238
0.200030
0.600238
...
...

sample2: (total 188 samples)
0.8001
0.2000
0.8001
0.
0.8001
0.4001
...
...

I want to check if these results are statistically significant? Intuitively,
the similarity in the two results mean the results are statistically
significant.
I am using the t-test t.test(sample1,sample2)to check for similarity amongst
the two results.
I get the following output:

---
Welch Two Sample t-test

data:  s1p5 and s2p5
t = 0.9778, df = 374.904, p-value = 0.3288
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -0.03170059  0.09441172
sample estimates:
mean of x mean of y
0.5138298 0.4824742


I believe the t-test checks for difference amongst the two sets, and p-value
 0.05 means both thesets are statistically different. Here while checking
for dissimilarity the p-value is 0.3288, does it mean that higher the
p-value (while t.test checks for dis-similarity) means more similar the
results are (which is the case above as the means of the results are very
close!)
Please help me interpret the results..
thanks in advance!

--
Rob Hall
Masters Student
ANU

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] T-test to check equality, unable to interpret the results.

2009-09-16 Thread Steve Lianoglou


Hi,

I was just going to send this when I saw Erik's post. He's right -- we  
can't say anything about your data, but we can say something about  
using a t-test.


I'm not a real statistician, so this answer isn't very rigorous, but  
might be helpful.


On Sep 16, 2009, at 2:55 PM, Robert Hall wrote:

I believe the t-test checks for difference amongst the two sets, and  
p-value

 0.05 means both thesets are statistically different.


A t-test is used to check if the difference in the mean of two samples  
is statistically significant.


The null hypothesis is that the two means are not different.

If you reject the null, it means you have reason to believe that the  
means of the two samples are different.


See the uses section here:

http://en.wikipedia.org/wiki/Student's_t-test


Here while checking
for dissimilarity the p-value is 0.3288, does it mean that higher the
p-value (while t.test checks for dis-similarity) means more similar  
the
results are (which is the case above as the means of the results are  
very

close!)
Please help me interpret the results..


Your intuition is essentially correct. In general, the higher the p- 
value (in any statistical test), the less confident you should be that  
rejecting the null hypothesis is a good idea.


-steve

--
Steve Lianoglou
Graduate Student: Computational Systems Biology
  |  Memorial Sloan-Kettering Cancer Center
  |  Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] apply function across two variables by mult factors

2009-09-16 Thread Jon Loehrke

Greetings,

I am attempting to run a function, which produces a vector and  
requires two input variables, across two nested factor levels.  I can  
do this using by(X, list(factor1, factor2), function), however I  
haven't found a simple way to extract the list output into an  
organized vector form.  I can do this using nested loops but it isn't  
exactly an optimal approach.

Thank you for any and all suggestions.  Jon

# example data frame
testDF-data.frame(
x=rnorm(12),
y=rnorm(12),
f1=gl(3,4),
f2=gl(2,2,12))

# example function [trivial]
testFun-function(x){
X-abs(x[,1]);
Y-abs(x[,2]);
as.numeric( paste(round(X), round(Y), sep='.'))
}

# apply by factor levels but hard to extract values
by(testDF[,1:2], list(testDF$f1, testDF$f2), testFun)

# Loop works, but not efficient for large datasets
testDF$value-NA
for(i in levels(testDF$f1)){
for(j in levels(testDF$f2)){
testDF[testDF$f1==i  
testDF$f2==j,]$value-testFun(testDF[testDF 
$f1==i  testDF$f2==j,1:2])
}
}
testDF
sessionInfo()
#R version 2.9.1 Patched (2009-08-07 r49093)
#i386-apple-darwin8.11.1
#
#locale:
#en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
#
#attached base packages:
#[1] stats graphics  grDevices utils datasets  methods   base


Jon Loehrke
Graduate Research Assistant
Department of Fisheries Oceanography
School for Marine Science and Technology
University of Massachusetts
200 Mill Road, Suite 325
Fairhaven, MA 02719
jloeh...@umassd.edu
T 508-910-6393
F 508-910-6396


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] apply function across two variables by mult factors

2009-09-16 Thread Erik Iverson

Hello, 

 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
 On Behalf Of Jon Loehrke
 Sent: Wednesday, September 16, 2009 2:23 PM
 To: r-help@r-project.org
 Subject: [R] apply function across two variables by mult factors

 Greetings,

 I am attempting to run a function, which produces a vector and
 requires two input variables, across two nested factor levels.  I can
 do this using by(X, list(factor1, factor2), function), however I
 haven't found a simple way to extract the list output into an
 organized vector form.  I can do this using nested loops but it isn't
 exactly an optimal approach.

 Thank you for any and all suggestions.  Jon

 # example data frame
 testDF-data.frame(
   x=rnorm(12),
   y=rnorm(12),
   f1=gl(3,4),
   f2=gl(2,2,12))

Try this using lapply, split, mapply?  Maybe it is in a nicer output object for 
you?  

testFun2 - function(x, y) {
  X - abs(x);
  Y - abs(y);
  as.numeric(paste(round(X), round(Y), sep='.'))
}

lapply(split(testDF, list(testDF$f1, testDF$f2)),
   function(x) mapply(testFun2, x[1], x[2]))

 # example function [trivial]
 testFun-function(x){
   X-abs(x[,1]);
   Y-abs(x[,2]);
   as.numeric( paste(round(X), round(Y), sep='.'))
   }

 # apply by factor levels but hard to extract values
 by(testDF[,1:2], list(testDF$f1, testDF$f2), testFun)

 # Loop works, but not efficient for large datasets
 testDF$value-NA
 for(i in levels(testDF$f1)){
   for(j in levels(testDF$f2)){
   testDF[testDF$f1==i  testDF$f2==j,]$value-
 testFun(testDF[testDF
 $f1==i  testDF$f2==j,1:2])
   }
   }
 testDF
 sessionInfo()
 #R version 2.9.1 Patched (2009-08-07 r49093)
 #i386-apple-darwin8.11.1
 #
 #locale:
 #en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
 #
 #attached base packages:
 #[1] stats graphics  grDevices utils datasets  methods   base

 Jon Loehrke
 Graduate Research Assistant
 Department of Fisheries Oceanography
 School for Marine Science and Technology
 University of Massachusetts
 200 Mill Road, Suite 325
 Fairhaven, MA 02719
 jloeh...@umassd.edu
 T 508-910-6393
 F 508-910-6396

   [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-
 guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] apply function across two variables by mult factors

2009-09-16 Thread Erik Iverson

One correction below, 

---snip---

 
  # example data frame
  testDF-data.frame(
  x=rnorm(12),
  y=rnorm(12),
  f1=gl(3,4),
  f2=gl(2,2,12))
 
 
 Try this using lapply, split, mapply?  Maybe it is in a nicer output
 object for you?
 
 testFun2 - function(x, y) {
   X - abs(x);
   Y - abs(y);
   as.numeric(paste(round(X), round(Y), sep='.'))
 }
 
 lapply(split(testDF, list(testDF$f1, testDF$f2)),
function(x) mapply(testFun2, x[1], x[2]))
 

Or use list indexing in the mapply call to get a vector, in this case at 
least...

lapply(split(testDF, list(testDF$f1, testDF$f2)),
   function(x) mapply(testFun2, x[[1]], x[[2]]))

---snip---

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] apply function across two variables by mult factors

2009-09-16 Thread Jorge Ivan Velez

Hi Jon,
Here is a suggestion:

foo - function(x) as.numeric( paste( abs( round( x ) ), collapse = . ) )
testDF$value - apply( testDF[,1:2], 1, foo )
testDF

HTH,
Jorge


On Wed, Sep 16, 2009 at 3:22 PM, Jon Loehrke jloeh...@umassd.edu wrote:

 Greetings,

 I am attempting to run a function, which produces a vector and
 requires two input variables, across two nested factor levels.  I can
 do this using by(X, list(factor1, factor2), function), however I
 haven't found a simple way to extract the list output into an
 organized vector form.  I can do this using nested loops but it isn't
 exactly an optimal approach.

 Thank you for any and all suggestions.  Jon

 # example data frame
 testDF-data.frame(
x=rnorm(12),
y=rnorm(12),
f1=gl(3,4),
f2=gl(2,2,12))

 # example function [trivial]
 testFun-function(x){
X-abs(x[,1]);
Y-abs(x[,2]);
as.numeric( paste(round(X), round(Y), sep='.'))
}

 # apply by factor levels but hard to extract values
 by(testDF[,1:2], list(testDF$f1, testDF$f2), testFun)

 # Loop works, but not efficient for large datasets
 testDF$value-NA
 for(i in levels(testDF$f1)){
for(j in levels(testDF$f2)){
testDF[testDF$f1==i 
 testDF$f2==j,]$value-testFun(testDF[testDF
 $f1==i  testDF$f2==j,1:2])
}
}
 testDF
 sessionInfo()
 #R version 2.9.1 Patched (2009-08-07 r49093)
 #i386-apple-darwin8.11.1
 #
 #locale:
 #en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
 #
 #attached base packages:
 #[1] stats graphics  grDevices utils datasets  methods   base


 Jon Loehrke
 Graduate Research Assistant
 Department of Fisheries Oceanography
 School for Marine Science and Technology
 University of Massachusetts
 200 Mill Road, Suite 325
 Fairhaven, MA 02719
 jloeh...@umassd.edu
 T 508-910-6393
 F 508-910-6396


[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] apply function across two variables by mult factors

2009-09-16 Thread Gabor Grothendieck

Try this:

transform(testDF, value = as.numeric(paste(round(abs(x)),
round(abs(y)), sep = .)))


On Wed, Sep 16, 2009 at 3:22 PM, Jon Loehrke jloeh...@umassd.edu wrote:
 Greetings,

 I am attempting to run a function, which produces a vector and
 requires two input variables, across two nested factor levels.  I can
 do this using by(X, list(factor1, factor2), function), however I
 haven't found a simple way to extract the list output into an
 organized vector form.  I can do this using nested loops but it isn't
 exactly an optimal approach.

 Thank you for any and all suggestions.  Jon

 # example data frame
 testDF-data.frame(
        x=rnorm(12),
        y=rnorm(12),
        f1=gl(3,4),
        f2=gl(2,2,12))

 # example function [trivial]
 testFun-function(x){
        X-abs(x[,1]);
        Y-abs(x[,2]);
        as.numeric(     paste(round(X), round(Y), sep='.'))
        }

 # apply by factor levels but hard to extract values
 by(testDF[,1:2], list(testDF$f1, testDF$f2), testFun)

 # Loop works, but not efficient for large datasets
 testDF$value-NA
 for(i in levels(testDF$f1)){
        for(j in levels(testDF$f2)){
                testDF[testDF$f1==i  
 testDF$f2==j,]$value-testFun(testDF[testDF
 $f1==i  testDF$f2==j,1:2])
                }
        }
 testDF
 sessionInfo()
 #R version 2.9.1 Patched (2009-08-07 r49093)
 #i386-apple-darwin8.11.1
 #
 #locale:
 #en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
 #
 #attached base packages:
 #[1] stats     graphics  grDevices utils     datasets  methods   base


 Jon Loehrke
 Graduate Research Assistant
 Department of Fisheries Oceanography
 School for Marine Science and Technology
 University of Massachusetts
 200 Mill Road, Suite 325
 Fairhaven, MA 02719
 jloeh...@umassd.edu
 T 508-910-6393
 F 508-910-6396


        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] T-test to check equality, unable to interpret the results.

2009-09-16 Thread Bert Gunter

I am loathe to expound basic statistics here  ... but,  at the considerable
risk of pedantry, I must note that Steve's reply below contains fundamental
errors, which I feel should not be left on this list unremarked: t-tests do
**not** test for differences in **sample** means; they test for differences
in **population** means. The sample means are different. Period.

Furthermore, the null can be other than equality -- e.g. that the mean  of
the first population is less than the second.

Finally, statistically different is a meaningless phrase. P .05 means
that assuming the underlying assumptions at least approximately hold (and
an operational definition of approximately hold means is a technical
discussion unto itself), then were this calculation to be repeated over and
over again with samples of data from populations for which the null is, in
fact, true, the expected (long run) proportion of times the null will be
rejected is  .05 (the standard frequentist interpretation). For any
**particular** pairs of samples, the probability of falsely rejecting when
the null holds is either 1 or 0 -- either you rejected or not.

I would not bother with this were it not for the fact that Steve's apparent
confusion -- or at least imprecise statements -- is widespread among
scientists, in my experience, and leads to frequent misapplications and
misinterpretations of significance testing. The woes of Stat 101 training.

... But that's another diatribe...

Bert Gunter
Genentech Nonclinical Biostatistics

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On
Behalf Of Steve Lianoglou
Sent: Wednesday, September 16, 2009 12:19 PM
To: Robert Hall
Cc: r-help
Subject: Re: [R] T-test to check equality, unable to interpret the results.

Hi,

I was just going to send this when I saw Erik's post. He's right -- we  
can't say anything about your data, but we can say something about  
using a t-test.

I'm not a real statistician, so this answer isn't very rigorous, but  
might be helpful.

On Sep 16, 2009, at 2:55 PM, Robert Hall wrote:

 I believe the t-test checks for difference amongst the two sets, and  
 p-value
  0.05 means both thesets are statistically different.

A t-test is used to check if the difference in the mean of two samples  
is statistically significant.

The null hypothesis is that the two means are not different.

If you reject the null, it means you have reason to believe that the  
means of the two samples are different.

See the uses section here:

http://en.wikipedia.org/wiki/Student's_t-test

 Here while checking
 for dissimilarity the p-value is 0.3288, does it mean that higher the
 p-value (while t.test checks for dis-similarity) means more similar  
 the
 results are (which is the case above as the means of the results are  
 very
 close!)
 Please help me interpret the results..

Your intuition is essentially correct. In general, the higher the p- 
value (in any statistical test), the less confident you should be that  
rejecting the null hypothesis is a good idea.

-steve

--
Steve Lianoglou
Graduate Student: Computational Systems Biology
   |  Memorial Sloan-Kettering Cancer Center
   |  Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

1 2 >

1 - 100 of 155 matches

Mail list logo