date:20100625

Re: [R] Assigning variable value as name to cbind column

2010-06-25 Thread Bill.Venables

Why does the naming have to be done inside the cbind()?

How about

 dataTest - data.frame(col1 = c(1,2,3))
 new.data - c(1,2)
 name - test

 length(new.data) - nrow(dataTest)
 newDataTest - cbind(dataTest, new.data)
 names(newDataTest)[[ncol(newDataTest)]] - name
 newDataTest
  col1 test
111
222
33   NA

? 

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Ralf B
Sent: Friday, 25 June 2010 3:48 PM
To: r-help@r-project.org
Subject: [R] Assigning variable value as name to cbind column

Hi all,

I have this (non-working) script:

dataTest - data.frame(col1=c(1,2,3))
new.data - c(1,2)
name - test
n.row - dim(dataTest)[1]
length(new.data) - n.row
names(new.data) - name
cbind(dataTest, name=new.data)
print(dataTest)

and would like to bind the new column 'new.data' to 'dataTest' by
using the value of the variable 'name' as the column name.

The end result should look like this:

  col1 test
1  1  1
2  2  2
3  3  NA


The best I got was that 'name' became the column name but never the
actual value of 'name'. How can i do that?

(This is actually a function that runs many time -- this means a
manual workaround is not feasible).

Ralf

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Wilcoxon signed rank test and its requirements

2010-06-25 Thread Atte Tenkanen

The values come from this kind of process:
The musical composition is segmented into so-called 'pitch-class segments' and 
these segments are compared with one reference set with a distance function. 
Only some distance values are possible. These distance values can be averaged 
over music bars which produces smoother distribution and the 'comparison curve' 
that illustrates the distances according to the reference set through a musical 
piece result in more readable curve (see e.g. 
http://users.utu.fi/attenka/with6.jpg ), but I would prefer to use original 
values.

then, I want to pick only some regions from the piece and compare those values 
of those regions, whether they are higher than the mean of all values. 

Atte

 On Jun 24, 2010, at 6:58 PM, Atte Tenkanen wrote:
 
  Is there anything for me?
 
  There is a lot of data, n=2418, but there are also a lot of ties.
  My sample n≈250-300
 
 
 I do not understand why there should be so many ties. You have not  
 described the measurement process or units. ( ... although you offer a 
  
 glipmse without much background  later.)
 
  i would like to test, whether the mean of the sample differ  
  significantly from the population mean.
 
 Why? What is the purpose of this investigation? Why should the mean of 
  
 a sample be that important?
 
 
  The histogram of the population looks like in attached histogram,  
  what test should I use? No choices?
 
  This distribution comes from a musical piece and the values are  
  'tonal distances'.
 
  http://users.utu.fi/attenka/Hist.png
 
 That picture does not offer much insidght into the features of that  
 measurement. It appears to have much more structure than I would  
 expect for a sample from a smooth unimodal underlying population.
 
 -- 
 David.
 
 
  Atte
 
  On 06/24/2010 12:40 PM, David Winsemius wrote:
 
  On Jun 23, 2010, at 9:58 PM, Atte Tenkanen wrote:
 
  Thanks. What I have had to ask is that
 
  how do you test that the data is symmetric enough?
  If it is not, is it ok to use some data transformation?
 
  when it is said:
 
  The Wilcoxon signed rank test does not assume that the data are
  sampled from a Gaussian distribution. However it does assume that 
  
  the
  data are distributed symmetrically around the median. If the
  distribution is asymmetrical, the P value will not tell you much  
 
  about
  whether the median is different than the hypothetical value.
 
  You are being misled. Simply finding a statement on a statistics
  software website, even one as reputable as Graphpad (???), does not
  mean
  that it is necessarily true. My understanding (confirmed reviewing
  Nonparametric statistical methods for complete and censored data
  by M.
  M. Desu, Damaraju Raghavarao, is that the Wilcoxon signed-rank test
  does
  not require that the underlying distributions be symmetric. The  
  above
  quotation is highly inaccurate.
 
 
  To add to what David and others have said, look at the kernel that  
 
  the
 
  U-statistic associated with the WSR test uses: the indicator (0/1) 
 of
  xi
  + xj  0.  So WSR tests H0:p=0.5 where p = the probability that the
  average of a randomly chosen pair of values is positive.  [If there
  are
  ties this probably needs to be worded as P[xi + xj  0] = P[xi + xj 
 
 
  0], i neq j.
 
  Frank
 
  -- 
  Frank E Harrell Jr   Professor and ChairmanSchool of Medicine
   Department of Biostatistics   Vanderbilt  
  University


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] installing multicore package

2010-06-25 Thread suman dhara

Sir,
I want to apply mclapply() function for my analysis. So, I have to install
multicore package. But I can not install the package.

install.packages(multicore)
 It gives that package multicore is not available.

Can you help me?


Regards,
Suman Dhara

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] best way to plot a evolution in time

2010-06-25 Thread nana


Hi everyone,
I have the following question:
given three objects let's say:
a - c( 2 , 5, 15, 16)
b - c(1 ,1, 8 , 8)
c - c (10, 10 11 ,11)
m-matrix(c(a,b,c),byrow=T,nrow=3)
rownames(m)-c(gene a, 'gene b','gene c')
m
gene.dist-dist(m,method='euclidian')
gene.dist
 which is the best way to plot their evolution in time? shoul I use a
levelplot or just a normal plot? if I use a normal plot how do I plot
evolution in time?

-- 
View this message in context: 
http://r.789695.n4.nabble.com/best-way-to-plot-a-evolution-in-time-tp2267993p2267993.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] i want create script

2010-06-25 Thread vijaysheegi


Hi R community,
I want to create a script which will take the .csv table as input and do
some prediction and output should be returned to some file.Inputs is exel
sheet containing some tables of data.out should be table of predicted
data.Will some one help me in this regards...
Thanks in advance.

I am using Windows R.Please advise proccedure to create Rscript.


Regards
-
Vijay
Research student
Bangalore
India
-- 
View this message in context: 
http://r.789695.n4.nabble.com/i-want-create-script-tp2268011p2268011.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Euclidean Distance Matrix Analysis (EDMA) in R?

2010-06-25 Thread gokhanocakoglu


In fact, Euclidean Distance Matrix Analysis (EDMA) of form is a coordinate
free approach to the analysis of form using landmark data which was
developed by  Subhash Lele and Joan Richstmeier. They also developed a
computer program (http://www.getahead.psu.edu/comment/edma.asp) that allow
to perform several techniques including EDMA I-II but I wonder is there a
package or code available in R to perform EDMA...

thanks 
-- 
View this message in context: 
http://r.789695.n4.nabble.com/Euclidean-Distance-Matrix-Analysis-EDMA-in-R-tp2266797p2268018.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Popularity of R, SAS, SPSS, Stata...

2010-06-25 Thread Dario Solari

I add some scientific references for Google Insights for Search:

* Google Predicting the Present
http://www.google.com/googleblogs/pdfs/google_predicting_the_present.pdf

* Google Econometrics and Unemployment Forecasting
http://ftp.iza.org/dp4201.pdf

* Query Indices and a 2008 Downturn: Israeli Data
http://www.bankisrael.gov.il/deptdata/mehkar/papers/dp0906e.pdf

If it is considered a useful tool, and I think it is, it is important to
reflect on keywords to use. Second, is interesting to see these trends in
comparison with other indicators such as number of users/posts in mailing
list, Google Scholar Citations, etc...

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Optimizing given two vectors of data

2010-06-25 Thread confusedSoul


I am trying to estimate an Arrhenius-exponential model in R.  I have one
vector of data containing failure times, and another containing
corresponding temperatures.  I am trying to optimize a maximum likelihood
function given BOTH these vectors.  However, the optim command takes only
one such vector parameter.

How can I pass both vectors into the function?
-- 
View this message in context: 
http://r.789695.n4.nabble.com/Optimizing-given-two-vectors-of-data-tp2268002p2268002.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Wilcoxon signed rank test and its requirements

2010-06-25 Thread Atte Tenkanen


BTW. If there is not so weak test that would be suitable for my purpose 
(because of the ties and the shape of the data), could I proceed this way:

It is also worth of comparing different samples taken from the data. Since the 
mean and sd of the data are available, could I approximate p-values using z- or 
t-test, just to compare several different samples?

Atte

 On Jun 24, 2010, at 6:58 PM, Atte Tenkanen wrote:
 
  Is there anything for me?
 
  There is a lot of data, n=2418, but there are also a lot of ties.
  My sample n≈250-300
 
 
 I do not understand why there should be so many ties. You have not  
 described the measurement process or units. ( ... although you offer a 
  
 glipmse without much background  later.)
 
  i would like to test, whether the mean of the sample differ  
  significantly from the population mean.
 
 Why? What is the purpose of this investigation? Why should the mean of 
  
 a sample be that important?
 
 
  The histogram of the population looks like in attached histogram,  
  what test should I use? No choices?
 
  This distribution comes from a musical piece and the values are  
  'tonal distances'.
 
  http://users.utu.fi/attenka/Hist.png
 
 That picture does not offer much insidght into the features of that  
 measurement. It appears to have much more structure than I would  
 expect for a sample from a smooth unimodal underlying population.
 
 -- 
 David.
 
 
  Atte
 
  On 06/24/2010 12:40 PM, David Winsemius wrote:
 
  On Jun 23, 2010, at 9:58 PM, Atte Tenkanen wrote:
 
  Thanks. What I have had to ask is that
 
  how do you test that the data is symmetric enough?
  If it is not, is it ok to use some data transformation?
 
  when it is said:
 
  The Wilcoxon signed rank test does not assume that the data are
  sampled from a Gaussian distribution. However it does assume that 
  
  the
  data are distributed symmetrically around the median. If the
  distribution is asymmetrical, the P value will not tell you much  
 
  about
  whether the median is different than the hypothetical value.
 
  You are being misled. Simply finding a statement on a statistics
  software website, even one as reputable as Graphpad (???), does not
  mean
  that it is necessarily true. My understanding (confirmed reviewing
  Nonparametric statistical methods for complete and censored data
  by M.
  M. Desu, Damaraju Raghavarao, is that the Wilcoxon signed-rank test
  does
  not require that the underlying distributions be symmetric. The  
  above
  quotation is highly inaccurate.
 
 
  To add to what David and others have said, look at the kernel that  
 
  the
 
  U-statistic associated with the WSR test uses: the indicator (0/1) 
 of
  xi
  + xj  0.  So WSR tests H0:p=0.5 where p = the probability that the
  average of a randomly chosen pair of values is positive.  [If there
  are
  ties this probably needs to be worded as P[xi + xj  0] = P[xi + xj 
 
 
  0], i neq j.
 
  Frank
 
  -- 
  Frank E Harrell Jr   Professor and ChairmanSchool of Medicine
   Department of Biostatistics   Vanderbilt  
  University


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Confused: Looping in dataframes

2010-06-25 Thread phani kishan

Hey,
I have a data frame x which consists of say 10 vectors. I essentially want
to find out the best fit exponential smoothing for each of the vectors.

The problem while I'm getting results when i say
 lapply(x,ets)

I am getting an error when I say
 myprint
function(x)
{
for(i in 1:length(x))
{
ets(x[i],model=AZZ,opt.crit=c(amse))
}
}

The error message is that* Error in ets(x[i], model = AZZ, opt.crit =
c(amse)) :
  y should be a univariate time series*

Could someone please explain why this is happening? I also want to be able
to extract data like coef's, errors (MAPE,MSE etc.)

Thanks and regards,
Phani
-- 
A. Phani Kishan
3rd Year B.Tech
Dept. of Computer Science  Engineering
IIT MADRAS
Ph: +919962363545

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Confused: Looping in dataframes

2010-06-25 Thread Paul Hiemstra


On 06/25/2010 10:02 AM, phani kishan wrote:

Hey,
I have a data frame x which consists of say 10 vectors. I essentially want
to find out the best fit exponential smoothing for each of the vectors.

The problem while I'm getting results when i say
   

lapply(x,ets)
 

I am getting an error when I say
   

myprint
   

function(x)
{
for(i in 1:length(x))
{
ets(x[i],model=AZZ,opt.crit=c(amse))
   

Hi,

Please provide a reproducible example, as stated in the posting guide. 
My guess is that replacing x[i] by x[[i]] would solve the problem. 
Double brackets return a vector in stead of a data.frame that has just 
column i.


cheers,
Paul

}
}

The error message is that* Error in ets(x[i], model = AZZ, opt.crit =
c(amse)) :
   y should be a univariate time series*

Could someone please explain why this is happening? I also want to be able
to extract data like coef's, errors (MAPE,MSE etc.)

Thanks and regards,
Phani
   



--
Drs. Paul Hiemstra
Department of Physical Geography
Faculty of Geosciences
University of Utrecht
Heidelberglaan 2
P.O. Box 80.115
3508 TC Utrecht
Phone:  +3130 253 5773
http://intamap.geo.uu.nl/~paul
http://nl.linkedin.com/pub/paul-hiemstra/20/30b/770

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Sweave: The opposite of tangle

2010-06-25 Thread stefan.d...@gmail.com

Hi,
I am using Sweave to write an article. If I want to convert the *.rnw
to a *.tex file I have to run Sweave which might take a long time. Is
there away to get a tex-file as result without (evaluating) the
R-chunks, i.e. the opposite of tangle (that just gives R-chunk).
Thanks,
Stefan

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] BBH2 and FrF2 packages

2010-06-25 Thread Dennis Murphy

Hi:

MEPlot, IAPlot and cubePlot come from the FrF2 package; the DanielPlot
function is in both package BsMD
and FrF2. Try

library(FrF2)

and then run your code again; it worked for me...

If you check the list of functions in BHH2 under HTML help, you'll find that
none of the plot functions you used below are found in that package, but
they are all found under FrF2.

HTH,
Dennis

On Thu, Jun 24, 2010 at 4:17 PM, Andrea Bernasconi DG 
andrea.bernasconi...@gmail.com wrote:

 Hi R HELP,

 I consider the 2^3 factorial experiment described at page 177 of
 the book Statistics for Experimenters: Design, Innovation, and Discovery
 by George E. P. Box, J. Stuart Hunter, William G. Hunter (BHH2).

 This example use the following data in file BHH2-Data/tab0502.dat
 at ftp://ftp.wiley.com/
 in /sci_tech_med/statistics_experimenters/BHH2-Data.zip

  run  T  C  K  y
 1   1 -1 -1 -1 60
 2   2  1 -1 -1 72
 3   3 -1  1 -1 54
 4   4  1  1 -1 68
 5   5 -1 -1  1 52
 6   6  1 -1  1 83
 7   7 -1  1  1 45
 8   8  1  1  1 80

 Using these data and the R BHH2 package, I was not able to reproduce the
 very simple results in the BHH2 book.
 In particular, the following solution will have no meaning since K is
 categorical:

 ( plan - lm(y ~ (T+C+K)^2, data = DATA) )
 MEPlot(plan) # Main Effects
 IAPlot(plan) # Interactions Effects
 DanielPlot(plan)
 cubePlot(plan, T, C, K)

 I decided to rebuilt the data using:

 plan - FrF2(8, 3, factor.names=c(T,C,K), default.level=c(-,+),
 randomize = FALSE)
 ( plan - add.response(plan, y) )

 giving:

  T C K  y
 1 - - - 60
 2 + - - 72
 3 - + - 54
 4 + + - 68
 5 - - + 52
 6 + - + 83
 7 - + + 45
 8 + + + 80
 class=design, type= full factorial

 Unfortunately the following plot commands do not work:

 MEPlot(plan)
 IAPlot(plan)
 DanielPlot(plan)

 The error is:
 Error in MEPlot.design(plan) :
  The design obj must be of a type containing FrF2 or pb.

 Why?

 If I add a fake factor to the plan the plot commands work, but the solution
 will have no meaning:

 plan - FrF2(8, 4, factor.names=c(T,C,K,Q),
 default.level=c(-,+), randomize = FALSE)
 ( plan - add.response(plan, y) )
 MEPlot(plan)
 IAPlot(plan)
 DanielPlot(plan)

 Sincerely, Andrea B.





[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] HEGY.test, error Mypi not found

2010-06-25 Thread Jessica Oettel

Hi,

I'd like to use the HEGY test from the uroot package (s. attachment) and get
the following error message:

error in dimnames(Mypi)[[2]] - paste(Ypi, 1:s, sep = ) :
  Object 'Mypi' not found

For the air passenger example on
http://127.0.0.1:11997/library/uroot/html/HEGY.test.html it works, but for
my time series it doesn't (giving names to the columns and rows did not help
either...).

Thanks for your help.

Jessica
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Simple qqplot question

2010-06-25 Thread Joris Meys

Sorry, missed the two variable thing. Go with the lm solution then,
and you can tweak the plot yourself (the confidence intervals are
easily obtained via predict(lm.object, interval=prediction) ). The
function qq.plot uses robust regression, but in your case normal
regression will do.

Regarding the shapes : this just indicates both tails are shorter than
expected, so you have a kurtosis greater than 3 (or positive,
depending whether you do the correction or not)

Cheers
Joris

On Fri, Jun 25, 2010 at 4:10 AM, Ralf B ralf.bie...@gmail.com wrote:
 Short rep: I have two distributions, data and data2; each build from
 about 3 million data points; they appear similar when looking at
 densities and histograms. I plotted qqplots for further eye-balling:

 qqplot(data, data2, xlab = 1, ylab = 2)

 and get an almost perfect diagonal line which means they are in fact
 very alike. Now I tried to check normality using qqnorm -- and I think
 I am doing something wrong here:

 qqnorm(data, main = Q-Q normality plot for 1)
 qqnorm(data2, main = Q-Q normality plot for 2)

 I am getting perfect S-shaped curves (??) for both distributions. Am I
 something missing here?

 |
 |                               *  *   *  *
 |                           *
 |                        *
 |                    *
 |               *
 |            *
 |         *
 | * * *
 |-

 Thanks, Ralf




-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Wilcoxon signed rank test and its requirements

2010-06-25 Thread Joris Meys

As a remark on your histogram : use less breaks! This histogram tells
you nothing. An interesting function is ?density , eg :

x-rnorm(250)
hist(x,freq=F)
lines(density(x),col=red)

See also this ppt, a very nice and short introduction to graphics in R :
http://csg.sph.umich.edu/docs/R/graphics-1.pdf

2010/6/25 Atte Tenkanen atte...@utu.fi:
 Is there anything for me?

 There is a lot of data, n=2418, but there are also a lot of ties.
 My sample n≈250-300

You should think about the central limit theorem. Actually, you can
just use a t-test to compare means, as with those sample sizes the
mean is almost certainly normally distributed.

 i would like to test, whether the mean of the sample differ significantly 
 from the population mean.

According to probability theory, this will be in 5% of the cases if
you repeat your sampling infinitly. But as David asked: why on earth
do you want to test that?

cheers
Joris

-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Finding sets

2010-06-25 Thread Muhammad Rahiz


Hi all,

I'd like to find how many sets of 1s there are in the following example;

x - rep(c(1,2,1,3,5), each=5)

I know that there are two sets of 1s, visually. Any function in R that 
allows me to automate the process?


Thanks.


Muhammad

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Finding sets

2010-06-25 Thread Dennis Murphy

Hi:

Here's one approach:

 x - rep(c(1,2,1,3,5), each=5)
 rle(x)
Run Length Encoding
  lengths: int [1:5] 5 5 5 5 5
  values : num [1:5] 1 2 1 3 5
 table(rle(x)$values)

1 2 3 5
2 1 1 1
 unname(table(rle(x)$values))[1]
[1] 2

HTH,
Dennis

On Fri, Jun 25, 2010 at 2:30 AM, Muhammad Rahiz 
muhammad.ra...@ouce.ox.ac.uk wrote:

 Hi all,

 I'd like to find how many sets of 1s there are in the following example;

 x - rep(c(1,2,1,3,5), each=5)

 I know that there are two sets of 1s, visually. Any function in R that
 allows me to automate the process?

 Thanks.


 Muhammad

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Euclidean Distance Matrix Analysis (EDMA) in R?

2010-06-25 Thread Joris Meys

I've been looking around myself, but I couldn't find any. Maybe
somebody will chime in to direct you to the correct places. I also
checked the papers, and it seems not too hard to implement. If I find
some time, I'll take a look at it next week.

For the other two gentlemen, check:

http://www.getahead.psu.edu/PDF/EuclideanDistanceMatrixAnalysis.pdf
http://www.getahead.psu.edu/PDF/no.1.pdf

Cheers
Joris

On Fri, Jun 25, 2010 at 8:30 AM, gokhanocakoglu ocako...@uludag.edu.tr wrote:

 In fact, Euclidean Distance Matrix Analysis (EDMA) of form is a coordinate
 free approach to the analysis of form using landmark data which was
 developed by  Subhash Lele and Joan Richstmeier. They also developed a
 computer program (http://www.getahead.psu.edu/comment/edma.asp) that allow
 to perform several techniques including EDMA I-II but I wonder is there a
 package or code available in R to perform EDMA...

 thanks
 --
 View this message in context: 
 http://r.789695.n4.nabble.com/Euclidean-Distance-Matrix-Analysis-EDMA-in-R-tp2266797p2268018.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Correctly plotting bar and scatter chart on 2-y axis plot with par(new=T)

2010-06-25 Thread Jim Lemon


On 06/25/2010 05:47 AM, dan.weavesham wrote:


Hello,

Thanks for the advice so far -- still struggling with it, I must admit.

Here is some sample data, which I hope helps:

# y axis #1 -- data for the bar chart
-30353.382 -21693.519   -7049.923  -72968.722  -10267.584 -269432.795
-19847.670 -686283.171 -376231.754 -597800.080 -274637.587 -112663.167
-39550.445 -133916.431

# x axis -- in specific order, so cannot be tampered with !! ;-)
1  7 13  2  8 14  3  9  4 10  5 11  6 12

# y axis #2 -- scatter chart
50  25   5  25   5 100   5 100 100  75  75  50  50  50

Does this help explain what I'm looking to do? If not, is there a way I can
get plot() to not change the order of the x axis data points -- so instead
of plotting 1,2,3,n as per my original post, it plots 1,7,13,n? (I've tried
coercing the data into character format with no luck)


Hi Dan,
Does this do what you want?

# y axis #1 -- data for the bar chart
y1-c(-30353.382,-21693.519,-7049.923,-72968.722,-10267.584,-269432.795,
 -19847.670,-686283.171,-376231.754,-597800.080,-274637.587,-112663.167,
 -39550.445,-133916.431)
# x axis -- in specific order, so cannot be tampered with !! ;-)
# these are really the x tick labels
x-c(1,7,13,2,8,14,3,9,4,10,5,11,6,12)
# y axis #2 -- scatter chart
y2-c(50,25,5,25,5,100,5,100,100,75,75,50,50,50)
library(plotrix)
twoord.plot(1:14,y1,1:14,y2,type=c(bar,p),lcol=2,rcol=4,
 lylim=c(-7,7),rylim=c(-100,100),xtickpos=1:14,xticklab=x)

Jim

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Sweave: The opposite of tangle

2010-06-25 Thread Kevin E. Thorpe


stefan.d...@gmail.com wrote:

Hi,
I am using Sweave to write an article. If I want to convert the *.rnw
to a *.tex file I have to run Sweave which might take a long time. Is
there away to get a tex-file as result without (evaluating) the
R-chunks, i.e. the opposite of tangle (that just gives R-chunk).
Thanks,
Stefan



This is untested, but does Sweave(file.rnw, eval=FASLE) do what you want?

--
Kevin E. Thorpe
Biostatistician/Trialist, Knowledge Translation Program
Assistant Professor, Dalla Lana School of Public Health
University of Toronto
email: kevin.tho...@utoronto.ca  Tel: 416.864.5776  Fax: 416.864.3016

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] installing multicore package

2010-06-25 Thread Uwe Ligges




On 25.06.2010 06:39, suman dhara wrote:

Sir,
I want to apply mclapply() function for my analysis. So, I have to install
multicore package. But I can not install the package.


install.packages(multicore)

  It gives that package multicore is not available.

Can you help me?


If this is Windows (unstated) we cannot help, since multicore is not 
available for that platform.


Uwe Ligges




Regards,
Suman Dhara

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Sweave: The opposite of tangle

2010-06-25 Thread Kevin E. Thorpe


Kevin E. Thorpe wrote:

stefan.d...@gmail.com wrote:

Hi,
I am using Sweave to write an article. If I want to convert the *.rnw
to a *.tex file I have to run Sweave which might take a long time. Is
there away to get a tex-file as result without (evaluating) the
R-chunks, i.e. the opposite of tangle (that just gives R-chunk).
Thanks,
Stefan



This is untested, but does Sweave(file.rnw, eval=FASLE) do what you want?



That should be FALSE above.  Don't post before coffee.
h

--
Kevin E. Thorpe
Biostatistician/Trialist, Knowledge Translation Program
Assistant Professor, Dalla Lana School of Public Health
University of Toronto
email: kevin.tho...@utoronto.ca  Tel: 416.864.5776  Fax: 416.864.3016

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Confused: Looping in dataframes

2010-06-25 Thread phani kishan

On Fri, Jun 25, 2010 at 1:54 PM, Paul Hiemstra p.hiems...@geo.uu.nl wrote:

 On 06/25/2010 10:02 AM, phani kishan wrote:

 Hey,
 I have a data frame x which consists of say 10 vectors. I essentially want
 to find out the best fit exponential smoothing for each of the vectors.

 The problem while I'm getting results when i say


 lapply(x,ets)


 I am getting an error when I say


 myprint


 function(x)
 {
 for(i in 1:length(x))
 {
 ets(x[i],model=AZZ,opt.crit=c(amse))


 Hi,

 Please provide a reproducible example, as stated in the posting guide. My
 guess is that replacing x[i] by x[[i]] would solve the problem. Double
 brackets return a vector in stead of a data.frame that has just column i.

Hey Paul,
As requested.
My example data frame

sdata:
SKU1SKU2   SKU3   SKU4
1   583.8 574.6  1106.9   648.1
2   441.7 552.8  1021.3   353.6
3   454.2 555.7   998.3   306.4
4   569.7 507.6   811.1   360.7
5   512.3 620.0  1046.3   713.9
6   580.8 668.2   732.0   490.9
7   648.5 766.9   653.4   422.1
8   617.4 657.1   602.1   190.8
9   826.8 767.3   640.5   324.1
10 1163.0 657.6   429.6   181.1
11  643.5 788.9   569.1   331.9
12  846.9 568.6   425.1   224.6
13  580.7 582.9   434.2   226.9

now when I apply
lapply(sdata,ets)
I get a result as:
$SKU1
ETS(A,N,N)

Call:
 ets(y = x, model = AZZ)

  Smoothing parameters:
alpha = 0.3845

  Initial states:
l = 533.3698

  sigma:  181.7615

 AIC AICc  BIC
172.6144 173.8144 173.7443

$SKU2
ETS(A,N,N)

Call:
 ets(y = x, model = AZZ)

  Smoothing parameters:
alpha = 0.5026

  Initial states:
l = 567.821

  sigma:  86.7074

 AIC AICc  BIC
153.3704 154.5704 154.5003

$SKU3
ETS(A,A,N)

Call:
 ets(y = x, model = AZZ)

  Smoothing parameters:
alpha = 1e-04
beta  = 1e-04

  Initial states:
l = 1189.2221
b = -64.3776

  sigma:  85.4153

 AIC AICc  BIC
156.9800 161.9800 159.2398

$SKU4
ETS(A,A,N)

Call:
 ets(y = x, model = AZZ)

  Smoothing parameters:
alpha = 1e-04
beta  = 1e-04

  Initial states:
l = 566.9001
b = -27.8818

  sigma:  127.2654

 AIC AICc  BIC
167.3475 172.3475 169.6073

Now when I run the same using:
myfun-function(x)
{
for(i in 1:length(x))
{
ets(x[i])

 }
}
I got the error as mentioned before. Now on modifying it to
myfun-function(x)
{
for(i in 1:length(x))
{
return(ets(x[[i]])
}
}
I only got the output as
ETS(A,N,N)

Call:
 ets(y = x[[i]], model = AZZ, opt.crit = c(amse))

  Smoothing parameters:
alpha = 0.3983

  Initial states:
l = 516.188

  sigma:  181.8688

 AIC AICc  BIC
172.6298 173.8298 173.7597

I think its considering whole dataframe as a series.
As said my objective it to essentially come up with a best exponential model
for each of the SKU's in the dataframe. However I want to be able to extract
information like mse, mape etc later. So kindly suggest.

Thanks in advance,
Phani



 cheers,
 Paul

  }
 }

 The error message is that* Error in ets(x[i], model = AZZ, opt.crit =
 c(amse)) :
   y should be a univariate time series*

 Could someone please explain why this is happening? I also want to be able
 to extract data like coef's, errors (MAPE,MSE etc.)

 Thanks and regards,
 Phani




 --
 Drs. Paul Hiemstra
 Department of Physical Geography
 Faculty of Geosciences
 University of Utrecht
 Heidelberglaan 2
 P.O. Box 80.115
 3508 TC Utrecht
 Phone:  +3130 253 5773
 http://intamap.geo.uu.nl/~paul http://intamap.geo.uu.nl/%7Epaul
 http://nl.linkedin.com/pub/paul-hiemstra/20/30b/770




-- 
A. Phani Kishan
3rd Year B.Tech
Dept. of Computer Science  Engineering
IIT MADRAS
Ph: +919962363545

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Popularity of R, SAS, SPSS, Stata...

2010-06-25 Thread Liviu Andronic

On Sun, Jun 20, 2010 at 2:31 PM, Muenchen, Robert A (Bob)
muenc...@utk.edu wrote:
 come up with so far at http://r4stats.com/popularity . I'm sure people
 will have plenty of ideas on how to improve this, so please let me know
 what you think.

This is not much of a metric, probably not even a ballpark, but I have
a habit of measuring the popularity of a software by the number of
unread messages in my mail account, sent to one of its main mailing
lists. For example, I subscribed to Gentoo, Xfce and LyX MLs much
earlier than to that of R, but R quickly and surpassed all in number
of unread messages. At the moment I have the following: R ( 37k), LyX
(10k), Debian (7k), Xfce (3k), Geany (.5k). I dare say that R might
be more popular than Debian, but again, any such estimation seems
farfetched.

Regards
Liviu

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] i want create script

2010-06-25 Thread John Kane

I'd suggest having a look at the manuals on the 
[url=http://www.r-project.org][b]R[/b][/url]site, especially the Introduction 
to R and R Data Import/Export.

Some helpful tutorials may be found at 
http://www.math.ilstu.edu/dhkim/Rstuff/Rtutor.html and 
http://www.sph.umich.edu/csg/abecasis/class/815.05.pdf


--- On Fri, 6/25/10, vijaysheegi vijay.she...@gmail.com wrote:

 From: vijaysheegi vijay.she...@gmail.com
 Subject: [R] i want create script
 To: r-help@r-project.org
 Received: Friday, June 25, 2010, 2:26 AM
 
 Hi R community,
 I want to create a script which will take the .csv table as
 input and do
 some prediction and output should be returned to some
 file.Inputs is exel
 sheet containing some tables of data.out should be table of
 predicted
 data.Will some one help me in this regards...
 Thanks in advance.
 
 I am using Windows R.Please advise proccedure to create
 Rscript.
 
 
 Regards
 -
 Vijay
 Research student
 Bangalore
 India
 -- 
 View this message in context: 
 http://r.789695.n4.nabble.com/i-want-create-script-tp2268011p2268011.html
 Sent from the R help mailing list archive at Nabble.com.
 
 __
 R-help@r-project.org
 mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained,
 reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] ask a question about list in R

2010-06-25 Thread David Winsemius



On Jun 25, 2010, at 1:00 AM, song song wrote:


my list al is as below:
al=list(c(2,3),5,7)

al

[[1]]
[1] 2 3

[[2]]
[1] 5

[[3]]
[1] 7

and  I check the second component, its element is 5, then I remove  
this, now

my al is:

al[[2]][al[[2]]!=5]-al[[2]]

al

[[1]]
[1] 2 3

[[2]]
numeric(0)

[[3]]
[1] 7

The Question is, how I can get the new list without the second  
component,

that is :


alwanted

[[1]]
[1] 2 3

[[2]]
[1] 7


Another way:

 al=list(c(2,3),5,7)
 al[-2]
[[1]]
[1] 2 3

[[2]]
[1] 7

 alwanted - al[-2]

Negative indexing with the [ operator.


--
David.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Handouts / Reports or just simply printing text to PDF?

2010-06-25 Thread Matt Shotwell

Check out the brew package, by Jeff Horner.

Ralf B wrote:
 I assume R won't easily generate nice reports (unless one starts using
 Sweave and LaTeX) but perhaps somebody here knows a package that can
 create report like output for special cases? How can I simply plot
 output into PDF? 
 Perhaps you know a package I should check out? What
 do you guys do to create handouts (before actually publishing)?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
-- 
Matthew S. Shotwell
Graduate Student
Division of Biostatistics and Epidemiology
Medical University of South Carolina
http://biostatmatt.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Finding sets

2010-06-25 Thread David Winsemius



On Jun 25, 2010, at 5:43 AM, Dennis Murphy wrote:


Hi:

Here's one approach:


x - rep(c(1,2,1,3,5), each=5)
rle(x)

Run Length Encoding
 lengths: int [1:5] 5 5 5 5 5
 values : num [1:5] 1 2 1 3 5

table(rle(x)$values)


1 2 3 5
2 1 1 1

unname(table(rle(x)$values))[1]

[1] 2



This method does not require visual inspection of the intermediate  
result:


 sum(rle(x)$values==1)
[1] 2

--
David.


HTH,
Dennis

On Fri, Jun 25, 2010 at 2:30 AM, Muhammad Rahiz 
muhammad.ra...@ouce.ox.ac.uk wrote:


Hi all,

I'd like to find how many sets of 1s there are in the following  
example;


x - rep(c(1,2,1,3,5), each=5)

I know that there are two sets of 1s, visually. Any function in R  
that

allows me to automate the process?

Thanks.


Muhammad

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Optimizing given two vectors of data

2010-06-25 Thread Joris Meys

Optim uses vectors of _parameters_, not of data. You add a
(likelihood) function, give initial values of the parameters, and get
the optimized parameters back. See ?optim and the examples therein. It
contains an example for optimization using multiple data columns.

Cheers
Joris

On Fri, Jun 25, 2010 at 8:12 AM, confusedSoul ruchir_402...@infosys.com wrote:

I am trying to estimate an Arrhenius-exponential model in R. I have one
vector of data containing failure times, and another containing
corresponding temperatures. I am trying to optimize a maximum likelihood
function given BOTH these vectors. However, the optim command takes only
one such vector parameter.

How can I pass both vectors into the function?
--
View this message in context:
http://r.789695.n4.nabble.com/Optimizing-given-two-vectors-of-data-tp2268002p2268002.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

--
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

Re: [R] Euclidean Distance Matrix Analysis (EDMA) in R?

2010-06-25 Thread gokhanocakoglu


thanks for your interests Joris 



Gokhan OCAKOGLU
Uludag University
Faculty of Medicine
Department of Biostatistics
http://www20.uludag.edu.tr/~biostat/ocakoglui.htm

-- 
View this message in context: 
http://r.789695.n4.nabble.com/Euclidean-Distance-Matrix-Analysis-EDMA-in-R-tp2266797p2268257.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] (no subject)

2010-06-25 Thread ricardo . sousa2001


Hello,
I'm new in using the R, but from what I read is an excellent tool.
   Would you like if I could help, I am trying create an array from  
reading a text file.


   The idea is to read the file, and transform the data in binary  
format, for example. The calves of this file format.



A,B,C,D,G
A,C,E,O
F,G


Put this away

   a b c d e f g o
1  1 1 1 1 0 0 1 0
2  1 0 1 0 1 0 0 1
3  0 0 0 0 0 1 0 0

 and display in monitor.

  Thanks for the help




Portugalmail - O email preferido dos portugueses!
http://www.portugalmail.pt

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Confused: Looping in dataframes

2010-06-25 Thread David Winsemius



On Jun 25, 2010, at 7:09 AM, phani kishan wrote:

On Fri, Jun 25, 2010 at 1:54 PM, Paul Hiemstra  
p.hiems...@geo.uu.nl wrote:



On 06/25/2010 10:02 AM, phani kishan wrote:


Hey,
I have a data frame x which consists of say 10 vectors. I  
essentially want
to find out the best fit exponential smoothing for each of the  
vectors.


The problem while I'm getting results when i say



lapply(x,ets)



I am getting an error when I say



myprint




function(x)

{
for(i in 1:length(x))
{
ets(x[i],model=AZZ,opt.crit=c(amse))



Hi,

Please provide a reproducible example, as stated in the posting  
guide. My
guess is that replacing x[i] by x[[i]] would solve the problem.  
Double
brackets return a vector in stead of a data.frame that has just  
column i.



Hey Paul,
As requested.
My example data frame

sdata:
   SKU1SKU2   SKU3   SKU4
1   583.8 574.6  1106.9
648.1
2   441.7 552.8  1021.3
353.6
3   454.2 555.7   998.3
306.4
4   569.7 507.6   811.1
360.7
5   512.3 620.0  1046.3
713.9
6   580.8 668.2   732.0
490.9
7   648.5 766.9   653.4
422.1
8   617.4 657.1   602.1
190.8
9   826.8 767.3   640.5
324.1
10 1163.0 657.6   429.6
181.1
11  643.5 788.9   569.1
331.9
12  846.9 568.6   425.1
224.6
13  580.7 582.9   434.2
226.9


now when I apply
lapply(sdata,ets)
I get a result as:
$SKU1
ETS(A,N,N)

Call:
ets(y = x, model = AZZ)

 Smoothing parameters:
   alpha = 0.3845

 Initial states:
   l = 533.3698

 sigma:  181.7615

AIC AICc  BIC
172.6144 173.8144 173.7443

$SKU2
ETS(A,N,N)

Call:
ets(y = x, model = AZZ)

 Smoothing parameters:
   alpha = 0.5026

 Initial states:
   l = 567.821

 sigma:  86.7074

AIC AICc  BIC
153.3704 154.5704 154.5003

$SKU3
ETS(A,A,N)

Call:
ets(y = x, model = AZZ)

 Smoothing parameters:
   alpha = 1e-04
   beta  = 1e-04

 Initial states:
   l = 1189.2221
   b = -64.3776

 sigma:  85.4153

AIC AICc  BIC
156.9800 161.9800 159.2398

$SKU4
ETS(A,A,N)

Call:
ets(y = x, model = AZZ)

 Smoothing parameters:
   alpha = 1e-04
   beta  = 1e-04

 Initial states:
   l = 566.9001
   b = -27.8818

 sigma:  127.2654

AIC AICc  BIC
167.3475 172.3475 169.6073

Now when I run the same using:
myfun-function(x)
{
for(i in 1:length(x))
{
ets(x[i])


}

}
I got the error as mentioned before. Now on modifying it to
myfun-function(x)
{
for(i in 1:length(x))
{
return(ets(x[[i]])
}
}
I only got the output as
ETS(A,N,N)

Call:
ets(y = x[[i]], model = AZZ, opt.crit = c(amse))

 Smoothing parameters:
   alpha = 0.3983

 Initial states:
   l = 516.188

 sigma:  181.8688

AIC AICc  BIC
172.6298 173.8298 173.7597

I think its considering whole dataframe as a series.


Doubtful. It is quietly calculating all of the requested models but  
you did not do anything with them inside the loop (which is a  
function). You could have assigned them to something permanent or  
printed them (or both):


ets_x - list()

for(i in 1:length(x))
{
print(ets(x[[i]]); ets_x - c(ets_x, ets(x[[i]])
}
}


ets_x

As said my objective it to essentially come up with a best  
exponential model
for each of the SKU's in the dataframe. However I want to be able to  
extract

information like mse, mape etc later. So kindly suggest.

Thanks in advance,
Phani




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Confused: Looping in dataframes

2010-06-25 Thread phani kishan

Hey,
I only got the output once cuz I was returning from the function at the end
of one loop.
I set that right and I have printed the values.

function being used by me now is:
function(x)
{
for(i in 1:length(x))
{
print(names(x[i]))
print(myets(x[[i]]))
}
}

where myets is my customized exponential smoothing model. However the
problem is that if I run my myets function individually on each of the SKU's
I get values of MAPE, MSE etc. However by running the above loop I dont get
the values. How do I store the values for me to look at them later?

There are minor changes (not significant) in the values of parameters from
applying the above function as opposed to lapply. Why could it be so??

Phani



-- 
A. Phani Kishan
3rd Year B.Tech
Dept. of Computer Science  Engineering
IIT MADRAS
Ph: +919962363545

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Assigning variable value as name to cbind column

2010-06-25 Thread Henrique Dallazuanna

Try this also:

cbind(dataTest, `colnames-`(cbind(new.data[1:nrow(dataTest)]), name))

On Fri, Jun 25, 2010 at 2:47 AM, Ralf B ralf.bie...@gmail.com wrote:

 Hi all,

 I have this (non-working) script:

 dataTest - data.frame(col1=c(1,2,3))
 new.data - c(1,2)
 name - test
 n.row - dim(dataTest)[1]
 length(new.data) - n.row
 names(new.data) - name
 cbind(dataTest, name=new.data)
 print(dataTest)

 and would like to bind the new column 'new.data' to 'dataTest' by
 using the value of the variable 'name' as the column name.

 The end result should look like this:

  col1 test
 1  1  1
 2  2  2
 3  3  NA


 The best I got was that 'name' became the column name but never the
 actual value of 'name'. How can i do that?

 (This is actually a function that runs many time -- this means a
 manual workaround is not feasible).

 Ralf

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Henrique Dallazuanna
Curitiba-Paraná-Brasil
25° 25' 40 S 49° 16' 22 O

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Popularity of R, SAS, SPSS, Stata...

2010-06-25 Thread Muenchen, Robert A (Bob)

-Original Message-
From: Liviu Andronic [mailto:landronim...@gmail.com]
Sent: Friday, June 25, 2010 7:15 AM
To: Muenchen, Robert A (Bob)
Cc: r-help@r-project.org
Subject: Re: [R] Popularity of R, SAS, SPSS, Stata...

On Sun, Jun 20, 2010 at 2:31 PM, Muenchen, Robert A (Bob)
muenc...@utk.edu wrote:
 come up with so far at http://r4stats.com/popularity . I'm sure people
 will have plenty of ideas on how to improve this, so please let me
know
 what you think.

This is not much of a metric, probably not even a ballpark, but I have
a habit of measuring the popularity of a software by the number of
unread messages in my mail account, sent to one of its main mailing
lists. For example, I subscribed to Gentoo, Xfce and LyX MLs much
earlier than to that of R, but R quickly and surpassed all in number
of unread messages. At the moment I have the following: R ( 37k), LyX
(10k), Debian (7k), Xfce (3k), Geany (.5k). I dare say that R might
be more popular than Debian, but again, any such estimation seems
farfetched.

Regards
Liviu

Hi Liviu,

E-mail was the thing that got me back to this paper. I had been working on 
variations of measures for several years  was frustrated mostly by how many 
problems I ran into regarding search logic (SAS stands for about 15 
scientific topics and of course R is far worse). I have all my listserv email 
routed to a set of folders which I always empty at the same time. I noticed 
that recently R-Help had really taken off and that Statalist had surpassed 
SAS-L. So I got the latest monthly data from the listservs and switched the 
program from doing yearly counts to means of the monthly figures so I could add 
2010 to it. Figure 1 at  http://r4stats.com/popularity is indeed the number of 
emails send by each of the listservs. All these measures have their own 
limitations, but I find that graph the most interesting since it includes the 
trends across time.

Cheers,
Bob
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] i want create script

2010-06-25 Thread Joris Meys

Please read the posting guide :  http://www.R-project.org/posting-guide.html

Your question is very vague. One could assume you're completely new to
R and want the commands to read a csv file (see ?read.csv), and to
write away a table (eg ?write.table to write your predicted data in a
text format).

My guess is you want to run this script in the shell without having to
open R, similar to a perl scipt. For this, take a look at:

http://cran.r-project.org/doc/manuals/R-intro.html#Scripting-with-R
http://projects.uabgrid.uab.edu/r-group/wiki/CommandLineProcessing

Cheers
Joris

On Fri, Jun 25, 2010 at 8:26 AM, vijaysheegi vijay.she...@gmail.com wrote:

 Hi R community,
 I want to create a script which will take the .csv table as input and do
 some prediction and output should be returned to some file.Inputs is exel
 sheet containing some tables of data.out should be table of predicted
 data.Will some one help me in this regards...
 Thanks in advance.

 I am using Windows R.Please advise proccedure to create Rscript.


 Regards
 -
 Vijay
 Research student
 Bangalore
 India
 --
 View this message in context: 
 http://r.789695.n4.nabble.com/i-want-create-script-tp2268011p2268011.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] create group markers in original data frame ie. countinued... ? to calculate sth for groups defined between points in one variable (string), /separating/ spliting variable into groups by i.e.

2010-06-25 Thread Eugeniusz Kaluza


Dear useRs,

at the beginning, 
Joris Meys, thank you for explaining how to obtain calculation result possible 
for groups between string marks in one variable in data frame, like in this 
example below (between START and STOP), wchich I would like to complete at the 
end by asking about... how is possible to mark each observations presented in 
oryginal data set
 

# so firstly, below 
# START...working example of solution proposed by: Joris Meys 
[jorism...@gmail.com] 
# Same trick :
  c0-rbind( 1,  2 , 3, 4,  5, 6, 7, 8, 9,10,11,
  12,13,14,15,16,17 )
  c0 
  c1-rbind(10, 20 ,30,40, 50,10,60,20,30,40,50,  30,10,
  0,NA,20,10.3444)
  c1
  c2-rbind(NA,A,NA,NA,B,NA,NA,NA,NA,NA,NA,C,NA,NA,NA,NA,D)
  c2

  pos - which(!is.na(C.df$c2))
  idx - sapply(2:length(pos),function(i) pos[i-1]:(pos[i]-1))
  names(idx) - sapply(2:length(pos),
  function(i) paste(C.df$c2[pos[i-1]],-,C.df$c2[pos[i]]))
  out - lapply(idx,function(i) summary(C.df[i,1:2]))
  out
#STOP ... below from: Sent: Thu 2010-06-24 18:02:  Joris Meys 
[jorism...@gmail.com]


#Thank you, it is done and works very well

# - - - - - - - -- - - - - - -- - -
# Now, I try to finish my question to add gruping sybol to the whole set, 
making 
# each observation marked by the name of the interval in which that observation 
is placed.
# to tell the observator, that this observation is between ...A and B, to 
enable sorting, to eneable simple acess using match
in_sub_starting_from-rbind(NA,A,A,A,B,B,B,B,B,B,B,C,C,C,C,C,C)
in_sub_finished_by 
-rbind(NA,B,B,B,C,C,C,C,C,C,C,D,D,D,D,D,D)
in_sub_limited_by-rbind(NA,A-B,A-B,A-B,B-C,B-C,B-C,B-C,B-C,B-C,B-C,C-D,C-D,C-D,C-D,C-D,C-D)
C.df-data.frame(c0,c1,c2,in_sub_starting_from,in_sub_finished_by,in_sub_limited_by)
C.df
#

# Therefore my one more question: 
How is possible to create these vectors automaticly, having  C.df$c2 (and of 
course having also: C.df$c0,C.df$c1), :
C.df$in_sub_starting_from
C.df$in_sub_finished_by
C.df$in_sub_limited_by
#to tell the observator, that this observation is between ...A and B, to enable 
sorting, to eneable simple acess using match


#for example, to make possible this access to data:
#to to take the 7'th observation from any row of data frame,
C.df$c0[7]
C.df$c1[c0==7]
#and could
#find in this same row in_sub_starting_from  that observation is preceded by 
... 
C.df$in_sub_starting_from[c0==7]
#find in this same row in_sub_finished_by  that observation is before ...   
  
C.df$in_sub_finished_by[c0==7]
#find in this same row in_sub_finished_by  that this observation is between ... 

C.df$in_sub_limited_by[c0==7]
#

?





#Thanks for advices, and maybe and this answer, 
#looking impatiently for time with possible access to internet... 

#

Sincerely,
Kaluza


and the beginnig of this story;







-Original Message-
From: Eugeniusz Kaluza
Sent: Thu 2010-06-24 17:12
To: r-help@r-project.org
Subject: PD: [R] ?to calculate sth for groups defined between points in one 
variable (string), / value separating/ spliting variable into groups by i.e. 
between start, NA, NA, stop1, start2, NA, stop2

Dear useRs,

Thanks for advice from Joris Meys, 
Now will try to think how to make it working for less specyfic case, 
to make the problem more general.
Then the result should be displayed for every group between non empty string in 
c2 
i.e. not only result for:
 #mean:
  c1 c3c4   c5
  20  Start1 Stop1 Start1-Stop1
25.48585  Start2 Stop2 Start2-Stop2 

but also for every one group created by space between two closest strings in 
c2, that contains only seriess of Na, NA, NA, separated from time to time by 
one string i.e.:
 #mean:
  c1 c3c4   c5
  20 Start1 Stop1 Start1-Stop1
  .. Stop1 Start2 Stop1-Start2
25.48585  Start2 Stop2 Start2-Stop2 

i.e.
to rewrite this maybe for another simpler version of command

but also for every one group created by space between two closest strings in 
c2, that contains only seriess of Na, NA, NA, separated from time to time by 
one string A, NA, NA, NA, NA, B, NA, NA, NA, C, NA,NA,NA,NA,D, NA,NA
i.e.:
 #mean:
  c1 c3c4   c5
  20  A B  A-B
  ..  B C  B-C
25.48585  C D  C-D 
...


Looking for more general method (function), grouping between these letters in 
c2,
I will now try to study solution proposed by Joris Meys
Thanks for immediate aswer  
Kaluza




-Wiadomosc oryginalna-
Od: Joris Meys [mailto:jorism...@gmail.com]
Wyslano: Cz 2010-06-24 15:14
Do: Eugeniusz Kaluza
DW: r-help@r-project.org
Temat: Re: [R] ?to calculate sth for groups defined between points in one 
variable (string), / value separating/ spliting variable

Re: [R] Optimizing given two vectors of data (confusedSoul)

2010-06-25 Thread Prof. John C Nash


I am trying to estimate an Arrhenius-exponential model in R.  I have one
vector of data containing failure times, and another containing
corresponding temperatures.  I am trying to optimize a maximum likelihood
function given BOTH these vectors.  However, the optim command takes only
one such vector parameter.

How can I pass both vectors into the function?


You need to combine your vectors

  params-c(vecone, vectwo)

Inside your objective function, you will need to split them out again.

However, I have some suspicions that you are referring to the DATA for the function rather 
than the parameters that are being optimized. The data goes into the '...' arguments to 
optim and other optimization tools.


JN

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Wilcoxon signed rank test and its requirements

2010-06-25 Thread Frank E Harrell Jr

The central limit theorem doesn't help.  It just addresses type I error,
not power.

Frank

On 06/25/2010 04:29 AM, Joris Meys wrote:
 As a remark on your histogram : use less breaks! This histogram tells
 you nothing. An interesting function is ?density , eg :
 
 x-rnorm(250)
 hist(x,freq=F)
 lines(density(x),col=red)
 
 See also this ppt, a very nice and short introduction to graphics in R :
 http://csg.sph.umich.edu/docs/R/graphics-1.pdf
 
 2010/6/25 Atte Tenkanenatte...@utu.fi:
 Is there anything for me?

 There is a lot of data, n=2418, but there are also a lot of ties.
 My sample n≈250-300
 
 You should think about the central limit theorem. Actually, you can
 just use a t-test to compare means, as with those sample sizes the
 mean is almost certainly normally distributed.

 i would like to test, whether the mean of the sample differ significantly 
 from the population mean.

 According to probability theory, this will be in 5% of the cases if
 you repeat your sampling infinitly. But as David asked: why on earth
 do you want to test that?
 
 cheers
 Joris
 


-- 
Frank E Harrell Jr   Professor and ChairmanSchool of Medicine
 Department of Biostatistics   Vanderbilt University

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] (no subject)

2010-06-25 Thread Allan Engelhardt


Maybe something like:

y - readLines(foo)
z - strsplit(y, ,)
cols - sort(unique(unlist(z)))  #  Assuming this is what you want for 
column names
m - matrix(0, nrow=length(z), ncol=length(cols), 
dimnames=list(as.character(1:length(z)), cols))

for (i in 1:length(z)) {
m[i, z[[i]]] - 1
}
print(m)
#   A B C D E F G O
# 1 1 1 1 1 0 0 1 0
# 2 1 0 1 0 1 0 0 1
# 3 0 0 0 0 0 1 1 0

Hope this helps you a little.

Allan

On 25/06/10 13:00, ricardo.sousa2...@portugalmail.pt wrote:

Hello,
I'm new in using the R, but from what I read is an excellent tool.
   Would you like if I could help, I am trying create an array from 
reading a text file.


   The idea is to read the file, and transform the data in binary 
format, for example. The calves of this file format.



A,B,C,D,G
A,C,E,O
F,G


Put this away

   a b c d e f g o
1  1 1 1 1 0 0 1 0
2  1 0 1 0 1 0 0 1
3  0 0 0 0 0 1 0 0

 and display in monitor.

  Thanks for the help



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] gsub with regular expression

2010-06-25 Thread Sebastian Kruk

If I have a text with 7 words per line and I would like to put first
and second word joined in a vector and the rest of words one per
column in a matrix how can I do it?

First 2 lines of my text file:
2008/12/31 12:23:31 numero 343.233.233 Rodeo Vaca Ruido
2010/02/01 02:35:31 palabra 111.111.222 abejorro Rodeo Vaca

Results:

Vector:
2008/12/31 12:23:31
2010/02/01 02:35:31

Matrix
numero 343.233.233 Rodeo   Vaca   Ruido
palabra 111.111.222 abejorro Rodeo Vaca

Thks,

Sebastian.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] gsub with regular expression

2010-06-25 Thread Gabor Grothendieck

On Fri, Jun 25, 2010 at 10:48 AM, Sebastian Kruk
residuo.so...@gmail.com wrote:
 If I have a text with 7 words per line and I would like to put first
 and second word joined in a vector and the rest of words one per
 column in a matrix how can I do it?

 First 2 lines of my text file:
 2008/12/31 12:23:31 numero 343.233.233 Rodeo Vaca Ruido
 2010/02/01 02:35:31 palabra 111.111.222 abejorro Rodeo Vaca

 Results:

 Vector:
 2008/12/31 12:23:31
 2010/02/01 02:35:31

 Matrix
 numero 343.233.233 Rodeo   Vaca   Ruido
 palabra 111.111.222 abejorro Rodeo Vaca


Here are two solutions.  Both solutions are three statements long
(read in the data, display the vector, display the matrix).  Replace
textConnection(text) with myfile.dat, say, in each.

1. Here is a sub solution:

L - readLines(textConnection(Lines))
sub((\\S+ \\S+) .*, \\1, L)
sub(\\S+ \\S+ , , L)


2. Here is a solution using zoo:

Lines - 2008/12/31 12:23:31 numero 343.233.233 Rodeo Vaca Ruido
2010/02/01 02:35:31 palabra 111.111.222 abejorro Rodeo Vaca

library(zoo)

z - read.zoo(textConnection(Lines), index = 1:2,
   FUN = function(x) paste(x[,1], x[,2]))

time(z) # the vector
coredata(z) # the matrix


Another possibility would be to convert to chron or POSIXct at the
same time as reading it in:

# chron
library(chron)
z - read.zoo(textConnection(Lines), index = 1:2,
 FUN = function(x) as.chron(paste(x[,1], x[,2]), format = %Y/%m/%d %H:%M:%S))

# POSIXct
z - read.zoo(textConnection(Lines), index = 1:2,
 FUN = function(x) as.POSIXct(paste(x[,1], x[,2]), format = %Y/%m/%d
%H:%M:%S))

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] gsub with regular expression

2010-06-25 Thread Allan Engelhardt


help(strsplit) is your friend, for example:

t - c(2008/12/31 12:23:31 numero 343.233.233 Rodeo Vaca Ruido,
   2010/02/01 02:35:31 palabra 111.111.222 abejorro Rodeo Vaca)
m - do.call(rbind, strsplit(t, [[:space:]]+))  #  Matrix of all the data
v - paste(m[, 1], m[, 2])  #  The vector
m - m[,-c(1,2)]  #  The matrix

Hope this helps a little.

Allan

On 25/06/10 15:48, Sebastian Kruk wrote:

If I have a text with 7 words per line and I would like to put first
and second word joined in a vector and the rest of words one per
column in a matrix how can I do it?

First 2 lines of my text file:
2008/12/31 12:23:31 numero 343.233.233 Rodeo Vaca Ruido
2010/02/01 02:35:31 palabra 111.111.222 abejorro Rodeo Vaca

Results:

Vector:
2008/12/31 12:23:31
2010/02/01 02:35:31

Matrix
numero 343.233.233 Rodeo   Vaca   Ruido
palabra 111.111.222 abejorro Rodeo Vaca

Thks,

Sebastian.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Fortune?

2010-06-25 Thread Bert Gunter

On average, any data manipulation that can be described in a sentence or
two of English can be programmed in one line in R. If you find yourself
writing a long 'for' loop to do something that sounds simple, take a step
back and research if an existing combination of functions can easily handle
your request.

-- Erik Iverson


I nominate this for a Fortune. (email thread in which it appeared below)

-- Bert


Bert Gunter
Genentech Nonclinical Biostatistics
 
 
-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On
Behalf Of Erik Iverson
Sent: Thursday, June 24, 2010 4:14 PM
To: john polo
Cc: r-help@r-project.org
Subject: Re: [R] write a loop for tallies

On average, any data manipulation that can be described in a sentence or two
of 
English can be programmed in one line in R. If you find yourself writing a
long 
'for' loop to do something that sounds simple, take a step back and research
if 
an existing combination of functions can easily handle your request.

john polo wrote:
 Dear R users,
 
 I have a list of numbers such as
 
   n
 [1] 3000 4000 5000 3000 5000 6000 4000 5000 7000 5000 6000 7000
 
 and i'd like to set up a loop that will keep track of the number of 
 occurences of each of the values that occur in the list, e.g.
 
 3000: 2
 4000: 2
 5000: 4
 
 I came up with the following:
 
 a- for (i in 1:length(n)) {
 r-0
 s-0
 t-0
 u-0
 v-0
 ifelse(n[i] == 3000, r - r+1,
 ifelse(n[i] == 4000, s - r+1,
 ifelse(n[i] == 5000, t - r+1,
 ifelse(n[i] == 6000, u - r+1,
 ifelse(n[i] == 7000, v - r+1, NA)
 r-sum(r)
 s-sum(s)
 t-sum(t)
 u-sum(u)
 v-sum(v)
 cat(r = , r, \n)
 cat(s = , s, \n)
 cat(t = , t, \n)
 cat(u = , u, \n)
 cat(v = , v, \n)
 }
 
 However, this is the output:
 
 r =  1
 s =  0
 t =  0
 u =  0
 v =  0
 r =  0
 s =  1
 t =  0
 u =  0
 v =  0
 r =  0
 s =  0
 t =  1
 u =  0
 v =  0
 r =  1
 s =  0
 t =  0
 u =  0
 v =  0
 r =  0
 s =  0
 t =  1
 u =  0
 v =  0
 r =  0
 s =  0
 t =  0
 u =  1
 v =  0
 r =  0
 s =  1
 t =  0
 u =  0
 v =  0
 r =  0
 s =  0
 t =  1
 u =  0
 v =  0
 r =  0
 s =  0
 t =  0
 u =  0
 v =  1
 r =  0
 s =  0
 t =  1
 u =  0
 v =  0
 r =  0
 s =  0
 t =  0
 u =  1
 v =  0
 r =  0
 s =  0
 t =  0
 u =  0
 v =  1
 
 How should i write this loop, please? I've tried variations with if 
 instead of ifelse and receive errors about unexpected { or 
 unexpected ).
 
 regards,
 john
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Wilcoxon signed rank test and its requirements

2010-06-25 Thread Joris Meys

2010/6/25 Frank E Harrell Jr f.harr...@vanderbilt.edu:
 The central limit theorem doesn't help.  It just addresses type I error,
 not power.

 Frank

I don't think I stated otherwise. I am aware of the fact that the
wilcoxon has an Asymptotic Relative Efficiency greater than 1 compared
to the t-test in case of skewed distributions. Apologies if I caused
more confusion.

The problem with the wilcoxon is twofold as far as I understood this
data correctly :
- there are quite some ties
- the wilcoxon assumes under the null that the distributions are the
same, not only the location. The influence of unequal variances and/or
shapes of the distribution is enhanced in the case of unequal sample
sizes.

The central limit theory makes that :
- the t-test will do correct inference in the presence of ties
- unequal variances can be taken into account using the modified
t-test, both in the case of equal and unequal sample sizes

For these reasons, I would personally use the t-test for comparing two
samples from the described population. Your mileage may vary.

Cheers
Joris


 On 06/25/2010 04:29 AM, Joris Meys wrote:
 As a remark on your histogram : use less breaks! This histogram tells
 you nothing. An interesting function is ?density , eg :

 x-rnorm(250)
 hist(x,freq=F)
 lines(density(x),col=red)

 See also this ppt, a very nice and short introduction to graphics in R :
 http://csg.sph.umich.edu/docs/R/graphics-1.pdf

 2010/6/25 Atte Tenkanenatte...@utu.fi:
 Is there anything for me?

 There is a lot of data, n=2418, but there are also a lot of ties.
 My sample n≈250-300

 You should think about the central limit theorem. Actually, you can
 just use a t-test to compare means, as with those sample sizes the
 mean is almost certainly normally distributed.

 i would like to test, whether the mean of the sample differ significantly 
 from the population mean.

 According to probability theory, this will be in 5% of the cases if
 you repeat your sampling infinitly. But as David asked: why on earth
 do you want to test that?

 cheers
 Joris



 --
 Frank E Harrell Jr   Professor and ChairmanSchool of Medicine
 Department of Biostatistics   Vanderbilt University




-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Fast and simple tool for re-sampling of asynchronous time series ?

2010-06-25 Thread bruno Piguet

Hi all,

   I'm looking for a function which could do some fast and simple
re-sampling of asynchronous time series.

   Below is a MCE of the kind of algorithm I need. As you can see, it's
quite crude, but it's enough for my current needs.  The only problem is that
it is quite slow on real use case.
   I've got a C version which is much faster, but I'd like to have a pure-R
program.

   Any pointer to the relevant part of the doc one one of the time-series
packages ? Any suggestion or advice ?

   Thanks in advance,

B. Piguet.

Here is the exemple :
Tx - seq(1, 50, 0.5)
Tx - Tx + rnorm(length(Tx), 0, 0.1)
X - sin(Tx/10.0) +  sin(Tx/5.0) + rnorm(length(Tx), 0, 0.1)
Ty - seq(1, 50, 0.)
Ty - Ty + rnorm(length(Ty), 0, 0.02)
Y - sin(Ty/10.0) + sin(Ty/5.0) + rnorm(length(Ty), 0, 0.1)

w - 0.25

Y_sync - rep(NA, length(Tx))
for (i in 1:length(Tx))
{
   T_min - Tx[i] - w
   T_max - Tx[i] + w
   Y_sync[i] - mean(Y[Ty = T_min  Ty = T_max ])
}

diff = X - Y_sync
print(summary(diff))

print(summary(lm(Y_sync~X)))
plot (diff~Tx, type=l)

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Fortune?

2010-06-25 Thread Liviu Andronic

On Fri, Jun 25, 2010 at 4:17 PM, Bert Gunter gunter.ber...@gene.com wrote:
 On average, any data manipulation that can be described in a sentence or
 two of English can be programmed in one line in R. If you find yourself
 writing a long 'for' loop to do something that sounds simple, take a step
 back and research if an existing combination of functions can easily handle
 your request.

I've already fallen in this trap. A couple hours of reading Rnews on
the apply() family would have saved me a month or so of for()
programming.
Liviu

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] gsub with regular expression

2010-06-25 Thread Gabor Grothendieck

On Fri, Jun 25, 2010 at 11:11 AM, Gabor Grothendieck
ggrothendi...@gmail.com wrote:
 On Fri, Jun 25, 2010 at 10:48 AM, Sebastian Kruk
 residuo.so...@gmail.com wrote:
 If I have a text with 7 words per line and I would like to put first
 and second word joined in a vector and the rest of words one per
 column in a matrix how can I do it?

 First 2 lines of my text file:
 2008/12/31 12:23:31 numero 343.233.233 Rodeo Vaca Ruido
 2010/02/01 02:35:31 palabra 111.111.222 abejorro Rodeo Vaca

 Results:

 Vector:
 2008/12/31 12:23:31
 2010/02/01 02:35:31

 Matrix
 numero 343.233.233 Rodeo   Vaca   Ruido
 palabra 111.111.222 abejorro Rodeo Vaca


 Here are two solutions.  Both solutions are three statements long
 (read in the data, display the vector, display the matrix).  Replace
 textConnection(text) with myfile.dat, say, in each.

 1. Here is a sub solution:

 L - readLines(textConnection(Lines))
 sub((\\S+ \\S+) .*, \\1, L)
 sub(\\S+ \\S+ , , L)

The last line should be:

as.matrix(read.table(textConnection(sub(\\S+ \\S+ , , L)), as.is = TRUE))

3. And a third solution which perhaps is the most obvious:

DF - read.table(textConnection(Lines), as.is = TRUE)
paste(DF[, 1], DF[, 2]) # vector
as.matrix(DF[-(1:2)]) # matrix

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Simple qqplot question

2010-06-25 Thread Bert Gunter

To add to/modify what Joris (and I) previously said:

1. qqplots are not cumulative distribution plots. Hence, as Joris said, the
S-shape indicates short tails/bimodality  compared to the normal. Why you
continue to insist on carrying out normality tests that with so many points
obviously will reject is beyond me! The bimodality is what's important. Why
is it there? What is it telling you about your data (perhaps some sort of
measurement shift...)?

2. My prior suggestion for plotting a reference line -- and Joris's
confidence interval recommendations -- are in some sense wrong. The reason
is that they give the conditional expectation and confidence intervals
thereof of the quantiles of the y distribution conditioned on those of the
x . What you probably want is the correlation line. One simple robust
estimate of this -- and quick to calculate -- is just to mimic qqline() and
calculate the 1st and 3rd quartiles of both distributions and use the line
joining the corresponding quartile pairs ((1st,1st) and (3rd,3rd)) . I leave
the trivial algebra to you -- quantile() gets the quartiles. 

Of course, there's a literature on this if you want to do something
authoritative -- and perhaps R functions somewhere based on it. Perhaps some
kind (and wiser than I) soul will provide references. 

(However, I doubt that the line so obtained will differ appreciably from my
earlier incorrect recommendation, which was probably good enough for
eyeballing in most cases.)

Finally, risking hubris again, I would suggest that if the two distributions
with so many points really are essentially identical, then this is
scientifically uninteresting -- that is, the identity is a logical (and
trivial) consequence of the systematic way in which the data were obtained,
some sort of software (data collection?) issue, or the like -- i.e. not
indicative of a scientifically interesting phenomenon. It might even
indicate a problem with the data/measurements. My reasoning: real
variability prohibits such identity. The identical bimodality may be a clue
here. Again, note that I know nothing about what you are doing, and you are
therefore justified in publicly chastising me for such ignorant speculation
if I am wrong. 

I would welcome comments and criticisms from others on such speculation
also.

HTH,

-- Bert


Bert Gunter
Genentech Nonclinical Biostatistics
 
 

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On
Behalf Of Joris Meys
Sent: Friday, June 25, 2010 2:15 AM
To: Ralf B
Cc: R mailing list
Subject: Re: [R] Simple qqplot question

Sorry, missed the two variable thing. Go with the lm solution then,
and you can tweak the plot yourself (the confidence intervals are
easily obtained via predict(lm.object, interval=prediction) ). The
function qq.plot uses robust regression, but in your case normal
regression will do.

Regarding the shapes : this just indicates both tails are shorter than
expected, so you have a kurtosis greater than 3 (or positive,
depending whether you do the correction or not)

Cheers
Joris

On Fri, Jun 25, 2010 at 4:10 AM, Ralf B ralf.bie...@gmail.com wrote:
 Short rep: I have two distributions, data and data2; each build from
 about 3 million data points; they appear similar when looking at
 densities and histograms. I plotted qqplots for further eye-balling:

 qqplot(data, data2, xlab = 1, ylab = 2)

 and get an almost perfect diagonal line which means they are in fact
 very alike. Now I tried to check normality using qqnorm -- and I think
 I am doing something wrong here:

 qqnorm(data, main = Q-Q normality plot for 1)
 qqnorm(data2, main = Q-Q normality plot for 2)

 I am getting perfect S-shaped curves (??) for both distributions. Am I
 something missing here?

 |
 |                               *  *   *  *
 |                           *
 |                        *
 |                    *
 |               *
 |            *
 |         *
 | * * *
 |-

 Thanks, Ralf




-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Fortune?

2010-06-25 Thread Achim Zeileis


Bert,

thanks for the pointer, added to the devel version of fortunes on 
R-Forge.


thx,
Z

On Fri, 25 Jun 2010, Bert Gunter wrote:


On average, any data manipulation that can be described in a sentence or
two of English can be programmed in one line in R. If you find yourself
writing a long 'for' loop to do something that sounds simple, take a step
back and research if an existing combination of functions can easily handle
your request.

-- Erik Iverson


I nominate this for a Fortune. (email thread in which it appeared below)

-- Bert


Bert Gunter
Genentech Nonclinical Biostatistics


-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On
Behalf Of Erik Iverson
Sent: Thursday, June 24, 2010 4:14 PM
To: john polo
Cc: r-help@r-project.org
Subject: Re: [R] write a loop for tallies

On average, any data manipulation that can be described in a sentence or two
of
English can be programmed in one line in R. If you find yourself writing a
long
'for' loop to do something that sounds simple, take a step back and research
if
an existing combination of functions can easily handle your request.

john polo wrote:

Dear R users,

I have a list of numbers such as

 n
[1] 3000 4000 5000 3000 5000 6000 4000 5000 7000 5000 6000 7000

and i'd like to set up a loop that will keep track of the number of
occurences of each of the values that occur in the list, e.g.

3000: 2
4000: 2
5000: 4

I came up with the following:

a- for (i in 1:length(n)) {
r-0
s-0
t-0
u-0
v-0
ifelse(n[i] == 3000, r - r+1,
ifelse(n[i] == 4000, s - r+1,
ifelse(n[i] == 5000, t - r+1,
ifelse(n[i] == 6000, u - r+1,
ifelse(n[i] == 7000, v - r+1, NA)
r-sum(r)
s-sum(s)
t-sum(t)
u-sum(u)
v-sum(v)
cat(r = , r, \n)
cat(s = , s, \n)
cat(t = , t, \n)
cat(u = , u, \n)
cat(v = , v, \n)
}

However, this is the output:

r =  1
s =  0
t =  0
u =  0
v =  0
r =  0
s =  1
t =  0
u =  0
v =  0
r =  0
s =  0
t =  1
u =  0
v =  0
r =  1
s =  0
t =  0
u =  0
v =  0
r =  0
s =  0
t =  1
u =  0
v =  0
r =  0
s =  0
t =  0
u =  1
v =  0
r =  0
s =  1
t =  0
u =  0
v =  0
r =  0
s =  0
t =  1
u =  0
v =  0
r =  0
s =  0
t =  0
u =  0
v =  1
r =  0
s =  0
t =  1
u =  0
v =  0
r =  0
s =  0
t =  0
u =  1
v =  0
r =  0
s =  0
t =  0
u =  0
v =  1

How should i write this loop, please? I've tried variations with if
instead of ifelse and receive errors about unexpected { or
unexpected ).

regards,
john

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Fast and simple tool for re-sampling of asynchronous time series ?

2010-06-25 Thread Charles C. Berry


On Fri, 25 Jun 2010, bruno Piguet wrote:


Hi all,

  I'm looking for a function which could do some fast and simple
re-sampling of asynchronous time series.

  Below is a MCE of the kind of algorithm I need. As you can see, it's
quite crude, but it's enough for my current needs.  The only problem is that
it is quite slow on real use case.
  I've got a C version which is much faster, but I'd like to have a pure-R
program.

  Any pointer to the relevant part of the doc one one of the time-series
packages ? Any suggestion or advice ?

  Thanks in advance,

B. Piguet.

Here is the exemple :
Tx - seq(1, 50, 0.5)
Tx - Tx + rnorm(length(Tx), 0, 0.1)
X - sin(Tx/10.0) +  sin(Tx/5.0) + rnorm(length(Tx), 0, 0.1)
Ty - seq(1, 50, 0.)
Ty - Ty + rnorm(length(Ty), 0, 0.02)
Y - sin(Ty/10.0) + sin(Ty/5.0) + rnorm(length(Ty), 0, 0.1)

w - 0.25



Personally, I'd incline towards leaving the next lines to C, perhaps using 
the inline package.


But if you want a purely R solution, the bioConductor IRanges package 
should help. I think the viewMeans() function will handle this loop.


See

http://comments.gmane.org/gmane.comp.lang.r.sequencing/1296

for some discussion.

HTH,

Chuck



Y_sync - rep(NA, length(Tx))
for (i in 1:length(Tx))
{
  T_min - Tx[i] - w
  T_max - Tx[i] + w
  Y_sync[i] - mean(Y[Ty = T_min  Ty = T_max ])
}

diff = X - Y_sync
print(summary(diff))

print(summary(lm(Y_sync~X)))
plot (diff~Tx, type=l)

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



Charles C. Berry(858) 534-2098
Dept of Family/Preventive Medicine
E mailto:cbe...@tajo.ucsd.edu   UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] best way to plot a evolution in time

2010-06-25 Thread Tal Galili

Hi Nana,

The question is not fully clear to me.

Are you looking to plot the (let's call it) family tree of the genes ?

(if so, then using
plot(hclust(gene.dist))
Might be a direction for you)

Tal


Contact
Details:---
Contact me: tal.gal...@gmail.com |  972-52-7275845
Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) |
www.r-statistics.com (English)
--




On Fri, Jun 25, 2010 at 8:56 AM, nana adriana_f...@yahoo.co.uk wrote:

 a - c( 2 , 5, 15, 16)
 b - c(1 ,1, 8 , 8)
 c - c (10, 10 11 ,11)
 m-matrix(c(a,b,c),byrow=T,nrow=3)
 rownames(m)-c(gene a, 'gene b','gene c')
 m
 gene.dist-dist(m,method='euclidian')
 gene.dist


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] 3D generalization of correlogram?

2010-06-25 Thread Dieter Menne

We have 3-Dimensional MRI density recordings of tumor tissue and would like
to have a measure of patchiness, reflecting cluster size in the tissue.

For 2-D slices, correlogram from MASS works well. Does someone know of a
packages that provides a 3-D generalization of this measure? Or any
alternatives? I assume this is quite a common question in climate research.

Dieter

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Fast and simple tool for re-sampling of asynchronous time series ?

2010-06-25 Thread Gabor Grothendieck

On Fri, Jun 25, 2010 at 11:34 AM, bruno Piguet bruno.pig...@gmail.com wrote:
 Hi all,

   I'm looking for a function which could do some fast and simple
 re-sampling of asynchronous time series.

   Below is a MCE of the kind of algorithm I need. As you can see, it's
 quite crude, but it's enough for my current needs.  The only problem is that
 it is quite slow on real use case.
   I've got a C version which is much faster, but I'd like to have a pure-R
 program.

   Any pointer to the relevant part of the doc one one of the time-series
 packages ? Any suggestion or advice ?

   Thanks in advance,

 B. Piguet.

 Here is the exemple :
 Tx - seq(1, 50, 0.5)
 Tx - Tx + rnorm(length(Tx), 0, 0.1)
 X - sin(Tx/10.0) +  sin(Tx/5.0) + rnorm(length(Tx), 0, 0.1)
 Ty - seq(1, 50, 0.)
 Ty - Ty + rnorm(length(Ty), 0, 0.02)
 Y - sin(Ty/10.0) + sin(Ty/5.0) + rnorm(length(Ty), 0, 0.1)

 w - 0.25

 Y_sync - rep(NA, length(Tx))
 for (i in 1:length(Tx))
 {
   T_min - Tx[i] - w
   T_max - Tx[i] + w
   Y_sync[i] - mean(Y[Ty = T_min  Ty = T_max ])
 }

 diff = X - Y_sync
 print(summary(diff))

 print(summary(lm(Y_sync~X)))
 plot (diff~Tx, type=l)

This isn't substantially different than what you have but does replace
the explicit loop and associated indexing with an implicit loop:

   sapply(Tx, function(tx) mean(Y[Ty = tx-w  Ty = tx+w]))

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] (no subject)

2010-06-25 Thread Berend Hasselman



You posted the exact same question several days ago (June 17) under a
different name.
You got two perfectly good and adequate answers.

/Berend
-- 
View this message in context: 
http://r.789695.n4.nabble.com/no-subject-tp2268375p2268685.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Wilcoxon signed rank test and its requirements

2010-06-25 Thread Frank E Harrell Jr

You still are stating the effect of the central limit theorem
incorrectly.  Please see my previous note.

Frank

On 06/25/2010 10:27 AM, Joris Meys wrote:
 2010/6/25 Frank E Harrell Jrf.harr...@vanderbilt.edu:
 The central limit theorem doesn't help.  It just addresses type I error,
 not power.

 Frank
 
 I don't think I stated otherwise. I am aware of the fact that the
 wilcoxon has an Asymptotic Relative Efficiency greater than 1 compared
 to the t-test in case of skewed distributions. Apologies if I caused
 more confusion.
 
 The problem with the wilcoxon is twofold as far as I understood this
 data correctly :
 - there are quite some ties
 - the wilcoxon assumes under the null that the distributions are the
 same, not only the location. The influence of unequal variances and/or
 shapes of the distribution is enhanced in the case of unequal sample
 sizes.
 
 The central limit theory makes that :
 - the t-test will do correct inference in the presence of ties
 - unequal variances can be taken into account using the modified
 t-test, both in the case of equal and unequal sample sizes
 
 For these reasons, I would personally use the t-test for comparing two
 samples from the described population. Your mileage may vary.
 
 Cheers
 Joris
 

 On 06/25/2010 04:29 AM, Joris Meys wrote:
 As a remark on your histogram : use less breaks! This histogram tells
 you nothing. An interesting function is ?density , eg :

 x-rnorm(250)
 hist(x,freq=F)
 lines(density(x),col=red)

 See also this ppt, a very nice and short introduction to graphics in R :
 http://csg.sph.umich.edu/docs/R/graphics-1.pdf

 2010/6/25 Atte Tenkanenatte...@utu.fi:
 Is there anything for me?

 There is a lot of data, n=2418, but there are also a lot of ties.
 My sample n≈250-300

 You should think about the central limit theorem. Actually, you can
 just use a t-test to compare means, as with those sample sizes the
 mean is almost certainly normally distributed.

 i would like to test, whether the mean of the sample differ significantly 
 from the population mean.

 According to probability theory, this will be in 5% of the cases if
 you repeat your sampling infinitly. But as David asked: why on earth
 do you want to test that?

 cheers
 Joris



 --
 Frank E Harrell Jr   Professor and ChairmanSchool of Medicine
  Department of Biostatistics   Vanderbilt University

 
 
 


-- 
Frank E Harrell Jr   Professor and ChairmanSchool of Medicine
 Department of Biostatistics   Vanderbilt University

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Handouts / Reports or just simply printing text to PDF?

2010-06-25 Thread Tal Galili

Hi Ralf,


?pdf and ?png are good places to start.

There is also R2wd:
http://cran.r-project.org/web/packages/R2wd/index.html
For exporting R output to word.  I wrote a short tutorial session for it
here:
http://www.r-statistics.com/2010/05/exporting-r-output-to-ms-word-with-r2wd-an-example-session/

Best,
Tal



Contact
Details:---
Contact me: tal.gal...@gmail.com |  972-52-7275845
Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) |
www.r-statistics.com (English)
--




On Fri, Jun 25, 2010 at 5:35 AM, Ralf B ralf.bie...@gmail.com wrote:

 I assume R won't easily generate nice reports (unless one starts using
 Sweave and LaTeX) but perhaps somebody here knows a package that can
 create report like output for special cases? How can I simply plot
 output into PDF? Perhaps you know a package I should check out? What
 do you guys do to create handouts (before actually publishing)?

 Thanks in advance,
 Ralf

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] ggplot2: deterministic position_jitter geom_line with position_jitter

2010-06-25 Thread Hadley Wickham

 I'm having the same problem as Stephan (see below), but what I'm trying to
 jitter is not a numeric vector, but a factor. How do I proceed? (Naively
 jittering a factor makes it numeric, no longer factor, so I don't get the
 custom ordering which conveniently comes with using a factor. I'm not sure
 how I would simulate that custom ordering with the jittered vector ... I
 couldn't find anything online about jittering factors, but maybe I just
 wasn't searching cleverly enough.)

You'll probably need to reorder the factor, then jitter it, and then
add custom labels with scale_continuous().  I think I see how to
resolve this problem in general (two displays of the same jittered
data), but it requires basically a complete rewrite of ggplot2, so
it's unlikely to appear before ggplot3.

Hadley


-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] variograms and kriging

2010-06-25 Thread Steve_Friedman


Hello

Trying to develop variograms and kriged surfaces from a point file. Here is
what I've done so far.


library(gstat)  # also loads library(sp)
library(lattice)

 soilpts$x - soilpts$UTM_X
 soilpts$y - soilpts$UTM_Y
 soil.dat - subset(soilpts, select=c(x, y, Area, BulkDensity, LOI, TP, TN,
TC, Total_Mg))

dim(soil.dat)
[1] 12927

 coordinates(soil.dat) - ~ x+y

 gridded(soil.dat) - TRUE

Warning messages:
1: In points2grid(points, tolerance, round, fuzz.tol) :
  grid has empty column/rows in dimension 1
2: In points2grid(points, tolerance, round, fuzz.tol) :
  grid has empty column/rows in dimension 2

 class(soil.dat)
[1] SpatialPixelsDataFrame
attr(,package)
[1] sp
 bbox(soil.dat)
  min max
x  476819  575981
y 2785749 2948128

 soil.dat[1:3,]

suggested tolerance minimum: 0.165318957771788
Error in points2grid(points, tolerance, round, fuzz.tol) :
  dimension 1 : coordinate intervals are not constant


The last error message and the warning returned above,  leads me to think
that the spatial sampling locations must be regular equally spaced.  My
data thou is not

I have spent the morning trying to figure this out - going back and forth
among many spatial packages that can do variograms and krigging.  Without a
good road map to follow however, I've had to a number of about faces.  Not
sure which way to turn now.

Can anyone provide guidance?

Using Windows WP and R 2.11.1 packages updated today.

Thanks
Steve


Steve Friedman Ph. D.
Spatial Statistical Analyst
Everglades and Dry Tortugas National Park
950 N Krome Ave (3rd Floor)
Homestead, Florida 33034

steve_fried...@nps.gov
Office (305) 224 - 4282
Fax (305) 224 - 4147

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Euclidean Distance Matrix Analysis (EDMA) in R?

2010-06-25 Thread Kjetil Halvorsen

There is a freely downloadable and very relevant ( readable) book at
https://ccrma.stanford.edu/~dattorro/mybook.html
Convex Optimization and Euclidean Distance geometry, and it indeed names EDMA
as a form of multidimensional scaling (or maybe in the oposite way).
You should have a look
at the codes for multidimensional scaling in R.

Kjetil

On Fri, Jun 25, 2010 at 6:25 AM, gokhanocakoglu ocako...@uludag.edu.tr wrote:

 thanks for your interests Joris



 Gokhan OCAKOGLU
 Uludag University
 Faculty of Medicine
 Department of Biostatistics
 http://www20.uludag.edu.tr/~biostat/ocakoglui.htm

 --
 View this message in context: 
 http://r.789695.n4.nabble.com/Euclidean-Distance-Matrix-Analysis-EDMA-in-R-tp2266797p2268257.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] variograms and kriging

2010-06-25 Thread Steve_Friedman

Please disregard.  I've posted to the wrong site.



Steve Friedman Ph. D.
Spatial Statistical Analyst
Everglades and Dry Tortugas National Park
950 N Krome Ave (3rd Floor)
Homestead, Florida 33034

steve_fried...@nps.gov
Office (305) 224 - 4282
Fax (305) 224 - 4147


   
 steve_fried...@np 
 s.gov 
 Sent by:   To 
 r-help-boun...@r- r-help@r-project.org
 project.orgcc 
   
   Subject 
 06/25/2010 01:38  [R] variograms and kriging  
 PM
   
   
   
   
   





Hello

Trying to develop variograms and kriged surfaces from a point file. Here is
what I've done so far.


library(gstat)  # also loads library(sp)
library(lattice)

 soilpts$x - soilpts$UTM_X
 soilpts$y - soilpts$UTM_Y
 soil.dat - subset(soilpts, select=c(x, y, Area, BulkDensity, LOI, TP, TN,
TC, Total_Mg))

dim(soil.dat)
[1] 12927

 coordinates(soil.dat) - ~ x+y

 gridded(soil.dat) - TRUE

Warning messages:
1: In points2grid(points, tolerance, round, fuzz.tol) :
  grid has empty column/rows in dimension 1
2: In points2grid(points, tolerance, round, fuzz.tol) :
  grid has empty column/rows in dimension 2

 class(soil.dat)
[1] SpatialPixelsDataFrame
attr(,package)
[1] sp
 bbox(soil.dat)
  min max
x  476819  575981
y 2785749 2948128

 soil.dat[1:3,]

suggested tolerance minimum: 0.165318957771788
Error in points2grid(points, tolerance, round, fuzz.tol) :
  dimension 1 : coordinate intervals are not constant


The last error message and the warning returned above,  leads me to think
that the spatial sampling locations must be regular equally spaced.  My
data thou is not

I have spent the morning trying to figure this out - going back and forth
among many spatial packages that can do variograms and krigging.  Without a
good road map to follow however, I've had to a number of about faces.  Not
sure which way to turn now.

Can anyone provide guidance?

Using Windows WP and R 2.11.1 packages updated today.

Thanks
Steve


Steve Friedman Ph. D.
Spatial Statistical Analyst
Everglades and Dry Tortugas National Park
950 N Krome Ave (3rd Floor)
Homestead, Florida 33034

steve_fried...@nps.gov
Office (305) 224 - 4282
Fax (305) 224 - 4147

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Sweave: The opposite of tangle

2010-06-25 Thread stefan.d...@gmail.com

Thanks! That was exactly what I was looking for.
Best,
Stefan


On Fri, Jun 25, 2010 at 12:37 PM, Kevin E. Thorpe
kevin.tho...@utoronto.ca wrote:
 Kevin E. Thorpe wrote:

 stefan.d...@gmail.com wrote:

 Hi,
 I am using Sweave to write an article. If I want to convert the *.rnw
 to a *.tex file I have to run Sweave which might take a long time. Is
 there away to get a tex-file as result without (evaluating) the
 R-chunks, i.e. the opposite of tangle (that just gives R-chunk).
 Thanks,
 Stefan


 This is untested, but does Sweave(file.rnw, eval=FASLE) do what you
 want?


 That should be FALSE above.  Don't post before coffee.
 h

 --
 Kevin E. Thorpe
 Biostatistician/Trialist, Knowledge Translation Program
 Assistant Professor, Dalla Lana School of Public Health
 University of Toronto
 email: kevin.tho...@utoronto.ca  Tel: 416.864.5776  Fax: 416.864.3016


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Forcing scalar multiplication.

2010-06-25 Thread rkevinburton

I am trying to check the results from an Eigen decomposition and I need to 
force a scalar multiplication. The fundamental equation is: Ax = lx. Where 'l' 
is the eigen value and x is the eigen vector corresponding to the eigenvalue. 
'R' returns the eigenvalues as a vector (e - eigen(A); e$values). So in order 
to 'check' the result I would multiply the eigenvalues ('l') by the 
eigenvectors. But unless I do it one by one (say e$values[1] * e$vectors[,1]) 
'R' tries a matrix multiplication and that is not what I want.  I would like a 
matrix that is formed by the SCALAR multiplication of each of the values by the 
corresponding eigenvector. How can I force such a multiplication?

Thank you.

Kevin

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Forcing scalar multiplication.

2010-06-25 Thread rkevinburton

I am trying to check the results from an Eigen decomposition and I need to 
force a scalar multiplication. The fundamental equation is: Ax = lx. Where 'l' 
is the eigen value and x is the eigen vector corresponding to the eigenvalue. 
'R' returns the eigenvalues as a vector (e - eigen(A); e$values). So in order 
to 'check' the result I would multiply the eigenvalues ('l') by the 
eigenvectors. But unless I do it one by one (say e$values[1] * e$vectors[,1]) 
'R' tries a matrix multiplication and that is not what I want.  I would like a 
matrix that is formed by the SCALAR multiplication of each of the values by the 
corresponding eigenvector. How can I force such a multiplication?

Thank you.

Kevin

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Wilcoxon signed rank test and its requirements

2010-06-25 Thread Daniel Malter


Atte, I would not wonder if you got lost and confused by the certainly
interesting methodological discussion that has been going on in this thread.

Since the helpers do not seem to converge/agree, I propose to you to use a
different nonparametric approach: The bootstrap. The important thing about
the bootstrap is that you do not have to be concerned with the questions
that have been discussed in this thread.

In the bootstrap you draw repeatedly samples with replacement from your data
and compute the statistic you are interested in (for you this is the mean).
The beauty of this approach is i) that the bootstrap distribution is normal
and ii) that you can directly compare the quantiles/confidence intervals of
the bootstrap distribution.

Let's say you have x and y, which both come from Poisson distributions with
relatively low means. Note that this resembles your data in that the
distributions are asymmetric, but contain a considerable number of ties.

#set seed for random number generation
set.seed(123)

#simulate x and y (these would be your data)
x=rpois(100,3)
y=rpois(100,4)

#plot histograms for x and y
par(mfcol=c(1,2))
hist(x,breaks=length(unique(x)))
hist(y,breaks=length(unique(y))) 


Now we sample with replacement from x and y (i.e., we draw one observation
from x and one from y, and afterwards we put the drawn observation back into
x and y, respectively). For each bootstrap of x and y, respectively, we
sample exactly as many observations as there are in x and y, respectively
(here 100). We then compute the statistic of interest of this bootstrap
(here the mean). We repeat this process many times (here 1000).


n=1000 #number of bootstraps to draw
x.boot1=numeric(n)
y.boot1=numeric(n)
for(i in 1:1000){
  x.boot1[i]=mean(sample(x,length(x),replace=T))
  y.boot1[i]=mean(sample(y,length(y),replace=T))
} 

Doing this, we draw the bootstrap distribution of the mean of x and y,
respectively. Note that the bootstrap distribution is normally distributed
and unbiased (the latter automatically because we bootstrap the mean):

par(mfcol=c(1,2))
hist(x.boot1)
hist(y.boot1)

The simple(st) way of comparing these distributions is by checking whether
their confidence intervals overlap or not. You get the 95-percent confidence
intervals by

quantile(x.boot1,p=c(0.025,0.975))
quantile(y.boot1,p=c(0.025,0.975))

If they do not overlap, you would conclude that they are significantly
different. In the one-sample case, you would just compare whether value of
interest is within or outside the confidence interval.

Finally, note that the little loop that we have programmed to draw the
bootstraps are already implemented in an R package. Using the bootstrap
package, you could draw the bootstraps analogously by:

library(bootstrap)
x.boot2=bootstrap(x,nboot=1000,mean)
y.boot2=bootstrap(y,nboot=1000,mean)

The bootstrapped means are then stored in x.boot2$thetastar and
y.boot2$thetastar.

Hope that helps,
Daniel











This process we repeatAnd now we draw many bootstraps, r
-- 
View this message in context: 
http://r.789695.n4.nabble.com/Wilcoxon-signed-rank-test-and-its-requirements-tp2266165p2268801.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Trying to tile wireframe plots (using lattice package)

2010-06-25 Thread Magnus Torfason


Hi all,

I'm trying to print a number of wireframe plots (generated using the 
lattice package), and I want them to appear in a two-by two matrix along 
with some other (standard) plots. In other words I am trying to create a 
subplot or tiled plot that works for wireframes.


I've tried the methods discussed in:
http://tolstoy.newcastle.edu.au/R/e2/help/07/07/21238.html
but while they work for hist(), they don't work for wireframe().

I've also tried split.screen() and layout() - see below:

## Example of what I'm trying to do
library(lattice)
layout(matrix(c(1,2,3,4), 2, 2, byrow = TRUE))
# Top-left, as expected
plot(rnorm(100),rnorm(100))
# Top-right, as expected
plot(rnorm(100),rnorm(100))
# But the volcano fills the whole the device ...
wireframe(volcano)
## End of example

All has been to no avail up until now. I'd be grateful for any 
suggestions you may have.


Best,
Magnus

ps. If there is a way to do this using intermediate files (saving each 
plot as a PS file, and then tiling multiple PS files within the same 
device), that would be a totally acceptable solution for me as well.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Wilcoxon signed rank test and its requirements

2010-06-25 Thread Greg Snow

Let me see if I understand.  You actually have the data for the whole 
population (the entire piece) but you have some pre-defined sections that you 
want to see if they differ from the population, or more meaningfully they are 
different from a randomly selected set of measures.  Is that correct?

If so, since you have the entire population of interest you can create the 
actual sampling distribution (or a good approximation of it).  Just take random 
samples from the population of the given size (matching the subset you are 
interested in) and calculate the means (or other value of interest), probably 
10,000 to 1,000,000 samples.  Now compare the value from your predefined subset 
to the set of random values you generated to see if it is in the tail or not.

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.org
801.408.8111


 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-
 project.org] On Behalf Of Atte Tenkanen
 Sent: Thursday, June 24, 2010 11:04 PM
 To: David Winsemius
 Cc: R mailing list
 Subject: Re: [R] Wilcoxon signed rank test and its requirements
 
 The values come from this kind of process:
 The musical composition is segmented into so-called 'pitch-class
 segments' and these segments are compared with one reference set with a
 distance function. Only some distance values are possible. These
 distance values can be averaged over music bars which produces smoother
 distribution and the 'comparison curve' that illustrates the distances
 according to the reference set through a musical piece result in more
 readable curve (see e.g. http://users.utu.fi/attenka/with6.jpg ), but I
 would prefer to use original values.
 
 then, I want to pick only some regions from the piece and compare those
 values of those regions, whether they are higher than the mean of all
 values.
 
 Atte
 
  On Jun 24, 2010, at 6:58 PM, Atte Tenkanen wrote:
 
   Is there anything for me?
  
   There is a lot of data, n=2418, but there are also a lot of ties.
   My sample n≈250-300
  
 
  I do not understand why there should be so many ties. You have not
  described the measurement process or units. ( ... although you offer
 a
 
  glipmse without much background  later.)
 
   i would like to test, whether the mean of the sample differ
   significantly from the population mean.
 
  Why? What is the purpose of this investigation? Why should the mean
 of
 
  a sample be that important?
 
  
   The histogram of the population looks like in attached histogram,
   what test should I use? No choices?
  
   This distribution comes from a musical piece and the values are
   'tonal distances'.
  
   http://users.utu.fi/attenka/Hist.png
 
  That picture does not offer much insidght into the features of that
  measurement. It appears to have much more structure than I would
  expect for a sample from a smooth unimodal underlying population.
 
  --
  David.
 
  
   Atte
  
   On 06/24/2010 12:40 PM, David Winsemius wrote:
  
   On Jun 23, 2010, at 9:58 PM, Atte Tenkanen wrote:
  
   Thanks. What I have had to ask is that
  
   how do you test that the data is symmetric enough?
   If it is not, is it ok to use some data transformation?
  
   when it is said:
  
   The Wilcoxon signed rank test does not assume that the data are
   sampled from a Gaussian distribution. However it does assume
 that
 
   the
   data are distributed symmetrically around the median. If the
   distribution is asymmetrical, the P value will not tell you much
 
   about
   whether the median is different than the hypothetical value.
  
   You are being misled. Simply finding a statement on a statistics
   software website, even one as reputable as Graphpad (???), does
 not
   mean
   that it is necessarily true. My understanding (confirmed
 reviewing
   Nonparametric statistical methods for complete and censored
 data
   by M.
   M. Desu, Damaraju Raghavarao, is that the Wilcoxon signed-rank
 test
   does
   not require that the underlying distributions be symmetric. The
   above
   quotation is highly inaccurate.
  
  
   To add to what David and others have said, look at the kernel that
 
   the
  
   U-statistic associated with the WSR test uses: the indicator (0/1)
  of
   xi
   + xj  0.  So WSR tests H0:p=0.5 where p = the probability that
 the
   average of a randomly chosen pair of values is positive.  [If
 there
   are
   ties this probably needs to be worded as P[xi + xj  0] = P[xi +
 xj
  
  
   0], i neq j.
  
   Frank
  
   --
   Frank E Harrell Jr   Professor and ChairmanSchool of
 Medicine
Department of Biostatistics   Vanderbilt
   University
 
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-
 guide.html
 and provide commented, minimal, self-contained, reproducible code.

Re: [R] Trying to tile wireframe plots (using lattice package)

2010-06-25 Thread Greg Snow

The layout function is base graphics, wireframe from lattice is grid based and 
they don't play well together without extra effort.  The simplest option will 
probably be to look at the help page for print.trellis, specifically the split 
and more arguments.  Then look at the examples to see if this works for you in 
place of layout.

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.org
801.408.8111


 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-
 project.org] On Behalf Of Magnus Torfason
 Sent: Friday, June 25, 2010 12:50 PM
 To: r-help@r-project.org
 Subject: [R] Trying to tile wireframe plots (using lattice package)
 
 Hi all,
 
 I'm trying to print a number of wireframe plots (generated using the
 lattice package), and I want them to appear in a two-by two matrix
 along
 with some other (standard) plots. In other words I am trying to create
 a
 subplot or tiled plot that works for wireframes.
 
 I've tried the methods discussed in:
 http://tolstoy.newcastle.edu.au/R/e2/help/07/07/21238.html
 but while they work for hist(), they don't work for wireframe().
 
 I've also tried split.screen() and layout() - see below:
 
 ## Example of what I'm trying to do
 library(lattice)
 layout(matrix(c(1,2,3,4), 2, 2, byrow = TRUE))
 # Top-left, as expected
 plot(rnorm(100),rnorm(100))
 # Top-right, as expected
 plot(rnorm(100),rnorm(100))
 # But the volcano fills the whole the device ...
 wireframe(volcano)
 ## End of example
 
 All has been to no avail up until now. I'd be grateful for any
 suggestions you may have.
 
 Best,
 Magnus
 
 ps. If there is a way to do this using intermediate files (saving each
 plot as a PS file, and then tiling multiple PS files within the same
 device), that would be a totally acceptable solution for me as well.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-
 guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Delete rows in the data frame by limiting values in two columns

2010-06-25 Thread Yi

Hi, folks,

Finally Friday~~  Here comes the question:

x=c('germany','poor italy','usa','england','poor italy','japan')
y=c('Spain','germany','usa','brazil','england','chile')
s=1:6
z=3:8
test=data.frame(x,y,s,z)

#Now I only concern the countries ('germany','england','brazil'). I would
like to keep the rows where these three countries
#are involved either in test$x OR test$y. So the result should be like as
follows (I did this manually  ):

xy   s z
1germany   Spain 1 3
2 poor italy germany 2 4
3england  Brazil 4 6
4 poor italy england 5 7

Any codes work for this?

Thanks great in advance.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Wilcoxon signed rank test and its requirements

2010-06-25 Thread Thomas Lumley


On Thu, 24 Jun 2010, Atte Tenkanen wrote:


On Jun 23, 2010, at 9:58 PM, Atte Tenkanen wrote:


Thanks. What I have had to ask is that

how do you test that the data is symmetric enough?
If it is not, is it ok to use some data transformation?

when it is said:

The Wilcoxon signed rank test does not assume that the data are
sampled from a Gaussian distribution. However it does assume that
the data are distributed symmetrically around the median. If the
distribution is asymmetrical, the P value will not tell you much
about whether the median is different than the hypothetical value.


You are being misled. Simply finding a statement on a statistics
software website, even one as reputable as Graphpad (???), does not
mean that it is necessarily true. My understanding (confirmed
reviewing Nonparametric statistical methods for complete and censored

data by M. M. Desu, Damaraju Raghavarao, is that the Wilcoxon signed-

rank test does not require that the underlying distributions be
symmetric. The above quotation is highly inaccurate.

--
David.


Thanks. Unfortunately, I can't follow the reference at all, but I read this in 
that way that I can be carefree as far as the underlying distribution is 
concerned?

Is there any other authoritative reference where that is just stated in a way test 
does not require that the underlying distributions be   symmetric or normal.



The statement from GraphPad is correct, but for a different question.  Let me 
expound.

First let us consider means:

If you have paired samples X1.. Xn and Y1..Yn you could ask if the mean of X is 
equal to the mean of Y, or if the mean of (X-Y) is zero.   These are equivalent 
questions, because of the way the mean is defined.   So the paired t-test, 
which answers the first question, and the one-sample t-test, which answers the 
second question, are equivalent.  They have no assumptions (other than 
sufficient sample size for the means to be Normally distributed).


Now, let us consider medians.
f you have paired samples X1.. Xn and Y1..Yn you could ask if the median of X 
is equal to the median of Y, or if the median of (X-Y) is zero.  The first 
question can be answered by any standard test (though there are ways to do it). 
 The second is answered by the sign test.  They are not at all equivalent: it 
is possible for the median of X to be larger than the median of Y but the 
median of (X-Y) to be negative.   The non-equivalence is true for essentially 
all statistics except for the mean.

Now, let us consider the Wilcoxon signed-rank test.
This can be characterized precisely as a test of the null hypothesis that the 
median pairwise mean of  X-Y is zero. That is, take all n(n-1)/2 pairs of 
(X-Y)s.  Take the mean of each pair to get n(n-1)/2 pairwise means. Take the 
median of these numbers.  The p-value will be 0.5 one-sided or 1.0 two-sided 
when this median pairwise mean is exactly zero.  The median pairwise mean is 
also sometimes known as the Hodges-Lehmann estimator (though this is strictly 
speaking a more general term).

As David correctly points out, no assumptions are needed for the Wilcoxon signed-rank 
test to be a test of *this* null hypothesis.   The problem is that this may not be the 
null hypothesis you care about.  As GraphPad correctly points out, the P value will 
not tell you much about whether the *median* is different than the hypothetical 
value because the median is not the same as the median pairwise mean.  It is 
entirely possible for the median difference to be positive and the median pairwise mean 
difference to be zero or negative.

If you assume that the distribution of differences X-Y is symmetric, then the 
Wilcoxon signed-rank test also tests the null hypothesis that the median of X-Y 
is zero (and that the mean of X-Y is zero), because these null hypotheses are 
equivalent for a symmetric distribution.  That's what GraphPad is saying

You could also assume that the distributions X and Y are stochastically 
ordered.  This basically implies that the direction of difference is the same 
no matter what location statistic you use to measure it. If X was before some 
intervention and Y was afterwards you would basically be assuming that the 
intervention is either beneficial for everyone or harmful for everyone (up to 
measurement error). Under this assumption, the signed rank test also tells you 
reliably about differences in medians.

To some extent this is a philosophical issue.  My preference is to know exactly 
what a test is doing and to make these distinctions.   Many other people, 
including reputable experts like Frank Harrell, believe (I think) that 
simplifying assumptions such as stochastic ordering are a pretty good 
approximation in a lot of situations, so it isn't necessary to always make 
these distinctions.


 -thomas

Thomas Lumley   Assoc. Professor, Biostatistics
tlum...@u.washington.eduUniversity of Washington, Seattle

Re: [R] Delete rows in the data frame by limiting values in two columns

2010-06-25 Thread Henrique Dallazuanna

Try this:
test[rowSums(mapply('%in%', test[c('x', 'y')],
list(c('germany','england','brazil'  0,]

On Fri, Jun 25, 2010 at 4:00 PM, Yi liuyi.fe...@gmail.com wrote:

 Hi, folks,

 Finally Friday~~  Here comes the question:

 x=c('germany','poor italy','usa','england','poor italy','japan')
 y=c('Spain','germany','usa','brazil','england','chile')
 s=1:6
 z=3:8
 test=data.frame(x,y,s,z)

 #Now I only concern the countries ('germany','england','brazil'). I would
 like to keep the rows where these three countries
 #are involved either in test$x OR test$y. So the result should be like as
 follows (I did this manually  ):

xy   s z
 1germany   Spain 1 3
 2 poor italy germany 2 4
 3england  Brazil 4 6
 4 poor italy england 5 7

 Any codes work for this?

 Thanks great in advance.

[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Henrique Dallazuanna
Curitiba-Paraná-Brasil
25° 25' 40 S 49° 16' 22 O

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Delete rows in the data frame by limiting values in two columns

2010-06-25 Thread Erik Iverson




x=c('germany','poor italy','usa','england','poor italy','japan')
y=c('Spain','germany','usa','brazil','england','chile')
s=1:6
z=3:8
test=data.frame(x,y,s,z)

#Now I only concern the countries ('germany','england','brazil'). I would
like to keep the rows where these three countries
#are involved either in test$x OR test$y. So the result should be like as
follows (I did this manually  ):

xy   s z
1germany   Spain 1 3
2 poor italy germany 2 4
3england  Brazil 4 6
4 poor italy england 5 7

Any codes work for this?


ss - c(germany, england, brazil)
subset(test, x %in% ss | y %in% ss)

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Popularity of R, SAS, SPSS, Stata...

2010-06-25 Thread Muenchen, Robert A (Bob)

I had taken the opposite tack with Google Trends by subtracting keywords
like:
SAS -shoes -airlines -sonar... 
but never got as good results as that beautiful X code for search.
When you see the end-of-semester panic bumps in traffic, you know you're
nailing it! 

I see that there's a car, the R Code Mustang, that adding for gets rid
of. 

Thanks for getting me back on a topic that I had given up on!

Bob

-Original Message-
From: r-help-boun...@r-project.org
[mailto:r-help-boun...@r-project.org]
On Behalf Of Joris Meys
Sent: Thursday, June 24, 2010 7:56 PM
To: Dario Solari
Cc: r-help@r-project.org
Subject: Re: [R] Popularity of R, SAS, SPSS, Stata...

Nice idea, but quite sensitive to search terms, if you compare your
result on ... code with ... code for:
http://www.google.com/insights/search/#q=r%20code%20for%2Csas%20code%20
f
or%2Cspss%20code%20forcmpt=q

On Thu, Jun 24, 2010 at 10:48 PM, Dario Solari dario.sol...@gmail.com
wrote:
 First: excuse for my english

 My opinion: a useful font for measuring popoularity can be Google
 Insights for Search - http://www.google.com/insights/search/#

 Every person using a software like R, SAS, SPSS needs first to learn
 it. So probably he make a web-search for a manual, a tutorial, a
 guide. One can measure the share of this kind of serach query.
 This kind of results can be useful to determine trends of
 popularity.

 Example 1: R tutorial/manual/guide, SAS tutorial/manual/guide,
 SPSS tutorial/manual/guide

http://www.google.com/insights/search/#q=%22r%20tutorial%22%2B%22r%20ma
n
ual%22%2B%22r%20guide%22%2B%22r%20vignette%22%2C%22spss%20tutorial%22%2
B
%22spss%20manual%22%2B%22spss%20guide%22%2C%22sas%20tutorial%22%2B%22sa
s
%20manual%22%2B%22sas%20guide%22cmpt=q

 Example 2: R software, SAS software, SPSS software

http://www.google.com/insights/search/#q=%22r%20software%22%2C%22spss%2
0
software%22%2C%22sas%20software%22cmpt=q

 Example 3: R code, SAS code, SPSS code

http://www.google.com/insights/search/#q=%22r%20code%22%2C%22spss%20cod
e
%22%2C%22sas%20code%22cmpt=q

 Example 4: R graph, SAS graph, SPSS graph

http://www.google.com/insights/search/#q=%22r%20graph%22%2C%22spss%20gr
a
ph%22%2C%22sas%20graph%22cmpt=q

 Example 5: R regression, SAS regression, SPSS regression

http://www.google.com/insights/search/#q=%22r%20regression%22%2C%22spss
%
20regression%22%2C%22sas%20regression%22cmpt=q

 Some example are cross-software (learning needs - Example1), other
can
 be biased by the tarditional use of that software (in SPSS usually
you
 don't manipulate graph, i think)

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-
guide.html
 and provide commented, minimal, self-contained, reproducible code.




--
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-
guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] lattice legend

2010-06-25 Thread sethandmelva



The solution Felix suggested worked: 

It was indeed helpful to include the line 

par.setttings=list(superpose.symbol=sup.sym) 



while using.auto key with a customized symbol list in lattice. 



Thanks Felix! 



Seth 

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Modelling Crystal Growth

2010-06-25 Thread mplar1


Dear all,

I would like to hear from anyone who has experience using R to simulate and
visualise the formation and growth of crystals.

Thank you.

mpl
-- 
View this message in context: 
http://r.789695.n4.nabble.com/Modelling-Crystal-Growth-tp2268746p2268746.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Forcing scalar multiplication.

2010-06-25 Thread Kjetil Halvorsen

?sweep

On Fri, Jun 25, 2010 at 2:43 PM,  rkevinbur...@charter.net wrote:
 I am trying to check the results from an Eigen decomposition and I need to 
 force a scalar multiplication. The fundamental equation is: Ax = lx. Where 
 'l' is the eigen value and x is the eigen vector corresponding to the 
 eigenvalue. 'R' returns the eigenvalues as a vector (e - eigen(A); 
 e$values). So in order to 'check' the result I would multiply the eigenvalues 
 ('l') by the eigenvectors. But unless I do it one by one (say e$values[1] * 
 e$vectors[,1]) 'R' tries a matrix multiplication and that is not what I want. 
  I would like a matrix that is formed by the SCALAR multiplication of each of 
 the values by the corresponding eigenvector. How can I force such a 
 multiplication?

 Thank you.

 Kevin

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Popularity of R, SAS, SPSS, Stata...

2010-06-25 Thread Muenchen, Robert A (Bob)

-Original Message-
From: r-help-boun...@r-project.org
[mailto:r-help-boun...@r-project.org]
On Behalf Of Muenchen, Robert A (Bob)
Sent: Friday, June 25, 2010 3:08 PM
To: Joris Meys; Dario Solari
Cc: r-help@r-project.org
Subject: Re: [R] Popularity of R, SAS, SPSS, Stata...

I had taken the opposite tack with Google Trends by subtracting
keywords
like:
SAS -shoes -airlines -sonar...
but never got as good results as that beautiful X code for search.
When you see the end-of-semester panic bumps in traffic, you know
you're
nailing it!

I have to eat those words already. The R code for search that showed a
peak every December did not have quotes around it, so it was searching
for those three words not the complete phrase. When you add the quotes,
the peaks vanish. 

Once you go the phrase route, you gain precision but end up with zero
counts on various phrases. I avoided that by combining them with + to
get enough to plot. The resulting graph shows SAS dominant until
mid-2006 when SPSS takes the top position, followed by R, SAS, Stata in
order:

http://www.google.com/insights/search/#q=%22r%20code%20for%22%2B%22r%20m
anual%22%2B%22r%20tutorial%22%2B%22r%20graph%22%2C%22sas%20code%20for%22
%2B%22sas%20manual%22%2B%22sas%20tutorial%22%2B%22sas%20graph%22%2C%22sp
ss%20code%20for%22%2B%22spss%20manual%22%2B%22spss%20tutorial%22%2B%22sp
ss%20graph%22%2C%22stata%20code%20for%22%2B%22stata%20manual%22%2B%22sta
ta%20tutorial%22%2B%22stata%20graph%22%2C%22s-plus%20code%20for%22%2B%22
s-plus%20manual%22%2Bs-plus%20tutorial%22%2B%22s-plus%20graph%22cmpt=q

This might be a good one to add to http://r4stats.com/popularity 

Bob

I see that there's a car, the R Code Mustang, that adding for gets
rid
of.

Thanks for getting me back on a topic that I had given up on!

Bob

-Original Message-
From: r-help-boun...@r-project.org
[mailto:r-help-boun...@r-project.org]
On Behalf Of Joris Meys
Sent: Thursday, June 24, 2010 7:56 PM
To: Dario Solari
Cc: r-help@r-project.org
Subject: Re: [R] Popularity of R, SAS, SPSS, Stata...

Nice idea, but quite sensitive to search terms, if you compare your
result on ... code with ... code for:
http://www.google.com/insights/search/#q=r%20code%20for%2Csas%20code%2
0
f
or%2Cspss%20code%20forcmpt=q

On Thu, Jun 24, 2010 at 10:48 PM, Dario Solari
dario.sol...@gmail.com
wrote:
 First: excuse for my english

 My opinion: a useful font for measuring popoularity can be Google
 Insights for Search - http://www.google.com/insights/search/#

 Every person using a software like R, SAS, SPSS needs first to learn
 it. So probably he make a web-search for a manual, a tutorial, a
 guide. One can measure the share of this kind of serach query.
 This kind of results can be useful to determine trends of
 popularity.

 Example 1: R tutorial/manual/guide, SAS tutorial/manual/guide,
 SPSS tutorial/manual/guide

http://www.google.com/insights/search/#q=%22r%20tutorial%22%2B%22r%20m
a
n
ual%22%2B%22r%20guide%22%2B%22r%20vignette%22%2C%22spss%20tutorial%22%
2
B
%22spss%20manual%22%2B%22spss%20guide%22%2C%22sas%20tutorial%22%2B%22s
a
s
%20manual%22%2B%22sas%20guide%22cmpt=q

 Example 2: R software, SAS software, SPSS software

http://www.google.com/insights/search/#q=%22r%20software%22%2C%22spss%
2
0
software%22%2C%22sas%20software%22cmpt=q

 Example 3: R code, SAS code, SPSS code

http://www.google.com/insights/search/#q=%22r%20code%22%2C%22spss%20co
d
e
%22%2C%22sas%20code%22cmpt=q

 Example 4: R graph, SAS graph, SPSS graph

http://www.google.com/insights/search/#q=%22r%20graph%22%2C%22spss%20g
r
a
ph%22%2C%22sas%20graph%22cmpt=q

 Example 5: R regression, SAS regression, SPSS regression

http://www.google.com/insights/search/#q=%22r%20regression%22%2C%22sps
s
%
20regression%22%2C%22sas%20regression%22cmpt=q

 Some example are cross-software (learning needs - Example1), other
can
 be biased by the tarditional use of that software (in SPSS usually
you
 don't manipulate graph, i think)

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-
guide.html
 and provide commented, minimal, self-contained, reproducible code.

--
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-
guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-

Re: [R] exists() and functions

2010-06-25 Thread Jonathan Greenberg

Always nice to answer my own question 3 minutes later.  The missing()
function does what I want.  Still, why DOES this exists() statement
fail?  Do functions auto create the variables once they are called,
regardless of whether or not they are assigned?

--j

On Fri, Jun 25, 2010 at 1:05 PM, Jonathan Greenberg
greenb...@ucdavis.edu wrote:
 I'm a bit confused about how exists() work within a function -- I want
 to test for unassigned variables, but I'm doing tests in the main
 environment to figure out the function, so the variables DO exist in
 the parent environment of a function call.

 Why does:
 myfunction - function(variable_outside_function)
 {
        print(exists(variable_outside_function,inherit=FALSE))
        print(exists(another_variable_outside_function,inherit=FALSE))
 }

 myfunction()

 Return:
 [1] TRUE
 [1] FALSE

 I didn't assign anything to variable_outside_function, so I'm unclear
 why it thinks it exists...

 --j


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] exists() and functions

2010-06-25 Thread Jonathan Greenberg

I'm a bit confused about how exists() work within a function -- I want
to test for unassigned variables, but I'm doing tests in the main
environment to figure out the function, so the variables DO exist in
the parent environment of a function call.

Why does:
myfunction - function(variable_outside_function)
{
print(exists(variable_outside_function,inherit=FALSE))
print(exists(another_variable_outside_function,inherit=FALSE))
}

myfunction()

Return:
[1] TRUE
[1] FALSE

I didn't assign anything to variable_outside_function, so I'm unclear
why it thinks it exists...

--j

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Export Results

2010-06-25 Thread Pedro Mota Veiga


Hi R users,
How can I automatically export results and graphs to a file?
Thanks in advance

Pedro Mota Veiga

-- 
View this message in context: 
http://r.789695.n4.nabble.com/Export-Results-tp2268622p2268622.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Lattice plotting question

2010-06-25 Thread David Warren

Hi all,

 I'm working on some plots using lattice (R 2.10.1), and have entered
the polish phase.  I've produced a satisfactory pair of xyplots (
http://imgur.com/EyXGi.png), but would like to align the y-axes of the top
and bottom plots.  I assume that I need to adjust axis padding or something,
but I can't figure this one out.  Thanks for any help!

Dave

-- 
Post-doctoral Fellow
Neurology Department
University of Iowa Hospitals and Clinics
davideugenewar...@gmail.com

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Diebold Mariano

2010-06-25 Thread Brajkovic J.

Hello, 
I am trying to calculate Diebold Mariano test statistic (DM) using dm.test 
module. I also try to do the same thing with STATA and I get vastly different 
results (4.5 vs 25). Does someone have experience with this module? 

I tried to calculate the DM statistic manually. If by “d” I define the 
difference of squared forecast errors for two models, then 
DM=mean(d)/sqrt(long_run_var(d)). To calculate long run variance of d I use 
newey west standard errors. What I don’t understand in newey west command is 
meaning of “prewhite”. Any help? 

Thanks!

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Export Results

2010-06-25 Thread Henrique Dallazuanna

See ?Sweave

On Fri, Jun 25, 2010 at 12:58 PM, Pedro Mota Veiga motave...@net.sapo.ptwrote:


 Hi R users,
 How can I automatically export results and graphs to a file?
 Thanks in advance

 Pedro Mota Veiga

 --
 View this message in context:
 http://r.789695.n4.nabble.com/Export-Results-tp2268622p2268622.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Henrique Dallazuanna
Curitiba-Paraná-Brasil
25° 25' 40 S 49° 16' 22 O

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Trying to tile wireframe plots (using lattice package)

2010-06-25 Thread Magnus Torfason

Thanks, that was the pointer I needed. I'd tried the split parameter but 
didn't realize that it doesn't work well within wireframe() itself, 
rather, I had to call print.trellis() directly using the trellis object 
that wireframe() returns if one assigns it to something.


After that, it was pretty straightforward. One issue I found surprising 
was that you must pass more=TRUE to the call _before_ you want to add 
more, rather than adding it to the call that is actually supposed to 
draw onto a pre-existing canvas. But that was a quick fix. Here is code 
that worked for me.


## Example begins
top.left = wireframe(volcano)
top.right= wireframe(volcano, shade = TRUE)
bottom.left  = wireframe(volcano, shade = TRUE,
aspect = c(61/87, 0.4), )
bottom.right = wireframe(volcano, shade = TRUE,
aspect = c(61/87, 0.4), light.source = c(10,0,10))
print(top.left , split=c(1,1,2,2) , more=TRUE )
print(top.right, split=c(2,1,2,2) , more=TRUE )
print(bottom.left  , split=c(1,2,2,2) , more=TRUE)
print(bottom.right , split=c(2,2,2,2) )
## Example ends

Thanks again!

Magnus

On 6/25/2010 2:59 PM, Greg Snow wrote:

The layout function is base graphics, wireframe from lattice is

 grid based and they don't play well together without extra effort.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] fatal error: unable to restore saved data

2010-06-25 Thread Albert Lee, Ph.D.

I just installed the R 2.11.1 version on my computer and I encountered a fatal 
error: Unable to restore saved  data in .RData and kick me out of R right 
away.  I still can run 2.10.2.  There is no package called rattle

I checked various posts regarding this error.  I still can't get it to work.  I 
removed two files that had .rdata extension and still does not work.  Any 
suggestion?

Please advise.

How do you check your current working directory?

Albert




Confidentiality Notice: This communication, and any file...{{dropped:12}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] fatal error: unable to restore saved data

2010-06-25 Thread Phil Spector


Albert -
   The message refers to a file specifically called .RData. 
Files with subscripts of .rdata are not related.

   You can see your current working directory by typing

getwd()

at the R prompt.

   I'm not sure where rattle enters into the picture.

- Phil Spector
 Statistical Computing Facility
 Department of Statistics
 UC Berkeley
 spec...@stat.berkeley.edu


On Fri, 25 Jun 2010, Albert Lee, Ph.D. wrote:


I just installed the R 2.11.1 version on my computer and I encountered a fatal error: Unable 
to restore saved  data in .RData and kick me out of R right away.  I still can run 2.10.2.  
There is no package called rattle

I checked various posts regarding this error.  I still can't get it to work.  I 
removed two files that had .rdata extension and still does not work.  Any 
suggestion?

Please advise.

How do you check your current working directory?

Albert




Confidentiality Notice: This communication, and any file...{{dropped:12}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Average 2 Columns when possible, or return available value

2010-06-25 Thread emorway


Forum, 

Using the following data:

DF-read.table(textConnection(A B
22.60 NA
 NA NA
 NA NA
 NA NA
 NA NA
 NA NA
 NA NA
 NA NA
102.00 NA
 19.20 NA
 19.20 NA
 NA NA
 NA NA
 NA NA
 11.80 NA
 7.62 NA
 NA NA
 NA NA
 NA NA
 NA NA
 NA NA
 75.00 NA
 NA NA
 18.30 18.2
 NA NA
 NA NA
 8.44 NA
 18.00 NA
 NA NA
 12.90 NA),header=T)
closeAllConnections()

The second column is a duplicate reading of the first column, and when two
values are available, I would like to average column 1 and 2 (example code
below).  But if there is only one reading, I would like to retain it, but I
haven't found a good way to exclude NA's using the following code:

t(as.matrix(aggregate(t(as.matrix(DF)),list(rep(1:1,each=2)),mean)[,-1]))

Currently, row 24 is the only row with a returned value.  I'd like the 
result to return column A if it is the only available value, and average
where possible.  Of course, if both columns are NA, NA is the only possible
result.

The result I'm after would look like this (row 24 is an avg):

 22.60 
NA
NA
NA
NA
NA
NA
NA
102.00
 19.20
 19.20
NA
NA
NA
 11.80
  7.62
NA
NA
NA
NA
NA
 75.00
NA
 18.25
NA
NA
  8.44
 18.00
NA
 12.90

This is a small example from a much larger data frame, so if you're
wondering what the deal is with list(), that will come into play for the
larger problem I'm trying to solve.

Respectfully,
Eric
-- 
View this message in context: 
http://r.789695.n4.nabble.com/Average-2-Columns-when-possible-or-return-available-value-tp2269049p2269049.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Average 2 Columns when possible, or return available value

2010-06-25 Thread Phil Spector


Eric -
  What you're describing is taking the mean of each row while
ignoring missing values:


apply(DF,1,mean,na.rm=TRUE)

 [1]  22.60NaNNaNNaNNaNNaNNaNNaN 102.00  19.20
[11]  19.20NaNNaNNaN  11.80   7.62NaNNaNNaNNaN
[21]NaN  75.00NaN  18.00NaN  12.90

  If this isn't suitable for your larger problem, please describe that
problem in greater detail.

- Phil Spector
 Statistical Computing Facility
 Department of Statistics
 UC Berkeley
 spec...@stat.berkeley.edu


On Fri, 25 Jun 2010, emorway wrote:



Forum,

Using the following data:

DF-read.table(textConnection(A B
22.60 NA
NA NA
NA NA
NA NA
NA NA
NA NA
NA NA
NA NA
102.00 NA
19.20 NA
19.20 NA
NA NA
NA NA
NA NA
11.80 NA
7.62 NA
NA NA
NA NA
NA NA
NA NA
NA NA
75.00 NA
NA NA
18.30 18.2
NA NA
NA NA
8.44 NA
18.00 NA
NA NA
12.90 NA),header=T)
closeAllConnections()

The second column is a duplicate reading of the first column, and when two
values are available, I would like to average column 1 and 2 (example code
below).  But if there is only one reading, I would like to retain it, but I
haven't found a good way to exclude NA's using the following code:

t(as.matrix(aggregate(t(as.matrix(DF)),list(rep(1:1,each=2)),mean)[,-1]))

Currently, row 24 is the only row with a returned value.  I'd like the
result to return column A if it is the only available value, and average
where possible.  Of course, if both columns are NA, NA is the only possible
result.

The result I'm after would look like this (row 24 is an avg):

22.60
   NA
   NA
   NA
   NA
   NA
   NA
   NA
102.00
19.20
19.20
   NA
   NA
   NA
11.80
 7.62
   NA
   NA
   NA
   NA
   NA
75.00
   NA
18.25
   NA
   NA
 8.44
18.00
   NA
12.90

This is a small example from a much larger data frame, so if you're
wondering what the deal is with list(), that will come into play for the
larger problem I'm trying to solve.

Respectfully,
Eric
--
View this message in context: 
http://r.789695.n4.nabble.com/Average-2-Columns-when-possible-or-return-available-value-tp2269049p2269049.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Average 2 Columns when possible, or return available value

2010-06-25 Thread Joshua Wiley

Hello Eric,

I am not sure how your need to use list() will fit in with this, but
for your sample data, this will do the trick.

matrix(rowMeans(DF, na.rm=TRUE), ncol=1)

HTH,

Josh

On Fri, Jun 25, 2010 at 4:08 PM, emorway emor...@engr.colostate.edu wrote:

 Forum,

 Using the following data:

 DF-read.table(textConnection(A B
 22.60 NA
  NA NA
  NA NA
  NA NA
  NA NA
  NA NA
  NA NA
  NA NA
 102.00 NA
  19.20 NA
  19.20 NA
  NA NA
  NA NA
  NA NA
  11.80 NA
  7.62 NA
  NA NA
  NA NA
  NA NA
  NA NA
  NA NA
  75.00 NA
  NA NA
  18.30 18.2
  NA NA
  NA NA
  8.44 NA
  18.00 NA
  NA NA
  12.90 NA),header=T)
 closeAllConnections()

 The second column is a duplicate reading of the first column, and when two
 values are available, I would like to average column 1 and 2 (example code
 below).  But if there is only one reading, I would like to retain it, but I
 haven't found a good way to exclude NA's using the following code:

 t(as.matrix(aggregate(t(as.matrix(DF)),list(rep(1:1,each=2)),mean)[,-1]))

 Currently, row 24 is the only row with a returned value.  I'd like the
 result to return column A if it is the only available value, and average
 where possible.  Of course, if both columns are NA, NA is the only possible
 result.

 The result I'm after would look like this (row 24 is an avg):

  22.60
    NA
    NA
    NA
    NA
    NA
    NA
    NA
 102.00
  19.20
  19.20
    NA
    NA
    NA
  11.80
  7.62
    NA
    NA
    NA
    NA
    NA
  75.00
    NA
  18.25
    NA
    NA
  8.44
  18.00
    NA
  12.90

 This is a small example from a much larger data frame, so if you're
 wondering what the deal is with list(), that will come into play for the
 larger problem I'm trying to solve.

 Respectfully,
 Eric
 --
 View this message in context: 
 http://r.789695.n4.nabble.com/Average-2-Columns-when-possible-or-return-available-value-tp2269049p2269049.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles
http://www.joshuawiley.com/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Average 2 Columns when possible, or return available value

2010-06-25 Thread Joshua Wiley

btw, if you just wanted your exact code to work:

t(as.matrix(aggregate(t(as.matrix(DF)),list(rep(1:1,each=2)),mean,
na.rm=TRUE)[,-1]))

You will get NaNs rather than NAs where you are missing from both
rows, but that should not be a real issue.

snip

-- 
Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles
http://www.joshuawiley.com/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Euclidean Distance Matrix Analysis (EDMA) in R?

2010-06-25 Thread Joris Meys

Thanks for the link, very interesting book. Yet, I couldn't find the
part about EDMA. It would have surprised me anyway, as the input of
multidimensional scaling is one matrix with euclidean distances
between your observations, whereas in EDMA the data consist of a
number of distance matrices.

Quite a different thing if you ask me. Neither cmdscale nor isoMDS or
its derivated functions (eg metaMDS in the vegan package) are going to
be of any help.

Now I come to think of it, vegan has a procrustes function, but I'm
not sure if it is generalized to be of use in EDMA.

Cheers
Joris

On Fri, Jun 25, 2010 at 7:42 PM, Kjetil Halvorsen
kjetilbrinchmannhalvor...@gmail.com wrote:
 There is a freely downloadable and very relevant ( readable) book at
 https://ccrma.stanford.edu/~dattorro/mybook.html
 Convex Optimization and Euclidean Distance geometry, and it indeed names 
 EDMA
 as a form of multidimensional scaling (or maybe in the oposite way).
 You should have a look
 at the codes for multidimensional scaling in R.

 Kjetil

 On Fri, Jun 25, 2010 at 6:25 AM, gokhanocakoglu ocako...@uludag.edu.tr 
 wrote:

 thanks for your interests Joris



 Gokhan OCAKOGLU
 Uludag University
 Faculty of Medicine
 Department of Biostatistics
 http://www20.uludag.edu.tr/~biostat/ocakoglui.htm

 --
 View this message in context: 
 http://r.789695.n4.nabble.com/Euclidean-Distance-Matrix-Analysis-EDMA-in-R-tp2266797p2268257.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Lattice plotting question

2010-06-25 Thread Felix Andrews

ylim = extendrange(c(0,100)) ?

On 26 June 2010 01:42, David Warren davideugenewar...@gmail.com wrote:
 Hi all,

     I'm working on some plots using lattice (R 2.10.1), and have entered
 the polish phase.  I've produced a satisfactory pair of xyplots (
 http://imgur.com/EyXGi.png), but would like to align the y-axes of the top
 and bottom plots.  I assume that I need to adjust axis padding or something,
 but I can't figure this one out.  Thanks for any help!

 Dave

 --
 Post-doctoral Fellow
 Neurology Department
 University of Iowa Hospitals and Clinics
 davideugenewar...@gmail.com

        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Felix Andrews / 安福立
Integrated Catchment Assessment and Management (iCAM) Centre
Fenner School of Environment and Society [Bldg 48a]
The Australian National University
Canberra ACT 0200 Australia
M: +61 410 400 963
T: + 61 2 6125 4670
E: felix.andr...@anu.edu.au
CRICOS Provider No. 00120C
-- 
http://www.neurofractal.org/felix/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] use a data frame whose name is stored as a string variable?

2010-06-25 Thread Seth


Hi,
Let's say I have a data frame (called example) with numeric values stored
(columns V1 and V2).  I also have a string variable storing this name

x1-example

Is there a way to use the variable x so that R knows that I want the
specified action to occur on the data frame?  For example, summary (x) would
return a summary of the data frame?

I am considering this because I need to compare many data frames within 2
nested for loops.  In the first iteration of the loop I could concatenate x
and 1 and then use it to represent the data frame.  I'm open to a better
solution.  Thanks, Seth Myers 
-- 
View this message in context: 
http://r.789695.n4.nabble.com/use-a-data-frame-whose-name-is-stored-as-a-string-variable-tp2269095p2269095.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Label Values in levelplot

2010-06-25 Thread Ben Wilkinson

I am trying to add labels equal to the value in a levelplot. I believe that
panel may be the way to go but cannot understand the examples.

In the following example:

X,Y,Z
A,M,100
A,M,200
B,N,150
B,N,225

I would like to label each of the rectangles 100,200,150 and 225 and colour
according to the value

The colouring is achieved by

levelplot(z ~ x *y , data) but then I get stuck with the labels

Thanks very much for your help

Ben

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] use a data frame whose name is stored as a string variable?

2010-06-25 Thread Joshua Wiley

On Fri, Jun 25, 2010 at 5:10 PM, Seth sjmy...@syr.edu wrote:

Hi,
Let's say I have a data frame (called example) with numeric values stored
(columns V1 and V2). I also have a string variable storing this name

x1-example

Is there a way to use the variable x so that R knows that I want the
specified action to occur on the data frame? For example, summary (x) would
return a summary of the data frame?

?get

For example:

get(x) # one object
mget(x, envir=.GlobalEnv) # for multiple objects
## just change the environment if that is not where they are located

I am considering this because I need to compare many data frames within 2
nested for loops. In the first iteration of the loop I could concatenate x
and 1 and then use it to represent the data frame. I'm open to a better
solution. Thanks, Seth Myers

It is hard to give a better solution without the rest of your code,
but there often are cleaner ways than for loops. One solution that
avoids the character vector is to put the data frames together in
list.

Best regards,

Josh

--
View this message in context:
http://r.789695.n4.nabble.com/use-a-data-frame-whose-name-is-stored-as-a-string-variable-tp2269095p2269095.html
Sent from the R help mailing list archive at Nabble.com.

--
Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles
http://www.joshuawiley.com/

Re: [R] Average 2 Columns when possible, or return available value

2010-06-25 Thread Joris Meys

Just want to add that if you want to clean out the NA rows in a matrix
or data frame, take a look at ?complete.cases. Can be handy to use
with big datasets. I got curious, so I just ran the codes given here
on a big dataset, before and after removing NA rows. I have to be
honest, this is surely an illustration of the power of rowMeans. I'm
amazed myself.

DF - data.frame(
  A=rep(DF$A,1),
  B=rep(DF$B,1)
)

 system.time(apply(DF,1,mean,na.rm=TRUE))
   user  system elapsed
  13.260.06   13.46

 system.time(matrix(rowMeans(DF, na.rm=TRUE), ncol=1))
   user  system elapsed
   0.030.000.03

 system.time(t(as.matrix(aggregate(t(as.matrix(DF)),list(rep(1:1,each=2)),mean,
+ na.rm=TRUE)[,-1]))
+ )

Timing stopped at: 227.84 1.03 249.31  -- I got impatient and pressed the escape

 DF - DF[complete.cases(DF),]

 system.time(apply(DF,1,mean,na.rm=TRUE))
   user  system elapsed
   0.390.000.39

 system.time(matrix(rowMeans(DF, na.rm=TRUE), ncol=1))
   user  system elapsed
   0.010.000.02

 system.time(t(as.matrix(aggregate(t(as.matrix(DF)),list(rep(1:1,each=2)),mean,
+ na.rm=TRUE)[,-1]))
+ )
   user  system elapsed
  10.010.07   13.40

Cheers
Joris


On Sat, Jun 26, 2010 at 1:08 AM, emorway emor...@engr.colostate.edu wrote:

 Forum,

 Using the following data:

 DF-read.table(textConnection(A B
 22.60 NA
  NA NA
  NA NA
  NA NA
  NA NA
  NA NA
  NA NA
  NA NA
 102.00 NA
  19.20 NA
  19.20 NA
  NA NA
  NA NA
  NA NA
  11.80 NA
  7.62 NA
  NA NA
  NA NA
  NA NA
  NA NA
  NA NA
  75.00 NA
  NA NA
  18.30 18.2
  NA NA
  NA NA
  8.44 NA
  18.00 NA
  NA NA
  12.90 NA),header=T)
 closeAllConnections()

 The second column is a duplicate reading of the first column, and when two
 values are available, I would like to average column 1 and 2 (example code
 below).  But if there is only one reading, I would like to retain it, but I
 haven't found a good way to exclude NA's using the following code:

 t(as.matrix(aggregate(t(as.matrix(DF)),list(rep(1:1,each=2)),mean)[,-1]))

 Currently, row 24 is the only row with a returned value.  I'd like the
 result to return column A if it is the only available value, and average
 where possible.  Of course, if both columns are NA, NA is the only possible
 result.

 The result I'm after would look like this (row 24 is an avg):

  22.60
    NA
    NA
    NA
    NA
    NA
    NA
    NA
 102.00
  19.20
  19.20
    NA
    NA
    NA
  11.80
  7.62
    NA
    NA
    NA
    NA
    NA
  75.00
    NA
  18.25
    NA
    NA
  8.44
  18.00
    NA
  12.90

 This is a small example from a much larger data frame, so if you're
 wondering what the deal is with list(), that will come into play for the
 larger problem I'm trying to solve.

 Respectfully,
 Eric
 --
 View this message in context: 
 http://r.789695.n4.nabble.com/Average-2-Columns-when-possible-or-return-available-value-tp2269049p2269049.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] All a column to a data frame with a specific condition

2010-06-25 Thread Yi

Hi, folks,

Please first look at the codes:

plan_a=c('apple','orange','apple','apple','pear','bread')
plan_b=c('bread','bread','orange','bread','bread','yogurt')
value=1:6
data=data.frame(plan_a,plan_b,value)
library(plyr)
library(reshape)
mm=melt(data, id=c('plan_a','plan_b'))
sum_plan_a=cast(mm,plan_a~variable,sum)

### I would like to add a new column to the data.frame named 'data',  with
the same sum of value for the same type of plan_a
### The result should come up like this:

   plan_a  plan_b  value  sum_plan_a
1  apple  bread  18
2 orange  bread 22
3  apple orange 38
4  apple  bread  48
5   pear  bread  5 5
6  bread yogurt 6 6

Any tips?

Thank you.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] predict newdata question

2010-06-25 Thread Felipe Carrillo

Hi:
I am using a subset of the below dataset to predict PRED_SUIT for
the whole dataset but I am having trouble with 'newdata'. The model
was created with 153 records and want to predict for 208 records. 

wolf2 - structure(list(gridcell = c(367L, 444L, 533L, 587L, 598L, 609L, 
620L, 629L, 641L, 651L, 662L, 674L, 684L, 695L, 738L, 748L, 804L, 
805L, 872L, 919L, 929L, 938L, 950L, 958L, 966L, 975L, 976L, 985L, 
994L, 1006L, 1015L, 1019L, 1022L, 1025L, 1027L, 1028L, 1029L, 
1032L, 1040L, 1043L, 1050L, 1053L, 1061L, 1070L, 1074L, 1078L, 
1080L, 1082L, 1083L, 1084L, 1090L, 1095L, 1096L, 1099L, 1106L, 
1116L, 1124L, 1125L, 1130L, 1133L, 1134L, 1137L, 1138L, 1139L, 
1145L, 1150L, 1151L, 1154L, 1161L, 1162L, 1163L, 1171L, 1175L, 
1179L, 1181L, 1184L, 1188L, 1189L, 1193L, 1194L, 1199L, 1204L, 
1207L, 1214L, 1222L, 1231L, 1232L, 1241L, 1250L, 1256L, 1275L, 
1279L, 378L, 421L, 432L, 480L, 492L, 501L, 511L, 522L, 545L, 
555L, 566L, 575L, 705L, 716L, 728L, 760L, 774L, 785L, 794L, 816L, 
831L, 841L, 850L, 860L, 861L, 873L, 889L, 899L, 908L, 917L, 931L, 
933L, 942L, 944L, 954L, 963L, 971L, 986L, 988L, 996L, 997L, 1007L, 
1009L, 1014L, 1041L, 1052L, 1062L, 1064L, 1069L, 1107L, 1108L, 
1117L, 1120L, 1172L, 1216L, 1225L, 1239L, 1245L, 1265L, 1287L, 
1293L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 
14L, 15L, 16L, 17L, 18L, 19L, 20L, 21L, 22L, 23L, 24L, 25L, 26L, 
27L, 28L, 29L, 30L, 31L, 32L, 33L, 34L, 35L, 36L, 37L, 38L, 39L, 
40L, 41L, 42L, 43L, 44L, 45L, 46L, 47L, 48L, 49L, 50L, 51L, 52L, 
53L, 54L, 55L), MAJOR_LC = c(42L, 42L, 42L, 42L, 42L, 42L, 42L, 
42L, 42L, 42L, 42L, 42L, 42L, 42L, 42L, 42L, 42L, 42L, 42L, 51L, 
51L, 51L, 42L, 42L, 42L, 71L, 51L, 51L, 51L, 71L, 71L, 51L, 42L, 
71L, 42L, 51L, 51L, 42L, 51L, 42L, 51L, 42L, 51L, 51L, 51L, 42L, 
51L, 42L, 51L, 71L, 42L, 51L, 42L, 42L, 51L, 51L, 42L, 51L, 42L, 
42L, 51L, 51L, 51L, 71L, 51L, 42L, 51L, 42L, 51L, 71L, 42L, 51L, 
42L, 42L, 51L, 51L, 42L, 51L, 51L, 71L, 82L, 51L, 42L, 51L, 51L, 
42L, 82L, 83L, 51L, 42L, 42L, 42L, 42L, 42L, 42L, 42L, 42L, 42L, 
42L, 42L, 42L, 42L, 42L, 42L, 42L, 42L, 42L, 42L, 42L, 42L, 42L, 
42L, 42L, 42L, 42L, 42L, 51L, 42L, 42L, 42L, 42L, 42L, 42L, 42L, 
42L, 42L, 42L, 51L, 42L, 42L, 42L, 42L, 42L, 42L, 42L, 42L, 42L, 
42L, 42L, 42L, 42L, 42L, 42L, 42L, 42L, 42L, 42L, 42L, 42L, 71L, 
51L, 51L, 51L, 31L, 81L, 41L, 42L, 41L, 42L, 41L, 42L, 42L, 42L, 
42L, 42L, 42L, 42L, 42L, 42L, 42L, 42L, 42L, 42L, 81L, 81L, 42L, 
42L, 42L, 51L, 42L, 42L, 42L, 42L, 42L, 42L, 42L, 42L, 42L, 42L, 
42L, 42L, 51L, 42L, 31L, 42L, 81L, 43L, 41L, 42L, 42L, 42L, 42L, 
42L, 42L, 42L, 42L, 42L, 42L), RD_DENSITY = c(1.046, 1.626, 2.356, 
1.912, 0.203, 0.049, 0.055, 1.96, 1.515, 0.361, 0.183, 0.022, 
1.702, 0.8, 1.356, 0.216, 0.509, 0.915, 0.689, 0.817, 0.93, 0.808, 
0.121, 0.026, 0.283, 1.256, 0.56, 0.881, 0.649, 1.074, 0.851, 
0.758, 0.375, 0.554, 1.111, 0.783, 1.113, 0.619, 0.587, 0.975, 
0.892, 0.162, 0.714, 1.582, 0.408, 0.227, 1.816, 1.586, 0.888, 
1.247, 2.016, 0.457, 0.816, 0.933, 0.894, 2.101, 0.091, 2.265, 
0.389, 0.343, 1.718, 0.738, 0.597, 1.098, 1.865, 1.082, 0.654, 
1.104, 0.43, 0.418, 0.164, 1.068, 0.708, 0.011, 1.61, 1.143, 
0.124, 2.039, 0.547, 0.794, 1.694, 0.526, 1.505, 0.861, 0.771, 
0.216, 1.018, 2.88, 0.892, 0.741, 0.437, 1.16, 0.966, 0.961, 
0.591, 2.052, 0.82, 0.638, 2.107, 3.082, 0.387, 0.716, 1.065, 
1.602, 0.93, 0.234, 0.257, 0.186, 0, 0.408, 0.914, 0.281, 0.019, 
0.13, 0.704, 0.305, 1.132, 0.347, 0, 0.252, 0.733, 0.925, 0.276, 
0.368, 0.596, 0.284, 0.158, 0.627, 0.719, 0.472, 0.264, 0.251, 
0.525, 0.231, 0.568, 0.204, 0.44, 0.466, 0.19, 0.134, 0.001, 
0.422, 0.2, 0.073, 0.528, 0, 0.42, 0.626, 0.121, 0.181, 1.324, 
1.265, 0.827, 11.611, 3.443, 5.382, 2.269, 3.677, 1.1, 4.876, 
0.003, 2.86, 2.375, 1.885, 0.044, 0.728, 1.314, 3.042, 0.469, 
0.248, 0.675, 1.91, 0.228, 4.058, 3.563, 0.801, 3.421, 0.515, 
1.945, 1.235, 1.999, 2.495, 1.193, 1.896, 1.689, 1.144, 1.028, 
0.858, 1.703, 4.009, 0.096, 1.85, 0.081, 0, 1.759, 5.549, 4.99, 
4.267, 1.792, 0.204, 2.144, 0.212, 9.263, 1.615, 3.502, 1.927, 
1.665, 2.17), WOLVES_99 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L), WOLVES_01 = c(0L, 1L, 1L, 0L, 0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 1L, 1L,

Re: [R] Average 2 Columns when possible, or return available value

2010-06-25 Thread Joshua Wiley

On Fri, Jun 25, 2010 at 5:24 PM, Joris Meys jorism...@gmail.com wrote:
 Just want to add that if you want to clean out the NA rows in a matrix
 or data frame, take a look at ?complete.cases. Can be handy to use
 with big datasets. I got curious, so I just ran the codes given here
 on a big dataset, before and after removing NA rows. I have to be
 honest, this is surely an illustration of the power of rowMeans. I'm
 amazed myself.

I was too...the documentation (?rowMeans) wasn't joking:

These functions are equivalent to use of 'apply' with 'FUN = mean' or
'FUN = sum' with appropriate margins, but are a lot faster.


 DF - data.frame(
  A=rep(DF$A,1),
  B=rep(DF$B,1)
 )

 system.time(apply(DF,1,mean,na.rm=TRUE))
   user  system elapsed
  13.26    0.06   13.46

 system.time(matrix(rowMeans(DF, na.rm=TRUE), ncol=1))
   user  system elapsed
   0.03    0.00    0.03

 system.time(t(as.matrix(aggregate(t(as.matrix(DF)),list(rep(1:1,each=2)),mean,
 + na.rm=TRUE)[,-1]))
 + )

 Timing stopped at: 227.84 1.03 249.31  -- I got impatient and pressed the 
 escape

 DF - DF[complete.cases(DF),]

 system.time(apply(DF,1,mean,na.rm=TRUE))
   user  system elapsed
   0.39    0.00    0.39

 system.time(matrix(rowMeans(DF, na.rm=TRUE), ncol=1))
   user  system elapsed
   0.01    0.00    0.02

 system.time(t(as.matrix(aggregate(t(as.matrix(DF)),list(rep(1:1,each=2)),mean,
 + na.rm=TRUE)[,-1]))
 + )
   user  system elapsed
  10.01    0.07   13.40

 Cheers
 Joris


 On Sat, Jun 26, 2010 at 1:08 AM, emorway emor...@engr.colostate.edu wrote:

 Forum,

 Using the following data:

 DF-read.table(textConnection(A B
 22.60 NA
  NA NA
  NA NA
  NA NA
  NA NA
  NA NA
  NA NA
  NA NA
 102.00 NA
  19.20 NA
  19.20 NA
  NA NA
  NA NA
  NA NA
  11.80 NA
  7.62 NA
  NA NA
  NA NA
  NA NA
  NA NA
  NA NA
  75.00 NA
  NA NA
  18.30 18.2
  NA NA
  NA NA
  8.44 NA
  18.00 NA
  NA NA
  12.90 NA),header=T)
 closeAllConnections()

 The second column is a duplicate reading of the first column, and when two
 values are available, I would like to average column 1 and 2 (example code
 below).  But if there is only one reading, I would like to retain it, but I
 haven't found a good way to exclude NA's using the following code:

 t(as.matrix(aggregate(t(as.matrix(DF)),list(rep(1:1,each=2)),mean)[,-1]))

 Currently, row 24 is the only row with a returned value.  I'd like the
 result to return column A if it is the only available value, and average
 where possible.  Of course, if both columns are NA, NA is the only possible
 result.

 The result I'm after would look like this (row 24 is an avg):

  22.60
    NA
    NA
    NA
    NA
    NA
    NA
    NA
 102.00
  19.20
  19.20
    NA
    NA
    NA
  11.80
  7.62
    NA
    NA
    NA
    NA
    NA
  75.00
    NA
  18.25
    NA
    NA
  8.44
  18.00
    NA
  12.90

 This is a small example from a much larger data frame, so if you're
 wondering what the deal is with list(), that will come into play for the
 larger problem I'm trying to solve.

 Respectfully,
 Eric
 --
 View this message in context: 
 http://r.789695.n4.nabble.com/Average-2-Columns-when-possible-or-return-available-value-tp2269049p2269049.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




 --
 Joris Meys
 Statistical consultant

 Ghent University
 Faculty of Bioscience Engineering
 Department of Applied mathematics, biometrics and process control

 tel : +32 9 264 59 87
 joris.m...@ugent.be
 ---
 Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles
http://www.joshuawiley.com/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

1 2 >

1 - 100 of 113 matches

Mail list logo