from:"David L Carlson"

Re: [R] Mantel test

2015-05-06 Thread David L Carlson

Please keep the discussion on the list. It is hard to answer your question 
without context. You give us a warning message from a function that is not in 
base R. Perhaps it is in package ade? You also don’t include anything about the 
data or the commands that produced the warning message. A dist object does not 
contain a diagonal so your comment suggests that you did not convert the matrix 
to a dist object.

David C

From: Nick Jeffery [mailto:nick.w.jeffe...@gmail.com]
Sent: Wednesday, May 6, 2015 9:34 AM
To: David L Carlson
Subject: Re: [R] Mantel test

Hi,
Thanks for the help. I get these warnings when I run the Mantel test however - 
is this because the diagonal of the matrix is all 0s? Both are symmetrical 
matrices about the diagonal line of zeroes.

Warning messages:

1: In is.euclid(m1) : Zero distance(s)

2: In is.euclid(m2) : Zero distance(s)

Thanks for your time,

Nick



On Mon, May 4, 2015 at 3:48 PM, David L Carlson 
mailto:dcarl...@tamu.edu>> wrote:
Assuming the 'matrix' format is a symmetrical distance 'matrix' stored as a 
data frame (which read.csv creates) rather a rectangular data 'matrix,' you can 
convert it to a dist object with as.dist().

?dist

---------
David L Carlson
Department of Anthropology
Texas A&M University
College Station, TX 77840-4352

-Original Message-
From: R-help 
[mailto:r-help-boun...@r-project.org<mailto:r-help-boun...@r-project.org>] On 
Behalf Of Nick Jeffery
Sent: Monday, May 4, 2015 10:49 AM
To: r-help@r-project.org<mailto:r-help@r-project.org>
Subject: [R] Mantel test

Dear R users,

I'm having trouble getting my data into R in the correct format to run a
Mantel test.

I'm testing genome size differences by genetic distances of the 28S gene
for ~30 species. I'm able to get my genome size data (as a single column of
data) into matrix and dist formats in R but the genetic distances output by
MEGA are already in 'matrix' format so I don't know how to load this CSV
file into R without it calculating new genetic distances when I convert it
to the dist form required by the test.

Thanks in advance,
Nick

--
Nick Jeffery, PhD Candidate
Integrative Biology
SCIE 1453
University of Guelph
Guelph, Ontario, Canada
[[alternative HTML version deleted]]

__
R-help@r-project.org<mailto:R-help@r-project.org> mailing list -- To 
UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Nick Jeffery, PhD Candidate
Integrative Biology
SCIE 1453
University of Guelph
Guelph, Ontario, Canada

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Mantel test

2015-05-04 Thread David L Carlson

Assuming the 'matrix' format is a symmetrical distance 'matrix' stored as a 
data frame (which read.csv creates) rather a rectangular data 'matrix,' you can 
convert it to a dist object with as.dist(). 

?dist

---------
David L Carlson
Department of Anthropology
Texas A&M University
College Station, TX 77840-4352

-Original Message-
From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Nick Jeffery
Sent: Monday, May 4, 2015 10:49 AM
To: r-help@r-project.org
Subject: [R] Mantel test

Dear R users,

I'm having trouble getting my data into R in the correct format to run a
Mantel test.

I'm testing genome size differences by genetic distances of the 28S gene
for ~30 species. I'm able to get my genome size data (as a single column of
data) into matrix and dist formats in R but the genetic distances output by
MEGA are already in 'matrix' format so I don't know how to load this CSV
file into R without it calculating new genetic distances when I convert it
to the dist form required by the test.

Thanks in advance,
Nick

-- 
Nick Jeffery, PhD Candidate
Integrative Biology
SCIE 1453
University of Guelph
Guelph, Ontario, Canada

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Results Differ in Ternary Plot Matrix of Compositional Response Variables

2015-05-01 Thread David L Carlson

Add plotMissings=FALSE to the second plot and see the plot.acomp manual page 
description of this argument:

plot(JerrittY, pch=as.numeric(JerrittX4), col=c("black","red", "dark green",
 "dark blue","dark goldenrod","dark orange","dark grey")[JerrittX4], 
      plotMissings=FALSE)


-
David L Carlson
Department of Anthropology
Texas A&M University
College Station, TX 77840-4352

-Original Message-
From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Rich Shepard
Sent: Thursday, April 30, 2015 3:57 PM
To: r-help@r-project.org
Subject: [R] Results Differ in Ternary Plot Matrix of Compositional Response 
Variables

   After hours of looking for the reason why one data set plots correctly and
another one does not I am still not seeing the reason. The only differences
I see between the two data sets is the number of discrete variables (one has
6 years, the other 7 years) and one contains zeros. I wonder if the number
of discrete variables is the issue.

   I'm sure that more experienced eyes will see the reason for the different
results and will point it out to me.

   The following data and code produce a matrix of ternary plots with the
other continuous variables represented by a dot above the top point of the
triangle:


"Year","NO3","SO4","pH","Fi","Ga","Gr","Pr","Sh"
"2005",0.60,816,7.87,0.0556,0.5370,0.1667,0.1667,0.0741
"2006",0.40,224,7.59,0.0435,0.6739,0.0870,0.1522,0.0435
"2010",0.10,571,7.81,0.0735,0.4706,0.1029,0.1912,0.1618
"2011",0.52,130,7.42,0.0462,0.5692,0.0769,0.2462,0.0615
"2012",0.42,363,7.79,0.0548,0.5205,0.0548,0.2466,0.1233
"2013",0.42,363,7.79,0.0484,0.5323,0.1129,0.2419,0.0645


# Create matrix of ternary plots of FFGs as dependent variables.
# Follows 'Analyzing Compositional Data with R' sec. 5.3; pp 122 ff
# Change stream name as necessary.
# load package from library
require(compositions)
# read in raw data
SnowRegr <- read.csv('snow-regression.dat', header=T)
# extract response variables
SnowY <- acomp(SnowRegr[,5:9])
# column headings; variables
names(SnowRegr)
# continuous explanatory co-variables
SnowCovars <- SnowRegr[,c("Year","NO3","SO4","pH")]
# first continuous co-variable
SnowX1 <- SnowCovars$NO3
# second continuous co-variable
SnowX2 <- SnowCovars$SO4
# third continuous co-variable
SnowX3 <- SnowCovars$pH
# discrete co-variable
SnowX4 <- 
factor(SnowCovars$Year,c("2005","2006","2010","2011","2012","2013"),ordered=T)
# for the discrete co-var, ANOVA not specified in unique way so contrasts must 
be specified; use the
#   treatment contrasts.
contrasts(SnowX4) <- "contr.treatment"
# save figure parameters
opar <- par(xpd=NA,no.readonly=T)
# ternary plot matrix
plot(SnowY, pch=as.numeric(SnowX4), col=c("red","dark green","dark blue","dark 
goldenrod","dark orange","dark grey")[SnowX4])
# add legend
legend(x=0.83, y=-0.165, abbreviate(levels(SnowX4), 
minlength=1),pch=as.numeric(SnowX4), col=c("red","dark green","dark blue","dark 
goldenrod","dark orange","dark grey"), ncol=2, xpd=T, bty="n", yjust=0)
# reset plot parameters
par(opar)
# unload the package
detach('package:compositions')

   This data set with eqivalent code produces plots with the other continuous
variables as bars with colors on the top points of the triangles:


"Year","NO3","SO4","pH","Fi","Ga","Gr","Pr","Sh"
"2004",1.70,2200,8.70,0.0444,0.6889,0.0222,0.,0.0222
"2005",2.50,5000,8.43,0.0182,0.5636,0.0909,0.3091,0.0182
"2006",1.80,6670,8.57,0.0370,0.6173,0.0741,0.2469,0.0247
"2010",0.54,4000,8.00,0.0870,0.6087,0.0870,0.2174,0.
"2011",2.70,4300,8.47,0.0449,0.5256,0.0897,0.2949,0.0449
"2012",0.76,595,8.21,0.,0.4231,0.0769,0.5000,0.
"2013",0.76,595,8.21,0.,0.4545,0.0455,0.4545,0.0455


# Create matrix of ternary plots of FFGs as dependent variables.
# Follows 'Analyzing Compositional Data with R' sec. 5.3; pp 122 ff
# Change stream name as necessary.
# load package from library
require(compositions)
# read in raw data
JerrittRegr <- read.csv('jerritt-regression.dat', header=T)
# extract response variables
JerrittY <- acomp(JerrittRegr[,5:9])
# column headings; variables
names(JerrittRegr)
# continuous explanatory co-variables
JerrittCovars <- JerrittRegr[,c("Year","NO3","SO4","pH")]
# firs

Re: [R] Missing axis labels

2015-05-01 Thread David L Carlson

I don't think you can tell in advance since the details of the plot are 
computed when you open the plot window and they change when you plot into the 
window. In principal you could estimate the size requirements for the axis 
labels if you know the plot window size and the character size. What gets 
plotted is also device dependent. For example if you open a window using x11(3, 
3) (I'm on windows so I haven't tried this on OS X) and produce the plot, the 
last x-axis is missing just as in the pdf file. But if you drag the window to 
make it larger, the label will appear when the device driver redraws the plot.

There is also a third option in addition to your two to getting all of the 
labels:

plot(0:100, 0:100, xaxp=c(0, 100, 4))

will plot at 0, 25, 50, 75, 100 which leaves room for the last label.

---------
David L Carlson
Department of Anthropology
Texas A&M University
College Station, TX 77840-4352



-Original Message-
From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Fisher Dennis
Sent: Friday, May 1, 2015 9:11 AM
To: r-h...@stat.math.ethz.ch
Subject: [R] Missing axis labels

R 3.2.0
OS X
This is a general question, not specific to OS X.  

Colleagues

Often, one or more values on an axis will be omitted, presumably in order to 
prevent overlap.  However, there are situations where I would like to override 
that omission.  Sample code:
pdf("labels.pdf", width=3, height=3)
plot(0:100, 0:100)
graphics.off()
Here, 100 is omitted from the x-axis and 20, 60, and 100 from the y-axis. 

Is there is automated way to detect which values will be omitted (i.e., without 
seeing the graphic)?  

If so, I see two options:
1.  change the font size
2.  force the entry, e.g., axis(1, 100, at=100)

Dennis


Dennis Fisher MD
P < (The "P Less Than" Company)
Phone: 1-866-PLessThan (1-866-753-7784)
Fax: 1-866-PLessThan (1-866-753-7784)
www.PLessThan.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Plot Title: Adjusting Position

2015-05-01 Thread David L Carlson

There are half a dozen implementations of ternary plots in as many R packages 
so it is hard to be specific. Since you are using title(), try the "further 
graphical parameters from par" mentioned in the manual page such as adj=c(x, y) 
for position (or maybe the line= argument) and cex.main= for size.

-----
David L Carlson
Department of Anthropology
Texas A&M University
College Station, TX 77840-4352


-Original Message-
From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Rich Shepard
Sent: Friday, May 1, 2015 10:00 AM
To: r-help@r-project.org
Subject: [R] Plot Title: Adjusting Position

   Plots of compositional data ternary diagrams do not accept the main label
within the plot() function, but do print the label when it is specified
within the title() function. On some of these plots I need to raise the
position of the title just enough to move the text above the top row of
diagrams.

   Applying the outer=TRUE option moves the title too high; the top half of
the text is cut off from viewing. The help file, ?title, suggests that the
line option applies to sub-titles and axis labels, not the main title.
Setting character expansion to a negative value throws an error.

   How can I either move the main title sightly higher on the figure or
slightly reduce the text size so the title does not overlap part of the
ternary diagrams?

Rich

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Editable plot

2015-04-30 Thread David L Carlson

Do not post in html. You need to change your email software so that it sends 
messages in plain text only. Look below to see why.

Your plot is edited by modifying the code you gave us to change the graph. Save 
the code in a script file, change it in any way you want and then run the code 
again to get a changed plot. You cannot edit the plot by selecting an element 
on the plot and changing its properties in some way. 

-
David L Carlson
Department of Anthropology
Texas A&M University
College Station, TX 77840-4352

-Original Message-
From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of IZHAK shabsogh 
via R-help
Sent: Thursday, April 30, 2015 2:04 AM
To: R.
Subject: [R] Editable plot


Hello,Kindly assist me on how to make the plot from the following programm to 
be editable 

x<-c(0.84,1.03,0.96)y<-c(1.30,1.46,1.48)z<-c(1.32,1.47,1.5)w<-c(0.07,0.07,0.07)r<-c(500,1000,2000)
# Graph cars using a y axis that ranges from 0 to 12plot(r,x, type="o", 
col="blue", ylim=c(0,1.5),lwd= 2, xlab = " Number of iteration",ylab=" Bias" )
# Graph trucks with red dashed line and square pointslines(r,y, type="o", 
pch=22, lty=2, col="red",lwd=2)lines(r,z, type="o", pch=22, lty=3, 
col="green",lwd=2)lines(r,w, type="o", pch=22, lty=4, col="forestgreen",lwd=2)
# Create a title with a red, bold/italic font#title(main="Estimated Bias for 
the optimal response ", col.main="red", font.main=4)
#legend("center", lty = 1:4, col = 1:4,       #legend = c("x","y", "z","w"))
text(1000, 0.15, "PM")text(1000, 1.10, "VM")text(1000, 1.52, "WMSE")text(1000, 
1.40, "LT")

Thank youIshaq

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Graphs for scientific publication ?

2015-04-30 Thread David L Carlson

More useful to the r-help list would be a reproducible example of the data you 
are using and a clear statement of what you are trying to accomplish. It is 
likely that all of your requirements can be easily met, but you spent most of 
your message talking about what you have tried without telling us where you 
want to end up. People on the list are familiar with base graphics, lattice 
graphics, and ggplot2. If you list your requirements clearly, you might end up 
with three solutions.

-
David L Carlson
Department of Anthropology
Texas A&M University
College Station, TX 77840-4352

-Original Message-
From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Bert Gunter
Sent: Thursday, April 30, 2015 1:41 PM
To: Jeremy Clark
Cc: r-help@r-project.org
Subject: Re: [R] Graphs for scientific publication ?

Jeremy:

I suggest you have a look at the latest edition of Paul Murrell's
book, "R Graphics", as you seem to be unaware that ggplot2 (as well as
a 3rd graphics paradigm, the lattice package) and base graphics are
built on 2 different and incompatible graphics engines.

Obviously, you are entitled to your opinions and graphical
predilections vary, but I do not think R-Help is a good venue for
these sorts of discussions. The R-devel list might be a better place
to discuss such matters.

Cheers,
Bert

Bert Gunter
Genentech Nonclinical Biostatistics
(650) 467-7374

"Data is not information. Information is not knowledge. And knowledge
is certainly not wisdom."
Clifford Stoll




On Thu, Apr 30, 2015 at 5:05 AM, Jeremy Clark  wrote:
> Dear All,
>
> First of all, many thanks to all R contributors for a fantastic
> program, and especially to Hadley Wickham for creating ggplot2. The
> following is intended to be a warning that, if the apparently
> superficial problems described are not sorted out, R could well find
> itself being superceded. The reason is that a new user wants to draw a
> graph, and perhaps publish in a scientific journal a graph created
> using R, well before wanting to do a complex regression (and the
> latter is relatively easy). So here goes:
>
> 1) The saga of the straight line. I implemented a geom_abline - it
> looked superb. Unfortunately I had to disable clip to allow text - now
> my abline looked ridiculous. My search found plotrix: ablineclip -
> fantastic I thought - but it applies to plot and not geom_plot. I
> switched to geom_segment - the rendering looked trash. I switched to
> geom_smooth - should work but as I don't know the x values beforehand
> I'll have to clip a new dataframe - it that a hassle ? - Yes it is !
>
> So my general question is - why isn't ggplot2 already part
> of R base - or at least if someone is to create useful packages for
> plot - perhaps a subtle hint could be made that they should also apply
> to ggplot2 (and perhaps to lattice ?? - also personally I would scrap
> qplot as an unnecessary distraction which is not easier to implement
> than ggplot). In general duplication of packages for plot and ggplot
> doesn't seem like a good idea.
>
>
> 2) The saga of the italic letter. I found, to my dismay, that to
> insert an italic letter into my plot I had to learn a whole new
> language called plotmath - which wouldn't accept normal R coding, and
> didn't even have normal control functions such as /n for a new line.
> This is ridiculous (and I'm not sure how plotmath managed to get into
> R base).
>
> So my question is, when is plotmath going to have a
> complete overhaul to allow eg. "," instead of, or as well as, ~,~, and
> normal control functions such as \n ?
>
> 3) A related question to (2) is: where is geom_textbox ?
>
> 4) Where are examples with scientific graph defaults ?  (meaning a
> two-axis graph which is publishable - I will post my own after this is
> published in a years time, but as suggested above, while the graph
> looks good the implementation of this is not pretty).
>
> Having said that - good luck with implementation - and many thanks for
> all your hard work !
>
> Yours sincerely,
>
> Abiologist
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

___

Re: [R] Problem with predict.lm()

2015-04-29 Thread David L Carlson

Since you passed a matrix to lm() and then a data.frame to predict(), predict 
can't match up what variables to use for the prediction so it falls back on the 
original data. This seems to work:

> set.seed(42)
> y <- rnorm(100)
> X <- matrix(rnorm(100*10), ncol=10)
> Xd <- data.frame(X)
> lm <- lm(y~., Xd)
> Xnew <- matrix(rnorm(100*20), ncol=10)
> Xnewd <- data.frame(Xnew)
> ynew <- predict(lm, newdata=Xnewd)
> head(ynew)
  1   2   3   4   5   6 
 0.35404067  0.14073495 -0.45442499  0.31065562 -0.02091366  0.25358175 
> head(predict(lm))
  1   2   3   4   5   6 
 0.75474817  0.06024122 -0.27221466 -0.20344713  0.20218135 -0.24045859 
>

-
David L Carlson
Department of Anthropology
Texas A&M University
College Station, TX 77840-4352

-Original Message-
From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Martin Spindler
Sent: Wednesday, April 29, 2015 9:21 AM
To: r-help@r-project.org
Subject: [R] Problem with predict.lm()

Dear all,
 
the following example somehow uses the "old data" (X) to make the predictions, 
but not the new data Xnew as intended.
 
y <- rnorm(100)
X <- matrix(rnorm(100*10), ncol=10)
lm <- lm(y~X)
Xnew <- matrix(rnorm(100*20), ncol=10)
ynew <- predict(lm, newdata=as.data.frame(Xnew)) #prediction in not made for 
Xnew
 
How can I foce predict.lm to use use the new data?
 
Thank you very much for your efforts in advance!
 
Best,
 
Martin

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] cite publications in the package help file

2015-04-28 Thread David L Carlson

Reproducible examples help. For package MASS do you mean?

http://cran.r-project.org/web/packages/MASS/index.html

Which provides information about the package and a link to the Reference manual:

http://cran.r-project.org/web/packages/MASS/MASS.pdf

In that manual data sets and functions contain a Source entry and/or References 
for that function or data set. For a large package such as MASS with over 150 
functions/data sets, it would be unwieldy to put them all on the web page.

-
David L Carlson
Department of Anthropology
Texas A&M University
College Station, TX 77840-4352

-Original Message-
From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of carol white via 
R-help
Sent: Tuesday, April 28, 2015 1:28 PM
To: Duncan Murdoch; R-help Help
Subject: Re: [R] cite publications in the package help file

the main web page is meant the page when a package is accessed on CRAN. So is 
it possible on this page that the content of DESCRIPTION is displayed to 
display the related publications and also put the related publications so that 
they appear on the help pdf file?

 On Tuesday, April 28, 2015 7:37 PM, Duncan Murdoch 
 wrote:

 On 28/04/2015 1:00 PM, carol white via R-help wrote:
> To cite related publications, it seems that they can't be mentioned in  
> DESCRIPTION. Where to mention so that it appears on the 1st page of  the pdf 
> help file and the package main web page? I'm not talking about what is 
> specified in  inst/citation.

The package help file (e.g. foo-package.Rd for package "foo") will be
displayed first in the PDF, and is the first entry linked in the help
page index for the package.

I don't know what page you mean as the "package main web page".

Duncan Murdoch

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Question about base::rank results

2015-04-27 Thread David L Carlson

Apologies if this belabors the point, but let's look at your second example to 
see why order and rank are different:

> x <- c(12,34,15,77,78,22)
> names(x) <- 1:6
> x
 1  2  3  4  5  6 
12 34 15 77 78 22

I've added names to the values so we can watch how they change. If we sort the 
numbers we get them in increasing order with their original indices:

> sort(x)
 1  3  6  2  4  5 
12 15 22 34 77 78

The values in are order and the names show where each value came from 
originally. That sequence of index values is exactly what order(x) gives you:

> order(x)
[1] 1 3 6 2 4 5
> x[order(x)]
 1  3  6  2  4  5 
12 15 22 34 77 78

The rank function gives you the relative size of the value, not its position in 
the original vector:

> rank(x)
1 2 3 4 5 6 
1 4 2 5 6 3
> x[rank(x)]
 1  4  2  5  6  3 
12 77 34 78 22 15

The second value has rank 4, but that is not its index which is 2. The value 
with index 4 is 77 so it shows up in the second position.

---------
David L Carlson
Department of Anthropology
Texas A&M University
College Station, TX 77840-4352

-Original Message-
From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of J 
Robertson-Burns
Sent: Monday, April 27, 2015 2:34 PM
To: Giorgio Garziano; r-help@r-project.org
Subject: Re: [R] Question about base::rank results

There is a blog post on this topic:

http://www.portfolioprobe.com/2012/07/26/r-inferno-ism-order-is-not-rank/

Pat

On 26/04/2015 09:17, Giorgio Garziano wrote:
> Hi,
>
> I cannot understand why rank(x) behaves as outlined below.
> Based on the results of first x vector values ranking, which is as expected 
> in my opinion,
> I cannot explain the following results.
>
>> x <- c(12,34,15,77,78)
>> x[rank(x)]
> [1] 12 15 34 77 78  (OK)
>
>> x <- c(12,34,15,77,78,22)
>> x[rank(x)]
> [1] 12 77 34 78 22 15   (?)
>
>> x <- c(12,34,77,15,78)
>> x[rank(x)]
> [1] 12 77 15 34 78  (?)
>
> Please any feedback ? Thanks.
>
> BR,
>
> Giorgio Garziano
>
>
>
>   [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] - Obtaining superscripts to affix to means that are not significantly different from each other with R

2015-04-23 Thread David L Carlson

The function cld() in package multcomp generates compact letter displays, but 
does not format them as exponents of the group names.

-
David L Carlson
Department of Anthropology
Texas A&M University
College Station, TX 77840-4352

-Original Message-
From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Joachim 
Audenaert
Sent: Thursday, April 23, 2015 4:58 AM
To: r-help@r-project.org
Subject: [R] - Obtaining superscripts to affix to means that are not 
significantly different from each other with R

Hello all,

It is often time consuming to interpret p-values of multiple pairwise 
comparisons of groups and assign them a letter code for publication 
purposes. So I found this interesting link to a program that does this for 
you. 

http://www.jerrydallal.com/lhsp/similar.htm

I was wondering if something similar exists in R?


Met vriendelijke groeten - With kind regards,

Joachim Audenaert 
onderzoeker gewasbescherming - crop protection researcher

PCS | proefcentrum voor sierteelt - ornamental plant research

Schaessestraat 18, 9070 Destelbergen, Belgi�
T: +32 (0)9 353 94 71 | F: +32 (0)9 353 94 95
E: joachim.audena...@pcsierteelt.be | W: www.pcsierteelt.be   

Heb je je individuele begeleiding bemesting (CVBB) al aangevraagd? | Het 
PCS op LinkedIn
Disclaimer | Please consider the environment before printing. Think green, 
keep it on the screen!
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Vectorizing a task

2015-04-14 Thread David L Carlson

It is not vectorized, but it is simple:

EXPANDED <- unlist(mapply(":", START, END))

-----
David L Carlson
Department of Anthropology
Texas A&M University
College Station, TX 77840-4352

-Original Message-
From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Dennis Fisher
Sent: Tuesday, April 14, 2015 2:36 PM
To: r-h...@stat.math.ethz.ch
Subject: [R] Vectorizing a task

R 3.1.3
OS X

Colleagues

I have data of this sort:
START   <- c(1, 2, 3, 4, 8, 14, 15, 118, 118, 119, 202, 202, 203, 204)
END <- c(1, 2, 3, 6, 13, 14, 117, 118, 118, 201, 202, 202, 203, 204)
I would like to create a vector that looks like this:
START.to.END<- 
c(1:1,2:2,3:3,4:6,8:13,14:14,15:117,118:118,118:118,119:201,202:202,202:202,203:203,204:204)
i.e., each pair of entries is link with “:”, then these are concatenated.

Ultimately, this will be expanded into:
EXPANDED<- c(1L, 2L, 3L, 4L, 5L, 6L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 
15L, 16L, 17L, 18L, 19L, 20L, 21L, 22L, 23L, 24L, 25L, 26L, 27L, 28L, 
29L, 30L, 31L, 32L, 33L, 34L, 35L, 36L, 37L, 38L, 39L, 40L, 41L, 42L, 43L, 44L, 
45L, 46L, 47L, 48L, 49L, 50L, 51L, 52L, 53L, 54L, 
55L, 56L, 57L, 58L, 59L, 60L, 61L, 62L, 63L, 64L, 65L, 66L, 67L, 68L, 69L, 70L, 
71L, 72L, 73L, 74L, 75L, 76L, 77L, 78L, 79L, 80L, 
81L, 82L, 83L, 84L, 85L, 86L, 87L, 88L, 89L, 90L, 91L, 92L, 93L, 94L, 95L, 96L, 
97L, 98L, 99L, 100L, 101L, 102L, 103L, 104L, 105L, 
106L, 107L, 108L, 109L, 110L, 111L, 112L, 113L, 114L, 115L, 116L, 117L, 118L, 
118L, 119L, 120L, 121L, 122L, 123L, 124L, 125L, 126L, 
127L, 128L, 129L, 130L, 131L, 132L, 133L, 134L, 135L, 136L, 137L, 138L, 139L, 
140L, 141L, 142L, 143L, 144L, 145L, 146L, 147L, 148L, 
149L, 150L, 151L, 152L, 153L, 154L, 155L, 156L, 157L, 158L, 159L, 160L, 161L, 
162L, 163L, 164L, 165L, 166L, 167L, 168L, 169L, 170L, 
171L, 172L, 173L, 174L, 175L, 176L, 177L, 178L, 179L, 180L, 181L, 182L, 183L, 
184L, 185L, 186L, 187L, 188L, 189L, 190L, 191L, 192L, 
193L, 194L, 195L, 196L, 197L, 198L, 199L, 200L, 201L, 202L, 202L, 203L, 204L)

The final step will be to find which values are missing from the sequence:
setdiff(1:max(EXPANDED), EXPANDED)

The command:
paste0("c(", paste(paste(ALLSTART, ALLEND, sep=":"), collapse=","), 
")") 
creates the text for START.to.END, but I can’t figure out how to evaluate that 
expression.  I could build the vector step-by-step but that seems quite 
inefficient.

Any suggestions?

Dennis

Dennis Fisher MD
P < (The "P Less Than" Company)
Phone: 1-866-PLessThan (1-866-753-7784)
Fax: 1-866-PLessThan (1-866-753-7784)
www.PLessThan.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Extracting unique entries by a column

2015-04-14 Thread David L Carlson

Try all.equal(df[1,3], df[2,3])

This relates to how decimal numbers are stored in computers. It is not an R 
only issue, but it is described in the R-FAQ:

>From the R-FAQ - http://cran.r-project.org/doc/FAQ/R-FAQ.html

7.31 Why doesn't R think these numbers are equal?

The only numbers that can be represented exactly in R's numeric type are 
integers and fractions whose denominator is a power of 2. Other numbers have to 
be rounded to (typically) 53 binary digits accuracy. As a result, two floating 
point numbers will not reliably be equal unless they have been computed by the 
same algorithm, and not always even then. For example

R> a <- sqrt(2)
R> a * a == 2
[1] FALSE
R> a * a - 2
[1] 4.440892e-16

The function all.equal() compares two objects using a numeric tolerance of 
.Machine$double.eps ^ 0.5. If you want much greater accuracy than this you will 
need to consider error propagation carefully.

For more information, see e.g. David Goldberg (1991), "What Every Computer 
Scientist Should Know About Floating-Point Arithmetic", ACM Computing Surveys, 
23/1, 5-48, also available via http://www.validlab.com/goldberg/paper.pdf.

To quote from "The Elements of Programming Style" by Kernighan and Plauger:

10.0 times 0.1 is hardly ever 1.0.


-----
David L Carlson
Department of Anthropology
Texas A&M University
College Station, TX 77840-4352


-Original Message-
From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Vikram Chhatre
Sent: Tuesday, April 14, 2015 2:40 PM
To: r-help
Subject: [R] Extracting unique entries by a column

I have a data frame of dim 3x600.  There are pairs of rows which have the
exact same value in column 3.

head(df)
POP1 POP2   ABSDIFF
L0005.01 0.98484848 0.688118812 0.2967297
L0005.03 0.01515152 0.311881188 0.2967297
L0008.02 0.97727273 0.004424779 0.9728479
L0008.04 0.02272727 0.995575221 0.9728479
L0012.03 0.98684211 0.004385965 0.9824561
L0012.01 0.01315789 0.995614035 0.9824561

I want to unique sort on df$ABSDIFF so that only one row per pair remains
in the subset.

>df_subset <- df[df(!duplicated(df$ABSDIFF), ]

This does not work. So I literally checked:

>identical(df[1,3], df[2,3])
FALSE

How is 0.2967297 different from 0.2967297?  I am puzzled.

Thanks for any insight.

Vikram

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Sum of some months totals

2015-04-14 Thread David L Carlson

Don't use html formatted emails and always copy the list on your replies.

For example?

rainstats <- function(data, months=3) {
 if (! months %in% c(1, 2, 3, 4, 6, 12)) stop("Months must divide into 12!")
 period <- 12/months
 grps <- rep(1:period, each=months)
 Group <- grps[rainfall$Month]
 aggregate(Rain~Year+Group, rainfall, function(x) c(sum=sum(x),
 days=sum(x>0)))
}

> rainstats(rainfall)
  Year Group Rain.sum Rain.days
1 1979 10 0

---------
David L Carlson
Department of Anthropology
Texas A&M University
College Station, TX 77840-4352

From: Frederic Ntirenganya [mailto:ntfr...@gmail.com] 
Sent: Tuesday, April 14, 2015 9:27 AM
To: David L Carlson
Subject: Re: [R] Sum of some months totals

Hi David,
I understand what you did. My aim is to make a function which takes a quarter 
as a default. i.e I can compute lets say for 4 motnhs by specifying it in the 
arguments of the function.
Regards,
Frederic.

Frederic Ntirenganya
Maseno University,
African Maths Initiative,
Kenya.
Mobile:(+254)718492836
Email: fr...@aims.ac.za
https://sites.google.com/a/aims.ac.za/fredo/

On Tue, Apr 14, 2015 at 4:44 PM, David L Carlson  wrote:
You should read some beginning tutorials for R before you go further. You are 
wasting a lot of time writing complicated loops that you do not need. R is 
probably very different from the programming languages you are used to. In 
these examples I called your data "rainfall."

To get the sum of the rain for each month you need only:

aggregate(Rain~Year+Month, rainfall, sum)

To get the number of days with rain is slightly more complicated:

aggregate(Rain~Year+Month, rainfall, function(x) sum(x>0))

To get the sum for a quarter, you need to add quarters to your data frame, eg. 
Notice that it does not require a loop to add an entire column to your existing 
data frame.

rainfall$Quarter <- (rainfall$Month+2) %/% 3
aggregate(Rain~Year+Quarter, rainfall, sum)

The command ?aggregate will bring up a manual page on the aggregate() function.

Read
"Introduction to R" at http://cran.r-project.org/manuals.html
and one or more of the contributed manuals at
http://cran.r-project.org/other-docs.html

-
David L Carlson
Department of Anthropology
Texas A&M University
College Station, TX 77840-4352

-Original Message-
From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Frederic 
Ntirenganya
Sent: Tuesday, April 14, 2015 6:10 AM
To: Adams, Jean
Cc: r-help@r-project.org
Subject: Re: [R] Sum of some months totals

Hi Jean,

Thanks for the help!
How can I compute monthly total of rainfall?

I want to compute both monthly total of rainfall and number of raindays. In
below function month_tot is a table and I want to some month. default is 3
months. The loop for quarter is not working and I am wondering why it is
not working.

total = function(data, threshold = 0.85){
  month_tot=matrix(NA,length(unique(data$Year)),12)
  rownames(month_tot)=as.character(unique(data$Year))

colnames(month_tot)=c("Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec")
  raindays=month_tot
  # loop over months and years to get summary statistics
  for (mon in 1:12) {
    rain=data[data[2]==mon,c(1,4)]   # rain just for a specific month
    for (yr in unique(data$Year)) {
      month_tot[yr-min(unique(data$Year)-1),mon]=sum(rain[rain[,1]==yr,2])
      #print(sum(rain[rain[,1]==yr,2]))

raindays[yr-min(unique(data$Year)-1),mon]=sum(rain[rain[,1]==yr,2]>threshold)
    }
  }
  month_tot
  1:ncol(month_tot)
  #month_tot[,1] + month_tot[,2] + month_tot[,3]
  quarter <-c()
  i = 3
  for (i in 1:ncol(month_tot)){

    quarter[i] = sum(month_tot[,i])
  }
  quarter
}

total(kitale)

Regards,
Frederic.

Frederic Ntirenganya
Maseno University,
African Maths Initiative,
Kenya.
Mobile:(+254)718492836
Email: fr...@aims.ac.za
https://sites.google.com/a/aims.ac.za/fredo/

On Tue, Apr 14, 2015 at 1:52 PM, Adams, Jean  wrote:

> If you want to calculate the number of days having greater than a certain
> threshold of rain within a range of months, a function like this might
> serve your needs.
>
> raindays <- function(data, monStart=1, monEnd=3, threshold=0.85) {
>   with(data, {
>     selRows <- Month >= monStart & Month <= monEnd & Rain > threshold
>     days <- tapply(selRows, Year, sum)
>     return(days)
>   })
> }
>
> raindays(kitale)
>
> Jean
>
> On Tue, Apr 14, 2015 at 2:46 AM, Frederic Ntirenganya 
> wrote:
>
>> I want to compute monthly summaries from daily data. I want to choose
>> which
>> month to start and how many months to total over.  Default could be to
>> start in Januar

Re: [R] R studio installation

2015-04-14 Thread David L Carlson

R Studio loads R if it can find it. Since you have installed R, the error 
message means that R Studio can't find it or is not sure which version to use. 
The part of the message that says "please select the version of R to use" 
should give you a dialog box to use to navigate to the directory that contains 
R. Once you have done this, R Studio will remember where it is. The most likely 
explanation is one of these:

1. You installed R in the default location "C:\Program Files\R" but you have 
multiple installations as a result of updating R. By default R creates a new 
subdirectory for each new version. As a result R Studio does not know which one 
you want.

2. You installed both 32-bit and 64-bit versions of R so R Studio does not know 
which one to use.

3. You installed R in a location other than the default location and R Studio 
cannot fine it.

---------
David L Carlson
Department of Anthropology
Texas A&M University
College Station, TX 77840-4352

-Original Message-
From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Jeff Newmiller
Sent: Tuesday, April 14, 2015 7:43 AM
To: John Kane; Sojood Malkawi; r-help@r-project.org
Subject: Re: [R] R studio installation

But if the answer to the question "Does R load on its own?" is "no" then this 
probably is the right place to ask for help. Of course, I would probably just 
suggest re-installing R, but someone else here might have better answers.
---
Jeff NewmillerThe .   .  Go Live...
DCN:Basics: ##.#.   ##.#.  Live Go...
  Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
/Software/Embedded Controllers)   .OO#.   .OO#.  rocks...1k
---
Sent from my phone. Please excuse my brevity.

On April 14, 2015 5:14:21 AM PDT, John Kane  wrote:
>You probably should go to the RStudio help/blog rather than here. This
>is not an RStudio list and the expertise is at the RStudoi site.
>
>Does R load on its own?
>
>What OS are you using?
>
>John Kane
>Kingston ON Canada
>
>
>> -Original Message-
>> From: sojoodmlk1...@gmail.com
>> Sent: Tue, 14 Apr 2015 11:45:05 +0300
>> To: r-help@r-project.org
>> Subject: [R] R studio installation
>> 
>> I installed R and then R studio but it doesn't open every time i try
>to
>> open it it gives me this message  "Rstudio requires an existing
>> installation of R in order to work. please select the version of R to
>use
>> ".
>> i'm using R i386 3.1.3 and downloaded RStudio 0.98.1103 - Windows
>> XP/Vista/7/8. do you have any idea what the problem is?
>> 
>>  [[alternative HTML version deleted]]
>> 
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>
>Can't remember your password? Do you need a strong and secure password?
>Use Password manager! It stores your passwords & protects your account.
>
>__
>R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Convert color hex code to color names

2015-04-13 Thread David L Carlson

Actually all 6 colors in rainbow(6) do have names. I missed the fact that 
rainbow() adds an alpha value that we need to strip off before comparing to the 
values in clrs$RGB:

> rain <- substr(rain, 1, 7)
> sum(clrs$RGB %in% rain)
[1] 12

So there are two color names for each color in rainbow(6):

> for (i in 1:6) cat(i, colors()[clrs$RGB==rain[i]], "\n")
1 red red1 
2 yellow yellow1 
3 green green1 
4 cyan cyan1 
5 blue blue1 
6 magenta magenta1

David C

-Original Message-
From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of David L Carlson
Sent: Monday, April 13, 2015 12:07 PM
To: Boris Steipe; Alejo C.S.
Cc: r-help@r-project.org
Subject: Re: [R] Convert color hex code to color names

And expanding at a more elementary level. The reason you need to find the 
smallest difference is 
that all of the possible colors do not have names. There are 256^3 = 16,777,216 
possible rgb color designations, but only 657 named colors. You can create a 
data frame of the named colors and their rgb designations using

> clrs <- data.frame(Color=colors(), RGB=rgb(t(col2rgb(colors())),
maxColorValue=255), stringsAsFactors=FALSE)
> str(clrs)
'data.frame':   657 obs. of  2 variables:
 $ Color: chr  "white" "aliceblue" "antiquewhite" "antiquewhite1" ...
 $ RGB  : chr  "#FF" "#F0F8FF" "#FAEBD7" "#FFEFDB" ...
> head(clrs)
  Color RGB
1 white #FF
2 aliceblue #F0F8FF
3  antiquewhite #FAEBD7
4 antiquewhite1 #FFEFDB
5 antiquewhite2 #EEDFCC
6 antiquewhite3 #CDC0B0

So most colors do not have names. In your example, none of the colors in 
rainbow(6) have names: 

> rain <- rainbow(6)
> sum(clrs$RGB %in% rain)
[1] 0

-
David L Carlson
Department of Anthropology
Texas A&M University
College Station, TX 77840-4352


-Original Message-
From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Boris Steipe
Sent: Monday, April 13, 2015 11:44 AM
To: Alejo C.S.
Cc: r-help@r-project.org
Subject: Re: [R] Convert color hex code to color names

To add slightly to that:

What you want to do is write a function that returns the named color that has 
the smallest difference to your input hex-triplet. But note that color 
difference is a large topic. Assuming you want to minimize *perceptual* 
differences, you want to calculate your differences in Lab color space. The 
function convertColor() has the option to convert hex to Lab. Example:
convertColor(t(col2rgb("thistle")), from="sRGB", to="Lab", scale.in=255)

Within Lab space, you can take the Euclidian distance.

That all said, I can't imagine why one would want to do this in the first place 
- color triplets are much more convenient than label strings :-)


B.




On Apr 13, 2015, at 11:45 AM, Thierry Onkelinx  wrote:

> A combination of rgb(), col2rgb() and colors() can gives hex values for the
> named colors.
> 
> ir. Thierry Onkelinx
> Instituut voor natuur- en bosonderzoek / Research Institute for Nature and
> Forest
> team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance
> Kliniekstraat 25
> 1070 Anderlecht
> Belgium
> 
> To call in the statistician after the experiment is done may be no more
> than asking him to perform a post-mortem examination: he may be able to say
> what the experiment died of. ~ Sir Ronald Aylmer Fisher
> The plural of anecdote is not data. ~ Roger Brinner
> The combination of some data and an aching desire for an answer does not
> ensure that a reasonable answer can be extracted from a given body of data.
> ~ John Tukey
> 
> 2015-04-13 17:28 GMT+02:00 Alejo C.S. :
> 
>> Hi all, I want to convert the output of:
>> 
>>> rainbow(6)
>> 
>>> [1] "#FFFF" "#00FF" "#00FF00FF" "#00FF" "#"
>> "#FF00"
>> 
>> To a vector of color names. Any tip?
>> 
>> 
>> Thanks in advance
>> 
>> C.
>> 
>>[[alternative HTML version deleted]]
>> 
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>> 
> 
>   [[alternative HTML version deleted]]
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commen

Re: [R] Convert color hex code to color names

2015-04-13 Thread David L Carlson

And expanding at a more elementary level. The reason you need to find the 
smallest difference is 
that all of the possible colors do not have names. There are 256^3 = 16,777,216 
possible rgb color designations, but only 657 named colors. You can create a 
data frame of the named colors and their rgb designations using

> clrs <- data.frame(Color=colors(), RGB=rgb(t(col2rgb(colors())),
maxColorValue=255), stringsAsFactors=FALSE)
> str(clrs)
'data.frame':   657 obs. of  2 variables:
 $ Color: chr  "white" "aliceblue" "antiquewhite" "antiquewhite1" ...
 $ RGB  : chr  "#FF" "#F0F8FF" "#FAEBD7" "#FFEFDB" ...
> head(clrs)
  Color RGB
1 white #FF
2 aliceblue #F0F8FF
3  antiquewhite #FAEBD7
4 antiquewhite1 #FFEFDB
5 antiquewhite2 #EEDFCC
6 antiquewhite3 #CDC0B0

So most colors do not have names. In your example, none of the colors in 
rainbow(6) have names: 

> rain <- rainbow(6)
> sum(clrs$RGB %in% rain)
[1] 0

-
David L Carlson
Department of Anthropology
Texas A&M University
College Station, TX 77840-4352

-Original Message-
From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Boris Steipe
Sent: Monday, April 13, 2015 11:44 AM
To: Alejo C.S.
Cc: r-help@r-project.org
Subject: Re: [R] Convert color hex code to color names

To add slightly to that:

What you want to do is write a function that returns the named color that has 
the smallest difference to your input hex-triplet. But note that color 
difference is a large topic. Assuming you want to minimize *perceptual* 
differences, you want to calculate your differences in Lab color space. The 
function convertColor() has the option to convert hex to Lab. Example:
convertColor(t(col2rgb("thistle")), from="sRGB", to="Lab", scale.in=255)

Within Lab space, you can take the Euclidian distance.

That all said, I can't imagine why one would want to do this in the first place 
- color triplets are much more convenient than label strings :-)

B.

On Apr 13, 2015, at 11:45 AM, Thierry Onkelinx  wrote:

> A combination of rgb(), col2rgb() and colors() can gives hex values for the
> named colors.
> 
> ir. Thierry Onkelinx
> Instituut voor natuur- en bosonderzoek / Research Institute for Nature and
> Forest
> team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance
> Kliniekstraat 25
> 1070 Anderlecht
> Belgium
> 
> To call in the statistician after the experiment is done may be no more
> than asking him to perform a post-mortem examination: he may be able to say
> what the experiment died of. ~ Sir Ronald Aylmer Fisher
> The plural of anecdote is not data. ~ Roger Brinner
> The combination of some data and an aching desire for an answer does not
> ensure that a reasonable answer can be extracted from a given body of data.
> ~ John Tukey
> 
> 2015-04-13 17:28 GMT+02:00 Alejo C.S. :
> 
>> Hi all, I want to convert the output of:
>> 
>>> rainbow(6)
>> 
>>> [1] "#FFFF" "#00FF" "#00FF00FF" "#00FF" "#"
>> "#FF00"
>> 
>> To a vector of color names. Any tip?
>> 
>> 
>> Thanks in advance
>> 
>> C.
>> 
>>[[alternative HTML version deleted]]
>> 
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>> 
> 
>   [[alternative HTML version deleted]]
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] how to Subset based on partial matching of columns?

2015-04-09 Thread David L Carlson

>From Sarah's data frame you can get what you want directly with the table() 
>function which will create a table object, mydf.tbl. If you want a data frame 
>you need to convert the table using as.data.frame.matrix() to make mydf.df. 
>Finally combine the two data frames if your x column consists of unique values 
>in ascending order to make mydf.all.

> mydf.tbl <- table(mydf$x, mydf$code)
> mydf.tbl
   
LGTY MY GM+ RS TY
  10  1  0  0
  21  0  0  0
  30  0  1  0
  40  0  0  1
> mydf.df <- as.data.frame.matrix(mydf.tbl)
> mydf.df
  LGTY MY GM+ RS TY
10  1  0  0
21  0  0  0
30  0  1  0
40  0  0  1
> mydf.all <- data.frame(mydf, mydf.df)
> mydf.all
  x   code LGTY MY.GM. RS TY
1 1 MY GM+0  1  0  0
2 2   LGTY1  0  0  0
3 3 RS0  0  1  0
4 4 TY0  0  0  1


-Original Message-
From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of samarvir singh
Sent: Thursday, April 9, 2015 8:50 AM
To: Sarah Goslee
Cc: r-help
Subject: Re: [R] how to Subset based on partial matching of columns?

Thank you. Sarah Goslee. I am rather new in learning R. So people like you
are great support. Really appreciate you, taking the time to correct my
mistakes. Thanks

On Thu 9 Apr, 2015 6:54 pm Sarah Goslee  wrote:

> Hi,
>
> Please don't put quotes around your code. It makes it hard to copy and
> paste. Alternatively, don't post in HTML, because it screws up your
> code.
>
> On Wed, Apr 8, 2015 at 8:57 PM, samarvir singh 
> wrote:
> > So I have a list that contains certain characters as shown below
> >
> > `list <- c("MY","GM+" ,"TY","RS","LG")`
>
> That's a character vector, not a list. A list is a specific type of object
> in R.
>
> > And I have a variable named "CODE" in the data frame as follows
> >
> > `code <- c("MY GM+", ,"LGTY", "RS","TY")`
>
> That doesn't work, and I have no idea what you expect to have there,
> so I'm deleting the extra comma. Also, your vector is named code, not
> CODE.
>
> code <- c("MY GM+", "LGTY", "RS","TY")
> x <- c(1:4)
>
> > 'x <- c(1:5)
> > `df <- data.frame(x,code)`
>
> You problably actually want
> mydf <- data.frame(x, code, stringsAsFactors=FALSE)
>
> Note I changed the name, because df() is a base R function.
>
>
> > Now I want to create 5 new variables named "MY","GM+","TY","RS","LG"
> >
> > Which takes binary value, 1 if there's a match case in the CODE variable
> >
> > df
> >  x  code MY GM+ TY RS LG
> > 1  MY GM+  1 1  00   0
> > 2  0 0  00   0
> > 3  LGTY   0 0 1 0   1
> > 4  RS   0 0  010
> > 5  TY   0 0  100
>
> grepl() will give you a logical match
>
> data.frame(mydf, sapply(code, function(x)grepl(x, mydf$code)),
> stringsAsFactors=FALSE, check.names=FALSE)
>
> Sarah
>
>
> --
> Sarah Goslee
> http://www.functionaldiversity.org
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] sort adjacency matrix

2015-04-06 Thread David L Carlson

The answer depends on what kind of matrix/data frame you have. That is why we 
encourage people to use dput() to create a copy of the sample data in their 
email. Some combination of order() function the rowSums() function will 
probably get you what you want. For example,

dat[order(rowSums(dat=="1"), decreasing=TRUE),]

or

dat[order(rowSums(dat), decreasing=TRUE),]

or

dat[order(rowSums(dat, na.rm=TRUE), decreasing=TRUE),]

Note that the order is not unique since there are ties in the number of 1s.

-----
David L Carlson
Department of Anthropology
Texas A&M University
College Station, TX 77840-4352


-Original Message-
From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Ragia Ibrahim
Sent: Monday, April 6, 2015 12:18 PM
To: r-help@r-project.org
Subject: [R] sort adjacency matrix

Dear group 
i have the following matrix

1  . . 1 . . 1 . . . .
2  . . . . . . 1 . . .
3  1 . . . 1 . . 1 . 1
4  . . . . . 1 . . . .
5  . . 1 . . . . . . 1
6  1 . . 1 . . . . 1 .
7  . 1 . . . . . 1 . .
8  . . 1 . . . 1 . . 1
9  . . . . . 1 . . . 1
10 . . 1 . 1 . . 1 1 .

I want to sort it according to ones in each row ascending (where max number of 
ones first)

to be as follow

3  1 . . . 1 . . 1 . 1
10 . . 1 . 1 . . 1 1 .
6  1 . . 1 . . . . 1 .8  . . 1 . . . 1 . . 11  . . 1 . . 1 . . . .5  . . 1 . . 
. . . . 17  . 1 . . . . . 1 . .9  . . . . . 1 . . . 12  . . . . . . 1 . . .4  . 
. . . . 1 . . . .

how can I do this in R
thanks in advance
  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Again: A problem someone should know about

2015-03-30 Thread David L Carlson

In your first example you created logfat.lm and then tried to plot logfat so 
you got an error indicating that logfat did not exist. 

In your second example we have no idea what body.fat. You must make your 
examples reproducible so that we can reproduce your error. It looks like you 
could also benefit from spending a little time learning about R using a free 
tutorial.

-
David L Carlson
Department of Anthropology
Texas A&M University
College Station, TX 77840-4352

-Original Message-
From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Ian Lester
Sent: Monday, March 30, 2015 12:59 AM
To: r-help@r-project.org
Subject: [R] Again: A problem someone should know about

i have no idea what to do

> plot(body.fat, BMI,xlab="Body fat",ylab="BMI",main=“Figure 2.1: BMI vs Body 
> fat (n=252)”)
Error: unexpected input in "plot(body.fat, BMI,xlab="Body 
fat",ylab="BMI",main=�"
> plot(body.fat, BMI,xlab="Body fat",ylab="BMI")
 serious error. This application, or a library it uses, is using an invalid 
context  and is thereby contributing to an overall degradation of system 
stability and reliability. This notice is a courtesy: please fix this problem. 
It will become a fatal error in an upcoming update.
> 
> Begin forwarded message:
> 
> From: Ian Lester 
> Reply-To: ihles...@mensa.org.au
> Subject: A problem someone should know about
> Date: 30 March 2015 9:52:54 am AEDT
> To: r-help@r-project.org
> 
> I’m a novice and this message looks like it shouldn’t be ignored. Someone who 
> knows what they’re doing should probably take a look.
> Thanks
> Ian Lester
> 
>> logfat.lm<-(lm(body.fat~log(BMI)))
>> plot(logfat)
> Error in plot(logfat) : object 'logfat' not found
>> plot(logfat.lm)
> Hit  to see next plot: 
> Hit  to see next plot: 
> Hit  to see next plot: 
> Mar 29 18:10:18 iansimac.gateway rsession[69550] : Error: this 
> application, or a library it uses, has passed an invalid numeric value (NaN, 
> or not-a-number) to CoreGraphics API. This is a serious error and contributes 
> to an overall degradation of system stability and reliability. This notice is 
> a courtesy: please fix this problem. It will become a fatal error in an 
> upcoming update.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] textplot() in wordcloud package

2015-03-16 Thread David L Carlson

Another possibility is to use pointLabel() in package maptools. For your example

library(maptools)

plot(x,y)
pointLabel(x, y, text1)

Advantages of pointLabel() are that it returns a list of the x and y 
coordinates of the labels that you can tweak if necessary and, at least in your 
example, it does a better job of avoiding labels being chopped at the plot 
margins.

-
David L Carlson
Department of Anthropology
Texas A&M University
College Station, TX 77840-4352

-Original Message-
From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of David L Carlson
Sent: Monday, March 16, 2015 10:44 AM
To: Fraser D. Neiman; r-help@r-project.org
Subject: Re: [R] textplot() in wordcloud package

You should contact the package maintainer about this. The problem is that the 
pos= argument is being passed to strwidth() and strheight() and those functions 
do not know what to do with it. In the meantime:

suppressWarnings(textplot(x,y, text1, new=F, show.lines=F,  
  pos=4))

will eliminate the warnings.

-----
David L Carlson
Department of Anthropology
Texas A&M University
College Station, TX 77840-4352

-Original Message-
From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Fraser D. Neiman
Sent: Friday, March 13, 2015 3:29 PM
To: r-help@r-project.org
Subject: [R] textplot() in wordcloud package

Dear All,

The textplot() function in the wordcloud package seem to do a good job with 
generating non-overlapping labels on a scatter plot.
But it throws "warnings" when I try to use the pos= parameter to position the 
text labels relative to a given x-y point.

Here is a simple example:

 x<-runif(100)
 y<-runif(100)
text1<- rep('LAB', 100)

 plot(x,y)
 textplot(x,y, text1, new=F, show.lines=F,  
  pos=4)

There were 50 or more warnings (use warnings() to see the first 50)
> warnings()
Warning messages:
1: In strwidth(words[i], cex = cex[i], ...) : "pos" is not a graphical parameter
2: In strheight(words[i], cex = cex[i], ...) : "pos" is not a graphical 
parameter 

How can I pass the pos=parameter to text() without generating the warnings?

I am doubly puzzled by the warnings because in the graph that results from the 
foregoing code,
The labels are to the  right of the points, as 'pos=4' requests.

Thanks!

Fraser D. Neiman

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] textplot() in wordcloud package

2015-03-16 Thread David L Carlson

You should contact the package maintainer about this. The problem is that the 
pos= argument is being passed to strwidth() and strheight() and those functions 
do not know what to do with it. In the meantime:

suppressWarnings(textplot(x,y, text1, new=F, show.lines=F,  
  pos=4))

will eliminate the warnings.

-
David L Carlson
Department of Anthropology
Texas A&M University
College Station, TX 77840-4352

-Original Message-
From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Fraser D. Neiman
Sent: Friday, March 13, 2015 3:29 PM
To: r-help@r-project.org
Subject: [R] textplot() in wordcloud package

Dear All,

The textplot() function in the wordcloud package seem to do a good job with 
generating non-overlapping labels on a scatter plot.
But it throws "warnings" when I try to use the pos= parameter to position the 
text labels relative to a given x-y point.

Here is a simple example:

 x<-runif(100)
 y<-runif(100)
text1<- rep('LAB', 100)

 plot(x,y)
 textplot(x,y, text1, new=F, show.lines=F,  
  pos=4)

There were 50 or more warnings (use warnings() to see the first 50)
> warnings()
Warning messages:
1: In strwidth(words[i], cex = cex[i], ...) : "pos" is not a graphical parameter
2: In strheight(words[i], cex = cex[i], ...) : "pos" is not a graphical 
parameter 

How can I pass the pos=parameter to text() without generating the warnings?

I am doubly puzzled by the warnings because in the graph that results from the 
foregoing code,
The labels are to the  right of the points, as 'pos=4' requests.

Thanks!

Fraser D. Neiman

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Adding Column to a Data Frame

2015-03-12 Thread David L Carlson

The merge function combines 2, not 3 files at a time. Maybe

rich.stats2 = merge(rich.stats, Month, by="X.SampleID")
rich.stats3 = merge(rich.stats2, Location, by="X.SampleID")

Reading the manual page will help:

?merge

---------
David L Carlson
Department of Anthropology
Texas A&M University
College Station, TX 77840-4352


-Original Message-
From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Lauren O'Connell
Sent: Wednesday, March 11, 2015 11:02 AM
To: r-help@r-project.org
Subject: [R] Adding Column to a Data Frame

I am trying to add a column with data to a data frame that already has
information but am getting this error:

> rich.stats4 = merge(rich.stats,Location,Month,by="X.SampleID")
Error in fix.by(by.x, x) :
  'by' must specify one or more columns as numbers, names or logical


I have two separate data frames that contain my sample names with location
and with month:

#Create data frame of all sample names and month sample was taken

Month =
data.frame(X.SampleID=sample_data(Lauren5000)$X.SampleID,MonthSampleTaken=sample_data(Lauren5000)$MonthSampleTaken)

head(Month)



#Create data frame of all samples names and location of sample

Location =
data.frame(X.SampleID=sample_data(Lauren5000)$X.SampleID,Location=sample_data(Lauren5000)$Location)

head(Location)

I was able to add my "MonthSampleTaken" variable by using this command:

>rich.stats2 = merge(rich.stats, Month,by="X.SampleID")

> head(rich.stats2)

  X.SampleIDmean   sd MonthSampleTaken

1  PE101 1421.34 19.44961 February

2  PE102 1336.24 25.43882 February

3  PE104 1418.75 21.92889March

4  PE105 1331.03 20.55712March

5  PE107 1320.21 20.91942March

6  PE108 1328.41 20.49247March


I now want to add my sample site location, but can't figure out how to do
this. Any help would be greatly appreciated.


Cheers,

Lauren

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] calculate value in dependence of target value

2015-03-09 Thread David L Carlson

This works for your example data, but I'd recommend testing it carefully before 
using it.

> dat <- data.frame(ID=11:14, VALUE=c(1, 5, 3, 2)*1)
> HURD <- c(50, 75, 100)*1000
> PCT <- c(.02, .04, .08, .1)
> dat$CVALUE <- cumsum(dat$VALUE)
> dat$LVALUE <- dat$CVALUE - dat$VALUE
> dat
  ID VALUE CVALUE LVALUE
1 11 1  1  0
2 12 5  6  1
3 13 3  9  6
4 14 2 11  9
> 
> for (idx in seq_len(nrow(dat))) {
+ rng <- sort(c(HURD, unlist(dat[idx,3:4])))
+ a <- which(names(rng) == "LVALUE")
+ b <- which(names(rng) == "CVALUE")
+ diff(rng[a:b])
+ ng <- length(diff(rng[a:b]))
+ dat$MARGE[idx] <- sum(PCT[a:(a+ng-1)]* diff(rng[a:b]))
+ }
> dat
  ID VALUE CVALUE LVALUE MARGE
1 11 1  1  0   200
2 12 5  6  1  1200
3 13 3  9  6  1800
4 14 2 11  9  1800

-
David L Carlson
Department of Anthropology
Texas A&M University
College Station, TX 77840-4352

-Original Message-
From: Jeff Newmiller [mailto:jdnew...@dcn.davis.ca.us] 
Sent: Monday, March 9, 2015 2:22 PM
To: Matthias Weber
Cc: David L Carlson; r-help@r-project.org
Subject: Re: [R] calculate value in dependence of target value


> target <- 10
>
> breakpts <- data.frame( PctTarget=c(50,75,100,Inf), Mult=c(2,4,8,10) )
> breakpts$LastPct <- c( 0, breakpts$PctTarget[ -nrow( breakpts ) ] )
> breakpts$Range <- cut( breakpts$PctTarget, c( 0, breakpts$PctTarget ), 
include.lowest=TRUE )
> breakpts$DeltaPct <- with( breakpts, diff( c( 0, PctTarget ) ) )
> breakpts$CumMARGE <- target / 1e4 * with( breakpts, cumsum( DeltaPct * 
Mult ) )
> breakpts$LastCumMARGE <- c( 0, breakpts$CumMARGE[ -nrow( breakpts ) ] )
>
> dta <- data.frame( ID=11:14, VALUE=c(1,5,3,2) )
> dta$CumVALUE <- cumsum( dta$VALUE )
> dta$CumPct <- 100 * dta$CumVALUE / target
> dta$Range <- cut( dta$CumPct, c( 0, breakpts$PctTarget ), 
include.lowest=TRUE )
>
> dta
   ID VALUE CumVALUE CumPct Range
1 11 11 10[0,50]
2 12 56 60   (50,75]
3 13 39 90  (75,100]
4 14 2   11110 (100,Inf]
> breakpts
   PctTarget Mult LastPct Range DeltaPct CumMARGE LastCumMARGE
1502   0[0,50]   50 10000
2754  50   (50,75]   25 2000 1000
3   1008  75  (75,100]   25 4000 2000
4   Inf   10 100 (100,Inf]  Inf  Inf 4000
>
> #dta2 <- merge( dta, breakpts, all.x=TRUE, by="Range" )
> #dta2 <- dta2[ order( dta2$ID ), ]
>
> dta2 <- cbind( dta, breakpts[ match( dta$Range, breakpts$Range ), 
-which( "Range"==names( breakpts ) ) ] )
>
> dta2$CumMARGE <- with( dta2, Mult/100 * ( CumVALUE - target * LastPct / 
100 ) + LastCumMARGE )
> dta2$MARGE <- with( dta2, diff( c( 0, CumMARGE ) ) )
>
> dta2
   ID VALUE CumVALUE CumPct Range PctTarget Mult LastPct DeltaPct 
CumMARGE LastCumMARGE MARGE
1 11 11 10[0,50]502   0   50 
2000   200
2 12 56 60   (50,75]754  50   25 
1400 1000  1200
3 13 39 90  (75,100]   1008  75   25 
3200 2000  1800
4 14 2   11110 (100,Inf]   Inf   10 100  Inf 
5000 4000  1800
>
>
> target <- 10
>
> breakpts <- data.frame( PctTarget=c(50,75,100,Inf), Mult=c(2,4,8,10) )
> breakpts$LastPct <- c( 0, breakpts$PctTarget[ -nrow( breakpts ) ] )
> breakpts$Range <- cut( breakpts$PctTarget, c( 0, breakpts$PctTarget ), 
include.lowest=TRUE )
> breakpts$DeltaPct <- with( breakpts, diff( c( 0, PctTarget ) ) )
> breakpts$CumMARGE <- target / 1e4 * with( breakpts, cumsum( DeltaPct * 
Mult ) )
> breakpts$LastCumMARGE <- c( 0, breakpts$CumMARGE[ -nrow( breakpts ) ] )
>
> dta <- data.frame( ID=11:14, VALUE=c(1,5,3,2) )
> dta$CumVALUE <- cumsum( dta$VALUE )
> dta$CumPct <- 100 * dta$CumVALUE / target
> dta$Range <- cut( dta$CumPct, c( 0, breakpts$PctTarget ), 
include.lowest=TRUE )
>
> dta
   ID VALUE CumVALUE CumPct Range
1 11 11 10[0,50]
2 12 56 60   (50,75]
3 13 39 90  (75,100]
4 14 2   11110 (100,Inf]
> breakpts
   PctTarget Mult LastPct Range DeltaPct CumMARGE LastCumMARGE
1502   0[0,50]   50 10000
2754  50   (50,75]   25 2000 1000
3   1008  75  (75,100]   25 4000 2000
4   Inf   10 100 (100,Inf]  Inf  Inf 4000
>
> #dta2 <- merge( dta, breakpts, all.x=TRUE, by="Range&q

Re: [R] calculate value in dependence of target value

2015-03-09 Thread David L Carlson

It is very hard to figure out what you are trying to do. 

1. All of the VALUEs are greater than the target of 100
2. Your description of what you want does not match your example.

Perhaps VALUE should be divided by 1000 (e.g. not 1, but 10)?
Perhaps your targets do not apply to VALUE, but to cumulative VALUE?

-
David L Carlson
Department of Anthropology
Texas A&M University
College Station, TX 77840-4352



-Original Message-
From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Matthias Weber
Sent: Monday, March 9, 2015 7:46 AM
To: r-help@r-project.org
Subject: [R] calculate value in dependence of target value

Hello together,

i have a litte problem. Maybe anyone can help me.

I have to calculate a new column in dependence of a target value.

As a example: My target value is 100.000
At the moment I have a data.frame with the following values.

 IDVALUE
1   111
2   125
3   133
4   142

The new column ("MARGE") should be calculated with the following graduation:
Until the VALUE reach 50% of the target value (50.000) = 2%
Until the VALUE reach 75% of the target value (75.000) = 4%
Until the VALUE reach 100% of the target value (<100.000) = 8%
If the VALUE goes above 100% of the value (>100.000) = 10%

The result looks like this one:

 IDVALUE  MARGE
1   111  200  (result of 10.000 * 2%)
2   125 1200 (result of 40.000 * 2% + 10.000 * 4%)
3   133 1800 (result of 15.000 * 4% + 15.000 * 8%)
4   142 1800 (result of 10.000 * 8% + 10.000 * 10%)

Is there anyway to calculate the column "MARGE" automatically in R?

Thanks a lot for your help.

Best regards.

Mat


This e-mail may contain trade secrets, privileged, undisclosed or otherwise 
confidential information. If you have received this e-mail in error, you are 
hereby notified that any review, copying or distribution of it is strictly 
prohibited. Please inform us immediately and destroy the original transmittal. 
Thank you for your cooperation.

Diese E-Mail kann Betriebs- oder Geschaeftsgeheimnisse oder sonstige 
vertrauliche Informationen enthalten. Sollten Sie diese E-Mail irrtuemlich 
erhalten haben, ist Ihnen eine Kenntnisnahme des Inhalts, eine 
Vervielfaeltigung oder Weitergabe der E-Mail ausdruecklich untersagt. Bitte 
benachrichtigen Sie uns und vernichten Sie die empfangene E-Mail. Vielen Dank.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] subset a data frame by largest frequencies of factors

2015-03-05 Thread David L Carlson

These two commands will compute the cell frequencies and then sort them:

e <- as.data.frame(xtabs(~ctry+member, Dataset))
f <- e[order(e$Freq, decreasing=TRUE),]

Then draw your subset

g <- head(f, 10)

or

g <- f[cumsum(f$Freq)/sum(f$Freq) >.8,]

Finally merge the sample with the original data and delete the unused factor 
levels:

sample <- merge(Dataset, g[,-3])
sample$ctry <- factor(sample$ctry)
sample$member <- factor(sample$member)

---------
David L Carlson
Department of Anthropology
Texas A&M University
College Station, TX 77840-4352


-Original Message-
From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Michael Friendly
Sent: Thursday, March 5, 2015 12:45 PM
To: R-help
Subject: [R] subset a data frame by largest frequencies of factors

A consulting client has a large data set with a binary response 
(negative) and two factors (ctry and member) which have many levels,
but many occur with very small frequencies.  It is far too sparse with a 
model like glm(negative ~ ctry+member, family=binomial).

 > str(Dataset)
'data.frame':   10672 obs. of  5 variables:
  $ ctry: Factor w/ 31 levels "Barbados","Belize",..: 21 21 5 22 18 
18 18 18 26 18 ...
  $ member  : Factor w/ 163 levels "","ADHOPIA, PREETI ",..: 150 19 19 
111 120 1 1 4 55 18 ...
  $ negative: int  0 1 0 1 1 1 1 0 0 0 ...
 >

For analysis, we'd like to subset the data to include only those that 
occur with frequency greater than a given
value, or the top 10 (say) in frequency, or the highest frequency 
categories accounting for 80% (say) of the
total.  I'm not sure how to do any of these in R.  Can anyone help?

-- 
Michael Friendly Email: friendly AT yorku DOT ca
Professor, Psychology Dept. & Chair, Quantitative Methods
York University  Voice: 416 736-2100 x66249 Fax: 416 736-5814
4700 Keele StreetWeb:http://www.datavis.ca
Toronto, ONT  M3J 1P3 CANADA

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Using dates in R

2015-03-04 Thread David L Carlson

Wow! A bold prediction from someone who has done exactly zero investigation of 
the basic, built-in date/time features in R. Since your example did not include 
the first two digits of the year, I've used %y instead of %Y. That will assume 
"19" precedes values from 69-99 and "20" precedes values from 00 to 68. If you 
decide to implement this with a for loop, it means you have much more to learn.

> today <- "3/4/15"
> d1 <- "2/15/80"
> d2 <- "2/15/16"
> # Is d before today, if so 0, otherwise 1
> as.integer(strptime(today, "%m/%d/%y") < strptime(d1, "%m/%d/%y"))
[1] 0
> as.integer(strptime(today, "%m/%d/%y") < strptime(d2, "%m/%d/%y"))
[1] 1

?strptime for details

-
David L Carlson
Department of Anthropology
Texas A&M University
College Station, TX 77840-4352


-Original Message-
From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Brian Hamel
Sent: Wednesday, March 4, 2015 8:55 AM
To: r-help@r-project.org
Subject: [R] Using dates in R

Hi all,

I have a dataset that includes a "date" variable. Each observation includes
a date in the form of 2/15/15, for example. I'm looking to create a new
indicator variable that is based on the date variable. So, for example, if
the date is earlier than today, I would need a "0" in the new column, and a
"1" otherwise. Note that my dataset includes dates from 1979-2012, so it is
not one-year (this means I can't easily create a new variable 1-365).

How does R handle dates? My hunch is "not well," but perhaps there is a
package that can help me with this. Let me know if you have any
recommendations as to how this can be done relatively easily.

Thanks! Appreciate it.

Best,
Brian

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] sampling dataframe based upon number of record occurrences

2015-03-04 Thread David L Carlson

I'm not sure I understand, but I think you have a large data frame with records 
and you want to construct a sample of that data frame that includes no more 
than 3 records for each IDbyYear combination? You say there are 5589 unique 
combinations and your code uses a data frame called fitting_set. Assuming this 
is the data frame you are describing, your code will select all of the lines 
since fitting_set$IDbyYear[i] is always a vector of length 1.

We need a reproducible example. The best way for you to give us that would be 
to copy the result of dput(head(fitting_set, 10)). It would look something like 
this plus the 6 other columns you mention except that I've added dta <- in 
front of structure() to create a data frame:

dta <- structure(list(IDbyYear = c(42.24, 42.24, 42.24, 42.24, 42.24, 
42.24, 45.32, 45.32, 45.36, 45.4, 45.4), SiteID = structure(c(1L, 
1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L), .Label = c("A-Airport", 
"A-Bark Corral East"), class = "factor"), Year = c(2006L, 2006L, 
2006L, 2006L, 2006L, 2006L, 2008L, 2008L, 2009L, 2010L, 2010L
)), .Names = c("IDbyYear", "SiteID", "Year"), class = "data.frame", row.names = 
c(NA, 
-11L))

Now create a list of data frames, one for each IDbyYear:

dta.list <- split(dta, dta$IDbyYear)

Now a function that will select 3 rows or all of them if there are fewer:

smp <- function(dframe) {
ind <- seq_len(nrow(dframe))
dframe[sample(ind, ifelse(length(ind)>2, 3, length(ind))),]
}

Now take the samples and combine them into a single data frame:

sample <- do.call(rbind, lapply(dta.list, smp))
sample

-
David L Carlson
Department of Anthropology
Texas A&M University
College Station, TX 77840-4352


-Original Message-
From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Curtis 
Burkhalter
Sent: Tuesday, March 3, 2015 3:23 PM
To: r-help@r-project.org
Subject: [R] sampling dataframe based upon number of record occurrences

Hello everyone,

I'm having trouble performing a task that is probably very simple, but
can't seem to figure out how to get my code to work. What I want to do is
use the sample function to pick records within in a dataframe, but only if
a column attribute value is repeated more than 3 times. So if you look at
the data below I have created a unique attribute value that corresponds to
every site by year combination (i.e. IDxYear). So you can see that for the
site called "A-Airport" it was sampled 6 times in 2006, "A-Bank Corral
East" was sampled twice in 2008. So what I want to do is randomly select 3
records for "A-Airport" in 2006 for the existing 6 records, but for "A-Bark
Corral East" in 2008 I just want to leave these records as they currently
are.

I've used the following code to try and  accomplish this, but like I said I
can't get it to work so I'm clearly doing something wrong. If you could
check out the code and provide any suggestions that would be great. It
should be noted that there are 5589 unique IDxYear combinations so that's
why that number is in the code. If any further clarification is needed also
let me know.

boom=data.frame()
for (i in 1:5589){

boom[i,]=ifelse(length(fitting_set$IDbyYear[i]>3),fitting_set[sample(nrow(fitting_set),3),],fitting_set)

}
boom


  *IDbyYear*   *SiteID *  *Year*
 *6 other column attributes*
  42.24   A-Airport 2006
 42.24   A-Airport 2006
  42.24   A-Airport 2006
 42.24   A-Airport 2006
  42.24   A-Airport 2006
 42.24   A-Airport 2006
 45.32  A-Bark Corral East2008
 45.32  A-Bark Corral East2008
 45.36  A-Bark Corral East2009
 45.40  A-Bark Corral East2010
 45.40   A-Bark Corral East   2010

 Thanks


-- 
Curtis Burkhalter

https://sites.google.com/site/curtisburkhalter/

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] 2D Timeseries trace plot

2015-03-03 Thread David L Carlson

You can do this with the animation package. Install the package and then

# Load the package
library(animation)
# This representation makes your data more portable using the dput() function:
pen <- structure(list(x = c(1073L, 1072L, 1066L, 1052L, 1030L, 1009L, 
994L), y = c(1058L, 1085L, 1117L, 1152L, 1196L, 1242L, 1286L), 
time = c(769.05, 769.07, 769.08, 769.1, 769.12, 769.13, 769.14
)), .Names = c("x", "y", "time"), class = "data.frame", row.names = c("1", 
"2", "3", "4", "5", "6", "7"))
# Compute the time between each step
diftime <- diff(pen$time)
# Draw a blank plot window using the ranges for x and y
with(pen, plot(NA, xlim=c(min(x), max(x)), ylim=c(min(y), max(y)),
xlab="", ylab="", axes=FALSE))
# Pause for a second
ani.pause(1)
# Draw the curve pausing between points.
for(i in 1:6) {
ani.pause(diftime[i]*10) # Multiply by ten to slow things down
segments(pen$x[i], pen$y[i], pen$x[i+1], pen$y[i+1])
}

-
David L Carlson
Department of Anthropology
Texas A&M University
College Station, TX 77840-4352



-Original Message-
From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of AjayT
Sent: Tuesday, March 3, 2015 8:59 AM
To: r-help@r-project.org
Subject: [R] 2D Timeseries trace plot

Hi,

I've got a 2D timeseries of handwriting samples, 

  xy   time
1  1073 1058 769.05
2  1072 1085 769.07
3  1066 1117 769.08
4  1052 1152 769.10
5  1030 1196 769.12
6  1009 1242 769.13
7   994 1286 769.14

upto 500

I was just wondering how to plot this as an animation, so that the points
join up as they are rendered in time. Basically showing how the person who
generated the data writes. 

The time index is not regular and if possible I'd like to avoid padding the
data with duplicate entries if this is avoidable. For example adding a
duplicate of the first row, for a 'padded' time 769.06.

Thanks alot for your help :)  



--
View this message in context: 
http://r.789695.n4.nabble.com/2D-Timeseries-trace-plot-tp4704127.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Summing certain values within columns that satisfy a certain condition

2015-02-27 Thread David L Carlson

Here is another approach

> maxv <- apply(df, 2, max) # Get the column maximums
> maxv0 <- ifelse(maxv == 0, -1, maxv) # Replace 0 maximums with -1
> Sum <-  rowSums(sweep(df, 2, maxv0, "=="))
> data.frame(df, Sum)
   A B C D Sum
1  0 1 0 7   1
2  0 2 0 7   1
3  0 3 0 7   1
4  0 4 0 7   1
5  0 1 0 0   0
6  0 0 0 0   0
7  0 0 0 0   0
8  0 0 0 0   0
9  0 0 1 5   0
10 0 5 1 5   0
11 0 4 1 5   0
12 0 8 4 7   3
13 0 0 3 0   0
14 0 0 3 4   0
15 0 0 3 4   0
16 0 0 0 5   0
17 0 2 0 6   0
18 0 0 4 0   1
19 0 0 4 0   1
20 0 0 4 0   1


-------------
David L Carlson
Department of Anthropology
Texas A&M University
College Station, TX 77840-4352



-Original Message-
From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Don McKenzie
Sent: Thursday, February 26, 2015 3:12 PM
To: Kate Ignatius
Cc: r-help
Subject: Re: [R] Summing certain values within columns that satisfy a certain 
condition

Kate — here is a transparent solution (tested but without NA treatment). 
Doubtless there are cleverer faster ones, which later posters will present.

HTH

# example with four columns and 20 rows
nrows <- 20

A <- sample(c(1:100), nrows, replace=T)
B <- sample(c(1:100), nrows, replace=T)
C <- sample(c(1:100), nrows, replace=T)
D <- sample(c(1:100), nrows, replace=T)

locs <- 
c(c(1:nrows)[A==max(A)],c(1:nrows)[B==max(B)],c(1:nrows)[C==max(C)],c(1:nrows)[D==max(D)])

mat1 <- matrix(rep(0,4*nrows),nrows,4)
for (i in 1:4)
mat1[,i][locs[i]] <- 1
SUM <- rowSums(mat1)


> On Feb 26, 2015, at 12:23 PM, Kate Ignatius  wrote:
> 
> Hi,
> 
> Supposed I had a data frame like so:
> 
> A B C D
> 0 1 0 7
> 0 2 0 7
> 0 3 0 7
> 0 4 0 7
> 0 1 0 0
> 0 0 0 0
> 0 0 0 0
> 0 0 0 0
> 0 0 1 5
> 0 5 1 5
> 0 4 1 5
> 0 8 4 7
> 0 0 3 0
> 0 0 3 4
> 0 0 3 4
> 0 0 0 5
> 0 2 0 6
> 0 0 4 0
> 0 0 4 0
> 0 0 4 0
> 
> For each row, I want to count how many max column values appear to
> adventurely get the following outcome, while ignoring zeros and N/As:
> 
> A B C D Sum
> 0 1 0 7 1
> 0 2 0 7 1
> 0 3 0 7 1
> 0 4 0 7 1
> 0 1 0 0 0
> 0 0 0 0 0
> 0 0 0 0 0
> 0 0 0 0 0
> 0 0 1 5 0
> 0 5 1 5 0
> 0 4 1 5 0
> 0 8 4 7 3
> 0 0 3 0 0
> 0 0 3 4 0
> 0 0 3 4 0
> 0 0 0 5 0
> 0 2 0 6 0
> 0 0 4 0 1
> 0 0 4 0 1
> 0 0 4 0 1
> 
> I've used the following code but it doesn't seem to work (my sum
> column column is all 1s):
> 
> (apply(df,1, function(x)  (sum(x %in% c(pmax(x))
> 
> Is this code too simple?
> 
> Thanks!
> 
> K.
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] One column listing on wide monitors too

2015-02-24 Thread David L Carlson

Here are several ways:

> a <- paste("String", 1:16)
> a
 [1] "String 1"  "String 2"  "String 3"  "String 4"  "String 5"  "String 6" 
 [7] "String 7"  "String 8"  "String 9"  "String 10" "String 11" "String 12"
[13] "String 13" "String 14" "String 15" "String 16"
> matrix(a, length(a))
  [,1]   
 [1,] "String 1" 
 [2,] "String 2" 
. . .
[15,] "String 15"
[16,] "String 16"
> t(t(a))
  [,1]   
 [1,] "String 1" 
 [2,] "String 2" 
. . . 
[15,] "String 15"
[16,] "String 16"
> b <- a
> dim(b) <- c(16, 1)
> b
  [,1]   
 [1,] "String 1" 
 [2,] "String 2" 
. . .
[15,] "String 15"
[16,] "String 16"
> cat(a, sep="\n") # But no numbering
String 1
String 2
. . .
String 15
String 16

-
David L Carlson
Department of Anthropology
Texas A&M University
College Station, TX 77840-4352

-Original Message-
From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of marekl
Sent: Tuesday, February 24, 2015 2:09 PM
To: r-help@r-project.org
Subject: [R] One column listing on wide monitors too

Hi,

it is probably very basic question, but I can't get answer still.

R shows listings in more columns on wider monitors. Like on this picture:
http://i.imgur.com/GLF70r9.png

Is there a way to set R to show listings like this, in one column only?

[1] "String 1"
[2] "String 2"
[3] "String 3"
...
[16] "String 16"

Thank you



--
View this message in context: 
http://r.789695.n4.nabble.com/One-column-listing-on-wide-monitors-too-tp4703781.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to Deploy a 'poLCA' Model?

2015-02-24 Thread David L Carlson

Looking at package poLCA I see functions poLCA.predcell() and poLCA.table(). If 
these do not do what you want, you will need to be clearer and provide a 
reproducible example.

-
David L Carlson
Department of Anthropology
Texas A&M University
College Station, TX 77840-4352


-Original Message-
From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of sagnik 
chakravarty
Sent: Monday, February 23, 2015 6:06 AM
To: d...@votamatic.org
Cc: r-help
Subject: [R] How to Deploy a 'poLCA' Model?

Hi Drew,

I was working with 'poLCA' to fit latent-class model with covariates
[formula: f=cbind(y1,y2,y3) ~ x1*x2*x3*x4]. The output contains a fit table
with coefficients, t-value, std_error and P-value for different
combinations of the covariates.

Now if I want to deploy this model to a new dataset like we do for any
other model with 'predict' function, how to proceed?

I couldn't find any predict function described in the package
documentation. Kindly help.

Thanks,

-- 
Regards,

SAGNIK CHAKRAVARTY

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Extracting Factor Pattern Matrix Similar to Proc Factor

2015-02-23 Thread David L Carlson

The pattern matrix is easy to compute from the results of princomp(). First we 
need a reproducible example so we'll use the iris data set (use ?iris for 
details) that comes with R.

> data(iris)
> iris.pc <- princomp(iris[,-5], cor=TRUE)
> print(iris.pc$loadings, cutoff=0)

Loadings:
 Comp.1 Comp.2 Comp.3 Comp.4
Sepal.Length  0.521 -0.377  0.720  0.261
Sepal.Width  -0.269 -0.923 -0.244 -0.124
Petal.Length  0.580 -0.024 -0.142 -0.801
Petal.Width   0.565 -0.067 -0.634  0.524

   Comp.1 Comp.2 Comp.3 Comp.4
SS loadings  1.00   1.00   1.00   1.00
Proportion Var   0.25   0.25   0.25   0.25
Cumulative Var   0.25   0.50   0.75   1.00

The object iris.pc is a list with 7 elements. One of those, iris.pc$loadings 
contains the standardized loadings so that the sum of the squared values in 
each column is 1. The default print method suppresses the printing of small 
loadings (< .1) so I've set cutoff=0 so we see them all. 

To get the pattern matrix we just need to multiple each of the columns by 
iris.pc$sdev (the square roots of the eigenvalues):

> iris.pat <- sweep(iris.pc$loadings, 2, iris.pc$sdev, "*")
> print(iris.pat, cutoff=0)

Loadings:
 Comp.1 Comp.2 Comp.3 Comp.4
Sepal.Length  0.890 -0.361  0.276  0.038
Sepal.Width  -0.460 -0.883 -0.094 -0.018
Petal.Length  0.992 -0.023 -0.054 -0.115
Petal.Width   0.965 -0.064 -0.243  0.075

   Comp.1 Comp.2 Comp.3 Comp.4
SS loadings 2.918  0.914  0.147  0.021
Proportion Var  0.730  0.229  0.037  0.005
Cumulative Var  0.730  0.958  0.995  1.000
> iris.pc$sdev^2
Comp.1 Comp.2 Comp.3 Comp.4 
2.91849782 0.91403047 0.14675688 0.02071484

The sweep() function multiplies each column by its standard deviation. Now the 
sums of the squared values in each column sum to the eigenvalue. 

Alternatively, you can install the "psych" package which computes the pattern 
(structure) matrix directly:

> library(psych)
> iris.pca <- principal(iris[,-5], nfactors=4, rotate="none")
> print(iris.pca$Structure, cutoff=0)

-
David L Carlson
Department of Anthropology
Texas A&M University
College Station, TX 77840-4352

-Original Message-
From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Scott Colwell
Sent: Monday, February 23, 2015 12:15 PM
To: r-help@r-project.org
Subject: [R] Extracting Factor Pattern Matrix Similar to Proc Factor

Hello,

I am fairly new to R and coming from SAS IML. I am rewriting one of my MC
simulations in R and am stuck on extracting a factor pattern matrix as would
be done in IML using Proc Factor.  

I have found the princomp() command and read through the manual but can't
seem to figure out how to save the factor pattern matrix.  I am waiting for
the R for SAS Users book to arrive. What I would use in SAS IML to get at
what I am looking for is:

PROC FACTOR Data=MODELCOV15(TYPE=COV) NOBS=1 N=16 CORR
OUTSTAT=FAC.FACOUT15;
RUN;

DATA FAC.PATTERN15; SET FAC.FACOUT15;
IF _TYPE_='PATTERN';
DROP _TYPE_ _NAME_;
RUN;

Would any SAS IML to R converts be able to help me with this?

Thanks,

Scott Colwell, PhD




--
View this message in context: 
http://r.789695.n4.nabble.com/Extracting-Factor-Pattern-Matrix-Similar-to-Proc-Factor-tp4703704.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Extracting Factor Pattern Matrix Similar to Proc Factor

2015-02-23 Thread David L Carlson

Function principal() in psych takes a correlation matrix so use cov2cor() to 
convert:

library(psych)
iris.pca <- principal(cov2cor(cov(iris[,-5])), nfactors=4, rotate="none")
print(iris.pca$Structure, cutoff=0)

David
-Original Message-
From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Scott Colwell
Sent: Monday, February 23, 2015 3:34 PM
To: r-help@r-project.org
Subject: Re: [R] Extracting Factor Pattern Matrix Similar to Proc Factor

Thanks David. What do you do when the input is a covariance matrix rather
than a dataset?



--
View this message in context: 
http://r.789695.n4.nabble.com/Extracting-Factor-Pattern-Matrix-Similar-to-Proc-Factor-tp4703704p4703719.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Replacing 9999 and 999 values with NA

2015-02-23 Thread David L Carlson

Just for the record, you do not need cbind():

wind <- data.frame(windSpeed,windDirec)

Using cbind() does not create a problem as long as the columns are all numeric, 
but if your data frame contains a mixture of numeric, factor, and character 
columns, cbind() will mess things up.

-----
David L Carlson
Department of Anthropology
Texas A&M University
College Station, TX 77840-4352


-Original Message-
From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Alexandra Catena
Sent: Monday, February 23, 2015 11:50 AM
To: Frederic Ntirenganya
Cc: r-help@r-project.org
Subject: Re: [R] Replacing  and 999 values with NA

The command,  data[data ==] <- NA, worked! Thank you!

But just in case you wanted to know, I'm downloading the data and
unzipping it through readLines.  I then concatenate two columns ( wind
speed and direction) from the unzipped data through cbind but I make
it into a data frame.

wind = data.frame(cbind(windSpeed,windDirec))


Thanks,
Alexandra

On Sat, Feb 21, 2015 at 10:38 PM, Frederic Ntirenganya
 wrote:
> If you are reading the data frame using for instance read.csv, you can put
> in the argument na.string ="".
> Another way to do that is data[data ==] <- NA.
>
> It should be good to tell us how you are reading your dataset.
>
> On Feb 21, 2015 6:49 AM, "Jeff Newmiller"  wrote:
>>
>> You did not say how you imported the data, but if you used one of the
>> read.table variants (including read.csv) then you can use the na.strings
>> argument as documented in the help file for read.table.
>>
>> Next time please read the posting guide, as there are some useful tips in
>> there, such as posting using plain text (a setting in your email program) so
>> we don't get garbled info from you, and providing a reproducible example.
>>
>> ---
>> Jeff NewmillerThe .   .  Go
>> Live...
>> DCN:Basics: ##.#.   ##.#.  Live
>> Go...
>>   Live:   OO#.. Dead: OO#..  Playing
>> Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
>> /Software/Embedded Controllers)   .OO#.   .OO#.
>> rocks...1k
>>
>> ---
>> Sent from my phone. Please excuse my brevity.
>>
>> On February 20, 2015 10:55:30 AM PST, Alexandra Catena 
>> wrote:
>> >Hello All,
>> >
>> >I have a data frame of two columns for wind.  The first column is for
>> >wind
>> >speed and the second wind direction.  I'm trying to replace the 
>> >values
>> >in the first column and the 999 values in the second column with NA.  I
>> >tried to use the function ltdl.fix.df but it doesn't seem to do
>> >anything.
>> >
>> >> ltdl.fix.df(windMV, zero2na = FALSE, coded = 999)
>> >
>> >  n = 9432 by p = 4 matrix checked, 0 NA(s) present
>> >
>> >  0 factor variable(s) present
>> >
>> >  5675 value(s) coded 999 set to NA
>> >
>> >  0 -ve value(s) set to +ve half the negative value
>> >
>> >
>> >I have R version 3.1.1
>> >
>> >Thanks,
>> >Alexandra
>> >
>> >   [[alternative HTML version deleted]]
>> >
>> >__
>> >R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> >https://stat.ethz.ch/mailman/listinfo/r-help
>> >PLEASE do read the posting guide
>> >http://www.R-project.org/posting-guide.html
>> >and provide commented, minimal, self-contained, reproducible code.
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Correlation question

2015-02-22 Thread David L Carlson

As Kehl pointed out, any linear function of the independent variable (speed) 
will have the same squared correlation with the dependent variable (dist), but 
only one linear function minimizes the squared deviations between the fitted 
values and the original values. The equation you are using is only applicable 
to that function, not to any of the others. In fact, some linear functions will 
produce negative values:

> fitted.new <- 6*cars$speed
> cor(cbind(fitted.new, fitted.right, fitted.wrong, cars$dist))
 fitted.new fitted.right fitted.wrong  
fitted.new1.0001.0001.000 0.8068949
fitted.right  1.0001.0001.000 0.8068949
fitted.wrong  1.0001.0001.000 0.8068949
  0.80689490.80689490.8068949 1.000
> 1-sum((cars$dist-fitted.new)^2)/sum((cars$dist-mean(cars$dist))^2)
[1] -3.281849

David L. Carlson
Department of Anthropology
Texas A&M University

-Original Message-
From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Jonathan Thayn
Sent: Sunday, February 22, 2015 12:01 AM
To: Kehl Dániel
Cc: r-help@r-project.org
Subject: Re: [R] Correlation question

Of course! Thank you, I knew I was missing something painfully obvious. Its 
seems, then, that this line

1-sum((cars$dist-fitted.wrong)^2)/sum((cars$dist-mean(cars$dist))^2)

is finding something other than the traditional correlation. I found this in a 
lecture introducing correlation, but , now, I'm not sure what it is. It does do 
a better job of showing that the fitted.wrong variable is not a good prediction 
of the distance. 

On Feb 21, 2015, at 4:36 PM, Kehl Dániel wrote:

> Hi,
> 
> try
> 
> cor(fitted.right,fitted.wrong)
> 
> should give 1 as both are a linear function of speed! Hence 
> cor(cars$dist,fitted.right)^2 and cor(x=cars$dist,y=fitted.wrong)^2 must be 
> the same.
> 
> HTH
> d
> 
> Feladó: R-help [r-help-boun...@r-project.org] ; meghatalmazó: Jonathan 
> Thayn [jth...@ilstu.edu]
> Küldve: 2015. február 21. 22:42
> To: r-help@r-project.org
> Tárgy: [R] Correlation question
> 
> I recently compared two different approaches to calculating the correlation 
> of two variables, and I cannot explain the different results:
> 
> data(cars)
> model <- lm(dist~speed,data=cars)
> coef(model)
> fitted.right <- model$fitted
> fitted.wrong <- -17+5*cars$speed
> 
> 
> When using the OLS fitted values, the lines below all return the same R2 
> value:
> 
> 1-sum((cars$dist-fitted.right)^2)/sum((cars$dist-mean(cars$dist))^2)
> cor(cars$dist,fitted.right)^2
> (sum((cars$dist-mean(cars$dist))*(fitted.right-mean(fitted.right)))/(49*sd(cars$dist)*sd(fitted.right)))^2
> 
> 
> However, when I use my estimated parameters to find the fitted values, 
> "fitted.wrong", the first equation returns a much lower R2 value, which I 
> would expect since the fit is worse, but the other lines return the same R2 
> that I get when using the OLS fitted values.
> 
> 1-sum((cars$dist-fitted.wrong)^2)/sum((cars$dist-mean(cars$dist))^2)
> cor(x=cars$dist,y=fitted.wrong)^2
> (sum((cars$dist-mean(cars$dist))*(fitted.wrong-mean(fitted.wrong)))/(49*sd(cars$dist)*sd(fitted.wrong)))^2
> 
> 
> I'm sure I'm missing something simple, but can someone explain the difference 
> between these two methods of finding R2? Thanks.
> 
> Jon
>[[alternative HTML version deleted]]
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Chi-square test

2015-02-20 Thread David L Carlson

And probably why chisq.test has the rescale.p= argument. Your second problem 
with small expected values can be handled with simulate.p.value=.

> chisq.test(f, p=p11)
Error in chisq.test(f, p = p11) : probabilities must sum to 1.
> 1-sum(p11)
[1] 4.3036e-08
> chisq.test(f, p=p11, rescale.p=TRUE)

Chi-squared test for given probabilities

data:  f
X-squared = 7.6268, df = 14, p-value = 0.9078

Warning message:
In chisq.test(f, p = p11, rescale.p = TRUE) :
  Chi-squared approximation may be incorrect
> chisq.test(f, p=p11, rescale.p=TRUE, simulate.p.value=TRUE)

Chi-squared test for given probabilities with simulated p-value (based
on 2000 replicates)

data:  f
X-squared = 7.6268, df = NA, p-value = 0.7996

---------
David L Carlson
Department of Anthropology
Texas A&M University
College Station, TX 77840-4352

-Original Message-
From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Berend Hasselman
Sent: Friday, February 20, 2015 12:13 PM
To: pari hesabi
Cc: r-help@r-project.org
Subject: Re: [R] Chi-square test


> On 20-02-2015, at 19:05, pari hesabi  wrote:
> 
> Hello,
> If the vector of observed frequencies is:  
> f<-c(0,0,0,2,3,6,17,15,21,21,14,10,5,1,5)
> and the vector of probability :p11<-c(7.577864e-06, 1.999541e-04  
> ,1.833510e-03,  9.059845e-03, 2.886977e-02, 6.546229e-02 ,1.124083e-01, 
> 1.525880e-01, 1.689712e-01, 1.563522e-01,   1.232031e-01, 8.395000e-02, 
> 5.009534e-02, 2.645857e-02,0.0205403)
> The sum of the probabilities is equal to one. But when I want to do the the 
> Chi-square test, I get this error: probabilities must sum to one.

print  sum(p11)-1

> Does anybody know the reason?

R FAQ 7.31  (http://cran.r-project.org/doc/FAQ/R-FAQ.html)

Berend

> Best Regards,
> pari
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] subsamples and regressions for 100 times

2015-02-17 Thread David L Carlson

Expanding a bit on Michael's answer, you don't need the sampling package for 
this, just the sample.int() function to draw a random set of integers that you 
will use to extract rows from each of your groups. The write a function that 
returns what you want, the regression slopes from each group and use that 
function with the replicate() function. Your problem is a good way to 
illustrate the lapply(), sapply(), replicate() family of functions in R:

# Split the data into a list of data frames
datlist <- split(dat, dat$L_group)
# Write a function to draw the sample and perform the regression on each group
slopes <- function(lst) {
# Get the minimum sample size
minsize <- min(sapply(lst, nrow))
# Draw sample (row numbers) of size minsize from each group
samlist <- lapply(sapply(lst, nrow), sample.int, size=minsize)
# Extract sample from each group
samples <- lapply(names(lst), function(x) lst[[x]][samlist[[x]],])
# Run the regressions for each group and extract the slopes
results <- sapply(samples, function(x) coef(lm(co2~temp, x))[2])
# Use the group names to label the slopes
names(results) <- names(datlist)
return(results)
}
# You can get a single set of results with
(results <- slopes(datlist))
# A B C 
# 1.0128392 0.2658041 1.3423786

# To get 100 runs
many <- t(replicate(100, slopes(datlist)))
head(many)
#  A BC
# [1,] 1.4326103 0.2658041 1.357475
# [2,] 1.4754324 0.2658041 1.309208
# [3,] 0.9838589 0.2658041 1.408987
# [4,] 0.9993144 0.2658041 1.354297
# [5,] 1.0134187 0.2658041 1.397112
# [6,] 1.4922856 0.2658041 1.312531
>

---------
David L Carlson
Department of Anthropology
Texas A&M University
College Station, TX 77840-4352

-Original Message-
From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Michael Dewey
Sent: Tuesday, February 17, 2015 9:52 AM
To: Angela Smith; r-help@r-project.org
Subject: Re: [R] subsamples and regressions for 100 times

Comment inline

On 17/02/2015 12:40, Angela Smith wrote:
>
>
> Hi R user,
> I'm new to R so
> my problem is probably pretty simple but I'm stuck:
>
>
>
> my data is consist of 2 variables: co2, temp and one
> treatment (l_group). The sample size is different among the treatments. so
>
> that, I wanted to make equal sample size among three groups (A,B and C) of the
> treatment.
>

Not sure whether that is necessary for regression but you did not tell 
us why you want to do that.

> For this one, I used subsamples technique. Using
> subsample, each time the data are different among the three groups of the
> treatment.
>
> so that I want to run regression (co2~temp) for a 100
> subsamples for each group of treatment (100 times subsample).
>

The usual way to do this is to store the subsamples in a list and then 
write a function and use lapply, say to store your models. You then have 
another list to which you can then apply the extractor function of your 
choice.


> it means that I will have 100 regression equations.  Later, I want to compare 
> the slope of the
> regression among the three groups. is there simple way to make a loop so that 
> I
> can compare it?
>
> Thanks in advance!
>
>
>
> Angela
>
> 
> Here is the example:
>
> dat<-structure(list(co2 = c(0.15, 0.148, 0.125, 0.145, 0.138, 0.23,
> 0.26, 0.35, 0.41, 0.45, 0.39, 0.42, 0.4, 0.43, 0.26, 0.3, 0.34,
> 0.141, 0.145, 0.153, 0.151, 0.128, 0.23, 0.26), temp = c(0.0119,
> 0.0122, 0.0089, 0.0115, 0.0101, 0.055, 0.097, 0.22, 0.339, 0.397,
> 0.257, 0.434, 0.318, 0.395, 0.087, 0.13, 0.154, 0.0107, 0.0112,
> 0.0119, 0.012, 0.0092, 0.055, 0.089), L_group = structure(c(1L,
> 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L,
> 3L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("A", "B", "C"), class = "factor")), 
> .Names = c("co2",
> "temp", "L_group"), class = "data.frame", row.names = c(NA, -24L
> ))
>
> head(dat)
> library(sampling)
>
> # strata.sampling -
> strata.sampling <- function(data, group,size, method = NULL) {
>   require(sampling)
>if (is.null(method)) method <- "srswor"
>temp <- data[order(data[[group]]), ]
>ifelse(length(size)> 1,
>   size <- size,
>   ifelse(size < 1,
>  size <- round(table(temp[group]) * size),
>  size <- rep(size, times=length(table(temp[group])
>strat = strata(temp, stratanames = names(temp[group]),
>   size = size, method = method)
>getdata(temp, strat)
> }
>
> #--

Re: [R] Picking Best Discriminant Function Variables

2015-02-16 Thread David L Carlson

Look at the function stepclass() in package klaR.

-
David L Carlson
Department of Anthropology
Texas A&M University
College Station, TX 77840-4352

-Original Message-
From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of David Moskowitz
Sent: Sunday, February 15, 2015 11:34 AM
To: n omranian via R-help
Subject: [R] Picking Best Discriminant Function Variables

Is there a way to have the LDA function give me the best 3 (or 4)  predictor 
variables.  When I put in all the variables, LDA uses all the variables, but I 
would like to know what would be the 3 (or 4) best to use out all the available 
variables and the coefficients for those.




Here is the code I am using for Linear Discriminant Function

library("MASS") 



results <- lda(data$V1 ~ data$V2 + data$V3 + data$V4 + data$V5 + data$V6 + 
data$V7 + data$V8 + data$V9 + data$V10 + data$V11 + data$V12 + data$V13 + 
data$V14)



Output:

Coefficients of linear discriminants:
LD1   LD2
data$V2 -0.4033997810.8717930699
data$V3 0.165254596 0.3053797325
data$V4 -0.3690752562.3458497486
data$V5 0.154797889 -0.1463807654
data$V6 -0.002163496-0.0004627565
data$V7 0.618052068 -0.0322128171
data$V8 -1.661191235   -0.4919980543
data$V9 -1.495818440   -1.6309537953
data$V10 0.134092628   -0.3070875776
data$V11 0.3550557100.2532306865
data$V12 -0.818036073-1.5156344987
data$V13 -1.1575593760.0511839665
data$V14 -0.0026912060.0028529846




So in the above example, I would like the LDA to return to me the 3 best 
predictors out of the 13 available.


Thank you
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Loop over regression results

2015-02-16 Thread David L Carlson

Or for the slopes and t-values:

> do.call(rbind, lapply(mod, function(x) summary(x)[["coefficients"]][2,]))
Estimate Std. Error  t value Pr(>|t|)
setosa 0.8371922  0.5049134 1.658091 1.038211e-01
versicolor 1.0536478  0.1712595 6.152348 1.41e-07
virginica  0.6314052  0.1428938 4.418702 5.647610e-05

David C

-Original Message-
From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of David L Carlson
Sent: Monday, February 16, 2015 8:52 AM
To: Ronald Kölpin; r-help@r-project.org
Subject: Re: [R] Loop over regression results

In R you would want to combine the results into a list. This could be done when 
you create the regressions or afterwards. To repeat your example using a list:

data(iris)
taxon <- levels(iris$Species)
mod <- lapply(taxon, function (x) lm(Sepal.Width ~ Petal.Width, 
data=iris, subset=Species==x))
names(mod) <- taxon
lapply(mod, summary)
coeffs <- do.call(rbind, lapply(mod, coef, "[1"))
coeffs
# (Intercept) Petal.Width
# setosa3.222051   0.8371922
# versicolor1.372863   1.0536478
# virginica 1.694773   0.6314052

---------
David L Carlson
Department of Anthropology
Texas A&M University
College Station, TX 77840-4352




-Original Message-
From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Ronald Kölpin
Sent: Monday, February 16, 2015 7:37 AM
To: r-help@r-project.org
Subject: [R] Loop over regression results

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Dear all,

I have a problem when trying to present the results of several
regression. Say I have run several regressions on a dataset and saved
the different results (as in the mini example below). I then want to
loop over the regression results in order so save certain values to a
matrix (in order to put them into a paper or presentation).

Aside from the question of how to access certain information stored by
lm() (or printed by summary()) I can't seem to so loop over lm()
objects -- no matter whether they are stored in a vector or a list.
They are always evaluated immediately when called. I tried quote() or
substitute() but that didn't work either as "Objects of type 'symbol'
cannot be indexed."

In Stata I would simply do something like

forvalues k = 1/3 {
 quietly estimates restore mod`k'
// [...]
}

and I am looking for the R equivalent of that syntax.

Kind regard and thanks

RK


attach(iris)
mod1 <- lm(Sepal.Width ~ Petal.Width, data=iris, subset=Species=="setosa")
mod2 <- lm(Sepal.Width ~ Petal.Width, data=iris,
subset=Species=="versicolor")
mod3 <- lm(Sepal.Width ~ Petal.Width, data=iris,
subset=Species=="virginica")

summary(mod1); summary(mod2); summary(mod3)

mat <- matrix(data=NA, nrow=3, ncol=5,
  dimnames=list(1:3, c("Model", "Intercept", "p(T > |T|)",
"Slope", "R^2")))

mods <- c(mod1, mod2, mod3)

for(k in 1:3)
{
mod <- mods[k]
mat[2,k] <- as.numeric(coef(mod))[1]
mat[3,k] <- as.numeric(coef(mod))[1]
}
-BEGIN PGP SIGNATURE-
Version: GnuPG v1

iQEcBAEBAgAGBQJU4fJnAAoJEKdHe5EUSrVeafwIALerOj+rsZTnbSKOUX6vYpr4
Uqsx0X2g+IgJw0KLdyqnlDmOut4wW6sWExtVgiugo/bkN8g5rDotGAl06d0UYRQV
17aLQqQjI6EGXKV9swwlm2DBphtXCIYUCXnDWUoG4Y2wC/4hDnaLbZ9yJFF1GSjn
+aN/PFf1mPPZLvF1NgMmzLdszP76VYzEgcOcEUfbmB7RU/2WEBLeBYJ8+FD1utPJ
cnh03rSc/0dgvphP8FO47Nj7mbqqhKL76a9oQqJSJiZJoCFCGiDIIgzq7vwGWc4T
9apwC/R3ahciB18yYOSMq7ZkVdQ+OpsqDTodnnIIUZjrVIcn9AI+GE0eq1VdLSE=
=x+gM
-END PGP SIGNATURE-

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Loop over regression results

2015-02-16 Thread David L Carlson

In R you would want to combine the results into a list. This could be done when 
you create the regressions or afterwards. To repeat your example using a list:

data(iris)
taxon <- levels(iris$Species)
mod <- lapply(taxon, function (x) lm(Sepal.Width ~ Petal.Width, 
data=iris, subset=Species==x))
names(mod) <- taxon
lapply(mod, summary)
coeffs <- do.call(rbind, lapply(mod, coef, "[1"))
coeffs
# (Intercept) Petal.Width
# setosa3.222051   0.8371922
# versicolor1.372863   1.0536478
# virginica 1.694773   0.6314052

---------
David L Carlson
Department of Anthropology
Texas A&M University
College Station, TX 77840-4352




-Original Message-
From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Ronald Kölpin
Sent: Monday, February 16, 2015 7:37 AM
To: r-help@r-project.org
Subject: [R] Loop over regression results

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Dear all,

I have a problem when trying to present the results of several
regression. Say I have run several regressions on a dataset and saved
the different results (as in the mini example below). I then want to
loop over the regression results in order so save certain values to a
matrix (in order to put them into a paper or presentation).

Aside from the question of how to access certain information stored by
lm() (or printed by summary()) I can't seem to so loop over lm()
objects -- no matter whether they are stored in a vector or a list.
They are always evaluated immediately when called. I tried quote() or
substitute() but that didn't work either as "Objects of type 'symbol'
cannot be indexed."

In Stata I would simply do something like

forvalues k = 1/3 {
 quietly estimates restore mod`k'
// [...]
}

and I am looking for the R equivalent of that syntax.

Kind regard and thanks

RK


attach(iris)
mod1 <- lm(Sepal.Width ~ Petal.Width, data=iris, subset=Species=="setosa")
mod2 <- lm(Sepal.Width ~ Petal.Width, data=iris,
subset=Species=="versicolor")
mod3 <- lm(Sepal.Width ~ Petal.Width, data=iris,
subset=Species=="virginica")

summary(mod1); summary(mod2); summary(mod3)

mat <- matrix(data=NA, nrow=3, ncol=5,
  dimnames=list(1:3, c("Model", "Intercept", "p(T > |T|)",
"Slope", "R^2")))

mods <- c(mod1, mod2, mod3)

for(k in 1:3)
{
mod <- mods[k]
mat[2,k] <- as.numeric(coef(mod))[1]
mat[3,k] <- as.numeric(coef(mod))[1]
}
-BEGIN PGP SIGNATURE-
Version: GnuPG v1

iQEcBAEBAgAGBQJU4fJnAAoJEKdHe5EUSrVeafwIALerOj+rsZTnbSKOUX6vYpr4
Uqsx0X2g+IgJw0KLdyqnlDmOut4wW6sWExtVgiugo/bkN8g5rDotGAl06d0UYRQV
17aLQqQjI6EGXKV9swwlm2DBphtXCIYUCXnDWUoG4Y2wC/4hDnaLbZ9yJFF1GSjn
+aN/PFf1mPPZLvF1NgMmzLdszP76VYzEgcOcEUfbmB7RU/2WEBLeBYJ8+FD1utPJ
cnh03rSc/0dgvphP8FO47Nj7mbqqhKL76a9oQqJSJiZJoCFCGiDIIgzq7vwGWc4T
9apwC/R3ahciB18yYOSMq7ZkVdQ+OpsqDTodnnIIUZjrVIcn9AI+GE0eq1VdLSE=
=x+gM
-END PGP SIGNATURE-

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Coordinate or top left corner + offset

2015-02-10 Thread David L Carlson

Thanks, I didn't know about corner.label. I started with legend but I couldn't 
find a way to make the box small enough. It always covered much more of the 
corner than the letter which could have obscured data points.

David

-Original Message-
From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Ben Bolker
Sent: Monday, February 9, 2015 5:43 PM
To: r-h...@stat.math.ethz.ch
Subject: Re: [R] Coordinate or top left corner + offset

David L Carlson  tamu.edu> writes:

> 
> This is more complicated, but it could be rolled up into a function.
Replace your mtext() call with the following:
> 
> # Set character expansion size
> cx <- 2.5
> # Get the plot coordinates and the character size
> ur <- par("usr")[c(1, 4)]
> chr <- par("cxy")
> rect(ur[1]+chr[1]/10, ur[2]-chr[2]*cx, ur[1]+chr[1]*cx, ur[2]-chr[1]/10, 
>  border=NA, col="white")
> text(ur[1]+chr[1]*cx/2, ur[2]-chr[2]*cx/2, "a", font=2, cex=2.5, col="red")
> 
> 1) Assign to cx the cex= value that you are using in text().
> 2) Then get the upper right corner of the plot window and the size of the
default character width in user
> coordinate units.
> 3) Draw a white rectangle the size of the character you are plotting (in
this case cex=2.5). Shrink the left
> and top edge so that the box around the plot area is not obscured.
> 4) Plot your character in the center of the box.
> 

  There are two more tricks you can use here:

  (1) cheat by using legend()

plot(0:10,0:10)
legend("topleft",legend=NA,title="hello",bty="n")

  (2) use plotrix::corner.label

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Variance is different in R vs. Excel?

2015-02-09 Thread David L Carlson

Time for a new version of Excel? I cannot duplicate your results in Excel 2013.

R:
> apply(dat, 2, var)
[1] 21290.80 24748.75

Excel 2013:
=VAR.S(A2:A21)   =VAR.S(B2:B21)
21290.8  24748.74737

-----
David L Carlson
Department of Anthropology
Texas A&M University
College Station, TX 77840-4352


-Original Message-
From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Karl Fetter
Sent: Monday, February 9, 2015 3:33 PM
To: r-help@r-project.org
Subject: [R] Variance is different in R vs. Excel?

Hello everyone, I have a simple question. when I use the var() function in
R to find a variance, it differs greatly from the variance found in excel
using the =VAR.S function. Any explanations on what those two functions are
actually doing?

Here is the data and the results:

dat<-matrix(c(402,908,553,522,627,1040,756,679,806,711,713,734,683,790,597,872,476,1026,423,476,419,591,376,640,550,601,588,499,646,693,351,730,632,707,779,838,814,771,533,818),
nrow=20, ncol=2, byrow=T)

var(dat[,1])
#21290.8

var(dat[,2])
#24748.75

#in Excel, the variance of dat[,1] = 44763.91; for dat[,2] = 52034.2

Thanks,

Karl

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Coordinate or top left corner + offset

2015-02-09 Thread David L Carlson

This is more complicated, but it could be rolled up into a function. Replace 
your mtext() call with the following:

# Set character expansion size
cx <- 2.5
# Get the plot coordinates and the character size
ur <- par("usr")[c(1, 4)]
chr <- par("cxy")
rect(ur[1]+chr[1]/10, ur[2]-chr[2]*cx, ur[1]+chr[1]*cx, ur[2]-chr[1]/10, 
 border=NA, col="white")
text(ur[1]+chr[1]*cx/2, ur[2]-chr[2]*cx/2, "a", font=2, cex=2.5, col="red")

1) Assign to cx the cex= value that you are using in text().
2) Then get the upper right corner of the plot window and the size of the 
default character width in user coordinate units.
3) Draw a white rectangle the size of the character you are plotting (in this 
case cex=2.5). Shrink the left and top edge so that the box around the plot 
area is not obscured.
4) Plot your character in the center of the box.

-
David L Carlson
Department of Anthropology
Texas A&M University
College Station, TX 77840-4352



-Original Message-
From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Pascal A. 
Niklaus
Sent: Monday, February 9, 2015 10:27 AM
To: r-help@r-project.org
Subject: [R] Coordinate or top left corner + offset

Dear all,

I am struggling to add annotations to panels of a series of plots 
arranged on a page.

Basically, I'd like to add letters enumerating the panels 
("a","b","c",...), at a fixed distance from the top left corner of the 
plot's "box".

I succeeded partly with "mtext" (see below), but the "at" option is in 
user coordinates, which makes is difficult to specify a given offset 
from the corner (e.g. 1cm from top and left).

I tried grid's "npc" but these coordinates refer to the entire plot 
instead of the current inner plotting region.

Phrased differently, I'd like to place text (and ideally also be able to 
plot, e.g. a white disc to cover background items) at position 
(top-1cm,left+1cm)

Here is a minimum working example illustrating what I try to achieve:


pdf("example.pdf",width=15,height=15)

m <- rbind( c(0.1,0.9,0.1,0.6),
 c(0.1,0.9,0.6,0.9)
  );

split.screen(m)

screen(1);
par(mar=c(0,0,0,0));
plot(rnorm(10),rnorm(10),xlim=c(-5,5),xaxt="n",yaxt="n");
mtext(quote(bold(a)),side=3,line=-2.5,at=-5,cex=2.5)

screen(2);
par(mar=c(0,0,0,0));
plot(rnorm(10),rnorm(10),xlim=c(-3,3),xaxt="n",yaxt="n");
mtext(quote(bold(a)),side=3,line=-2.5,at=-3,cex=2.5)


close.screen(all.screens=TRUE)

dev.off()


Thanks for your help

Pascal Niklaus

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Still trying to avoid loops

2015-02-04 Thread David L Carlson

How about?

> ave(dat$D, dat$S, FUN=order)
[1] 2 1 1 1 2 3
> ave(dat_2$D, dat_2$S, FUN=order)
[1] 2 2 1 1 1 3

Note, your answer for the second example is incorrect since row 2 (c, 3) and 
row 5 (c, 2) are both assigned 2.

-
David L Carlson
Department of Anthropology
Texas A&M University
College Station, TX 77840-4352

-Original Message-
From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Tom Wright
Sent: Wednesday, February 4, 2015 2:08 PM
To: Rui Barradas
Cc: r-h...@stat.math.ethz.ch
Subject: Re: [R] Still trying to avoid loops

Thanks, I was not aware of order().
I did deliberately mess up the order of S. The following example breaks
your solution
dat_2<-data.frame(S=factor(c('a','c','a','b','c','c')),
  D=c(5,3,1,3,2,4))

which should give the answer c(2,2,1,1,2,3)

Your solution does indicate that sorting the data correctly before
starting might solve the problem.

On Wed, 2015-02-04 at 19:49 +, Rui Barradas wrote:
> Hello,
> 
> Aren't the levels of your example wrong? If the levels are 
> levels=c('a','b','c'), not c('b', 'a', 'c'), then the following will do
> the job.
> 
> unname(unlist(tapply(dat$D, dat$S, order)))
> 
> 
> Hope this helps,
> 
> Rui Barradas
> 
> Em 04-02-2015 19:34, Tom Wright escreveu:
> > Given a dataframe:
> > dat<-data.frame(S=factor(c('a','b','a','c','c','c',levels=c('b','a','c')),
> > D=c(1,5,3,2,3,4))
> >
> > where S is a subject identifier and D a visit (actually a date in my
> > real dataset). I would like to generate another column giving the visit
> > number
> >
> > R=c(2,1,1,1,2,3)
> >
> > My current solution uses nested loops and is slow and ugly. I've looked
> > at by() but can't see how to keep the order of R correct.
> >
> > Thanks,
> > Tom
> >
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] naming rows/columns in 'array of matrices' | solved

2015-01-31 Thread David L Carlson

You can also add names to the dimensions:

> dimnames(P)[[1]] <- c("live","dead")
> dimnames(P)[[2]] <- c("old","young")
> names(dimnames(P)) <- c("status", "age", NULL)
> P
, , 1

  age
status old young
  live   1 2
  dead   3     4

, , 2

  age
status old young
  live   5 6
  dead   7 8

David L. Carlson
Department of Anthropology
Texas A&M University

-Original Message-
From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of peter dalgaard
Sent: Saturday, January 31, 2015 2:19 AM
To: Evan Cooch
Cc: r-help@r-project.org
Subject: Re: [R] naming rows/columns in 'array of matrices' | solved

> On 30 Jan 2015, at 20:34 , Evan Cooch  wrote:
> 
> The (obvious, after the fact) solution at the bottom. D'oh...
> 
[snip]
> Forgot I was dealing with a multi-dimensional array, not a list. So, 
> following works fine. I'm sure there are better approaches (where 'better' is 
> either 'cooler', or 'more flexible'), but for the moment...)
> 
> P <- array(0, c(2,2,2),dimnames=list(c("live","dead"),c("old","young"),NULL))
> 
> P[,,1] <- matrix(c(1,2,3,4),2,2,byrow=T);
> P[,,2] <- matrix(c(5,6,7,8),2,2,byrow=T);
> 
> print(P);
> 

Just for completeness, this also works:

> P <- array(0, c(2,2,2)) 
> P[,,1] <- matrix(c(1,2,3,4),2,2,byrow=T);
> P[,,2] <- matrix(c(5,6,7,8),2,2,byrow=T);

> dimnames(P)[[1]] <- c("live","dead")
> dimnames(P)[[2]] <- c("live","dead")

-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd@cbs.dk  Priv: pda...@gmail.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Your personal email on the R-help mail list

2015-01-29 Thread David L Carlson

Yes. I thought I was replying to a different message. Sorry.

David

-Original Message-
From: Chel Hee Lee [mailto:chl...@mail.usask.ca] 
Sent: Thursday, January 29, 2015 11:33 AM
To: David L Carlson
Subject: Your personal email on the R-help mail list

Hi David,

I am not sure if you noticed that your personal conversation is on the 
R-help mailing list.

Chel Hee Lee, PhD

Biostatistician and Manager
Clinical Research Support Unit
College of Medicine
University of Saskatchewan
Canada

On 1/29/2015 11:28 AM, David L Carlson wrote:
> That's fine, but I'm here in town if you want me to pick her up at the 
> airport.
>
> David
>
> -Original Message-
> From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Chel Hee Lee
> Sent: Thursday, January 29, 2015 9:18 AM
> To: Jeff Newmiller; Alan Yong; r-help@r-project.org
> Subject: Re: [R] Passing a Data Frame Name as a Variable in a Function
>
> I like Jeff's comments on the previous post.
>
> Regarding Alan's question, please see the following example.
>
>   > df.1 <- data.frame(v1=1:5, v2=letters[1:5])
>   > df.2 <- data.frame(v1=LETTERS[1:3], v2=11:13)
>   > DFName <- ls(pattern = glob2rx("df.*"))[1]
>   > DFName
> [1] "df.1"
>   > length(DFName[,1])
> Error in DFName[, 1] : incorrect number of dimensions
>
> 'DFName' is a character vector of length 1 (it is neither a matrix nor a
> data frame).  In this case, you may try 'eval()' as below:
>
>   > eval(parse(text=DFName))
> v1 v2
> 1  1  a
> 2  2  b
> 3  3  c
> 4  4  d
> 5  5  e
>   > eval(parse(text=DFName))[,1]
> [1] 1 2 3 4 5
>   > length(eval(parse(text=DFName))[,1])
> [1] 5
>   >
>
> Is this what you are looking for?  I hope this helps.
>
> Chel Hee Lee
>
>
> On 1/29/2015 12:34 AM, Jeff Newmiller wrote:
>> This approach is fraught with dangers.
>>
>> I recommend that you put all of those data frames into a list and have your 
>> function accept the list and the name and use the list indexing operator 
>> mylist[[DFName]] to refer to it. Having functions that go fishing around in 
>> the global environment will be hard to maintain at best, and buggy at worst.
>>
>> That said, I usually work with all of my data frames combined as one and use 
>> the plyr, dplyr, or data.table packages to apply my algorithms to each group 
>> of rows identified by a character or factor column.
>> ---
>> Jeff NewmillerThe .   .  Go Live...
>> DCN:Basics: ##.#.   ##.#.  Live Go...
>> Live:   OO#.. Dead: OO#..  Playing
>> Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
>> /Software/Embedded Controllers)   .OO#.   .OO#.  rocks...1k
>> ---
>> Sent from my phone. Please excuse my brevity.
>>
>> On January 28, 2015 5:37:34 PM PST, Alan Yong  wrote:
>>> Dear R-help,
>>> I have df.1001 as a data frame with rows & columns of values.
>>>
>>> I also have other data frames named similarly, i.e., df.*.
>>>
>>> I used DFName from:
>>>
>>> DFName <- ls(pattern = glob2rx("df.*"))[1]
>>>
>>> & would like to pass on DFName to another function, like:
>>>
>>> length(DFName[, 1])
>>>
>>> however, when I run:
>>>
>>>> length(DFName[, 1])
>>> Error in DFName[, 1] : incorrect number of dimensions
>>>
>>> and
>>>
>>> length(df.1001[, 1])
>>> [1] 104
>>>
>>> do not provide the same expected answer.
>>>
>>> How can I successfully pass the data frame name of df.1001 as a
>>> variable named DFName in a function?
>>>
>>> Thanks,
>>> Alan
>>>
>>> __
>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Passing a Data Frame Name as a Variable in a Function

2015-01-29 Thread David L Carlson

That's fine, but I'm here in town if you want me to pick her up at the airport.

David

-Original Message-
From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Chel Hee Lee
Sent: Thursday, January 29, 2015 9:18 AM
To: Jeff Newmiller; Alan Yong; r-help@r-project.org
Subject: Re: [R] Passing a Data Frame Name as a Variable in a Function

I like Jeff's comments on the previous post.

Regarding Alan's question, please see the following example.

 > df.1 <- data.frame(v1=1:5, v2=letters[1:5])
 > df.2 <- data.frame(v1=LETTERS[1:3], v2=11:13)
 > DFName <- ls(pattern = glob2rx("df.*"))[1]
 > DFName
[1] "df.1"
 > length(DFName[,1])
Error in DFName[, 1] : incorrect number of dimensions

'DFName' is a character vector of length 1 (it is neither a matrix nor a 
data frame).  In this case, you may try 'eval()' as below:

 > eval(parse(text=DFName))
   v1 v2
1  1  a
2  2  b
3  3  c
4  4  d
5  5  e
 > eval(parse(text=DFName))[,1]
[1] 1 2 3 4 5
 > length(eval(parse(text=DFName))[,1])
[1] 5
 >

Is this what you are looking for?  I hope this helps.

Chel Hee Lee


On 1/29/2015 12:34 AM, Jeff Newmiller wrote:
> This approach is fraught with dangers.
>
> I recommend that you put all of those data frames into a list and have your 
> function accept the list and the name and use the list indexing operator 
> mylist[[DFName]] to refer to it. Having functions that go fishing around in 
> the global environment will be hard to maintain at best, and buggy at worst.
>
> That said, I usually work with all of my data frames combined as one and use 
> the plyr, dplyr, or data.table packages to apply my algorithms to each group 
> of rows identified by a character or factor column.
> ---
> Jeff NewmillerThe .   .  Go Live...
> DCN:Basics: ##.#.   ##.#.  Live Go...
>Live:   OO#.. Dead: OO#..  Playing
> Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
> /Software/Embedded Controllers)   .OO#.   .OO#.  rocks...1k
> ---
> Sent from my phone. Please excuse my brevity.
>
> On January 28, 2015 5:37:34 PM PST, Alan Yong  wrote:
>> Dear R-help,
>> I have df.1001 as a data frame with rows & columns of values.
>>
>> I also have other data frames named similarly, i.e., df.*.
>>
>> I used DFName from:
>>
>> DFName <- ls(pattern = glob2rx("df.*"))[1]
>>
>> & would like to pass on DFName to another function, like:
>>
>> length(DFName[, 1])
>>
>> however, when I run:
>>
>>> length(DFName[, 1])
>> Error in DFName[, 1] : incorrect number of dimensions
>>
>> and
>>
>> length(df.1001[, 1])
>> [1] 104
>>
>> do not provide the same expected answer.
>>
>> How can I successfully pass the data frame name of df.1001 as a
>> variable named DFName in a function?
>>
>> Thanks,
>> Alan
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Working with < and > is data sets

2015-01-26 Thread David L Carlson

Here is one way to fix the data:

# First note that "value" is a factor so we need to convert it to character
> str(zp)
'data.frame':   20 obs. of  2 variables:
 $ variable: Factor w/ 5 levels "ZP.1","ZP.3",..: 1 1 1 1 2 2 2 2 3 3 ...
 $ value   : Factor w/ 19 levels "<0.030","<1.2",..: 3 4 2 1 7 8 6 5 12 11 ...
> zp$value <- as.character(zp$value)
> str(zp)
'data.frame':   20 obs. of  2 variables:
 $ variable: Factor w/ 5 levels "ZP.1","ZP.3",..: 1 1 1 1 2 2 2 2 3 3 ...
 $ value   : chr  "1160" "27.3" "<1.2" "<0.030" ...

# Next we need to see which values are preceded by "<", and record that in 
# a new variable, "note"
> zp$note <- ifelse(grepl("<", zp$value), "Limit", "Measured")

# Finally we strip the "<" off and convert "value" to numeric
> zp$value <- as.numeric(gsub("<", "", zp$value))
> str(zp)
'data.frame':   20 obs. of  3 variables:
 $ variable: Factor w/ 5 levels "ZP.1","ZP.3",..: 1 1 1 1 2 2 2 2 3 3 ...
 $ value   : num  1160 27.3 1.2 0.03 1870 45.7 0.85 0.025 695 31.9 ...
 $ note    : chr  "Measured" "Measured" "Limit" "Limit" ...
> head(zp)
  variable   value note
1 ZP.1 1160.00 Measured
2 ZP.1   27.30 Measured
3 ZP.11.20Limit
4 ZP.10.03Limit
5 ZP.3 1870.00 Measured
6 ZP.3   45.70 Measured

-
David L Carlson
Department of Anthropology
Texas A&M University
College Station, TX 77840-4352

-Original Message-
From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Sam Albers
Sent: Monday, January 26, 2015 12:41 PM
To: r-help@r-project.org
Subject: [R] Working with < and > is data sets

Hello,

I am having some trouble figuring out how to deal with data that has some
observations that are detection limits and others that are integers denoted
by greater and less than symbols. Ideally I would like a column that has
the data as numbers then another column with values "Measured" or "Limit"
or something like that. Data and further clarification below.

##Data
zp<-structure(list(variable = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L,
3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L),
.Label = c("ZP.1", "ZP.3", "ZP.5",
"ZP.7", "ZP.9"), class = "factor"),
   value = structure(c(3L, 4L, 2L, 1L, 7L, 8L, 6L, 5L, 12L,
11L, 10L, 9L, 15L, 16L, 14L, 13L, 19L, 18L, 17L, 9L),
 .Label = c("<0.030", "<1.2", "1160",
"27.3", "<0.025", "<0.85", "1870", "45.7", "<0.0020",
"<0.050", "31.9", "695",
"<0.0060", "<0.20", "311", "8.84", "<0.090", "12", "646"), class =
"factor")),
  .Names = c("variable", "value"), row.names = c(NA, -20L),
class = "data.frame")

## As expected converting everything to numeric results is a slew of NA
values
zp$valuefactor<-as.numeric(as.character(zp$value))

## At this point I am unsure how to proceed.

zp

###

So I am just wondering how folks deal with this type of data. Any advice
would be much appreciated as I am looking for something that will reliably
works on a large data set.

Thanks in advance!

Sam

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Complex merging problems

2015-01-13 Thread David L Carlson

I think the OP does not want to list duplicate records. Perhaps

> merge(unique(df1), df2, all.y=TRUE)
  v1 v2 ind
1  1 83   1
2  1 84   1
3  2 83  NA
4  2 84  NA
5  3 83  NA
6  3 84  NA
7  4 83  NA
8  4 84  NA

-----
David L Carlson
Department of Anthropology
Texas A&M University
College Station, TX 77840-4352


-Original Message-
From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of PIKAL Petr
Sent: Tuesday, January 13, 2015 2:14 AM
To: npretnar; r-help@r-project.org
Subject: Re: [R] Complex merging problems

Hi

I do not understand what you want to achive with this.

> df2$v3 <- ifelse(df2$v1 %in% df1$v1 & df2$v2==df2$v1, 1, 0).

You compare v1 and v2 from data frame df2 to column v1 in data frame df1?

It is true only in case where df2$v1 equals df2$v2.

In case you mean that you want check equality of rows in both data frames you 
can use this

> df1$ind<-1
> merge(df1, df2, all.y=T)
   v1 v2 ind
1   1 83   1
2   1 83   1
3   1 84   1
4   1 84   1
5   2 83  NA
6   2 84  NA
7   3 83  NA
8   3 84  NA
9   4 83  NA
10  4 84  NA

Cheers
Petr

> -Original Message-
> From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of
> npretnar
> Sent: Tuesday, January 13, 2015 7:07 AM
> To: r-help@r-project.org
> Subject: [R] Complex merging problems
>
> Hello,
>
> I have two data frames structured as follows:
>
> df1
>
> v1v2
> 1 83
> 1 83
> 1 84
> 1 84
> 1 85
> 1 85
> 2 90
> 2 91
> 2 91
> 2 91
> 2 92
> 4 89
> 4 89
> 4 90
> 4 90
>
> df2
>
> v1v2
> 1 83
> 2 83
> 3 83
> 4 83
> 1 84
> 2 84
> 3 84
> 4 84
>
> ... etc.
>
> I am trying to create an indicator variable in df2 to indicate whether
> the record is identified in df1. I just want to know if it appears
> once. The problem seems to be that df1 contains multiple records with
> the same data. I am attempting the following:
>
> df2$v3 <- ifelse(df2$v1 %in% df1$v1 & df2$v2==df2$v1, 1, 0).
>
> However, I get the following warning message:
>
> Warning message:
> In df2$v2 == df1$v1 :
>   longer object length is not a multiple of shorter object length
>
> Nonetheless, the function outputs all 0's to df2$v3. If anybody has any
> suggestions with this, I would greatly appreciate it.
>
> Thanks,
>
> - Nick Pretnar
> npret...@gmail.com
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.


Tento e-mail a jakékoliv k němu připojené dokumenty jsou důvěrné a jsou určeny 
pouze jeho adresátům.
Jestliže jste obdržel(a) tento e-mail omylem, informujte laskavě neprodleně 
jeho odesílatele. Obsah tohoto emailu i s přílohami a jeho kopie vymažte ze 
svého systému.
Nejste-li zamýšleným adresátem tohoto emailu, nejste oprávněni tento email 
jakkoliv užívat, rozšiřovat, kopírovat či zveřejňovat.
Odesílatel e-mailu neodpovídá za eventuální škodu způsobenou modifikacemi či 
zpožděním přenosu e-mailu.

V případě, že je tento e-mail součástí obchodního jednání:
- vyhrazuje si odesílatel právo ukončit kdykoliv jednání o uzavření smlouvy, a 
to z jakéhokoliv důvodu i bez uvedení důvodu.
- a obsahuje-li nabídku, je adresát oprávněn nabídku bezodkladně přijmout; 
Odesílatel tohoto e-mailu (nabídky) vylučuje přijetí nabídky ze strany příjemce 
s dodatkem či odchylkou.
- trvá odesílatel na tom, že příslušná smlouva je uzavřena teprve výslovným 
dosažením shody na všech jejích náležitostech.
- odesílatel tohoto emailu informuje, že není oprávněn uzavírat za společnost 
žádné smlouvy s výjimkou případů, kdy k tomu byl písemně zmocněn nebo písemně 
pověřen a takové pověření nebo plná moc byly adresátovi tohoto emailu případně 
osobě, kterou adresát zastupuje, předloženy nebo jejich existence je adresátovi 
či osobě jím zastoupené známá.

This e-mail and any documents attached to it may be confidential and are 
intended only for its intended recipients.
If you received this e-mail by mistake, please immediately inform its sender. 
Delete the contents of this e-mail with all attachments and its copies from 
your system.
If you are not the intended recipient of this e-mail, you are not authorized to 
use, disseminate, copy or disclose this e-mail in any manner.
The sender of this e-mail shall not be liable for any possible damage caused by 
modifications of the e-mail or by delay with transfer of the email.

In case that this e-mail forms part of business dealings:
- the sender reserves the right to end

Re: [R] number of individuals where X=0 during all periods (longitudinal data)

2014-12-22 Thread David L Carlson

Spend a little time with aggregate()

?aggregate

-
David L Carlson
Department of Anthropology
Texas A&M University
College Station, TX 77840-4352

-Original Message-
From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of najuzz
Sent: Monday, December 22, 2014 7:45 AM
To: r-help@r-project.org
Subject: [R] number of individuals where X=0 during all periods (longitudinal 
data)

#Hi guys, 

#I would like to count the number of individuals that receive X=0 troughout
their observational period.
#example dataset:

ID<-c(1,1,1,1,2,2,3,3,3)
X<-c(0,1,2,1,0,0,0,0,0)
Time<-c(1,2,3,4,1,2,1,2,3)
Test<-data.frame(ID,X,Time)

# Individuals 2 and 3 have x=0 during all their periods. The count should
hence equal to two. I simply have 
# no clue how R could solve this for me. As an addon, I would also like to
know the number of individuals  #that report X=0 during all periods plus
have at least 3 weeks of observations. The answer would be one in #this
sample datset.

#Thank you



--
View this message in context: 
http://r.789695.n4.nabble.com/number-of-individuals-where-X-0-during-all-periods-longitudinal-data-tp4701023.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] combinations between two vectors

2014-12-18 Thread David L Carlson

Depending on what you want, you probably want to start with expand.grid():

# All combinations of test with test
> pairs1 <- expand.grid(test, test)
> nrow(pairs1)
[1] 36
# Exclude cases that differ only in the order of the values
# E.g. (1, 5001), but not (5001, 1), also (1, 1), etc are included
> pairs2 <- pairs1[pairs1[,1] <= pairs1[,2],]
> nrow(pairs2)
[1] 21
# Same as pairs2 but (1, 1), etc are not included
> pairs3 <- pairs1[pairs1[,1] < pairs1[,2],]
> nrow(pairs3)
[1] 15

---------
David L Carlson
Department of Anthropology
Texas A&M University
College Station, TX 77840-4352

-Original Message-
From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Sarah Goslee
Sent: Thursday, December 18, 2014 9:06 AM
To: Alaios
Cc: R-help@r-project.org
Subject: Re: [R] combinations between two vectors

I can't quite tell what you want: your example output is either
unclear to me or mangled by posting in HTML (please don't).

Is
expand.grid(test, test)
what you want, or partway to what you want?

Sarah

On Thu, Dec 18, 2014 at 9:56 AM, Alaios via R-help  wrote:
> Hi all,I am looking for a function that would give me all the combinations 
> between two vectors.Lets take as example the
>
> test<-seq(1,3,by=5000)
> Browse[2]> test
> [1] 1  5001 10001 15001 20001 25001
> I want all the combinations between two times the test... I think this is  
> called permutation so a function that could do permutation(test,test)and 
> produce the following
> 1,11,50011,100011,15001
> 3,13,5001...25001,20001,25001,25001
> is there such a function ?
> RegardsAlex
>
>
> [[alternative HTML version deleted]]
>

-- 
Sarah Goslee
http://www.functionaldiversity.org

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Extract values from multiple lists

2014-12-16 Thread David L Carlson

Something like

scens <- paste0("scen", 1:N)
new.df <- data.frame(sapply(scens, function(x) get(x)[["pop.inf.r"]]))

---------
David L Carlson
Department of Anthropology
Texas A&M University
College Station, TX 77840-4352

-Original Message-
From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of SH
Sent: Tuesday, December 16, 2014 11:06 AM
To: r-help
Subject: [R] Extract values from multiple lists

Dear List,

I hope this posting is not redundant.  I have several list outputs with the
same components.  I ran a function with three different scenarios below
(e.g., scen1, scen2, and scen3,...,scenN).  I would like to extract the
same components and group them as a data frame.  For example,
pop.inf.r1 <- scen1[['pop.inf.r']]
pop.inf.r2 <- scen2[['pop.inf.r']]
pop.inf.r3 <- scen3[['pop.inf.r']]
...
pop.inf.rN<-scenN[['pop.inf.r']]
new.df <- data.frame(pop.inf.r1, pop.inf.r2, pop.inf.r3,...,pop.inf.rN)

My final output would be 'new.df'.  Could you help me how I can do that
efficiently?

Thanks in advance,

Steve

P.S.:  Below are some examples of summary outputs.


> summary(scen1)
Length Class  Mode
aql1   -none- numeric
rql1   -none- numeric
alpha  1   -none- numeric
beta   1   -none- numeric
n.sim  1   -none- numeric
N  1   -none- numeric
n.sample   1   -none- numeric
n.acc  1   -none- numeric
lot.inf.r  1   -none- numeric
pop.inf.n   2000   -none- list
pop.inf.r   2000   -none- list
pop.decision.t1 2000   -none- list
pop.decision.t2 2000   -none- list
sp.inf.n2000   -none- list
sp.inf.r2000   -none- list
sp.decision 2000   -none- list
> summary(scen2)
Length Class  Mode
aql1   -none- numeric
rql1   -none- numeric
alpha  1   -none- numeric
beta   1   -none- numeric
n.sim  1   -none- numeric
N  1   -none- numeric
n.sample   1   -none- numeric
n.acc  1   -none- numeric
lot.inf.r  1   -none- numeric
pop.inf.n   2000   -none- list
pop.inf.r   2000   -none- list
pop.decision.t1 2000   -none- list
pop.decision.t2 2000   -none- list
sp.inf.n2000   -none- list
sp.inf.r2000   -none- list
sp.decision 2000   -none- list
> summary(scen3)
Length Class  Mode
aql1   -none- numeric
rql1   -none- numeric
alpha  1   -none- numeric
beta   1   -none- numeric
n.sim  1   -none- numeric
N  1   -none- numeric
n.sample   1   -none- numeric
n.acc  1   -none- numeric
lot.inf.r  1   -none- numeric
pop.inf.n   2000   -none- list
pop.inf.r   2000   -none- list
pop.decision.t1 2000   -none- list
pop.decision.t2 2000   -none- list
sp.inf.n2000   -none- list
sp.inf.r2000   -none- list
sp.decision 2000   -none- list

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] dotplot axes labelling

2014-12-15 Thread David L Carlson

You are very close. The argument scales(list(y=list())) supports multiple 
arguments for the y axis so you need to tell lattice how to use testylabels:

dotplot(testmatrix, scales=list(y=list(labels=testylabels), xlab=NULL))

-
David L Carlson
Department of Anthropology
Texas A&M University
College Station, TX 77840-4352


-Original Message-
From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of 
r...@openmailbox.org
Sent: Monday, December 15, 2014 10:03 AM
To: r-help@r-project.org
Subject: [R] dotplot axes labelling

Subscribers,

What is my mistake with the following example:

library(lattice)
testmatrix<-matrix(c(1,2,3,4,3,6,12,24),nrow=4,ncol=2)
testylabels<-c('w1','x1','y1','z1')
dotplot(testmatrix, scales=list(y=list(testylabels)), xlab=NULL)
#testylabels not shown, instead 'D' 'C' 'B' 'A'

Thanks in advance.

--

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] create matrices with constraint

2014-12-15 Thread David L Carlson

Actually there are not so many matrices as you suggest.

> comb <- combn(28, 4)
> dim(comb)
[1] 4 20475
> sum(comb[1,]==1)
[1] 2925
> comb[, 1]
[1] 1 2 3 4

There are 20,475 combinations, but you cannot choose any four to make a 4x7 
matrix since each value can be used only once. The combn() function returns the 
combinations sorted, so we can get the number of combinations that contain 1 
with sum(comb[1,]==1) and that is 2,925. The set of 4x7 matrices cannot use the 
same combination more than once, so 2,925 is the maximum possible number of 
matrices and there may be fewer. As a first approach to finding them, you could 
take the first combination comb[, 1] which is 1, 2, 3, 4. Now add a second 
combination that does not include 1:4 and then a third combination that does 
not include any in the first two combinations and finally a fourth that does 
not include any in the first three combinations. Actually this is easy since we 
will just take 1:4, 5:8, 9:12, 13:16, 17:20, 21:24, 24:18.

> cols <- sapply(c(1, 5, 9, 13, 17, 21, 24), function(x)
+  head(which(comb[1,]==x), 1))
> cols
[1] 1  9850 15631 18656 19981 20406 20471
> comb[,cols]
 [,1] [,2] [,3] [,4] [,5] [,6] [,7]
[1,]159   13   17   21   24
[2,]26   10   14   18   22   25
[3,]37   11   15   19   23   26
[4,]48   12   16   20   24   27

But now it gets more complicated. While building the second matrix, we have to 
make sure that it does not use any combinations that have already been used.  
Combinations used on earlier matrices may be necessary to complete later 
matrices and that is why the number of sets may be less than 2,925. This 
sequential approach would guarantee to obtain matrices meeting the OP's 
criteria, but would not necessarily produce the maximum number of matrices 
possible. 

---------
David L Carlson
Department of Anthropology
Texas A&M University
College Station, TX 77840-4352

-Original Message-
From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of John McKown
Sent: Monday, December 15, 2014 9:23 AM
To: Kathryn Lord
Cc: r-help
Subject: Re: [R] create matrices with constraint

On Fri, Dec 12, 2014 at 11:00 AM, Kathryn Lord 
wrote:

> Dear all,
>
> Suppose that I have natural numbers 1 through 28.
>
> Based on these numbers, choose 4 numbers 7 times without replacement and
> make a 4 by 7 matrix, for example,
>
>
After a relaxing weekend, it came to me that these 4x7 matrices are really
just a subset of all the possible permutations of the vector 1:28, recast
as  4x7 matrices. Of course, there are factorial(28) (about 3*10^29 ) such
4x7 matrices. But given your constraints, I think that these can be
subsetted to only those permutations in which the values in each row are
sorted in ascending (or descending) order. I am fairly certain that this
subset would be exhaustive for your purposes. I not really certain how big
that subset would be. I think it would be 1/168th ( 1 out of 7*factorial(4)
) of the 3*10^29 permutations, or about 1.8*10^27. Which is still way to
big to actually instantiate all at once. You might be able store such a
thing in a huge data base. If you're lucky, you have access to a massive
supercomputer so that you can get the results before the heat death of this
universe. (exaggeration?)

Two R libraries seems to address this. One is combinat. The other is
permute. The permute library seems, to me, to be the more likely
candidate. It contains a "how()" function which __appears to me__ to
perhaps be a way to subset the permutations as they are being generated.
But all that I get from reading the documentation is a bad headache. I
never studied combinatorics. And I got a milder headache trying to read the
Wikipedia article on it.

I am curious about what you will do with such a set of matrices, once you
have them. If you are permitted to say.

-- 

While a transcendent vocabulary is laudable, one must be eternally careful
so that the calculated objective of communication does not become ensconced
in obscurity.  In other words, eschew obfuscation.

Maranatha! <><
John McKown

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] if else for cumulative sum error

2014-12-02 Thread David L Carlson

Let's try a different approach. You don't need a loop for this. First we need a 
reproducible example:

> set.seed(42)
> dadosmax <- data.frame(above=runif(150) + .5)

Now compute your sums using cumsum() and diff() and then compute enchday using 
ifelse(). See the manual pages for each of these functions to understand how 
they work:

> sums <- diff(c(0, cumsum(dadosmax$above)), 45)
> dadosmax$enchday <- c(ifelse(sums >= 45, 1, 0), rep(NA, 44))

> dadosmax$enchday
  [1]  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1
 [26]  1  1  1  1  1  1  1  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
 [51]  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
 [76]  0  0  0  0  0  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1
[101]  1  1  1  1  1  1 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
[126] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA

See the NA's? Those are what David Winsemius is talking about. For the 106th 
value, 106+44 is 150, but for the 107th value 107+144 is 151 which does not 
exist. Fortunately diff() understands that and stops at 106, but we have to add 
44 NA's because that is the number of rows in your data frame.

You might find this plot informative as well:

> plot(sums, typ="l")
> abline(h=45)

Another way to get there is to use sapply() which will add the NA's for us:

> sums <- sapply(1:150, function(x) sum(dadosmax$above[x:(x+44)]))
> dadosmax$enchday <- ifelse(sums >= 45, 1, 0)

But it won't be as fast if you have a large data set.

-
David L Carlson
Department of Anthropology
Texas A&M University
College Station, TX 77840-4352

-Original Message-
From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of David Winsemius
Sent: Tuesday, December 2, 2014 2:50 PM
To: Jefferson Ferreira-Ferreira
Cc: r-help@r-project.org
Subject: Re: [R] if else for cumulative sum error

On Dec 2, 2014, at 12:26 PM, Jefferson Ferreira-Ferreira wrote:

> Thank you for replies.
> 
> David,
> 
> I tried your modified form
> 
> for (i in 1:seq_along(rownames(dadosmax))){

No. it is either 1:  or seq_along(...). in this case perhaps 
1:(nrow(dadosmax)-44 would be safer

You do not seem to have understood that you cannot use an index of i+44 when i 
is going to be the entire set of rows of the dataframe. There is "no there 
there" to quote Gertrude Stein's slur against Oakland. In fact there is not 
there there at i+1 when you get to the end. You either need to only go to row

>  dadosmax$enchday[i] <- if ( (sum(dadosmax$above[i:(i+44)])) >= 45) 1 else
> 0
> }
> 
> However, I'm receiving this warning:
> Warning message:
> In 1:seq_along(rownames(dadosmax)) :
>  numerical expression has 2720 elements: only the first used
> 
> I can't figure out why only the first row was calculated...

You should of course read these, but the error is not from your if-statement 
but rahter you for-loop-indexing.

?'if'
?ifelse

> Any ideas?
> 
> 
> 
> Em Tue Dec 02 2014 at 15:22:25, John McKown 
> escreveu:
> 
>> On Tue, Dec 2, 2014 at 12:08 PM, Jefferson Ferreira-Ferreira <
>> jeco...@gmail.com> wrote:
>> 
>>> Hello everybody;
>>> 
>>> I'm writing a code where part of it is as follows:
>>> 
>>> for (i in nrow(dadosmax)){
>>>  dadosmax$enchday[i] <- if (sum(dadosmax$above[i:(i+44)]) >= 45) 1 else 0
>>> }
>>> 
>> 
>> Without some test data for any validation, I would try the following
>> formula
>> 
>> dadosmax$enchday[i] <- if
>> (sum(dadosmax$above[i:(min(i+44,nrow(dadosmax)))] >= 45) 1 else 0
>> 
>> 
>> 
>>> 
>>> That is for each row of my data frame, sum an specific column (0 or 1) of
>>> that row plus 44 rows. If It is >=45 than enchday is 1 else 0.
>>> 
>>> The following error is returned:
>>> 
>>> Error in if (sum(dadosmax$above[i:(i + 44)]) >= 45) 1 else 0 :
>>>  missing value where TRUE/FALSE needed
>>> 
>>> I've tested the ifelse statement assigning different values to i and it
>>> works. So I'm wondering if this error is due the fact that at the final of
>>> my data frame there aren't 45 rows to sum anymore. I tried to use "try"
>>> but
>>> It's simply hide the error.
>>> 
>>> How can I deal with this? Any ideas?
>>> Thank you very much.
>>> 
>>>[[alternative HTML version deleted]]
>>> 
>>> __
>>> R-help@r

Re: [R] Creating submatrices from a dataframe, depending on factors in sample names

2014-12-01 Thread David L Carlson

I may have misunderstood, but does this do what you want?

> df.mat <- as.matrix(df)
> same <- lapply(1:3, function(x) df.mat[grep(paste0("_", x), 
+ rownames(df.mat)), grep(paste0("_", x), colnames(df.mat))])
> same
[[1]]
   HQ673618_1 HQ674317_1 EU686630_1
HQ673618_1 NA   90.8   89.8
HQ674317_1   90.8 NA   98.6
EU686630_1   89.8   98.6 NA

[[2]]
   EU686593_2 JN166322_2 EU491340_2
EU686593_2 NA   98.1   96.8
JN166322_2   98.1 NA   97.5
EU491340_2   96.8   97.5 NA

[[3]]
   AB694259_3 AB694258_3 AB694462_3
AB694259_3 NA   98.3   95.9
AB694258_3   98.3 NA   95.8
AB694462_3   95.9   95.8 NA

> Diff <- as.matrix(expand.grid(1:3, 1:3))
> Diff <- Diff[Diff[,1] different <- lapply(seq_len(nrow(Diff)), function(x) 
+ df.mat[grep(paste0("_", Diff[x,1]), rownames(df.mat)),
+ grep(paste0("_", Diff[x,2]), colnames(df.mat))])
> different
[[1]]
   EU686593_2 JN166322_2 EU491340_2
HQ673618_1   89.6   89.8   88.9
HQ674317_1   97.7   98.4   97.4
EU686630_1   98.4   98.9   97.7

[[2]]
   AB694259_3 AB694258_3 AB694462_3
HQ673618_1   87.8   88.2   88.3
HQ674317_1   94.9   96.2   95.1
EU686630_1   95.4   96.4   95.8

[[3]]
   AB694259_3 AB694258_3 AB694462_3
EU686593_2   94.4   95.6   94.8
JN166322_2   95.3   96.5   95.9
EU491340_2   96.5   97.7   96.0

-
David L Carlson
Department of Anthropology
Texas A&M University
College Station, TX 77840-4352

-Original Message-
From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Bert Gunter
Sent: Monday, December 1, 2014 11:46 AM
To: Tim Richter-Heitmann
Cc: r-help@r-project.org
Subject: Re: [R] Creating submatrices from a dataframe, depending on factors in 
sample names

I do not have the patience to study your request carefully, but does
the following help?

> a <- 1:3
> x <- outer(a,a,paste,sep=".")
> x
 [,1]  [,2]  [,3]
[1,] "1.1" "1.2" "1.3"
[2,] "2.1" "2.2" "2.3"
[3,] "3.1" "3.2" "3.3"
> x[upper.tri(x)]
[1] "1.2" "1.3" "2.3"

> x[upper.tri(x,diag=TRUE)]
[1] "1.1" "1.2" "2.2" "1.3" "2.3" "3.3"

This gives you a vector all possible pairs (including identical pairs
or not) of values of a, which you could then loop over as an index to
do what you want, I think.

If this is not what you want, just ignore without replying.

Cheers,
Bert


Bert Gunter
Genentech Nonclinical Biostatistics
(650) 467-7374

"Data is not information. Information is not knowledge. And knowledge
is certainly not wisdom."
Clifford Stoll




On Mon, Dec 1, 2014 at 8:47 AM, Tim Richter-Heitmann
 wrote:
> Hello there,
>
> this is a cross-post of a stack-overflow question, which wasnt answered, but
> is very important for my work. Apologies for breaking any rules, but i do
> hope for some help from the list instead:
>
> I have a huge matrix of pairwise similarity percentages between different
> samples. The samples are belonging to groups. The groups are determined by
> the suffix "_n" in the row.names/header names.
> In the first step, i wanted to create submatrices consisting of all pairs
> within single groups (i.e. for all samples from "_1").
> However, I realized that i need to know all pairwise submatrices, between
> all combination of groups. So, i want to create (a list of) vectors that are
> named "_n1 vs _n2" (or similar) for all combinations of n, as illustrated by
> the colored rectangulars:
>
> http://i.stack.imgur.com/XMkxj.png
>
> Reproducible code, as provided by helpful Stack Overflow members, dealing
> with identical "_n"s.
>
>
> df <- structure(list(HQ673618_1 = c(NA, 90.8, 89.8, 89.6, 89.8,
> 88.9,
> 87.8, 88.2, 88.3), HQ674317_1 = c(90.8, NA, 98.6, 97.7, 98.4,
> 97.4, 94.9, 96.2, 95.1), EU686630_1 = c(89.8, 98.6, NA, 98.4,
> 98.9, 97.7, 95.4, 96.4, 95.8), EU686593_2 = c(89.6, 97.7, 98.4,
> NA, 98.1, 96.8, 94.4, 95.6, 94.8), JN166322_2 = c(89.8, 98.4,
> 98.9, 98.1, NA, 97.5, 95.3, 96.5, 95.9), EU491340_2 = c(88.9,
> 97.4, 97.7, 96.8, 97.5, NA, 96.5, 97.7, 96), AB694259_3 = c(87.8,
> 94.9, 95.4, 94.4, 95.3, 96.5, NA, 98.3, 95.9), AB694258_3 = c(88.2,
> 96.2, 96.4, 95.6, 96.5, 97.7, 98.3, NA, 95.8), AB694462_3 = c(88.3,
> 95.1, 95.8, 94.8, 95.9, 96, 95.9, 95.8, NA)), .Names =
> c("HQ673618_1",
> &qu

Re: [R] Converting list to character

2014-11-25 Thread David L Carlson

Or just modify your aggregate() command:

> TAB <- aggregate(mydata$CODE, by=list(ID=mydata$ID, 
+YEAR=mydata$YEAR), FUN=paste0, collapse=", ")
> TAB
 ID YEAR  x
1   986 2008 GR.3.8
2  1251 2008 GR.3.1, GR.3.8
3  1801 2008 GR.3.8
411 2009 GR.3.7
5   986 2009 GR.3.8
6  1251 2009 GR.3.1, GR.3.8
7  1801 2009 GR.3.8
811 2010 GR.3.7
9   460 2010 GR.3.1
10  986 2010 GR.3.8
11 1251 2010 GR.3.1, GR.3.8
12 1801 2010 GR.3.8
13  460 2011 GR.3.1
14  986 2011 GR.3.8
15 1251 2011 GR.3.1, GR.3.8
16 1801 2011 GR.3.8

-------------
David L Carlson
Department of Anthropology
Texas A&M University
College Station, TX 77840-4352



-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Lee, Chel Hee
Sent: Tuesday, November 25, 2014 11:23 AM
To: Massimiliano Tripoli; r-help@r-project.org
Subject: Re: [R] Converting list to character

 > do.call("rbind", TAB$x)
[,1] [,2]
1  "GR.3.8" "GR.3.8"
2  "GR.3.1" "GR.3.8"
4  "GR.3.8" "GR.3.8"
5  "GR.3.7" "GR.3.7"
6  "GR.3.8" "GR.3.8"
7  "GR.3.1" "GR.3.8"
9  "GR.3.8" "GR.3.8"
10 "GR.3.7" "GR.3.7"
11 "GR.3.1" "GR.3.1"
12 "GR.3.8" "GR.3.8"
13 "GR.3.1" "GR.3.8"
15 "GR.3.8" "GR.3.8"
16 "GR.3.1" "GR.3.1"
17 "GR.3.8" "GR.3.8"
18 "GR.3.1" "GR.3.8"
20 "GR.3.8" "GR.3.8"
 >

Is this what you are looking for?  I hope this helps.

Chel Hee Lee

On 11/25/2014 6:07 AM, Massimiliano Tripoli wrote:
>
>
> Dear all,
>
> I can't convert the result of aggregate function in a dataframe. My data
> looks like:
>
> mydata <- structure(list(ID = c(11, 11, 460, 460, 986, 986, 986, 986, 1251,
> 1251, 1251, 1251, 1251, 1251, 1251, 1251, 1801, 1801, 1801, 1801
> ), YEAR = c(2009, 2010, 2010, 2011, 2008, 2009, 2010, 2011, 2008,
> 2008, 2009, 2009, 2010, 2010, 2011, 2011, 2008, 2009, 2010, 2011
> ), Y = c(158126, 153015, 3701, 5880, 718663, 661112, 527233,
> 558281, 450, 131714, 427, 124648, 425, 116500, 434, 123853, 17400,
> 16493, 8057, 8329), CODE = c("GR.3.7", "GR.3.7", "GR.3.1", "GR.3.1",
> "GR.3.8", "GR.3.8", "GR.3.8", "GR.3.8", "GR.3.1", "GR.3.8", "GR.3.1",
> "GR.3.8", "GR.3.1", "GR.3.8", "GR.3.1", "GR.3.8", "GR.3.8", "GR.3.8",
> "GR.3.8", "GR.3.8")), .Names = c("ID", "YEAR", "Y", "CODE"), row.names = c(NA,
> 20L), class = "data.frame")
>
> and by using aggregate function
>
> TAB <- 
> aggregate(mydata$CODE,by=list(ID=mydata$ID,YEAR=mydata$YEAR),FUN=paste0)
>
> What I want is a dataframe like of printing TAB:
>> TAB
>   ID YEAR  x
> 1   986 2008 GR.3.8
> 2  1251 2008 GR.3.1, GR.3.8
> 3  1801 2008 GR.3.8
> 411 2009 GR.3.7
> 5   986 2009 GR.3.8
> 6  1251 2009 GR.3.1, GR.3.8
> 7  1801 2009 GR.3.8
> 811 2010 GR.3.7
> 9   460 2010 GR.3.1
> 10  986 2010 GR.3.8
> 11 1251 2010 GR.3.1, GR.3.8
> 12 1801 2010 GR.3.8
> 13  460 2011 GR.3.1
> 14  986 2011 GR.3.8
> 15 1251 2011 GR.3.1, GR.3.8
> 16 1801 2011 GR.3.8
>
>> str(TAB)[1:10]
> 'data.frame':16 obs. of  3 variables:
>   $ ID  : num  986 1251 1801 11 986 ...
>   $ YEAR: num  2008 2008 2008 2009 2009 ...
>   $ x   :List of 16
>..$ 1 : chr "GR.3.8"
>..$ 2 : chr  "GR.3.1" "GR.3.8"
>..$ 4 : chr "GR.3.8"
>..$ 5 : chr "GR.3.7"
>..$ 6 : chr "GR.3.8"
>..$ 7 : chr  "GR.3.1" "GR.3.8"
>..$ 9 : chr "GR.3.8"
>..$ 10: chr "GR.3.7"
>..$ 11: chr "GR.3.1"
>..$ 12: chr "GR.3.8"
>..$ 13: chr  "GR.3.1" "GR.3.8"
>..$ 15: chr "GR.3.8"
>..$ 16: chr "GR.3.1"
>..$ 17: chr "GR.3.8"
>..$ 18: chr  "GR.3.1" "GR.3.8"
>..$ 20: chr "GR.3.8"
> NULL
>
> As you can see the "x" coloumn is a list and I would want to change it to 
> character variable.
> Anyone may help me?
> Thanks,
>
> Massimiliano
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Rose Diagrams for Geology

2014-11-21 Thread David L Carlson

No. Just use the circular() function to specify that your data are in degrees 
and clockwise and the graph will be labeled that way.

David C (I was beginning to think that this thread was only for Davids).

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of jwd
Sent: Friday, November 21, 2014 1:58 AM
To: r-help@r-project.org
Subject: Re: [R] Rose Diagrams for Geology

On Tue, 18 Nov 2014 22:06:03 -0600
David Doyle  wrote:

> Thank you to David and David for their help.  The code below
> generated what I needed.
> 
> 
> library(circular)
> mydata <- read.table("http://doylesdartden.com/R/Joints.csv";,
> header=TRUE, sep=",",)
> x <- circular(mydata$JointsRad)
> rose.diag(x,
> 
>   #Set point character to use
>   pch = 20,
>   #sets font size
>   cex = 1,
>   #parameter that controls the size of the circle.
>   #1= default <1 makes it larger > makes it smaller
>   shrink = 1,
>   #the color for filling the rose diagram.
>   col=2,
>   prop = 2,
>   # number of bins.  36 = 10 degrees each.  18 = 20 degree
> each bins=36,
>   # Ticks showing bins
>   ticks=TRUE,
>   # Unites.
>   units="degrees",
>   # list main title
>   main="Rose Diagram of XXX")
> # for more info see
> http://www.inside-r.org/packages/cran/circular/docs/rose.diag
> 
I've been following this thread with some interest.  One problem that I
might have with the code above is that as it is, the plot is labeled
with 0-deg to the left, and numbered counter clockwise (standard
trigonometric format). Most field mapping data I have collected has been
either in quadrant form (rarely) or more commonly in azimuthal form
(0-360 degrees order clockwise from the top).  Is that an issue? 

jwdougherty

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Rose Diagrams for Geology

2014-11-18 Thread David L Carlson

Look at circular more carefully. It accepts both degrees and radians, but you 
have to create a circular object with circular() to specify what kind of 
circular data you have. Then you can plot and get circular statistics on your 
data.

-
David L Carlson
Department of Anthropology
Texas A&M University
College Station, TX 77840-4352

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of David Doyle
Sent: Tuesday, November 18, 2014 3:42 PM
To: r-help@r-project.org
Subject: [R] Rose Diagrams for Geology

Hello everyone,

In geology we often do rose diagrams showing the number of features along a
certain compass direction within a given range (bin) of angle (0-180
degrees).  I was wondering if anybody has had experience with this in R and
if they could recommend a package.

I looked at the circular package but it seems to deal only in radian and we
normally use degrees.

I've also looked a little at openair being rose diagrams are often used for
wind directions.

Any suggestions / guidance would be greatly appreciated.

Thank you for your time.
David Doyle

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Help for x axis

2014-11-18 Thread David L Carlson

This should get you started:

> aggData <- aggregate(age~smoke+gender, sampleData, mean)
> aggData
  smoke gender  age
1 0  0 39.47231
2 1  0 40.09020
3 0  1 39.59814
4 1  1 42.04092
> plotInfo <- barplot(aggData$age)
> axis(1, c(0, plotInfo), c("Gender", "-", "-", "+", "+"), line=.75, 
+ lwd=0, cex.axis=1.25, xpd=TRUE)
> axis(1, c(0, plotInfo), c("Smoke", "-", "+", "-", "+"), line=2, 
+ lwd=0, cex.axis=1.25, xpd=TRUE)

To adapt it you will need to read the manual pages for barplot() and axis() and 
the page on graphical parameters par(). In particular, you will have to 
allocate more space at the bottom of the plot if you want to add more lines.

-
David L Carlson
Department of Anthropology
Texas A&M University
College Station, TX 77840-4352



-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Olivier
Sent: Monday, November 17, 2014 9:24 AM
To: r-help@r-project.org
Subject: [R] Help for x axis

Hi,
I want to customize x axis to scientific data. I do experiments with 
different triggers. As others publications, I want that there is one 
line for each trigger with the sign "-" or "+" to show if the trigger is 
used or no. You will find attached an exemple.
Please find below a data.frame you could use to explain me.
Thank you for your response,
Olivier



set.seed(3)
sampleData <- data.frame(id = 1:100,gender = sample(c("0", "1"), 100, 
replace = TRUE), smoke = sample (c("0","1"), 100, replace=TRUE), age = 
rnorm(100, 40, 10))
summary(sampleData)

-> I want to give results with histograms or box.plot (age according to 
sex and smoking status)
-> x axis may be like something like this :

Gender-  - ++
Smoke -  +- +

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Help for x axis

2014-11-18 Thread David L Carlson

Try these:

aggData <- aggregate(age~smoke+gender, sampleData, function(x) c(mean=mean(x), 
stderr=sd(x)/sqrt(length(x
aggData
plotInfo <- barplot(aggData$age[,1], ylim=c(0, max(rowSums(aggData$age
axis(1, c(0, plotInfo), c("Gender", "-", "-", "+", "+"), line=.75, 
lwd=0, cex.axis=1.25, xpd=TRUE)
axis(1, c(0, plotInfo), c("Smoke", "-", "+", "-", "+"), line=2, 
lwd=0, cex.axis=1.25, xpd=TRUE)
top <- aggData$age[,1]+aggData$age[,2]
bottom <- aggData$age[,1]-aggData$age[,2]
arrows(plotInfo, bottom, plotInfo, top, length=.15, angle=90, code=3)


boxplot(age~smoke+gender, sampleData, xaxt="n")
axis(1, 0:4, c("Gender", "-", "-", "+", "+"), line=.75, 
lwd=0, cex.axis=1.25, xpd=TRUE)
axis(1, 0:4, c("Smoke", "-", "+", "-", "+"), line=2, 
lwd=0, cex.axis=1.25, xpd=TRUE)


David

-Original Message-
From: Olivier [mailto:olivier.lerou...@ymail.com] 
Sent: Monday, November 17, 2014 4:39 PM
To: David L Carlson
Subject: Re: [R] Help for x axis

Thank you very much, it is all I want to do. Is it possible with showing 
the error-bars or in a boxplot?
Best regards,

Olivier Le Rouzic

On 2014-11-17, 4:07 PM, David L Carlson wrote:
> This should get you started:
>
>> aggData <- aggregate(age~smoke+gender, sampleData, mean)
>> aggData
>smoke gender  age
> 1 0  0 39.47231
> 2 1  0 40.09020
> 3 0  1 39.59814
> 4 1  1 42.04092
>> plotInfo <- barplot(aggData$age)
>> axis(1, c(0, plotInfo), c("Gender", "-", "-", "+", "+"), line=.75,
> + lwd=0, cex.axis=1.25, xpd=TRUE)
>> axis(1, c(0, plotInfo), c("Smoke", "-", "+", "-", "+"), line=2,
> + lwd=0, cex.axis=1.25, xpd=TRUE)
>
> To adapt it you will need to read the manual pages for barplot() and axis() 
> and the page on graphical parameters par(). In particular, you will have to 
> allocate more space at the bottom of the plot if you want to add more lines.
>
> -
> David L Carlson
> Department of Anthropology
> Texas A&M University
> College Station, TX 77840-4352
>
>
>
> -Original Message-
> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
> Behalf Of Olivier
> Sent: Monday, November 17, 2014 9:24 AM
> To: r-help@r-project.org
> Subject: [R] Help for x axis
>
> Hi,
> I want to customize x axis to scientific data. I do experiments with
> different triggers. As others publications, I want that there is one
> line for each trigger with the sign "-" or "+" to show if the trigger is
> used or no. You will find attached an exemple.
> Please find below a data.frame you could use to explain me.
> Thank you for your response,
> Olivier
>
>
>
> set.seed(3)
> sampleData <- data.frame(id = 1:100,gender = sample(c("0", "1"), 100,
> replace = TRUE), smoke = sample (c("0","1"), 100, replace=TRUE), age =
> rnorm(100, 40, 10))
> summary(sampleData)
>
> -> I want to give results with histograms or box.plot (age according to
> sex and smoking status)
> -> x axis may be like something like this :
>
> Gender-  - ++
> Smoke -  +- +
>
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Help with ddply/summarize

2014-11-14 Thread David L Carlson

I think this is what you want:

> MyVar <- 1:10
> MyVar
 [1]  1  2  3  4  5  6  7  8  9 10
> mean(MyVar)
[1] 5.5
> txt <- "MyVar"
> mean(txt)
[1] NA
Warning message:
In mean.default(txt) : argument is not numeric or logical: returning NA
> mean(get(txt))
[1] 5.5


---------
David L Carlson
Department of Anthropology
Texas A&M University
College Station, TX 77840-4352




-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of John Posner
Sent: Thursday, November 13, 2014 5:32 PM
To: 'r-help@r-project.org'
Subject: [R] Help with ddply/summarize

I have a straightforward application of ddply() and summarize():

   ddply(MyFrame, .(Treatment, Week), summarize, MeanValue=mean(MyVar))

This works just fine:

   Treatment Week MeanValue
1MyDrug  BASELINE  5.91
2MyDrugWEEK 1  4.68
3MyDrugWEEK 2  4.08
4MyDrugWEEK 3  3.67
5MyDrugWEEK 4  2.96
6MyDrugWEEK 5  2.57
7MyDrugWEEK 6  2.50
8Placebo BASELINE  8.58
9Placebo   WEEK 1  8.25
...

But I want to specify the variable (MyVar) as a character string:

   ddply(MyFrame, .(Treatment, Week), summarize, MeanValue=mean("MyVar"))

(Actually, the character string "MyVar" will be selected from a vector of 
character strings.)

The code above produces no joy:

   Treatment Week MeanValue
1MyDrug  BASELINENA
2MyDrugWEEK 1NA
3MyDrugWEEK 2NA
4MyDrugWEEK 3NA
...
I tried a few things, including:

  as.name("MyVar")
  as.quoted("MyVar")

... but they all produced the name results: NAs

I'm obviously thrashing around in the dark! Any advice would be greatly 
appreciated.

-John


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] factor levels > numeric values

2014-11-12 Thread David L Carlson

Also look at the Frequently Asked Questions document that comes with your R 
installation:

7.10 How do I convert factors to numeric?

It may happen that when reading numeric data into R (usually, when reading in a 
file), they come in as factors. If f is such a factor object, you can use

as.numeric(as.character(f))

to get the numbers back. More efficient, but harder to remember, is

as.numeric(levels(f))[as.integer(f)]

In any case, do not call as.numeric() or their likes directly for the task at 
hand (as as.numeric() or unclass() give the internal codes).

-
David L Carlson
Department of Anthropology
Texas A&M University
College Station, TX 77840-4352

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Gerrit Eichner
Sent: Wednesday, November 12, 2014 8:06 AM
To: David Studer
Cc: r-help@r-project.org
Subject: Re: [R] factor levels > numeric values

Hello, David,

take a look at the beginning of the "Warning" section of ?factor.

  Hth  --  Gerrit

> Hi everybody,
>
> I have another question (to which I could not find an answer in my r-books.
> I am sure, it's not a great issue, but I simply lack of a good idea how to
> solve this:
>
> One of my variables gets imported as a factor instead of a numeric variable.
> Now I have a...
> Factor w/ 63 levels "0","0.02","0.03",..: 1 NA NA 1 NA NA 1 1 53 10 ...
>
> How can I transform these factor levels into actual values?
>
> Thank you very much for any help!
> David
>
>   [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Counting within groups / means by groups

2014-11-10 Thread David L Carlson

In addition to Jeff's recommendation, you need to read a basic introduction to 
R. Your data frame is probably not what you think it is:

> group<-c("A", "A", "A", "B", "B", "B", "B", "C")
> value<-c(1,3,2,2,2,4,4,1)
> df<-as.data.frame(cbind(group, value))
> str(df)
'data.frame':   8 obs. of  2 variables:
 $ group: Factor w/ 3 levels "A","B","C": 1 1 1 2 2 2 2 3
 $ value: Factor w/ 4 levels "1","2","3","4": 1 3 2 2 2 4 4 1

By using cbind() you combined a character vector and a numeric vector into a 
matrix so R converted the numeric value to characters since a matrix can hold 
only a single data type. The cbind() function is generic and which version you 
get depends on the first argument.
> cbind(group, value)
 group value
[1,] "A"   "1"  
[2,] "A"   "3"  
[3,] "A"   "2"  
[4,] "B"   "2"  
[5,] "B"   "2"  
[6,] "B"   "4"  
[7,] "B"   "4"  
[8,] "C"   "1"  

Then you used as.data.frame() to convert the character matrix to a data.frame. 
The default for character variables is to convert those to factors. All you 
need is
> dfa <- data.frame(group, value)
> str(dfa)
'data.frame':   8 obs. of  2 variables:
 $ group: Factor w/ 3 levels "A","B","C": 1 1 1 2 2 2 2 3
 $ value: num  1 3 2 2 2 4 4 1

I changed df to dfa since df() is the density function for the f distribution. 
R is not likely to get confused, but you might.

Then read the manual page on ave() to see why these work and how to adapt them:

> ave(dfa$value, dfa$group, FUN=length)
[1] 3 3 3 4 4 4 4 1
> ave(dfa$value, dfa$group)
[1] 2 2 2 3 3 3 3 1

-
David L Carlson
Department of Anthropology
Texas A&M University
College Station, TX 77840-4352

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Jeff Newmiller
Sent: Monday, November 10, 2014 9:19 AM
To: stude...@gmail.com; r-help@r-project.org
Subject: Re: [R] Counting within groups / means by groups

Help file ?ave should apply here.

Please read the Posting Guide mentioned in the footer of every email on this 
list and on the list manager page for this mailing list. It warns you to read 
the archives before posting and to post in plain text format rather than HTML 
format.
---
Jeff NewmillerThe .   .  Go Live...
DCN:Basics: ##.#.   ##.#.  Live Go...
  Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
/Software/Embedded Controllers)   .OO#.   .OO#.  rocks...1k
---
Sent from my phone. Please excuse my brevity.

On November 10, 2014 6:39:47 AM PST, David Studer  wrote:
>Hi everyone!
>
>I have problems finding a solution to the following two problems:
>
>My sample-dataframe consists of two variables "group" and "value":
>
>group<-c("A", "A", "A", "B", "B", "B", "B", "C")
>value<-c(1,3,2,2,2,4,4,1)
>df<-as.data.frame(cbind(group, value))
>
>Problem 1:
>**
>
>Now I'd like to count the number of group-A-cases, group-B-cases etc
>and
>write
>this number into a new column. It should be like:
>
>count_group<-c(3, 3, 3, 4, 4, 4, 4, 1)
>
>Problem 2:
>***
>
>I'd like to add new column with the mean values (or any other function)
>within
>my groups. E.g:
>
>Group A: (1+3+2)/3=2
>Group B: (2+2+4+4)/4=3
>Group C: =1
>
>Now I'd add another column 2 2 3 3 3 3 1
>
>
>Can anyone help me, how this can be done best?
>
>Thank you!
>David
>
>   [[alternative HTML version deleted]]
>
>__
>R-help@r-project.org mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] limit of cmdscale function

2014-11-06 Thread David L Carlson

You avoid the call to cmdscale() by supplying your own starting configuration 
(see the manual page for the y= argument). You could still hit other barriers 
within isoMDS() or insufficient memory on your computer.

-
David L Carlson
Department of Anthropology
Texas A&M University
College Station, TX 77840-4352

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Kawashima, Masayuki
Sent: Wednesday, November 5, 2014 10:51 PM
To: r-help@r-project.org
Subject: [R] limit of cmdscale function

Hi

We have a few questions regarding the use of the "isoMDS" function.

When we run "isoMDS" function using 60,000 x 60,000 data matrix, 
we get the following error message:


cmdscale(d, k) : invalid value of 'n'
Calls: isoMDS -> cmdscale


We checked the source code of "cmdscale" and found the following limitation:

## we need to handle nxn internally in dblcen
if(is.na(n) || n > 46340) stop("invalid value of 'n'")


1. This cmdscale limitation ('n > 46340') is due to the limitation of BLAS and 
LAPACK variables(int4) which can only handle '2^31-1' amount of data?

2. Is there any workaround to run isoMDS using large data (i.e. greater than 
46340)?
   We would like to run isoMDS using a maximum of 150,000x150,000 data matrix.

Best regards

Masayuki Kawashima
Email: kawasim...@jp.fujitsu.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] reading data from a web

2014-10-29 Thread David L Carlson

You did not read the data with the commands you provided since c1 is not 
defined so read.fwf() fails immediately. Here is a solution that works for the 
link you provided, but would need to be modified for months that do not have 30 
days:

> lnk <- 
> "http://www.data.jma.go.jp/gmd/env/data/radiation/data/geppo/201004/DR201004_sap.txt";
> raw <- readLines(lnk) # Read the file as text lines
> raw <- raw[19:48] # Pull out the data
> raw <- substr(raw, 16, nchar(raw))  # Strip the leading blanks
> raw <- gsub("  +", ",", raw)# Replace two or more blanks with a comma
> raw <- gsub("\\.\\.\\.", "NA", raw) # Replace ... with NA
> Solar <- read.csv(text=raw, header=FALSE, colClasses=c("character", 
+   rep("numeric", 25)))
> str(Solar)
'data.frame':   30 obs. of  26 variables:
 $ V1 : chr  "4 1" "4 2" "4 3" "4 4" ...
 $ V2 : num  NA NA NA NA NA NA NA NA NA NA ...
 $ V3 : num  NA NA NA NA NA NA NA NA NA NA ...
 $ V4 : num  NA NA NA NA NA NA NA NA NA NA ...
 $ V5 : num  NA NA NA NA NA NA NA NA NA NA ...
 $ V6 : num  NA NA NA NA NA NA NA NA NA NA ...
 $ V7 : num  0 0 0 2 0 8 0 75 2 0 ...
 $ V8 : num  0 0 17 133 0 27 36 218 1 1 ...
 $ V9 : num  0 98 29 205 0 23 4 280 1 0 ...
 $ V10: num  2 190 62 100 0 9 0 310 7 12 ...
 $ V11: num  0 237 49 227 86 9 0 321 0 0 ...
 $ V12: num  0 303 21 151 177 13 1 304 52 0 ...
 $ V13: num  0 286 72 199 131 8 2 320 33 6 ...
 $ V14: num  0 318 203 284 30 1 102 285 9 130 ...
 $ V15: num  0 314 241 282 10 0 43 286 93 107 ...
 $ V16: num  1 270 171 256 6 1 0 272 181 27 ...
 $ V17: num  3 190 100 214 34 0 11 255 177 0 ...
 $ V18: num  0 89 69 129 24 0 8 205 138 0 ...
 $ V19: num  0 7 2 27 2 0 0 80 30 0 ...
 $ V20: num  0 0 0 0 0 0 0 0 0 0 ...
 $ V21: num  NA NA NA NA NA NA NA NA NA NA ...
 $ V22: num  NA NA NA NA NA NA NA NA NA NA ...
 $ V23: num  NA NA NA NA NA NA NA NA NA NA ...
 $ V24: num  NA NA NA NA NA NA NA NA NA NA ...
 $ V25: num  NA NA NA NA NA NA NA NA NA NA ...
 $ V26: num  6 2302 1036 2209 500 ...

-
David L Carlson
Department of Anthropology
Texas A&M University
College Station, TX 77840-4352

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Alemu Tadesse
Sent: Wednesday, October 29, 2014 2:21 PM
To: r-help@r-project.org
Subject: [R] reading data from a web

Dear All,

I have data of the format shown in the link
http://www.data.jma.go.jp/gmd/env/data/radiation/data/geppo/201004/DR201004_sap.txt
 that I need to read. I have downloaded all the data from the link and I
have it on my computer. I used the following script (got it from web) and
was able to read the data. But, it is not in the format that I wanted it to
be. I want it a data frame and clean numbers.
asNumeric <- function(x) as.numeric(as.character(x))
factorsNumeric <- function(data) modifyList(data, lapply(data[,
sapply(data, is.logical)],asNumeric))

data=read.fwf(filename, widths=c(c1),skip=18, header=FALSE)
data$V2<-as.numeric(gsub(" ","", as.character(data$V2) , fixed=TRUE))
f <- factorsNumeric(data)

Any help is appreciated.

Best,

Alemu

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] "inahull? from package alphahull not working when used with lapply

2014-10-27 Thread David L Carlson

Why not just

> library (alphahull)
> DT=data.frame(x=c(0.25,0.25,0.75,0.75),y=c(0.25,0.75,0.75,0.25))
> Hull <- ahull(DT, alpha = 0.5)
> TEST<- data.frame(x=c(0.25,0.5),y=c(0.5,0.5))
> apply(TEST, 1, function(x) inahull(Hull, x))
[1] FALSE  TRUE

---------
David L Carlson
Department of Anthropology
Texas A&M University
College Station, TX 77840-4352

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Bart Kastermans
Sent: Monday, October 27, 2014 2:10 PM
To: r-help@r-project.org
Subject: Re: [R] "inahull? from package alphahull not working when used with 
lapply

On 27/10/14 19:42, Camilo Mora wrote:
> Hi Bart,
> 
> Even after putting the variables in the apply function, the results come not 
> right:
> 
> library (alphahull)
> DT=data.frame(x=c(0.25,0.25,0.75,0.75),y=c(0.25,0.75,0.75,0.25))
> Hull <- ahull(DT, alpha = 0.5)
> 
> TEST<- data.frame(x=c(0.25,0.5),y=c(0.5,0.5))
> plot(Hull)
> points(TEST)
> 
> InHul2D <- function(Val1, Val2, Hull) inahull(Hull, p = c(Val1, Val2))
> 
> IN <- apply(TEST, 1, function(x,y) InHul2D("x","y",Hull))
> 
> 

Try with this version of your function:

InHul2D <- function(Val1, Val2, Hull) {
stopifnot(is.numeric(Val1),
  is.numeric(Val2))
inahull(Hull, p = c(Val1, Val2))
}

And answer the question; why would you put quotes around x and y in
InHul2D call in apply?  Once you remove the quotes, and get the error
"Error: argument "y" is missing, with no default" that I mentioned in my
last email, look at my last email to find out why.

I'll be happy to help you further with this, but then you have to
explain the output you get from using my version of InHul2D (before you
remove the quotes), and why my last email didn't solve the problem after
you removed the quotes.

Check ?stopifnot, and ?is.numeric

Best,
Bart

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Selecting rows/columns of a matrix

2014-10-26 Thread David L Carlson

Note that you do not have to create the vector of 1's (TRUE) and 0's (FALSE) if 
you know the index values:

> j <- c(2, 4, 6)
> a[j, j]
 [,1] [,2] [,3]
[1,]8   20   32
[2,]   10   22   34
[3,]   12   24   36

==========
David L. Carlson
Department of Anthropology
Texas A&M University


-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Steven Yen
Sent: Sunday, October 26, 2014 1:57 PM
To: Rui Barradas; r-help
Subject: Re: [R] Selecting rows/columns of a matrix

Rui

Thanks. This works great. Below, I get the 2nd, 4th, and 6th rows/columns:

 > (a<-matrix(1:36,6,6))
  [,1] [,2] [,3] [,4] [,5] [,6]
[1,]17   13   19   25   31
[2,]28   14   20   26   32
[3,]39   15   21   27   33
[4,]4   10   16   22   28   34
[5,]5   11   17   23   29   35
[6,]6   12   18   24   30   36
 > (j<-matrix(c(0,1,0,1,0,1)))
  [,1]
[1,]0
[2,]1
[3,]0
[4,]1
[5,]0
[6,]1
 > ((a[as.logical(j), as.logical(j)]))
  [,1] [,2] [,3]
[1,]8   20   32
[2,]   10   22   34
[3,]   12   24   36

Steven Yen

At 02:49 PM 10/26/2014, Rui Barradas wrote:
>Sorry, that should be
>
>t(a[as.logical(j), as.logical(j)])
>
>Rui Barradas
>
>Em 26-10-2014 18:45, Rui Barradas escreveu:
>>Hello,
>>
>>Try the following.
>>
>>a[as.logical(j), as.logical(j)]
>>
>># or
>>b <- a[as.logical(j), ]
>>t(b)[as.logical(j), ]
>>
>>
>>Hope this helps,
>>
>>Rui Barradas
>>
>>Em 26-10-2014 18:35, Steven Yen escreveu:
>>>Dear
>>>
>>>I am interested in selecting rows and columns of a matrix with a 
>>>criterion defined by a binary indicator vector. Let  matrix a be
>>>
>>>  > a<-matrix(1:16, 4,4,byrow=T)
>>>  > a
>>>   [,1] [,2] [,3] [,4]
>>>[1,]1234
>>>[2,]5678
>>>[3,]9   10   11   12
>>>[4,]   13   14   15   16
>>>
>>>Elsewhere in Gauss, I select the first and third rows and columns of 
>>>a by defining a column vector j = [1,0,1,0]. Then, select the rows of 
>>>a using j, and then selecting the rows of the transpose of the 
>>>resulting matrix using j again. I get the 2 x 2 matrix as desired. Is 
>>>there a way to do this in R? below are my Gauss commands. Thank you.
>>>
>>>---
>>>
>>>j
>>>
>>>1
>>>0
>>>1
>>>0
>>>
>>>a=selif(a,j); a
>>>
>>>1  2  3  4
>>>9 10 11 12
>>>
>>>a=selif(a',j); a
>>>
>>>1  9
>>>3 11
>>>
>>>__
>>>R-help@r-project.org mailing list
>>>https://stat.ethz.ch/mailman/listinfo/r-help
>>>PLEASE do read the posting guide
>>>http://www.R-project.org/posting-guide.html
>>>and provide commented, minimal, self-contained, reproducible code.
>>
>>__
>>R-help@r-project.org mailing list
>>https://stat.ethz.ch/mailman/listinfo/r-help
>>PLEASE do read the posting guide
>>http://www.R-project.org/posting-guide.html
>>and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] how to calculate a numeric's digits count?

2014-10-24 Thread David L Carlson

Where do these numbers come from? If they are calculated values, they are 
actually many decimal places longer than your examples. They are represented on 
your terminal with fewer decimals according to the setting of 
options("digits"). 

For example:

> sqrt(2)*sqrt(2)
[1] 2
> sqrt(2)*sqrt(2) == 2  
[1] FALSE
# FAQ 7.31 Why doesn’t R think these numbers are equal?
> options("digits")
$digits
[1] 7
> options(digits=22)
> sqrt(2)*sqrt(2)
[1] 2.000444089

If the numbers were read from a plain text file and you are talking about how 
they are represented in the file, analyze them as character strings.

-------------
David L Carlson
Department of Anthropology
Texas A&M University
College Station, TX 77840-4352

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of PO SU
Sent: Thursday, October 23, 2014 10:35 PM
To: R. Help
Subject: [R] how to calculate a numeric's digits count?


Dear usRers,
  Now i want to cal ,e.g. 
 cal(1.234)  will get 3
 cal(1) will get 0
 cal(1.3045) will get 4
 But the difficult part is cal(1.3450) will get 4 not 3.
So, is there anyone happen to know the solution to this problem, or it can't be 
solved in R, because 1.340 will always be transformed autolly to 1.34?






--

PO SU
mail: desolato...@163.com 
Majored in Statistics from SJTU
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] assigning letter to a column

2014-10-17 Thread David L Carlson

Minor correction, given your code, values less than 3 will be coded as "S" 
since they are less than 15.23. In the code I suggested, values less than 3 
will be coded as missing (NA).

David C

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of David L Carlson
Sent: Friday, October 17, 2014 9:15 AM
To: Monaly Mistry; r-help@r-project.org
Subject: Re: [R] assigning letter to a column

I think it is doing exactly what you have told it to do, but that is probably 
not what you want it to do.

First, you do not need a loop since the ifelse() function is vectorized. Read 
the manual page and the examples carefully. Also you are coding ifelse() as if 
it were the same as if() {} else() {}. Again you need to refer to the 
documentation.

Second, this seems like a job for cut() not ifelse().

Third, look at your code. The first statement is x$COR_LOC>=3 | 
x$COR_LOC<15.230 so everything greater than 3 will be coded as "S." That is 
probably all of your data. You probably want to use & (and) instead of | (or). 
It is not clear what you want to happen for values less than 3 but they will be 
NA (missing).

Your entire ifelse() boils down to

set.seed(42)
x <- data.frame(COR_LOC=runif(100, 0, 30))
x$ForS <- cut(x$COR_LOC, breaks=c(3, 15.23, 19.81, 25.40, Inf),
labels=c("S", "I1", "I2", "F"), right=FALSE)

No loops, no ifelse's. Anything below 3 will 

-
David L Carlson
Department of Anthropology
Texas A&M University
College Station, TX 77840-4352



-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Monaly Mistry
Sent: Friday, October 17, 2014 8:27 AM
To: r-help@r-project.org
Subject: [R] assigning letter to a column

Hi,

I'm having trouble with assigning a letter to a column based on the value
of another column.
Since I have separate data files I've saved then into one folder and I'm
reading them in separately into the function.

The code is below.

#F= fast; S= slow; I1= Intermediate score 1; I2=Intermediate score 2
filename<-list.files(pattern="*.txt")
filename
corloc<- function(x){
  x<-read.table(filename[x], sep="\t", header=TRUE) #will extract the
relevant data file from folder 1998. ex. corloc(1) will return 1998
breeding year data
  x[,"ForS"]<-0 #new column
  for (i in length(x$CORLOC)){ #this is the bit that I'm having a problem
with since it's not assigning the appropriate letter into the "ForS" column
  ifelse(x$COR_LOC>=3 | x$COR_LOC<15.230, ForS<-"S",
 ifelse(x$COR_LOC>=15.230 | x$COR_LOC<19.810, ForS<-"I1",
ifelse(x$COR_LOC>=19.810 | x$COR_LOC<25.540,
FS<-"I2",ForS<-"F")))}
  print(x)
}

I've tried some of the solutions on stackoverflow but still was
unsuccessful.

Best,

Monaly.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] assigning letter to a column

2014-10-17 Thread David L Carlson

I think it is doing exactly what you have told it to do, but that is probably 
not what you want it to do.

First, you do not need a loop since the ifelse() function is vectorized. Read 
the manual page and the examples carefully. Also you are coding ifelse() as if 
it were the same as if() {} else() {}. Again you need to refer to the 
documentation.

Second, this seems like a job for cut() not ifelse().

Third, look at your code. The first statement is x$COR_LOC>=3 | 
x$COR_LOC<15.230 so everything greater than 3 will be coded as "S." That is 
probably all of your data. You probably want to use & (and) instead of | (or). 
It is not clear what you want to happen for values less than 3 but they will be 
NA (missing).

Your entire ifelse() boils down to

set.seed(42)
x <- data.frame(COR_LOC=runif(100, 0, 30))
x$ForS <- cut(x$COR_LOC, breaks=c(3, 15.23, 19.81, 25.40, Inf),
labels=c("S", "I1", "I2", "F"), right=FALSE)

No loops, no ifelse's. Anything below 3 will 

-
David L Carlson
Department of Anthropology
Texas A&M University
College Station, TX 77840-4352



-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Monaly Mistry
Sent: Friday, October 17, 2014 8:27 AM
To: r-help@r-project.org
Subject: [R] assigning letter to a column

Hi,

I'm having trouble with assigning a letter to a column based on the value
of another column.
Since I have separate data files I've saved then into one folder and I'm
reading them in separately into the function.

The code is below.

#F= fast; S= slow; I1= Intermediate score 1; I2=Intermediate score 2
filename<-list.files(pattern="*.txt")
filename
corloc<- function(x){
  x<-read.table(filename[x], sep="\t", header=TRUE) #will extract the
relevant data file from folder 1998. ex. corloc(1) will return 1998
breeding year data
  x[,"ForS"]<-0 #new column
  for (i in length(x$CORLOC)){ #this is the bit that I'm having a problem
with since it's not assigning the appropriate letter into the "ForS" column
  ifelse(x$COR_LOC>=3 | x$COR_LOC<15.230, ForS<-"S",
 ifelse(x$COR_LOC>=15.230 | x$COR_LOC<19.810, ForS<-"I1",
ifelse(x$COR_LOC>=19.810 | x$COR_LOC<25.540,
FS<-"I2",ForS<-"F")))}
  print(x)
}

I've tried some of the solutions on stackoverflow but still was
unsuccessful.

Best,

Monaly.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Ternary Plots Do Not Display Ellipses in PDF

2014-10-15 Thread David L Carlson

I haven't looked at the source so I don't know exactly what is going on, but I 
think I have a work around. While running your example I noticed that ellipse() 
does not just add the ellipse to the plot produced by plot(), it replots the 
figure. However, just running ellipse() without plot() generates an error 
"Error in if (coorgeo == "acomp") { : argument is of length zero" so ellipse 
needs the plot environment produced by plot(). Moving the pdf() file works on 
my Windows machine:

> plot(winters.acomp, main="Winters Creek", cex=0.5)
> pdf("winters-pdf.pdf")
> ellipses(mean=mn, var=vr, r=r, steps=72, thinRatio=NULL, aspanel=FALSE,
+  col='red', lwd=2)
> dev.off()

-
David L Carlson
Department of Anthropology
Texas A&M University
College Station, TX 77840-4352



-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Rich Shepard
Sent: Tuesday, October 14, 2014 4:21 PM
To: r-help@r-project.org
Subject: [R] Ternary Plots Do Not Display Ellipses in PDF

   A rather strange situation here and I've not found the source of the
problem.

   The point is to print a ternary plot matrix of compositional data with
ellipses enclosing 95% of the variance in each plot. The ellipses display on
the monitor, dev = x11cairo (see attached winters-x11cairo.pdf), but not when
sent directly to a file, dev = pdf (see attached winters-pdf.pdf).

   Here's winters.acomp:

structure(c(0.0667, 0.0612244897959184, 0.0434782608695652, 
0.043956043956044, 0.05, 0.0161290322580645, 0.6, 0.571428571428571, 
0.623188405797101, 0.593406593406593, 0.433, 0.629032258064516,
0.0667, 0.0612244897959184, 0.101449275362319, 0.0659340659340659, 
0.0667, 0.032258064516129, 0.244, 0.26530612244898,
0.217391304347826, 0.263736263736264, 0.367, 0.290322580645161,
0.0222, 0.0408163265306122, 0.0144927536231884, 0.032967032967033, 
0.0833, 0.032258064516129), .Dim = c(6L, 5L), .Dimnames = list(
 NULL, c("filter", "gather", "graze", "predate", "shred")), class = "acomp")

   And this is the command sequence:

> library(compositions)
> plot(winters.acomp, main="Winters Creek", cex=0.5)
> r <- sqrt(qchisq(p=0.95, df=4))
> mn <- mean(winters.acomp)
> vr <- var(winters.acomp)
> plot(winters.acomp, main="Winters Creek", cex=0.5)
> ellipses(mean=mn, var=vr, r=r, steps=72, thinRatio=NULL, aspanel=FALSE,
col='red', lwd=2)
# monitor plot window is manually closed.
> pdf("winters-pdf.pdf")
> plot(winters.acomp, main="Winters Creek", cex=0.5)
> ellipses(mean=mn, var=vr, r=r, steps=72, thinRatio=NULL, aspanel=FALSE,
col='red', lwd=2)
> dev.off()

   What am I not seeing here that causes the different outputs?

Rich

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Storing vectors as vectors in a list without losing each individual vector

2014-10-14 Thread David L Carlson

If you just want to plot the various combinations of a set of 
variables/columns, you don't need a list, just another data frame/matrix with 
the combinations of the column numbers you want to plot:

> df <- matrix(rnorm(100), 10, 10)
> df <- data.frame(df)
> comb <- expand.grid(7:10, 7:10)
> comb <- comb[comb[,1] < comb[,2],]
> rownames(comb) <- NULL
> comb
  Var1 Var2
178
279
389
47   10
58   10
69   10
> windows(record=TRUE)
> apply(comb, 1, function(x) plot(df[,x[1]], df[,x[2]], 
+ main=paste("Plot of", x[1], "with", x[2])))
NULL

-
David L Carlson
Department of Anthropology
Texas A&M University
College Station, TX 77840-4352

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Patricia Seo
Sent: Monday, October 13, 2014 6:28 PM
To: r-help@r-project.org
Subject: [R] Storing vectors as vectors in a list without losing each 
individual vector

Hi everyone,

My help request is similar to what was asked by Ken Termiso on April 18th, 
2005. Link here: https://stat.ethz.ch/pipermail/r-help/2005-April/069729.html

Matt Wiener answered with suggesting a vector list where you hand type each of 
the vectors. This is not what I want to do. What I want to do is automate the 
process. So, in other words creating a list through a loop. 

For example:

My data frame is called "df" and I have four variables/vectors that are v7, v8, 
v9, 10. Each variable/vector is an integer (no character strings). I want to 
create a list called "Indexes" so that I can use this list for "for-in" loops 
to SEPARATELY plot each and every variable/vector. 

If I followed Matt Wiener's suggestion, I would input this:


Indexes = list()
Indexes[[1]] = df$v7 
Indexes[[2]] = df$v8
Indexes[[3]] = df$v9
Indexes[[4]] = df$v10

But if I want to include more than four variable/vectors (let's say I want to 
include 25 of them!), I do not want to have to type all of it. If I do the 
following command:

Indexes <- c(df$v7, df$v8, df$v9, df$v10)

then I run into the same problem as Ken Termiso with having all the integers in 
one vector. I need to keep the variables/vectors separate. 

Is this just not possible in R? Any help would be great. Thank you!

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] cbind in a loop...better way? | summary

2014-10-09 Thread David L Carlson

Actually Jeff Laake's can be made even shorter with

sapply(mat_list, as.vector)

David C

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Evan Cooch
Sent: Thursday, October 9, 2014 7:37 AM
To: Evan Cooch; r-help@r-project.org
Subject: Re: [R] cbind in a loop...better way? | summary

Two solutions proposed -- not entirely orthogonal, but both do the 
trick. Instead of nesting cbin in a loop (as I did originally -- OP, 
below),

1\   do.call(cbind, lapply(mat_list, as.vector))

or

2\   sapply(mat_list,function(x) as.vector(x))


Both work fine. Thanks to Jeff Laake (2) + David Carlson (1) for their 
suggestions.


On 10/8/2014 3:12 PM, Evan Cooch wrote:
> ...or some such. I'm trying to work up a function wherein the user 
> passes a list of matrices to the function, which then (1) takes each 
> matrix, (2) performs an operation to 'vectorize' the matrix (i.e., 
> given an (m x n) matrix x, this produces the vector Y of length  m*n 
> that contains the columns of the matrix x, stacked below each other), 
> and then (3) cbinds them together.
>
> Here is an example using the case where I know how many matrices I 
> need to cbind together. For this example, 2 square (3x3) matrices:
>
>  a <- matrix(c,0,20,50,0.05,0,0,0,0.1,0),3,3,byrow=T)
>  b <- matrix(c(0,15,45,0.15,0,0,0,0.2,0),3,3,byrow=T)
>
> I want to vec them, and then cbind them together. So,
>
> result  <- cbind(matrix(a,nr=9), matrix(b,nr=9))
>
> which yields the following:
>
>   [,1]  [,2]
>  [1,]  0.00  0.00
>  [2,]  0.05  0.15
>  [3,]  0.00  0.00
>  [4,] 20.00 15.00
>  [5,]  0.00  0.00
>  [6,]  0.10  0.20
>  [7,] 50.00 45.00
>  [8,]  0.00  0.00
>  [9,]  0.00  0.00
>
> Easy enough. But, I want to put it in a function, where the number and 
> dimensions  of the matrices is not specified. Something like
>
> Using matrices (a) and (b) from above, let
>
>   env <- list(a,b).
>
> Now, a function (or attempt at same) to perform the desired operations:
>
>   vec=function(matlist) {
>
>   n_mat=length(matlist);
>   size_mat=dim(matlist[[1]])[1];
>
>   result=cbind()
>
>for (i in 1:n_mat) {
>  result=cbind(result,matrix(matlist[[i]],nr=size_mat^2))
>   }
>
>  return(result)
>
>}
>
>
> When I run vec(env), I get the *right answer*, but I am wondering if 
> there is a *better* way to get there from here than the approach I use 
> (above). I'm not so much interested in 'computational efficiency' as I 
> am in stability, and flexibility.
>
> Thanks...
>
> .
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] cbind in a loop...better way?

2014-10-08 Thread David L Carlson

How about

> do.call(cbind, lapply(env, as.vector))
   [,1]  [,2]
 [1,]  0.00  0.00
 [2,]  0.05  0.15
 [3,]  0.00  0.00
 [4,] 20.00 15.00
 [5,]  0.00  0.00
 [6,]  0.10  0.20
 [7,] 50.00 45.00
 [8,]  0.00  0.00
 [9,]  0.00  0.00

-----
David L Carlson
Department of Anthropology
Texas A&M University
College Station, TX 77840-4352

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Evan Cooch
Sent: Wednesday, October 8, 2014 2:13 PM
To: r-help@r-project.org
Subject: [R] cbind in a loop...better way?

...or some such. I'm trying to work up a function wherein the user 
passes a list of matrices to the function, which then (1) takes each 
matrix, (2) performs an operation to 'vectorize' the matrix (i.e., given 
an (m x n) matrix x, this produces the vector Y of length  m*n that 
contains the columns of the matrix x, stacked below each other), and 
then (3) cbinds them together.

Here is an example using the case where I know how many matrices I need 
to cbind together. For this example, 2 square (3x3) matrices:

  a <- matrix(c,0,20,50,0.05,0,0,0,0.1,0),3,3,byrow=T)
  b <- matrix(c(0,15,45,0.15,0,0,0,0.2,0),3,3,byrow=T)

I want to vec them, and then cbind them together. So,

result  <- cbind(matrix(a,nr=9), matrix(b,nr=9))

which yields the following:

   [,1]  [,2]
  [1,]  0.00  0.00
  [2,]  0.05  0.15
  [3,]  0.00  0.00
  [4,] 20.00 15.00
  [5,]  0.00  0.00
  [6,]  0.10  0.20
  [7,] 50.00 45.00
  [8,]  0.00  0.00
  [9,]  0.00  0.00

Easy enough. But, I want to put it in a function, where the number and 
dimensions  of the matrices is not specified. Something like

Using matrices (a) and (b) from above, let

   env <- list(a,b).

Now, a function (or attempt at same) to perform the desired operations:

   vec=function(matlist) {

   n_mat=length(matlist);
   size_mat=dim(matlist[[1]])[1];

   result=cbind()

for (i in 1:n_mat) {
  result=cbind(result,matrix(matlist[[i]],nr=size_mat^2))
   }

  return(result)

}


When I run vec(env), I get the *right answer*, but I am wondering if 
there is a *better* way to get there from here than the approach I use 
(above). I'm not so much interested in 'computational efficiency' as I 
am in stability, and flexibility.

Thanks...

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Conditional Data Manipulation -Cumulative Product

2014-10-07 Thread David L Carlson

I think this works, at least for your example data. The function SSRuns gets 
the index values of the starting points and then finds the first ending point 
that is greater or equal. Then we cycle through the starting points and print 
the index values from start to stop. Those are combined into a single vector 
which is used to create each column of the mask for the data.

SSRuns <- function(x, y, rows) {
a <- which(x>0)
b <- which(y>0)
d <- unlist(lapply(seq_along(a), function(i) 
a[i]:head(b[a[i] <= b], 1)))
v <- rep(0, rows)
v[d] <- 1
return(v)
}
mask <- sapply(StartSignals[,-1], SSRuns, y=StopSignals$Stop, 
rows=nrow(MainData))
Results <- data.frame(Date=MainData$Date, MainData[,-1]*mask)
Results
 Date   X1   X2   X3   X4   X5
1  2014-01-01 0.00 0.00 0.00 0.00 0.00
2  2014-01-02 0.00 1.51 0.00 0.00 1.24
3  2014-01-03 0.00 0.09 0.20 0.00 0.30
4  2014-01-04 0.00 0.00 0.00 0.00 0.00
5  2014-01-05 1.04 0.00 0.00 0.00 1.23
6  2014-01-06 0.00 0.00 0.76 0.00 0.00
7  2014-01-07 0.00 0.00 1.22 0.66 0.00
8  2014-01-08 0.00 0.00 0.27 0.09 0.00
9  2014-01-09 0.00 0.00 0.00 0.00 0.00
10 2014-01-10 0.00 0.00 1.68 0.98 0.00
11 2014-01-11 0.43 0.00 1.98 1.46 0.00
12 2014-01-12 1.51 0.78 1.63 0.46 1.84
13 2014-01-13 0.26 0.34 0.34 0.97 1.13

David C

-Original Message-
From: Pooya Lalehzari [mailto:plalehz...@platinumlp.com] 
Sent: Tuesday, October 7, 2014 8:06 PM
To: David L Carlson
Subject: RE: [R] Conditional Data Manipulation -Cumulative Product

Hi David,
I also made a dput of the Expected Results in case if you want to read it in:
> dput(ExpResults)
structure(list(Date = c("1/1/2014", "1/2/2014", "1/3/2014", "1/4/2014", 
"1/5/2014", "1/6/2014", "1/7/2014", "1/8/2014", "1/9/2014", "1/10/2014", 
"1/11/2014", "1/12/2014", "1/13/2014"), X1 = c(0, 0, 0, 0, 1.04, 0, 0, 0, 0, 0, 
0.43, 0.65, 0.17), X2 = c(0, 1.51, 0.14, 0, 0, 0, 0, 0, 0, 0, 0, 0.78, 0.27), 
X3 = c(0, 0, 0.2, 0, 0, 0.76, 0.93, 0.25, 0, 1.68, 3.33, 5.42, 1.84), X4 = c(0, 
0, 0, 0, 0, 0, 0.66, 0.06, 0, 0.98, 1.43, 0.66, 0.64), X5 = c(0, 1.24, 0.37, 0, 
1.23, 0, 0, 0, 0, 0, 0, 1.84, 2.08)), .Names = c("Date", "X1", "X2", "X3", 
"X4", "X5"), class = "data.frame", row.names = c(NA,
-13L))

-Original Message-
From: David L Carlson [mailto:dcarl...@tamu.edu]
Sent: Tuesday, October 07, 2014 5:03 PM
To: Pooya Lalehzari
Cc: R help
Subject: RE: [R] Conditional Data Manipulation -Cumulative Product

More clear to read, but this is much easier to load into R. Then adding 

StartSignals$Date <- as.Date(StartSignals$Date, "%m/%d/%Y") MainData$Date <- 
as.Date(MainData$Date, "%m/%d/%Y") StopSignals$Date <- 
as.Date(StopSignals$Date, "%m/%d/%Y")

Creates date objects out of the character strings.

But what should the final result look like? For example X1 has two start dates, 
"2014-01-05" and "2014-01-11" and you have stop dates of "2014-01-03", 
"2014-01-05", "2014-01-08", and "2014-01-13". So for X1 "2014-01-05" is both a 
start and stop date (value 1.04) and the second start/end would be "2014-01-11" 
to "2014-01-13" (values .43, 1.51, .26). What do you mean by compounding?

David C


-Original Message-
From: Pooya Lalehzari [mailto:plalehz...@platinumlp.com]
Sent: Tuesday, October 7, 2014 2:59 PM
To: David L Carlson
Subject: RE: [R] Conditional Data Manipulation -Cumulative Product

Dear David,
This is the dput output but I think the previous email had it more clearly.


> dput(StartSignals)
structure(list(Date = c("1/1/2014", "1/2/2014", "1/3/2014", "1/4/2014", 
"1/5/2014", "1/6/2014", "1/7/2014", "1/8/2014", "1/9/2014", "1/10/2014", 
"1/11/2014", "1/12/2014", "1/13/2014"), X1 = c(0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 
0L, 0L, 1L, 0L, 0L), X2 = c(0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 
0L), X3 = c(0L, 0L, 1L, 0L, 0L, 1L, 0L, 0L, 0L, 1L, 0L, 0L, 0L), X4 = c(0L, 0L, 
0L, 0L, 0L, 0L, 1L, 0L, 0L, 1L, 0L, 0L, 0L), X5 = c(0L, 1L, 0L, 0L, 1L, 0L, 0L, 
0L, 0L, 0L, 0L, 1L, 0L)), .Names = c("Date", "X1", "X2", "X3", "X4", "X5"), 
class = "data.frame", row.names = c(NA, -13L))
> dput(MainData)
structure(list(Date = c("1/1/2014", "1/2/2014", "1/3/2014", "1/4/2014", 
"1/5/2014", "1/6/2014", "1/7/2014", "1/8/2014", "1/9/2014", "1/10/2014", 
"1/11/2014", "1/12/2014", "1/13/2014"), X1

Re: [R] Conditional Data Manipulation -Cumulative Product

2014-10-07 Thread David L Carlson

More clear to read, but this is much easier to load into R. Then adding 

StartSignals$Date <- as.Date(StartSignals$Date, "%m/%d/%Y")
MainData$Date <- as.Date(MainData$Date, "%m/%d/%Y")
StopSignals$Date <- as.Date(StopSignals$Date, "%m/%d/%Y")

Creates date objects out of the character strings.

But what should the final result look like? For example X1 has two start dates, 
"2014-01-05" and "2014-01-11" and you have stop dates of "2014-01-03", 
"2014-01-05", "2014-01-08", and "2014-01-13". So for X1 "2014-01-05" is both a 
start and stop date (value 1.04) and the second start/end would be "2014-01-11" 
to "2014-01-13" (values .43, 1.51, .26). What do you mean by compounding?

David C


-Original Message-
From: Pooya Lalehzari [mailto:plalehz...@platinumlp.com] 
Sent: Tuesday, October 7, 2014 2:59 PM
To: David L Carlson
Subject: RE: [R] Conditional Data Manipulation -Cumulative Product

Dear David,
This is the dput output but I think the previous email had it more clearly.


> dput(StartSignals)
structure(list(Date = c("1/1/2014", "1/2/2014", "1/3/2014", "1/4/2014", 
"1/5/2014", "1/6/2014", "1/7/2014", "1/8/2014", "1/9/2014", "1/10/2014", 
"1/11/2014", "1/12/2014", "1/13/2014"), X1 = c(0L, 0L, 0L, 0L, 
1L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L), X2 = c(0L, 1L, 0L, 0L, 0L, 
0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L), X3 = c(0L, 0L, 1L, 0L, 0L, 1L, 
0L, 0L, 0L, 1L, 0L, 0L, 0L), X4 = c(0L, 0L, 0L, 0L, 0L, 0L, 1L, 
0L, 0L, 1L, 0L, 0L, 0L), X5 = c(0L, 1L, 0L, 0L, 1L, 0L, 0L, 0L, 
0L, 0L, 0L, 1L, 0L)), .Names = c("Date", "X1", "X2", "X3", "X4", 
"X5"), class = "data.frame", row.names = c(NA, -13L))
> dput(MainData)
structure(list(Date = c("1/1/2014", "1/2/2014", "1/3/2014", "1/4/2014", 
"1/5/2014", "1/6/2014", "1/7/2014", "1/8/2014", "1/9/2014", "1/10/2014", 
"1/11/2014", "1/12/2014", "1/13/2014"), X1 = c(1.92, 0.67, 1.09, 
1.81, 1.04, 1.69, 1.57, 0.5, 0, 1.31, 0.43, 1.51, 0.26), X2 = c(1.38, 
1.51, 0.09, 1.33, 0.38, 1.12, 1.3, 1.75, 1.26, 1.57, 1.63, 0.78, 
0.34), X3 = c(0.83, 1.21, 0.2, 1.57, 1.72, 0.76, 1.22, 0.27, 
0.59, 1.68, 1.98, 1.63, 0.34), X4 = c(1.25, 0.06, 1.62, 1.68, 
1.98, 1.45, 0.66, 0.09, 0.4, 0.98, 1.46, 0.46, 0.97), X5 = c(1.12, 
1.24, 0.3, 1.41, 1.23, 1.99, 1.75, 1.91, 1.81, 1.79, 0.81, 1.84, 
1.13)), .Names = c("Date", "X1", "X2", "X3", "X4", "X5"), class = "data.frame", 
row.names = c(NA, 
-13L))
> dput(StopSignals)
structure(list(Date = c("1/1/2014", "1/2/2014", "1/3/2014", "1/4/2014", 
"1/5/2014", "1/6/2014", "1/7/2014", "1/8/2014", "1/9/2014", "1/10/2014", 
"1/11/2014", "1/12/2014", "1/13/2014"), Stop = c(0L, 0L, 1L, 
0L, 1L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 1L)), .Names = c("Date", 
"Stop"), class = "data.frame", row.names = c(NA, -13L))


-Original Message-
From: David L Carlson [mailto:dcarl...@tamu.edu] 
Sent: Tuesday, October 07, 2014 3:13 PM
To: Pooya Lalehzari; R help
Subject: RE: [R] Conditional Data Manipulation -Cumulative Product

You need to use plain text, not html in your email. Your data are scrambled 
(see below). It is better to send your data using the R dput() function:

dput(StartSignals)
dput(MainData)
dput(StopSignals)

-
David L Carlson
Department of Anthropology
Texas A&M University
College Station, TX 77840-4352


-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Pooya Lalehzari
Sent: Tuesday, October 7, 2014 11:55 AM
To: R help
Subject: [R] Conditional Data Manipulation -Cumulative Product

Hello,
I have three datasets StartSignals, MainData, StopSignals and need to compound 
the data for each variable in MainData over dates that fall between the Start 
and Stop signals. (Stop signals are common and the same to all X1:X5 
variables). Please see sample below:
The one way I was thinking of doing this project was to setup a nested "FOR" 
loop and go through the three data matrices. Is there a more elegant way of 
doing this?
Thank you.

StartSignals:
Date

X1

X2

X3

X4

X5

1/1/2014

0

0

0

0

0

1/2/2014

0

1

0

0

1

1/3/2014

0

0

1

0

0

1/4/2014

0

0

0

0

0

1/5/2014

1

0

0

0

1

1/6/2014

0

0

1

0

0

1/7/2014

0

0

0

1

0

1/8/2014

0

0

0

0

0

1/9/2014

0

0

0

0

0

1/10/2014

0

0

1

1

0

1/11/2014

1

0

0

0

0

1/12/2014

0

1

0

0

1

1/

Re: [R] Conditional Data Manipulation -Cumulative Product

2014-10-07 Thread David L Carlson

You need to use plain text, not html in your email. Your data are scrambled 
(see below). It is better to send your data using the R dput() function:

dput(StartSignals)
dput(MainData)
dput(StopSignals)

-
David L Carlson
Department of Anthropology
Texas A&M University
College Station, TX 77840-4352


-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Pooya Lalehzari
Sent: Tuesday, October 7, 2014 11:55 AM
To: R help
Subject: [R] Conditional Data Manipulation -Cumulative Product

Hello,
I have three datasets StartSignals, MainData, StopSignals and need to compound 
the data for each variable in MainData over dates that fall between the Start 
and Stop signals. (Stop signals are common and the same to all X1:X5 
variables). Please see sample below:
The one way I was thinking of doing this project was to setup a nested "FOR" 
loop and go through the three data matrices. Is there a more elegant way of 
doing this?
Thank you.

StartSignals:
Date

X1

X2

X3

X4

X5

1/1/2014

0

0

0

0

0

1/2/2014

0

1

0

0

1

1/3/2014

0

0

1

0

0

1/4/2014

0

0

0

0

0

1/5/2014

1

0

0

0

1

1/6/2014

0

0

1

0

0

1/7/2014

0

0

0

1

0

1/8/2014

0

0

0

0

0

1/9/2014

0

0

0

0

0

1/10/2014

0

0

1

1

0

1/11/2014

1

0

0

0

0

1/12/2014

0

1

0

0

1

1/13/2014

0

0

0

0

0




MainData:
Date

X1

X2

X3

X4

X5

1/1/2014

1.92

1.38

0.83

1.25

1.12

1/2/2014

0.67

1.51

1.21

0.06

1.24

1/3/2014

1.09

0.09

0.2

1.62

0.3

1/4/2014

1.81

1.33

1.57

1.68

1.41

1/5/2014

1.04

0.38

1.72

1.98

1.23

1/6/2014

1.69

1.12

0.76

1.45

1.99

1/7/2014

1.57

1.3

1.22

0.66

1.75

1/8/2014

0.5

1.75

0.27

0.09

1.91

1/9/2014

0

1.26

0.59

0.4

1.81

1/10/2014

1.31

1.57

1.68

0.98

1.79

1/11/2014

0.43

1.63

1.98

1.46

0.81

1/12/2014

1.51

0.78

1.63

0.46

1.84

1/13/2014

0.26

0.34

0.34

0.97

1.13




StopSignals:
Date

Stop

1/1/2014

0

1/2/2014

0

1/3/2014

1

1/4/2014

0

1/5/2014

1

1/6/2014

0

1/7/2014

0

1/8/2014

1

1/9/2014

0

1/10/2014

0

1/11/2014

0

1/12/2014

0

1/13/2014

1



ExpectedResult:

Date

X1

X2

X3

X4

X5

1/1/2014

0

0

0

0

0

1/2/2014

0

1.51

0

0

1.24

1/3/2014

0

0.14

0.2

0

0.37

1/4/2014

0

0

0

0

0

1/5/2014

1.04

0

0

0

1.23

1/6/2014

0

0

0.76

0

0

1/7/2014

0

0

0.93

0.66

0

1/8/2014

0

0

0.25

0.06

0

1/9/2014

0

0

0

0

0

1/10/2014

0

0

1.68

0.98

0

1/11/2014

0.43

0

3.33

1.43

0

1/12/2014

0.65

0.78

5.42

0.66

1.84

1/13/2014

0.17

0.27

1.84

0.64

2.08










***
We are pleased to announce that, as of October 20th, 2014, we will be moving to
our new office at:
Platinum Partners
250 West 55th Street, 14th Floor, New York, NY 10019
T: 212.582. | F: 212.582.2424
***
THIS E-MAIL IS FOR THE SOLE USE OF THE INTENDED RECIPIENT(S) AND MAY CONTAIN
CONFIDENTIAL AND PRIVILEGED INFORMATION.ANY UNAUTHORIZED REVIEW, USE, DISCLOSURE
OR DISTRIBUTION IS PROHIBITED. IF YOU ARE NOT THE INTENDED RECIPIENT, PLEASE
CONTACT THE SENDER BY REPLY E-MAIL AND DESTROY ALL COPIES OF THE ORIGINAL 
E-MAIL.
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to extract table results from survival summary object

2014-10-07 Thread David L Carlson

This will create a data.frame containing the results of the summary(mod) 
object. You can find out what that is using the command ?summary.survfit. You 
have an error in your example since death is not a variable in lung:

> library(survival)
> data(lung)
> mod <- with(lung, survfit(Surv(time, status)~ 1))
> res <- summary(mod)
> str(res)
List of 14
 $ n: int 228
 $ time : num [1:139] 5 11 12 13 15 26 30 31 53 54 ...
 $ n.risk   : num [1:139] 228 227 224 223 221 220 219 218 217 215 ...
 $ n.event  : num [1:139] 1 3 1 2 1 1 1 1 2 1 ...
 $ n.censor : num [1:139] 0 0 0 0 0 0 0 0 0 0 ...
 $ surv : num [1:139] 0.996 0.982 0.978 0.969 0.965 ...
 $ type : chr "right"
 $ std.err  : num [1:139] 0.00438 0.00869 0.0097 0.01142 0.01219 ...
 $ upper: num [1:139] 1 1 0.997 0.992 0.989 ...
 $ lower: num [1:139] 0.987 0.966 0.959 0.947 0.941 ...
 $ conf.type: chr "log"
 $ conf.int : num 0.95
 $ call : language survfit(formula = Surv(time, status) ~ 1)
 $ table: Named num [1:7] 228 228 228 165 310 285 363
  ..- attr(*, "names")= chr [1:7] "records" "n.max" "n.start" "events" ...
 - attr(*, "class")= chr "summary.survfit"
> # Extract the columns you want
> cols <- lapply(c(2:6, 8:10) , function(x) res[x])
> # Combine the columns into a data frame
> tbl <- do.call(data.frame, cols)
> str(tbl)
'data.frame':   139 obs. of  8 variables:
 $ time: num  5 11 12 13 15 26 30 31 53 54 ...
 $ n.risk  : num  228 227 224 223 221 220 219 218 217 215 ...
 $ n.event : num  1 3 1 2 1 1 1 1 2 1 ...
 $ n.censor: num  0 0 0 0 0 0 0 0 0 0 ...
 $ surv: num  0.996 0.982 0.978 0.969 0.965 ...
 $ std.err : num  0.00438 0.00869 0.0097 0.01142 0.01219 ...
 $ upper   : num  1 1 0.997 0.992 0.989 ...
 $ lower   : num  0.987 0.966 0.959 0.947 0.941 ...

Since res is a list containing the columns you want plus other information, we 
need to extract the needed columns from res and then combine those columns into 
a data.frame.

-
David L Carlson
Department of Anthropology
Texas A&M University
College Station, TX 77840-4352

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Juan Andres Hernandez
Sent: Tuesday, October 7, 2014 8:41 AM
To: r-help@r-project.org
Subject: [R] How to extract table results from survival summary object

Hi. I need to extract the "matrix" or "data.frame" results from a survival
object.

library(survival)
data(lung)
mod=with(lung, survfit(Surv(time,death)~ 1))
res=summary(mod)

res show in consola the "matrix" I am looking for, but I can't find the way
to save or assign this table to an object. Anyone knows how to solve it.
Thank's in advance

Juan A. Hernández

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Using PCA to filter a series

2014-10-03 Thread David L Carlson

You can reconstruct the data from the first component. Here's an example using 
singular value decomposition on the original data matrix:

> d <- cbind(d1, d2, d3, d4)
> d.svd <- svd(d)
> new <- d.svd$u[,1] * d.svd$d[1]

new is basically your cp1. If we multiply it by each of the loadings, we can 
create reconstructed values based on the first component:

> dnew <- sapply(d.svd$v[,1], function(x) new * x)
> round(head(dnew), 1)
  [,1]  [,2]  [,3]  [,4]
[1,] 119.3 134.1 135.7 134.6
[2,] 104.2 117.2 118.6 117.6
[3,] 109.7 123.3 124.8 123.8
[4,] 109.3 122.9 124.3 123.3
[5,] 105.8 119.0 120.4 119.4
[6,] 111.5 125.4 126.9 125.8
> head(d)
  d1  d2  d3  d4
[1,] 113 138 138 134
[2,] 108 115 120 115
[3,] 105 127 129 120
[4,] 103 127 129 120
[5,] 109 119 120 117
[6,] 115 126 126 123

> diag(cor(d, dnew))
[1] 0.9233742 0.9921703 0.9890085 0.9910287

Since you want a single variable to stand for all four, you could scale new to 
the mean:

> newd <- new*mean(d.svd$v[,1])
> head(newd)
[1] 130.9300 114.3972 120.3884 119.9340 116.1588 122.3983

-----
David L Carlson
Department of Anthropology
Texas A&M University
College Station, TX 77840-4352



-Original Message-
From: Jonathan Thayn [mailto:jth...@ilstu.edu] 
Sent: Thursday, October 2, 2014 11:11 PM
To: David L Carlson
Cc: r-help@r-project.org
Subject: Re: [R] Using PCA to filter a series

I suppose I could calculate the eigenvectors directly and not worry about 
centering the time-series, since they essentially the same range to begin with:

vec <- eigen(cor(cbind(d1,d2,d3,d4)))$vector
cp <- cbind(d1,d2,d3,d4)%*%vec
cp1 <- cp[,1]

I guess there is no way to reconstruct the original input data using just the 
first component, though, is there? Not the original data in it entirety, just 
one time-series that we representative of the general pattern. Possibly 
something like the following, but with just the first component:

o <- cp%*%solve(vec)

Thanks for your help. It's been a long time since I've played with PCA.

Jonathan Thayn




On Oct 2, 2014, at 4:59 PM, David L Carlson wrote:

> I think you want to convert your principal component to the same scale as d1, 
> d2, d3, and d4. But the "original space" is a 4-dimensional space in which 
> d1, d2, d3, and d4 are the axes, each with its own mean and standard 
> deviation. Here are a couple of possibilities
> 
> # plot original values for comparison
>> matplot(cbind(d1, d2, d3, d4), pch=20, col=2:5)
> # standardize the pc scores to the grand mean and sd
>> new1 <- scale(pca$scores[,1])*sd(c(d1, d2, d3, d4)) + mean(c(d1, d2, d3, d4))
>> lines(new1)
> # Use least squares regression to predict the row means for the original four 
> variables
>> new2 <- predict(lm(rowMeans(cbind(d1, d2, d3, d4))~pca$scores[,1]))
>> lines(new2, col="red")
> 
> -
> David L Carlson
> Department of Anthropology
> Texas A&M University
> College Station, TX 77840-4352
> 
> 
> 
> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
> Behalf Of Don McKenzie
> Sent: Thursday, October 2, 2014 4:39 PM
> To: Jonathan Thayn
> Cc: r-help@r-project.org
> Subject: Re: [R] Using PCA to filter a series
> 
> 
> On Oct 2, 2014, at 2:29 PM, Jonathan Thayn  wrote:
> 
>> Hi Don. I would like to "de-rotate� the first component back to its original 
>> state so that it aligns with the original time-series. My goal is to create 
>> a �cleaned�, or a �model� time-series from which noise has been removed. 
> 
> Please cc the list with replies. It�s considered courtesy plus you�ll get 
> more help that way than just from me.
> 
> Your goal sounds almost metaphorical, at least to me.  Your first axis 
> �aligns� with the original time series already in that it captures the 
> dominant variation
> across all four. Beyond that, there are many approaches to signal/noise 
> relations within time-series analysis. I am not a good source of help on 
> these, and you probably need a statistical consult (locally?), which is not 
> the function of this list.
> 
>> 
>> 
>> Jonathan Thayn
>> 
>> 
>> 
>> On Oct 2, 2014, at 2:33 PM, Don McKenzie  wrote:
>> 
>>> 
>>> On Oct 2, 2014, at 12:18 PM, Jonathan Thayn  wrote:
>>> 
>>>> I have four time-series of similar data. I would  like to combine these 
>>>> into a single, clean time-series. I could simply find the mean of each 
>>>> time period, but I think that using principal components analysis should 
>>>> extract the most salient pattern and ignore some of the noise. I can 
>>>> compute components using princomp
>

Re: [R] Using PCA to filter a series

2014-10-02 Thread David L Carlson

I think you want to convert your principal component to the same scale as d1, 
d2, d3, and d4. But the "original space" is a 4-dimensional space in which d1, 
d2, d3, and d4 are the axes, each with its own mean and standard deviation. 
Here are a couple of possibilities

# plot original values for comparison
> matplot(cbind(d1, d2, d3, d4), pch=20, col=2:5)
# standardize the pc scores to the grand mean and sd
> new1 <- scale(pca$scores[,1])*sd(c(d1, d2, d3, d4)) + mean(c(d1, d2, d3, d4))
> lines(new1)
# Use least squares regression to predict the row means for the original four 
variables
> new2 <- predict(lm(rowMeans(cbind(d1, d2, d3, d4))~pca$scores[,1]))
> lines(new2, col="red")

---------
David L Carlson
Department of Anthropology
Texas A&M University
College Station, TX 77840-4352



From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Don McKenzie
Sent: Thursday, October 2, 2014 4:39 PM
To: Jonathan Thayn
Cc: r-help@r-project.org
Subject: Re: [R] Using PCA to filter a series


On Oct 2, 2014, at 2:29 PM, Jonathan Thayn  wrote:

> Hi Don. I would like to "de-rotate� the first component back to its original 
> state so that it aligns with the original time-series. My goal is to create a 
> �cleaned�, or a �model� time-series from which noise has been removed. 

Please cc the list with replies. It�s considered courtesy plus you�ll get more 
help that way than just from me.

Your goal sounds almost metaphorical, at least to me.  Your first axis �aligns� 
with the original time series already in that it captures the dominant variation
across all four. Beyond that, there are many approaches to signal/noise 
relations within time-series analysis. I am not a good source of help on these, 
and you probably need a statistical consult (locally?), which is not the 
function of this list.

> 
> 
> Jonathan Thayn
> 
> 
> 
> On Oct 2, 2014, at 2:33 PM, Don McKenzie  wrote:
> 
>> 
>> On Oct 2, 2014, at 12:18 PM, Jonathan Thayn  wrote:
>> 
>>> I have four time-series of similar data. I would  like to combine these 
>>> into a single, clean time-series. I could simply find the mean of each time 
>>> period, but I think that using principal components analysis should extract 
>>> the most salient pattern and ignore some of the noise. I can compute 
>>> components using princomp
>>> 
>>> 
>>> d1 <- c(113, 108, 105, 103, 109, 115, 115, 102, 102, 111, 122, 122, 110, 
>>> 110, 104, 121, 121, 120, 120, 137, 137, 138, 138, 136, 172, 172, 157, 165, 
>>> 173, 173, 174, 174, 119, 167, 167, 144, 170, 173, 173, 169, 155, 116, 101, 
>>> 114, 114, 107, 108, 108, 131, 131, 117, 113)
>>> d2 <- c(138, 115, 127, 127, 119, 126, 126, 124, 124, 119, 119, 120, 120, 
>>> 115, 109, 137, 142, 142, 143, 145, 145, 163, 169, 169, 180, 180, 174, 181, 
>>> 181, 179, 173, 185, 185, 183, 183, 178, 182, 182, 181, 178, 171, 154, 145, 
>>> 147, 147, 124, 124, 120, 128, 141, 141, 138)
>>> d3 <- c(138, 120, 129, 129, 120, 126, 126, 125, 125, 119, 119, 122, 122, 
>>> 115, 109, 141, 144, 144, 148, 149, 149, 163, 172, 172, 183, 183, 180, 181, 
>>> 181, 181, 173, 185, 185, 183, 183, 184, 182, 182, 181, 179, 172, 154, 149, 
>>> 156, 156, 125, 125, 115, 139, 140, 140, 138)
>>> d4 <- c(134, 115, 120, 120, 117, 123, 123, 128, 128, 119, 119, 121, 121, 
>>> 114, 114, 142, 145, 145, 144, 145, 145, 167, 172, 172, 179, 179, 179, 182, 
>>> 182, 182, 182, 182, 184, 184, 182, 184, 183, 183, 181, 179, 172, 149, 149, 
>>> 149, 149, 124, 124, 119, 131, 135, 135, 134)
>>> 
>>> 
>>> pca <- princomp(cbind(d1,d2,d3,d4))
>>> plot(pca$scores[,1])
>>> 
>>> This seems to have created the clean pattern I want, but I would like to 
>>> project the first component back into the original axes? Is there a simple 
>>> way to do that?
>> 
>> Do you mean that you want to scale the scores on Axis 1 to the mean and 
>> range of your raw data?  Or their mean and variance?
>> 
>> See
>> 
>> ?scale
>>> 
>>> 
>>> 
>>> 
>>> Jonathan B. Thayn
>>>  
>>> 
>>> __
>>> R-help@r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>> 
>> Don McKenzie
>> Research Ecologist
>> Pacific WIldland Fire Sciences Lab
>> US Forest Service
>> 
>> Affiliate P

Re: [R] Converting factor data into Date-time format

2014-09-30 Thread David L Carlson

First, use stringsAsFactors=FALSE with the read.csv() function. That will 
prevent the conversion to factors. Then try to convert date and time to 
datetime objects. 

-
David L Carlson
Department of Anthropology
Texas A&M University
College Station, TX 77840-4352

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of tandi perkins
Sent: Tuesday, September 30, 2014 12:55 PM
To: r-help@r-project.org
Subject: [R] Converting factor data into Date-time format

Hello R help: 



I am
new to this forum so I apologize in advance for any protocol missteps.  I
have a data set that is comprised of eight birds with GPS; each of which
transmit everyday at 8:00 am, 4:00 pm, and midnight for 1 year (although I have
some missing relocation's).  I am trying to format my data to be run in
adehabitatLT but I am unsuccessful.  I have a "csv" file with
the following header: "Craneid, Date, Time, Long, Lat, Habitat,
BurstID".  R creates factor levels in the all of the data except Lat,
Long. I have attempted the following to correctly format my date and time
factors (data=l10): 



First
attempt: 

1.
 datetime=as.POSIXct(paste(l10$Date, l10$Time), format="%m/%d/%Y
%H:%M:%S", "America/Chicago") 



2.
coord=data.frame((l10$Longitude), (l10$Latitude)) 



3.
test=as.ltraj(coord, datetime, l10$Craneid, burst=l10$ID, typeII=TRUE) 



Results:Error
in as.ltraj(coord, datetime, l10$Craneid, burst = l10$ID, typeII = TRUE) : 


non unique dates for a given burst 



I
researched this error on the list serve and found that I could have duplicates
so I checked for duplicates in datetime and the return was NULL (I also check
for duplicates in Excel as I am in the learning stages in R).  Next I read
a thread posted on the R help in 2012 with a similar problem so I attempted
what was suggested as follows: 



1.
 datetime=as.POSIXct(strptime(as.character(l10$Date, l10$Time),
format="%m/%d/%Y %H:%M:%S")) 



2.test=as.ltraj(coord,
datetime, l10$Craneid, burst=l10$ID, typeII=TRUE) 



Results:
Same error. 



Finally,
I have tried: 



1.
 datetime=as.POSIXct(as.character(levels(l10$Date)(l10$Time)),
format="%m/%d/%Y %H:%M:%S")[l10$Date][l10$Time] 



Results:Error
in as.POSIXct(as.character(levels(l10$Date)(l10$Time)), format = "%m/%d/%Y
%H:%M:%S") : 


attempt to apply non-function 



Can
someone please explain what I am doing wrong?  My goal is to obtain
trajectories for all birds using each bird as a burst as is detailed in the
adehabitatLT manual and then to create Bias Random Bridges for each bird.
 I did not include my data but I can if that will be helpful. 



Thank
you in advance for your help,

TLP

  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] adding rows

2014-09-25 Thread David L Carlson

Another approach

fun <- function(i, dat=x) {
 grp <- rep(1:(nrow(dat)/i), each=i)
 aggregate(dat[1:length(grp),]~grp, FUN=sum)
}

lapply(2:6, fun, dat=TT)


-
David L Carlson
Department of Anthropology
Texas A&M University
College Station, TX 77840-4352

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Rui Barradas
Sent: Thursday, September 25, 2014 3:34 PM
To: eliza botto; r-help@r-project.org
Subject: Re: [R] adding rows

Hello,

Try the following.

fun <- function(x, r){
if(r > 0){
m <- length(x) %/% r
y <- numeric(m)
for(i in seq_len(m)){
y[i] <- sum(x[((i - 1)*r + 1):(i*r)])
}
y
}else{
NULL
}
}

apply(TT, 2, fun, r = 2)
apply(TT, 2, fun, r = 3)
etc


Hope this helps,

Rui Barradas


Em 25-09-2014 20:50, eliza botto escreveu:
> Dear useRs,
> Here is my data with two columns and 20 rows.
>> dput(TT)
> structure(c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 
> 19, 20, 24, 48, 72, 96, 120, 144, 168, 192, 216, 240, 264, 288, 312, 336, 
> 360, 384, 408, 432, 456, 480), .Dim = c(20L, 2L), .Dimnames = list(NULL, 
> c("", "SS")))
> I first of all want to sum up continuously  two rows (1 & 2, 3 & 4, 5 & 6 and 
> so on) of each column.
> Then I want to sum up 3 rows as (1-2-3,4-5-6,. 16-17-18) and since 19th 
> and 20th rows do not up 3 rows, so they should be ignored.
> Similarly with 4 sets of rows and 5 sets of rows and even 6.
> I hope I was clear.
> Thankyou so very much in advance,
> Eliza 
>   [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Cluster -- Agnes function

2014-09-24 Thread David L Carlson

Read the documentation for cutree(). You will have to decide how many clusters 
you want to use since agnes() provides results for everything from n clusters 
(where n is the number of observations) to 1 cluster.

?cutree

-
David L Carlson
Department of Anthropology
Texas A&M University
College Station, TX 77840-4352

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Sohail Khan
Sent: Wednesday, September 24, 2014 9:14 AM
To: r-help@r-project.org
Subject: [R] Cluster -- Agnes function

Dear All,

I have clustered a patient data set by agnes.

I want to extract information for each cluster, I.E. all row ids
belonging to each cluster.

Thank you.






[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Copying tables from R to Excel

2014-09-23 Thread David L Carlson

If you looked at the documentation for R2HTML you might have noticed that there 
is no function HTML.matrix. Perhaps your recommendation from an unnamed source 
is out of date? Assuming you loaded the package with library(R2HTML) as Ivan 
suggested, the command would be

HTML( summary(iris), file("clipboard", "w"), append=F )

Which will work just fine as long as you are using the Windows operating 
system. More technically, HTML() is a generic function with methods (156 in 
this case) for many different data types including matrices and tables. 

---------
David L Carlson
Department of Anthropology
Texas A&M University
College Station, TX 77840-4352



-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Ivan Calandra
Sent: Tuesday, September 23, 2014 8:12 AM
To: r-help@r-project.org
Subject: Re: [R] Copying tables from R to Excel

library(R2HTML) ??

Le 23/09/14 15:04, Angel Rodriguez a écrit :
> Dear Subscribers,
>
> I've found this recommendation to paste an R table to Excel:
>
> HTML.matrix( summary(iris), file("clipboard", "w"), append=F )
> # paste into Excel
>
> After installing R2HTML and writting that command, I get:
>
> Error: could not find function "HTML.matrix"
>
> Any clue?
>
> Thank you very much,
>
> Angel Rodr�guez-Laso
>
>   [[alternative HTML version deleted]]
>
>
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Pseudo R squared for quantile regression with replicates

2014-09-18 Thread David L Carlson

It is hard to say because we do not have enough information. R has 
approximately 6,000 packages and you have not told us which ones you are using. 
You have not told us much about your data and you have not told us where to 
find the query from August 2006. The basic problem is that your "fit" is not 
the same as the "f" in the query. Your fit object is not very complicated. If 
you look at the output from str(fit) you will see that fit is an "atomic" 
vector (note the wording in your error message) with a series of attributes 
that are probably documented in the help pages for the functions you are using. 
There is nothing called resid inside fit. It is likely that the post you are 
looking at refers to the output from rq(...) or perhaps predict(rq(...)), but 
not the output from withReplicates(..., quote(coef(rq(... which is what fit 
is.

---------
David L Carlson
Department of Anthropology
Texas A&M University
College Station, TX 77840-4352

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Donia Smaali Bouhlila
Sent: Thursday, September 18, 2014 9:54 AM
To: r-help@r-project.org
Subject: [R] Pseudo R squared for quantile regression with replicates

Hi,


I am a new user of r software. I intend to do quantile regressions with 
complex survey data using replicate method. I have ran the following 
commands successfully:


  mydesign 
<-svydesign(ids=~IDSCHOOL,strata=~IDSTRATE,data=TUN,nest=TRUE,weights=~TOTWGT) 
bootdesign <- as.svrepdesign(mydesign,type="auto",replicates=150)

  fit<- 
withReplicates(bootdesign,quote(coef(rq(Math1~Female+Age+calculator+computer+desk+
 
+ 
dictionary+internet+work+Book2+Book3+Book4+Book5+Pedu1+Pedu2+Pedu3+Pedu4+Born1+Born2,tau=0.5,weights=.weights,
 
method="fn"




I want get the pseudo R squared but I failed. I read a query dating from 
August 2006, [R] Pseudo R for Quant Reg and the answer to it:


rho <- function(u,tau=.5)u*(tau - (u < 0))
  V <- sum(rho(f$resid, f$tau))


  I copied it and paste it , replacing f by fit I get this error message:
Error in fit$resid : $ operator is invalid for atomic vectors, I don't 
know what it means

The fit object is likely to be quite complicated  I used str() to see 
what it looks like:



str (fit)
Class 'svrepstat'  atomic [1:19] 713.24 -24.01 -18.37 9.05 7.71 ...
   ..- attr(*, "var")= num [1:19, 1:19] 2839.3 10.2 -122.1 -332.4 -42.3 
...
   .. ..- attr(*, "dimnames")=List of 2
   .. .. ..$ : chr [1:19] "(Intercept)" "Female" "Age" "calculator" ...
   .. .. ..$ : chr [1:19] "(Intercept)" "Female" "Age" "calculator" ...
   .. ..- attr(*, "means")= Named num [1:19] 710.97 -24.03 -18.3 9.39 
7.58 ...
   .. .. ..- attr(*, "names")= chr [1:19] "(Intercept)" "Female" "Age" 
"calculator" ...
   ..- attr(*, "statistic")= chr "theta"

How can I retrieve the residuals?? and calculate the pseudo R squared??


Any help please


-- 
Dr. Donia Smaali Bouhlila
Associate-Professor
Department of Economics
Faculté des Sciences Economiques et de Gestion de Tunis

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] column names to row names

2014-09-17 Thread David L Carlson

Here's another approach using stack():
> y <- data.frame(y)
> E <- with(y, data.frame(year, month, day, 
 stack(data.frame(y), select=4:12)))
> colnames(E)[4:5] <- c("discharge", "station")

But there are some differences. For my E:
> str(E)
'data.frame':   36 obs. of  5 variables:
 $ year : num  1961 1961 1961 1961 1961 ...
 $ month: num  1 1 1 1 1 1 1 1 1 1 ...
 $ day  : num  1 2 3 4 1 2 3 4 1 2 ...
 $ discharge: num  1 2 3 4 5 6 7 8 9 10 ...
 $ station  : Factor w/ 9 levels "A","B","C","D",..: 1 1 1 1 2 2 2 2 3 3 ...

But for your E:

> str(E)
'data.frame':   36 obs. of  5 variables:
 $ year : Factor w/ 1 level "1961": 1 1 1 1 1 1 1 1 1 1 ...
 $ month: num  1 1 1 1 1 1 1 1 1 2 ...
 $ day  : int  1 2 3 4 1 2 3 4 1 2 ...
 $ discharge: Factor w/ 36 levels "1","10","11",..: 1 12 23 31 32 33 34 35 36 2 
...
 $ station  : chr  "A" "A" "A" "A" ...

It seems strange that the discharge and year would be factors and station would 
be character.

-
David L Carlson
Department of Anthropology
Texas A&M University
College Station, TX 77840-4352

Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of jim holtman
Sent: Wednesday, September 17, 2014 8:26 AM
To: eliza botto
Cc: r-help@r-project.org
Subject: Re: [R] column names to row names

Use the 'tidyr' package:  your 'month' does not match your desired output -

> x <- structure(c(1961, 1961, 1961, 1961, 1, 1, 1, 1, 1, 2, 3
+ , 4, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14
+ , 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27
+ , 28, 29, 30, 31, 32, 33, 34, 35, 36)
+ , .Dim = c(4L, 12L)
+ , .Dimnames = list(NULL, c("year", "month", "day", "A", "B", "C"
+ , "D", "E", "F", "G", "H", "I"))
+ )
> xdf <- as.data.frame(x)
> xdf
  year month day A B  C  D  E  F  G  H  I
1 1961 1   1 1 5  9 13 17 21 25 29 33
2 1961 1   2 2 6 10 14 18 22 26 30 34
3 1961 1   3 3 7 11 15 19 23 27 31 35
4 1961 1   4 4 8 12 16 20 24 28 32 36
> require(tidyr)
> require(dplyr)
> xdf %>% gather(station, discharge, -year, -month, -day)
   year month day station discharge
1  1961 1   1   A 1
2  1961 1   2   A 2
3  1961 1   3   A 3
4  1961 1   4   A 4
5  1961 1   1   B 5
6  1961 1   2   B 6
7  1961 1   3   B 7
8  1961 1   4   B 8
9  1961 1   1   C 9
10 1961 1   2   C10
11 1961 1   3   C11
12 1961 1   4   C12
13 1961 1   1   D13
14 1961 1   2   D14
15 1961 1   3   D15
16 1961 1   4   D16
17 1961 1   1   E17
18 1961 1   2   E18
19 1961 1   3   E19
20 1961 1   4   E20
21 1961 1   1   F21
22 1961 1   2   F22
23 1961 1   3   F23
24 1961 1   4   F24
25 1961 1   1   G25
26 1961 1   2   G26
27 1961 1   3   G27
28 1961 1   4   G28
29 1961 1   1   H29
30 1961 1   2   H30
31 1961 1   3   H31
32 1961 1   4   H32
33 1961 1   1   I33
34 1961 1   2   I34
35 1961 1   3   I35
36 1961 1   4   I36
>

Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.


On Wed, Sep 17, 2014 at 8:28 AM, eliza botto  wrote:
> Dear useRs,
> I have a data frame "y"  starting from 1961 to 2010 in the following manner 
> (where A,B,C .., I are station names and the values uder these are 
> "discharge" values.)
>> dput(y)
> structure(c(1961, 1961, 1961, 1961, 1, 1, 1, 1, 1, 2, 3, 4, 1, 2, 3, 4, 5, 6, 
> 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 
> 27, 28, 29, 30, 31, 32, 33, 34, 35, 36), .Dim = c(4L, 12L), .Dimnames = 
> list(NULL, c("year", "month", "day", "A", "B", "C", "D", "E", "F", "G", "H", 
> "I")))
>
> I want it to be in the following manner "E" where the stations names are in a 
> seperate column and all discharge values are in one column.
>> dput(E)
>
> structure(list(year = structure(c(1L, 1L, 1L, 1L, 1

Re: [R] chi-square test

2014-09-15 Thread David L Carlson

Rick's question is a good one. It is unlikely that the results will be 
informative, but from a technical standpoint, you can estimate the p value 
using the simulate.p.value=TRUE argument to chisq.test().

> chisq.test(TT, simulate.p.value=TRUE)

Pearson's Chi-squared test with simulated p-value (based on 2000
replicates)

data:  TT
X-squared = 7919.632, df = NA, p-value = 0.0004998

-----
David L Carlson
Department of Anthropology
Texas A&M University
College Station, TX 77840-4352

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Rick Bilonick
Sent: Monday, September 15, 2014 10:18 AM
To: r-help@r-project.org
Subject: Re: [R] chi-square test

On 09/15/2014 10:57 AM, eliza botto wrote:
> Dear useRs of R,
> I have two datasets (TT and SS) and i wanted to to see if my data is 
> uniformly distributed or not?I tested it through chi-square test and results 
> are given at the end of it.Now apparently P-value has a significant 
> importance but I cant interpret the results and why it says that "In 
> chisq.test(TT) : Chi-squared approximation may be incorrect"
> ###
>> dput(TT)
> structure(list(clc5 = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.26, 0.14, 0, 0.44, 
> 0.26, 0, 0, 0, 0, 0, 0, 0.11, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.17, 0.16, 
> 0.56, 0, 1.49, 0, 0.64, 0.79, 0.66, 0, 0, 0.17, 0, 0, 0, 0, 0.56, 0, 0, 0, 0, 
> 0, 0, 0, 0, 0, 0, 0, 0.43, 0.41, 0, 0.5, 0.44, 0, 0, 0, 0, 0.09, 0.46, 0, 
> 0.27, 0.45, 0.15, 0.31, 0.16, 0.44, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.07, 0, 0, 
> 0, 0, 0, 0.06, 0, 0.09, 0.07, 0, 0, 7.89, 0, 0.22, 0.29, 0.33, 0.27, 0, 0.36, 
> 0.41, 0, 0, 0, 0, 0.55, 0.81, 0, 0.09, 0.13, 0.28, 0, 0, 0), quota_massima = 
> c(1167L, 1167L, 4572L, 3179L, 3141L, 585L, 585L, 876L, 876L, 1678L, 2667L, 
> 1369L, 1369L, 1369L, 1381L, 1381L, 1381L, 1381L, 2284L, 410L, 2109L, 2507L, 
> 2579L, 2507L, 1436L, 3234L, 3234L, 3234L, 3234L, 2792L, 2569L, 2569L, 2569L, 
> 1669L, 4743L, 4743L, 4743L, 3403L, 3197L, 3267L, 3583L, 3583L, 3583L, 2584L, 
> 2584L, 2579L, 1241L, 1241L, 4174L, 3006L, 3197L, 2366L, 2618L, 2670L, 4487L, 
> 3196L, 3196L, 2107L, 2107L, 2427L, 1814L, 2622L, 1268L, 1268L, 1268!
 L,!
>3885L, 3885L, 3092L, 3234L, 2625L, 2625L, 3760L, 4743L, 3707L, 3760L, 
> 4743L, 3760L, 3885L, 3760L, 4743L, 2951L, 782L, 2957L, 3343L, 2697L, 2697L, 
> 3915L, 2277L, 1678L, 1678L, 3197L, 2957L, 2957L, 2957L, 4530L, 4530L, 4530L, 
> 2131L, 3618L, 3618L, 3335L, 2512L, 2390L, 1616L, 3526L, 3197L, 3197L, 2625L, 
> 2622L, 3197L, 3197L, 2622L, 2622L, 2622L, 368L, 4572L, 3953L, 863L, 3716L, 
> 3716L, 3716L, 2697L, 2697L, 1358L)), .Names = c("clc5", "quota_massima"), 
> class = "data.frame", row.names = c(NA, -124L))
>
>>   chisq.test(TT)
>  Pearson's Chi-squared test
> data:  TT
> X-squared = 411.5517, df = 123, p-value < 2.2e-16
> Warning message:
> In chisq.test(TT) : Chi-squared approximation may be incorrect
> ###
>> dput(SS)
> structure(list(NDVIanno = c(0.57, 0.536, 0.082, 0.262, 0.209, 0.539, 0.536, 
> 0.543, 0.588, 0.599, 0.397, 0.63, 0.616, 0.644, 0.579, 0.597, 0.617, 0.622, 
> 0.548, 0.528, 0.541, 0.436, 0.509, 0.467, 0.534, 0.412, 0.324, 0.299, 0.41, 
> 0.462, 0.427, 0.456, 0.508, 0.581, 0.242, 0.291, 0.324, 0.28, 0.291, 0.305, 
> 0.365, 0.338, 0.399, 0.516, 0.357, 0.558, 0.605, 0.638, 0.191, 0.377, 0.325, 
> 0.574, 0.458, 0.426, 0.188, 0.412, 0.464, 0.568, 0.582, 0.494, 0.598, 0.451, 
> 0.577, 0.572, 0.602, 0.321, 0.38, 0.413, 0.427, 0.55, 0.437, 0.481, 0.425, 
> 0.234, 0.466, 0.464, 0.491, 0.463, 0.489, 0.435, 0.267, 0.564, 0.256, 0.156, 
> 0.476, 0.498, 0.122, 0.508, 0.582, 0.615, 0.409, 0.356, 0.284, 0.285, 0.444, 
> 0.303, 0.478, 0.557, 0.345, 0.408, 0.347, 0.498, 0.534, 0.576, 0.361, 0.495, 
> 0.502, 0.553, 0.519, 0.504, 0.53, 0.547, 0.559, 0.505, 0.557, 0.377, 0.36, 
> 0.613, 0.452, 0.397, 0.277, 0.42, 0.443, 0.62), delta_z = c(211L, 171L, 925L, 
> 534L, 498L, 50L, 53L, 331L, 135L, 456L, 850L, 288L, 286L, 233L, 342L, !
 27!
>   4L, 184L, 198L, 312L, 67L, 476L, 676L, 349L, 873L, 65L, 963L, 553L, 474L, 
> 948L, 1082L, 616L, 704L, 814L, 450L, 865L, 987L, 1265L, 720L, 565L, 652L, 
> 941L, 822L, 1239L, 929L, 477L, 361L, 199L, 203L, 642L, 788L, 818L, 450L, 
> 703L, 760L, 711L, 1015L, 1351L, 195L, 511L, 617L, 296L, 604L, 381L, 389L, 
> 287L, 1043L, 1465L, 963L, 1125L, 582L, 662L, 1424L, 1762L, 575L, 1477L, 
> 1364L, 1236L, 1483L, 1201L, 1644L, 498L, 142L, 510L, 482L, 811L, 788L, 466L, 
> 626L, 461L, 350L, 1177L, 826L, 575L, 568L, 916L, 767L, 1017L, 532L, 1047L, 
> 1370L, 902L, 686L, 703L, 440L, 1016L, 1148L, 1089L, 753L, 65

Re: [R] apply block of if statements with menu function

2014-09-15 Thread David L Carlson

I think switch() should work for you here, but it is not clear how much 
flexibility you are trying to have (different tests based on the first 
response; different tests based on first, then second response; different tests 
based on each successive response). 

?switch

For the second question just index the return value:

> let <- letters[1:4]
> let[menu(let)]

1: a
2: b
3: c
4: d

Selection: 3
[1] "c"

Or a bit more polished:

> cat("Choice: ", let[menu(let)], "\n")

1: a
2: b
3: c
4: d

Selection: 4
Choice:  d

---------
David L Carlson
Department of Anthropology
Texas A&M University
College Station, TX 77840-4352

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of r...@openmailbox.org
Sent: Monday, September 15, 2014 3:53 AM
To: r-help@r-project.org
Subject: [R] apply block of if statements with menu function

Subscribers,

apply block of if statements with menu function
Subscribers,

For a menu:

menu(c('a','b','c','d'))

How to create a function that will apply to specific menu choice 
objects? For example:

object1<-function (menuifchoices) {
menu1<-menu(c('a','b','c','d'))
if (menu1==1)
...
menu1a<-menu...
if (menu1a==1)
...
menu2a<-menu...
if (menu2a==1)
...
menu2
<-menu(c('a','b','c','d'))
if (menu1==2)
...
}

The request action is that a user can select a menu option that will 
activate a series of "multiple choice" questions, results in "menu1" 
being activated without menu2 being activated. If someone could direct 
to the relevant terminology, thank you.

Separate question; for a menu:

menu(c('a','b','c','d'))

1: a
2: b
3: c
4: d

Selection: 1
[1] 1

is it possible to change behaviour so that result of the selection is 
not the integer, but the original menu choice:

Selection: 1
[1] a

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] mice - undefined columns selected

2014-09-12 Thread David L Carlson

I'm copying the package maintainer who can probably give a more definite 
answer. I'm getting the same error on your data. I can get a subset of your 
data to run, eg:

d.imp <- mice(d[,c(1:2, 5:6)]) works, but
d.imp <- mice(d[,c(3:4, 7:8)]) fails. 

That suggests to me that the problem is with your data. There are some very 
high correlations between variables. Looking at pairwise complete observations, 
C1 has correlations of .998, .999, and .998 with C2, C3, and C4 while M1 has 
correlations of .999, .999, and .999 with M2, M3, and M4. The correlations 
between the C variables and the M variables are also high (consistently greater 
than .80). You really have only two variables C and M. This is probably the 
reason function mice() is failing, but the error message could be more 
informative. Since you are only imputing single values, you might be better off 
with simpler imputation methods. Package VIM has a number of options of which 
nearest neighbor and hot deck might work well with your data.

---------
David L Carlson
Department of Anthropology
Texas A&M University
College Station, TX 77840-4352

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Jeremy Miles
Sent: Thursday, September 11, 2014 7:49 PM
To: r-help
Subject: [R] mice - undefined columns selected

I've got a problem with the mice package that I don't understand.

Here's the code:
library(mice)
d <- read.csv("https://dl.dropboxusercontent.com/u/24381951/employment.csv";,
 as.is=TRUE, row.names=1)d.imp <- mice(data=d, m=1)

Result is:
Error in `[.data.frame`(data, , jj) : undefined columns selected

I hope I'm doing something foolish,

thanks,

Jeremy

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Margins to fill matrix

2014-09-11 Thread David L Carlson

You want r2dtable():

> ?r2dtable
> set.seed(42)
> a <- r2dtable(1, seats, mandates)
addmargins(a[[1]])
  [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
 [1,]23106212   17
 [2,]8011   11021   24
 [3,]80527114   28
 [4,]   105316302   30
 [5,]   134149021   34
 [6,]8220   17340   36
 [7,]   130269235   40
 [8,]   12443   12333   44
 [9,]   14332   18042   46
[10,]   19220   17550   50
[11,]  107   23   24   19  112   19   25   20  349


---------
David L Carlson
Department of Anthropology
Texas A&M University
College Station, TX 77840-4352

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Stefan Petersson
Sent: Thursday, September 11, 2014 7:13 AM
To: Charles Determan Jr
Cc: r-help@r-project.org
Subject: Re: [R] Margins to fill matrix

I have :

rs <- c(3, 2, 3, 4)
cs <- c(4, 5, 3)

And want:

> matrix
[,1] [,2] [,3]
[1,] 120
[2,] 101
[3,] 111
[4,] 121

The rowSums in the above matrix is equal to sum(rs) and colSums is
equal to sum(cs). It's sort of a matrix expansion where the margins
are known beforehand...

I hope I make sense.


2014-09-11 14:09 GMT+02:00 Charles Determan Jr :
> Do you have an example of what you would like your output to look like?  It
> is a little difficult to fully understand what you are looking for.  You
> only have 18 values but are looking to fill at 10x8 matrix (i.e. 80 values).
> If you can clarify better we may be better able to help you.
>
> Charles
>
>
> On Thu, Sep 11, 2014 at 3:47 AM, Stefan Petersson  wrote:
>>
>> Hi,
>>
>> I have two vector of margins. Now I want to create "fill" matrix that
>> reflects the margins.
>>
>>  seats <- c(17,24,28,30,34,36,40,44,46,50)
>>  mandates <- c(107,23,24,19,112,19,25,20)
>>
>> Both vectors adds up to 349. So I want a 10x8 matrix with row sums
>> corresponding to "seats" and column sums corresponding to "mandates".
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>
>
>
> --
> Dr. Charles Determan, PhD
> Integrated Biosciences

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] incorrect number of dimensions

2014-09-11 Thread David L Carlson

Look below to see what happens to your formatting when you use html. Don't use 
html.

Why do you use x='df' in defining the function

df is a data frame with 5 observations and 4 variables.
'df' is a character vector of length 1. Your function is looking for a data 
frame (or matrix) with at least 4 columns.

---------
David L Carlson
Department of Anthropology
Texas A&M University
College Station, TX 77840-4352


-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Marie-Eve St-Onge
Sent: Thursday, September 11, 2014 10:53 AM
To: r-help@r-project.org
Subject: [R] incorrect number of dimensions

Dear all, I'm trying the following experiment simulation, but I'm receiving 
this error:
> probs()Error in x[j, 4] : incorrect number of dimensions
however, the simulation works fine outside the function statement{}. What am I 
doing wrong?
# Create some fake data and call the function: df <- data.frame(y1 = rpois(5, 
9),y2 = rpois(5, 7), y3 = rpois(5, 8), n = rpois(5, 100)) 
probs = function(x='df', j=5, export=1){  p=gtools::rdirichlet(10, x[j,4] * 
c(x[j,1],x[j,2],x[j,3], 1-x[j,1]-x[j,2]-x[j,3])/100+1 )if(export==1){  
mean(p[,1] > p[,3])} else {  return(p)} }

Eve

  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] create new column by replacing multiple unique values in existing column

2014-09-11 Thread David L Carlson

Note that in the data you sent, b is a factor:
> str(dat1)
'data.frame':   15 obs. of  2 variables:
 $ a: int  1 2 3 4 5 6 7 8 9 10 ...
 $ b: Factor w/ 3 levels "A1","A2","B1": 1 1 1 1 1 2 2 2 2 2 ...

So all you need is
> dat1$new <- as.numeric(dat1$b)
> table(dat1$new)
> table(dat1$new)

1 2 3 
5 5 5 
> table(dat1$b)

A1 A2 B1 
 5  5  5 

If b is not a factor in your table, make it one ?factor
-
David L Carlson
Department of Anthropology
Texas A&M University
College Station, TX 77840-4352


-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of raz
Sent: Thursday, September 11, 2014 10:49 AM
To: r-help@r-project.org
Subject: [R] create new column by replacing multiple unique values in existing 
column

Hi,

I got the following data frame:
 dat1 <- read.table(text="a,b
1,A1
2,A1
3,A1
4,A1
5,A1
6,A2
7,A2
8,A2
9,A2
10,A2
11,B1
12,B1
13,B1
14,B1
15,B1",sep=",",header=T)


I would like to add a new column dat1$new based on column "b" (dat$b) in
which values will be substituted according to their unique values e.g "A1"
will be "1", "A2" will be "2" and so on (this is only a part of a large
table). It would be better if I could change all unique values in dat1 to
numbers 1:unique(n). if not then how do I change all values
("A1","A2","B1") to (1,2,3) in a new column?.

Thanks a lot,

Raz


-- 
\m/

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] KDE routines for data that is aggregated

2014-09-09 Thread David L Carlson

If the x and y values are regularly spaced, you could use contour() or persp() 
to plot the densities. If they are not, you can use density(), loess(), gam(), 
kriging another function to estimate a smooth surface for the values and then 
estimate the values over a regular grid and then plot with contour, etc.

-
David L Carlson
Department of Anthropology
Texas A&M University
College Station, TX 77840-4352

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Saptarshi Guha
Sent: Monday, September 8, 2014 6:57 PM
To: R-help@r-project.org
Subject: [R] KDE routines for data that is aggregated

Hello,
Couldn't think of a better subject line. Rather than a matrix like

x,y
..,..
.,..

I have a matrix like
x,y,n,
..,..,..,
..,..,..

and so on. Also, sum(n) is roughly few hundred million. The number of rows
is <1MM

Are they routines to fit a 2d kde estimate to data provided in this form?
I can sample from the data according to weights given by 'n' but i am
curious if there is something that can use all the data when given a
structure of this form.

Regards
Saptarshi

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Turn Rank Ordering Into Numerical Scores By Transposing A Data Frame

2014-09-05 Thread David L Carlson

The big difference between the data sets is that many of your rows (16) have 
all missing values. None of mine do. If you run my data and yours, you will see 
that decast throws a warning "Aggregation function missing: defaulting to 
length" with your data but not with mine. As a result, instead of using the 
value of rank, dcast uses length(rank) which is always 1 except when there are 
multiple missing values when it is the number of missing values. This problem 
will occur whenever there is more than one missing value on a row. The simplest 
way to handle this is to create a function that returns the first value of a 
vector and use that with the fun.aggregate= argument:

> first <- function(x) {x[1]}
> d4<- dcast(d3, row~color, fun.aggregate=first, value.var="rank", fill=0)

The only drawback is that this will not warn you if a category was ranked twice 
except that the NA column will be zero and one of the other columns will be 
zero. The number of missing values is the number of zeroes in your category 
columns (not including row or NA) and the value in NA is the lowest rank that 
was missing.

David C

-Original Message-
From: Simon Kiss [mailto:sjk...@gmail.com] 
Sent: Friday, September 5, 2014 10:22 AM
To: David L Carlson
Cc: r-help@r-project.org
Subject: Re: [R] Turn Rank Ordering Into Numerical Scores By Transposing A Data 
Frame

HI, of course.

The a mini-version of my data-set is below, stored in d2. Then the code I'm 
working follows.
library(reshape2)
#Create d2
structure(list(row = 1:50, rank1 = structure(c(3L, 3L, 3L, 4L, 
3L, 3L, NA, NA, 3L, NA, 3L, 3L, 1L, NA, 2L, NA, 3L, NA, 2L, 1L, 
1L, 3L, NA, 6L, NA, 1L, NA, 3L, 1L, NA, 1L, NA, NA, 6L, 3L, NA, 
1L, 3L, 3L, 4L, 1L, NA, 3L, 3L, 3L, NA, 3L, 3L, NA, 1L), .Label = 
c("accessible", 
"alternatives", "information", "responsive", "social", "technical", 
"trade"), class = "factor"), rank2 = structure(c(6L, 1L, 1L, 
2L, 4L, 6L, NA, NA, 6L, NA, 6L, 4L, 2L, NA, 4L, NA, 6L, NA, 1L, 
6L, 3L, 2L, NA, 3L, NA, 6L, NA, 6L, 6L, NA, 3L, NA, NA, 3L, 6L, 
NA, 6L, 6L, 6L, 7L, 3L, NA, 1L, 6L, 6L, NA, 2L, 6L, NA, 2L), .Label = 
c("accessible", 
"alternatives", "information", "responsive", "social", "technical", 
"trade"), class = "factor"), rank3 = structure(c(1L, 6L, 4L, 
3L, 2L, 4L, NA, NA, 4L, NA, 1L, 1L, 6L, NA, 1L, NA, 1L, NA, 7L, 
3L, 6L, 1L, NA, 2L, NA, 4L, NA, 1L, 3L, NA, 6L, NA, NA, 4L, 2L, 
NA, 7L, 1L, 1L, 6L, 7L, NA, 6L, 1L, 1L, NA, 4L, 1L, NA, 3L), .Label = 
c("accessible", 
"alternatives", "information", "responsive", "social", "technical", 
"trade"), class = "factor"), rank4 = structure(c(7L, 4L, 2L, 
1L, 1L, 7L, NA, NA, 1L, NA, 7L, 2L, 7L, NA, 3L, NA, 2L, NA, 3L, 
4L, 5L, 6L, NA, 4L, NA, 3L, NA, 4L, 4L, NA, 4L, NA, NA, 2L, 7L, 
NA, 2L, 2L, 2L, 3L, 6L, NA, 2L, 5L, 4L, NA, 1L, 2L, NA, 4L), .Label = 
c("accessible", 
"alternatives", "information", "responsive", "social", "technical", 
"trade"), class = "factor"), rank5 = structure(c(2L, 7L, 6L, 
7L, 7L, 2L, NA, NA, 2L, NA, 2L, 7L, 3L, NA, 6L, NA, 7L, NA, 6L, 
7L, 4L, 7L, NA, 7L, NA, 7L, NA, 2L, 2L, NA, 2L, NA, NA, 7L, 1L, 
NA, 3L, 7L, 4L, 2L, 2L, NA, 4L, 2L, 2L, NA, 6L, 4L, NA, 5L), .Label = 
c("accessible", 
"alternatives", "information", "responsive", "social", "technical", 
"trade"), class = "factor"), rank6 = structure(c(4L, 2L, 7L, 
6L, 6L, 1L, NA, NA, 7L, NA, 4L, 5L, 4L, NA, 7L, NA, 4L, NA, 4L, 
2L, 2L, 4L, NA, 1L, NA, 2L, NA, 7L, 7L, NA, 7L, NA, NA, 1L, 4L, 
NA, 4L, 4L, 7L, 1L, 4L, NA, 7L, 7L, 7L, NA, 7L, 7L, NA, 7L), .Label = 
c("accessible", 
"alternatives", "information", "responsive", "social", "technical", 
"trade"), class = "factor"), rank7 = structure(c(5L, 5L, 5L, 
5L, 5L, 5L, NA, NA, 5L, NA, 5L, 6L, 5L, NA, 5L, NA, 5L, NA, 5L, 
5L, 7L, 5L, NA, 5L, NA, 5L, NA, 5L, 5L, NA, 5L, NA, NA, 5L, 5L, 
NA, 5L, NA, 5L, 5L, 5L, NA, 5L, 4L, 5L, NA, 5L, 5L, NA, 6L), .Label = 
c("accessible", 
"alternatives", "information", "responsive", "social", "technical", 
"trade"), class = "factor")), .Names = c("row", "rank1", "rank2", 
"rank3", "rank4", "rank5", "rank6", "rank7"), row.names = c(NA, 
50L), class = "data.frame")


#This code is a replication of David Carlson's code (below) which works 
splendidly, but does not work on my data-set
#Melt d2: Note, I've used value.name='color' to maximi

Re: [R] calculate Euclidean distances between populations in R with this data structure

2014-09-05 Thread David L Carlson

There may be a specialized package for this in bioconductor, but it seems that 
you could just use aggregate() to calculate the means for each population and 
then use the results of that in dist().

?aggregate

-
David L Carlson
Department of Anthropology
Texas A&M University
College Station, TX 77840-4352

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Ding, Yuan Chun
Sent: Thursday, September 4, 2014 3:11 PM
To: r-help@R-project.org
Subject: [R] calculate Euclidean distances between populations in R with this 
data structure




I want to calculate Euclidean distance between 12 populations, in each 
population there are 20 samples and each sample is measured for 100 genes 
(these are microarray data; the numbers here are just examples).
The equation I found is:
distance = sqrt{[sum(Average of xi -average of yi)^2] /n }, i=1 to n;
where xi and yi are the expression of gene i over two populations with p and q 
samples (x1, x2,...,xp), (y1, y2,...,yq), n is the number of genes.
part of data are pasted below
row.names pop1.1pop1.2  pop1.3  pop1.4  pop2.1  pop2.2  pop2.3  pop2.4
7A5 5.38194 4.06191 4.88044 5.60383 6.23101 6.53738 4.80336 5.86136
A1BG5.15155 4.29441 4.59131 4.90026 4.62908 4.48712 4.73039 4.46208
A1CF4.22396 4.14451 4.41465 3.93179 4.89638 4.66109 4.20918 4.48107
A26C3   12.1969 12.4179 10.9786 11.7659 11.405  11.7594 11.1757 11.8128
How might one calculate these distances in R with this data structure?


Thanks,

Ding



-
*SECURITY/CONFIDENTIALITY WARNING:
This message and any attachments are intended solely for the individual or 
entity to which they are addressed. This communication may contain information 
that is privileged, confidential, or exempt from disclosure under applicable 
law (e.g., personal health information, research data, financial information). 
Because this e-mail has been sent without encryption, individuals other than 
the intended recipient may be able to view the information, forward it to 
others or tamper with the information without the knowledge or consent of the 
sender. If you are not the intended recipient, or the employee or person 
responsible for delivering the message to the intended recipient, any 
dissemination, distribution or copying of the communication is strictly 
prohibited. If you received the communication in error, please notify the 
sender immediately by replying to this message and deleting the message and any 
accompanying files from your system. If, due to the security risks, you do not 
wi!
 sh to receive further communications via e-mail, please reply to this message 
and inform the sender that you do not wish to receive further e-mail from the 
sender. (fpc5p)
-


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] depth of labels of axis

2014-09-04 Thread David L Carlson

The problem with this approach is that the horizontal positioning of the labels 
is based on the width of the label including the phantom part so that the E's 
are pushed to the left of the tick mark (at least on my Windows machine). But 
it does provide a way of dealing with superscripts as long as the phantom is 
added to each label and hadj= is used to position the label horizontally, eg 
(changing the last label to a superscript for illustration):

lbl <- expression(E[g]~phantom(E[g]), E~phantom(E[g]), E[j]~phantom(E[g]),
   E~phantom(E[g]), E^t~phantom(E[g]))
plot(1:5, xaxt = "n")
axis(1, at = 1:5, labels = lbl, hadj=.1)
abline(h=.7, xpd=TRUE, lty=3)

David C

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of David Winsemius
Sent: Thursday, September 4, 2014 2:25 PM
To: Jinsong Zhao
Cc: r-help@r-project.org
Subject: Re: [R] depth of labels of axis


On Sep 3, 2014, at 10:05 PM, Jinsong Zhao wrote:

> On 2014/9/3 21:33, Jinsong Zhao wrote:
>> On 2014/9/2 11:50, David L Carlson wrote:
>>> The bottom of the expression is set by the lowest character (which can
>>> even change for subscripted letters with descenders. The solution is
>>> to get axis() to align the tops of the axis labels and move the line
>>> up to reduce the space, e.g.
>>> 
>>> plot(1:5, xaxt = "n")
>>> axis(1, at = 1:5, labels = c(expression(E[g]), "E", expression(E[j]),
>>> "E", expression(E[t])), padj=1, mgp=c(3, .1, 0))
>>> # Check alignment
>>> abline(h=.7, xpd=TRUE, lty=3)
>> 
>> yes. In this situation, padj = 1 is the fast solution. However, If there
>> are also superscript, then it's hard to alignment all the labels.
>> 
>> If R provide a mechanism that aligns the label in axis() or text() with
>> the baseline of the character without the super- and/or sub-script, that
>> will be terrific.
> 
> it seems that the above wish is on the Graphics TODO lists:
> https://www.stat.auckland.ac.nz/~paul/R/graphicstodos.html
> 
> Allow text adjustment for mathematical annotations which is relative to a 
> text baseline (in addition to the current situation where adjustment is 
> relative to the bounding box).
> 

In many case adding a phantom argument will correct aliognment problems:

plot(1:5, xaxt = "n")
axis(1, at = 1:5, labels = c(expression(E[g]), E~phantom(E[g]), 
expression(E[j]),
E~phantom(E[g]), expression(E[t])))

abline(h=.7, xpd=TRUE, lty=3)

Notice that c(expression(.), ...) will coerce all items separated by commas to 
expressions, sot you cna just put in "native" expression that are not 
surrounded by the `expression`-function

c(expression(E[g]), E~phantom(E[g]), expression(E[j])  ) #returns
# expression(E[g], E ~ phantom(E[g]), E[j])

The tilde is actually a function that converts parse-able strings into R 
language objects:

c(expression(E[g]), E~phantom(E[g]), ~E[j])

-- 
David.

>>> 
>>> 
>>> -Original Message-
>>> From: r-help-boun...@r-project.org
>>> [mailto:r-help-boun...@r-project.org] On Behalf Of Jinsong Zhao
>>> Sent: Monday, September 1, 2014 6:41 PM
>>> To: r-help@r-project.org
>>> Subject: [R] depth of labels of axis
>>> 
>>> Hi there,
>>> 
>>> With the following code,
>>> 
>>> plot(1:5, xaxt = "n")
>>> axis(1, at = 1:5, labels = c(expression(E[g]), "E", expression(E[j]),
>>> "E", expression(E[t])))
>>> 
>>> you may notice that the "E" within labels of axis(1) are not at the same
>>> depth. So the vision of axis(1) labels is something like wave.
>>> 
>>> Is there a possible way to typeset the labels so that they are have the
>>> same depth?
>>> 
>>> Any suggestions will be really appreciated. Thanks in advance.
>>> 
>>> Best regards,
>>> Jinsong

David Winsemius
Alameda, CA, USA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Turn Rank Ordering Into Numerical Scores By Transposing A Data Frame

2014-09-04 Thread David L Carlson

I think we would need enough of the data you are using to figure out how to 
modify the process. Can you use dput() to send a small data set that fails to 
work?

David C

-Original Message-
From: Simon Kiss [mailto:sjk...@gmail.com] 
Sent: Thursday, September 4, 2014 1:28 PM
To: David L Carlson
Cc: r-help@r-project.org
Subject: Re: [R] Turn Rank Ordering Into Numerical Scores By Transposing A Data 
Frame

Hi David and list:
This is working, except at this command
mycast <- dcast(mymelt, row~color, value.var="rank", fill=0)

dcast is using "length" as the default aggregating function. This results in 
not accurate results. It tells me, for example how many choices were missing 
values and it tells me if a person selected any given option (value is reported 
as 1).
When I try to run your reproducible research, it works great, but something 
with the aggregating function is not working properly with mine. 
Any other thoughts?
Simon
On Aug 18, 2014, at 10:44 AM, David L Carlson  wrote:

> Another approach using reshape2:
> 
>> library(reshape2)
>> # Construct data/ add column of row numbers
>> set.seed(42)
>> mydf <- data.frame(t(replicate(100, sample(c("red", "blue",
> +   "green", "yellow", NA), 4
>> mydf <- data.frame(rows=1:100, mydf)
>> colnames(mydf) <- c("row", "rank1", "rank2", "rank3", "rank4")
>> head(mydf)
>  row  rank1  rank2  rank3 rank4
> 1   1yellowred  blue
> 2   2 yellow  green  red
> 3   3 yellow  green   blue  
> 4   4  blue yellow green
> 5   5   red   blue green
> 6   6   red  green  blue
>> # Reshape
>> mymelt <- melt(mydf, id.vars=1, measure.vars=2:5, 
> + variable.name="rank", value.name="color")
>> # Convert rank to numeric
>> mymelt$rank <- as.numeric(mymelt$rank)
>> mycast <- dcast(mymelt, row~color, value.var="rank", fill=0)
>> head(mycast)
>  row blue green red yellow NA
> 1   14 0   3  2  1
> 2   20 2   4      1  3
> 3   33 2   0  1  4
> 4   42 4   0  3  1
> 5   53 4   2  0  1
> 6   64 3   2  0  1
> 
> David C
> 
> -Original Message-
> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
> Behalf Of David L Carlson
> Sent: Sunday, August 17, 2014 6:32 PM
> To: Simon Kiss; r-help@r-project.org
> Subject: Re: [R] Turn Rank Ordering Into Numerical Scores By Transposing A 
> Data Frame
> 
> There is probably an easier way to do this, but
> 
>> set.seed(42)
>> mydf <- data.frame(t(replicate(100, sample(c("red", "blue",
> +  "green", "yellow", NA), 4
>> colnames(mydf) <- c("rank1", "rank2", "rank3", "rank4")
>> head(mydf)
>   rank1  rank2  rank3 rank4
> 1yellowred  blue
> 2 yellow  green  red
> 3 yellow  green   blue  
> 4  blue yellow green
> 5   red   blue green
> 6   red  green  blue
>> lvls <- levels(mydf$rank1)
>> # convert color factors to numeric
>> for (i in seq_along(mydf)) mydf[,i] <- as.numeric(mydf[,i]) 
>> # stack the columns
>> mydf2 <- stack(mydf)
>> # convert rank factor to numeric
>> mydf2$ind <- as.numeric(mydf2$ind)
>> # add row numbers
>> mydf2 <- data.frame(rows=1:100, mydf2)
>> # Create table
>> mytbl <- xtabs(ind~rows+values, mydf2)
>> # convert to data frame
>> mydf3 <- data.frame(unclass(mytbl))
>> colnames(mydf3) <- lvls
>> head(mydf3)
>  blue green red yellow
> 14 0   3  2
> 20 2   4  1
> 33 2   0  1
> 42 4   0  3
> 53 4   2  0
> 64 3   2  0
> 
> David C
> 
> -Original Message-
> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
> Behalf Of Simon Kiss
> Sent: Friday, August 15, 2014 3:58 PM
> To: r-help@r-project.org
> Subject: Re: [R] Turn Rank Ordering Into Numerical Scores By Transposing A 
> Data Frame
> 
> 
> Both the suggestions I got work very well, but what I didn't realize is that 
> NA values would cause serious problems.  Where there is a missing value, 
> using the argument na.last=NA to order just returns the the order of the 
> factor levels, but excludes the missing values, but I have no idea where 
> those occur in the or rather which of those variables were actually missing.  
> Have I explained this problem sufficiently? 
> I didn't think it would cause such a problem so I didn't include it in the 
> or

Re: [R] wilcox.test - difference between p-values of R and online calculators

2014-09-03 Thread David L Carlson

Since they all have the same W/U value, it seems likely that the difference is 
how the different versions adjust the standard error for ties. Here are a 
couple of posts addressing the issues of ties:

http://tolstoy.newcastle.edu.au/R/e8/help/09/12/9200.html
http://stats.stackexchange.com/questions/6127/which-permutation-test-implementation-in-r-to-use-instead-of-t-tests-paired-and

David C

From: wbradleyk...@gmail.com [mailto:wbradleyk...@gmail.com] On Behalf Of W 
Bradley Knox
Sent: Wednesday, September 3, 2014 9:20 AM
To: David L Carlson
Cc: Tal Galili; r-help@r-project.org
Subject: Re: [R] wilcox.test - difference between p-values of R and online 
calculators

Tal and David, thanks for your messages.

I should have added that I tried all variations of true/false values for the 
exact and correct parameters. Running with correct=FALSE makes only a tiny 
change, resulting in W = 485, p-value = 0.0002481.

At one point, I also thought that the discrepancy between R and these online 
calculators might come from how ties are handled, but the fact that R and two 
of the online calcultors reach the same U/W values seems to indicate that ties 
aren't the issue, since (I believe) the U or W values contain all of the 
information needed to calculate the p-value, assuming the number of samples is 
also known for each condition. (However, it's been a while since I looked into 
how MWU tests work, so maybe now's the time to refresh.) If that's correct, the 
discrepancy seems to be based in what R does with the W value that is identical 
to the U values of two of the online calculators. (I'm also assuming that U and 
W have the same meaning, which seems likely.)

- Brad

W. Bradley Knox, PhD
http://bradknox.net<http://bradknox.net/>
bradk...@mit.edu<mailto:bradk...@mit.edu>

On Wed, Sep 3, 2014 at 9:10 AM, David L Carlson 
mailto:dcarl...@tamu.edu>> wrote:
That does not change the results. The problem is likely to be the way ties are 
handled. The first sample has 25 values of which 23 are identical (359). The 
second sample has 26 values of which 12 are identical (359). The difference 
between the implementations may be a result of the way the ties are ranked. For 
example the R function rank() offers 5 different ways of handling the rank on 
tied observations. With so many ties, that could make a substantial difference.

Package coin has wilxon_test() which uses Monte Carlo simulation to estimate 
the confidence limits.

---------
David L Carlson
Department of Anthropology
Texas A&M University
College Station, TX 77840-4352

-Original Message-
From: r-help-boun...@r-project.org<mailto:r-help-boun...@r-project.org> 
[mailto:r-help-boun...@r-project.org<mailto:r-help-boun...@r-project.org>] On 
Behalf Of Tal Galili
Sent: Wednesday, September 3, 2014 5:24 AM
To: W Bradley Knox
Cc: r-help@r-project.org<mailto:r-help@r-project.org>
Subject: Re: [R] wilcox.test - difference between p-values of R and online 
calculators

It seems your numbers has ties. What happens if you run wilcox.test with
correct=FALSE, will the results be the same as the online calculators?

Contact
Details:---
Contact me: tal.gal...@gmail.com<mailto:tal.gal...@gmail.com> |
Read me: www.talgalili.com<http://www.talgalili.com> (Hebrew) | 
www.biostatistics.co.il<http://www.biostatistics.co.il> (Hebrew) |
www.r-statistics.com<http://www.r-statistics.com> (English)
--

On Wed, Sep 3, 2014 at 3:54 AM, W Bradley Knox 
mailto:bradk...@mit.edu>> wrote:

> Hi.
>
> I'm taking the long-overdue step of moving from using online calculators to
> compute results for Mann-Whitney U tests to a more streamlined system
> involving R.
>
> However, I'm finding that R computes a different result than the 3 online
> calculators that I've used before (all of which approximately agree). These
> calculators are here:
>
> http://elegans.som.vcu.edu/~leon/stats/utest.cgi
> http://vassarstats.net/utest.html
> http://www.socscistatistics.com/tests/mannwhitney/
>
> An example calculation is
>
>
> *wilcox.test(c(359,359,359,359,359,359,335,359,359,359,359,359,359,359,359,359,359,359,359,359,359,303,359,359,359),c(332,85,359,359,359,220,231,300,359,237,359,183,286,355,250,105,359,359,298,359,359,359,28.6,359,359,128))*
>
> which prints
>
>
>
>
>
>
>
>
>
> *Wilcoxon rank sum test with continuity correction  data: c(359, 359, 359,
> 359, 359, 359, 335, 359, 359, 359, 359, 359, and c(332, 85, 359, 359, 359,
> 220, 231, 300, 359, 237, 359, 183, 359, 359, 359, 359, 359, 359, 359, 359,
> 359, 303, 359, 359, and 286, 355, 250, 105, 359, 359, 298, 359, 359, 359,

< 1 2 3 4 5 6 7 8 9 10 >

401 - 500 of 931 matches

Mail list logo