Re: [R] foreign package read.spss() and NA levels

2021-06-08 Thread Allen, Justin
Hi Uwe,

Thank you so much for your reply, I am sorry to waste your time, but after 
another look, the error seemed to be caused with how I had created the '.sav' 
file I was using. It was converting one of the factor levels which wasn't NA to 
the factor level "NA", but the problem was in the dataset as I said.

Many Thanks,

Justin

Justin Allen
Housing Consultant, BRE<https://bregroup.com/>
T: 07807122647


From: Uwe Ligges 
Sent: 04 June 2021 10:57
To: Allen, Justin ; R-help@r-project.org 

Cc: Whiteley, Jonathon ; Foster, Helen 

Subject: Re: [R] foreign package read.spss() and NA levels



On 27.05.2021 12:27, Allen, Justin wrote:
> Hi All,
>
> Wanted to report what may be a bug or possibly an oversight, but I am unsure, 
> in the "foreign" packages in the read.spss() command, 
> https://cran.r-project.org/web/packages/foreign/index.html. When running the 
> following code,
>
> input <- read.spss("[.sav file location]", to.data.frame = TRUE)
> str(input)
>
> The read.spss() seems to be applying addNA() to factors so NA is being set as 
> a level, and there seems to be no way to get read.spss() to bring factors in 
> without doing this. This seems to be a recent change as read.spss() was not 
> doing this as of a few months ago. None of the arguments in read.spss() seem 
> to also stop this behaviour. I am currently on the most recent version of 
> both R and the package, as of 27/05/21, and am using RStudio Version 1.4.1106.

Within R (I do not use RStudio) and even with the most recent R-devel, I see

(sav <- system.file("files", "electric.sav", package = "foreign"))
dat <- read.spss(file=sav, to.data.frame=TRUE)
table(dat$DAYOFWK)

#  SUNDAY   MONDAY  TUESDAY WEDNSDAY THURSDAY   FRIDAY SATURDAY
#  19   11   19   17   15   13   16

table(dat$DAYOFWK, useNA="always")

#  SUNDAY   MONDAY  TUESDAY WEDNSDAY THURSDAY   FRIDAY SATURDAY 
#  19   11   19   17   15   13   16  130

So exactly what you expected?

If you rather use

dat <- read.spss(file=sav, to.data.frame=TRUE, use.missings=FALSE)

table(dat$DAYOFWK, useNA="always")

you see the NA values are converted to a factor level called "MISSING".

If it is different on your end, pelase try in plain R, tell us the
version of R / foreign and show an example data file where this happens.

Best,
Uwe Ligges



>
> Any thoughts?
>
> Many Thanks,
>
> Justin Allen
>
> p.s. your continued maintenance and additions to R and its packages have been 
> infinitely useful in my work and life and thank for that.
>
> Justin Allen
> Housing Consultant, BRE<https://bregroup.com/>
> T: 07807122647
>
> 
> Follow BRE on Twitter: @BRE_Group<http://twitter.com/BRE_Group>
> 
> Privileged and confidential information and/or copyright material may be 
> contained in this e-mail. If you are not the intended addressee you may not 
> copy or deliver it to anyone else or use it in any unauthorised manner. To do 
> so is prohibited and may be unlawful. If you have received this e-mail by 
> mistake, please advise the sender immediately by return e-mail and destroy 
> all copies. Thank you.
>
> Building Research Establishment Ltd, Registered under number 3319324 in 
> England and Wales. VAT Registration No GB 689 9499 27 
> www.bregroup.com<http://www.bregroup.com>
> BRE Global Limited, Registered under number 8961297 in England and Wales. 
> www.breglobal.com<http://www.breglobal.com>
> Building Research Establishment and BRE Global are subsidiaries of the BRE 
> Trust.
> BRE Trust is a company limited by guarantee, Registered under number 3282856 
> in England and Wales, and registered as a charity in England (no. 1092193) 
> and in Scotland (no. SC039320). 
> www.bretrust.org.uk<http://www.bretrust.org.uk>
> Registered Offices: Bucknalls Lane, Garston, Watford, Hertfordshire WD25 9XX 
> - Travelling to BRE: see 
> www.bregroup.com/contact/directions/<http://www.bregroup.com/contact/directions/<http://www.bregroup.com/contact/directions/<http://www.bregroup.com/contact/directions/>>
> 
>
>[[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


Follow BRE on Twitter: @BRE_Group<http://twitter.com/BRE_Group>
_

[R] foreign package read.spss() and NA levels

2021-05-27 Thread Allen, Justin
Hi All,

Wanted to report what may be a bug or possibly an oversight, but I am unsure, 
in the "foreign" packages in the read.spss() command, 
https://cran.r-project.org/web/packages/foreign/index.html. When running the 
following code,

input <- read.spss("[.sav file location]", to.data.frame = TRUE)
str(input)

The read.spss() seems to be applying addNA() to factors so NA is being set as a 
level, and there seems to be no way to get read.spss() to bring factors in 
without doing this. This seems to be a recent change as read.spss() was not 
doing this as of a few months ago. None of the arguments in read.spss() seem to 
also stop this behaviour. I am currently on the most recent version of both R 
and the package, as of 27/05/21, and am using RStudio Version 1.4.1106.

Any thoughts?

Many Thanks,

Justin Allen

p.s. your continued maintenance and additions to R and its packages have been 
infinitely useful in my work and life and thank for that.

Justin Allen
Housing Consultant, BRE<https://bregroup.com/>
T: 07807122647


Follow BRE on Twitter: @BRE_Group<http://twitter.com/BRE_Group>

Privileged and confidential information and/or copyright material may be 
contained in this e-mail. If you are not the intended addressee you may not 
copy or deliver it to anyone else or use it in any unauthorised manner. To do 
so is prohibited and may be unlawful. If you have received this e-mail by 
mistake, please advise the sender immediately by return e-mail and destroy all 
copies. Thank you.

Building Research Establishment Ltd, Registered under number 3319324 in England 
and Wales. VAT Registration No GB 689 9499 27 
www.bregroup.com<http://www.bregroup.com>
BRE Global Limited, Registered under number 8961297 in England and Wales. 
www.breglobal.com<http://www.breglobal.com>
Building Research Establishment and BRE Global are subsidiaries of the BRE 
Trust.
BRE Trust is a company limited by guarantee, Registered under number 3282856 in 
England and Wales, and registered as a charity in England (no. 1092193) and in 
Scotland (no. SC039320). www.bretrust.org.uk<http://www.bretrust.org.uk>
Registered Offices: Bucknalls Lane, Garston, Watford, Hertfordshire WD25 9XX - 
Travelling to BRE: see 
www.bregroup.com/contact/directions/<http://www.bregroup.com/contact/directions/>


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Opportunities for Developing R Packages (Research-Based, Open-Source)

2019-04-22 Thread Justin Thong
Dear R package community,

I am uncertain whether this is appropriate for this mailing list. Please
let me know. If not, would you be so kind as to point me in a better
direction?

I am a mathematics major with a well-developed R experience. I have
graduated two years ago and have been working in business operations in a
cryptocurrency startup. I am rather rusty and I wish to venture back into
statistical research and R-package development.

My question is: For those researchers who are interested in developing
tools and algorithms for their new-founded research, be it in medical
statistics or data visualisation or machine learning, I was wondering
whether is there a possibility for collaboration. This will help me extend
my experience and possibly open more avenues for me to enter research.
I am quite aware of statistical concepts and can read research papers (I've
done a research internship in experimental design, linear algebra and data
compression, particle filters and bayes analyses). I do not expect to be
paid and am willing to commit to a project.


Yours sincerely,
Justin

*I check my email at 9AM and 4PM everyday*
*If you have an EMERGENCY, contact me at +447938674419(UK) or
+60125056192(Malaysia)*

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] igraph problem

2017-09-19 Thread Justin Thong
Run this code

tree<-graph_from_literal(1-+2:3,3-+5,1-+4);
graph.bfs(tree,root=1, neimode="out",father=TRUE,order=TRUE,unreachable =
FALSE)

I do not understand why the father values will give NA 1 1 3 1 rather than NA
1 1 1 3

The reason I am doing this is to obtain the values(by vertex names) or some
index of each individual branch in tree. Does anyone have any ideas on how
to do this?

Yours sincerely,
Justin

*I check my email at 9AM and 4PM everyday*
*If you have an EMERGENCY, contact me at +447938674419 <07938%20674419>(UK)
or +60125056192 <+60%2012-505%206192>(Malaysia)*

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] rmutil parameters for Pareto distribution

2017-08-24 Thread Justin Thong
In https://en.wikipedia.org/wiki/Pareto_distribution, it is clear what the
parameters are for the pareto distribution: *xmin *the scale parameter and
*a* the shape parameter.

I am using rmutil to generate random deviates from a pareto distribution.
It says in the documentation that the probabilty density of the pareto
distribution

The Pareto distribution has density

f(y) = s (1 + y/(m (s-1)))^(-s-1)/(m (s-1))

where m is the mean parameter of the distribution and s is the dispersion

Through my experimentation of using rpareto function from the library using
m as the scale parameter *xmin* value and s as the shape parameter* a* , I
found that the deviates generated are not all larger than *xmin*. This
leads me to believe that m and s are not the shape and scale parameter
respectively.

What is m and s? Could it be defined as the mean and variance respectively
 as shown on the wikipedia link?


Yours sincerely,
Justin

*I check my email at 9AM and 4PM everyday*
*If you have an EMERGENCY, contact me at +447938674419(UK) or
+60125056192(Malaysia)*

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Kernel Density Estimation: Generate a sample from Epanechnikov Kernel

2017-03-21 Thread Justin Thong
Below are samples from a kernel density estimated "data" with gaussian
kernel.
I really like this solution of estimation of a kernel because it is nice
and elegant.

fit<-density(data)
rnorm(N, sample(data, size = N, replace = TRUE), fit$bw)  #samples from
kernel density estimation

I am however interested in generating a kernel density estimate with
an Epanechnikov kernel

fit<-density(data,kernel = "epanechnikov")
#is there a quick way to compute the samples and INCORPORATING THE
BANDWIDTH of the #kernel density estimate


Yours sincerely,
Justin

*I check my email at 9AM and 4PM everyday*
*If you have an EMERGENCY, contact me at +447938674419(UK) or
+60125056192(Malaysia)*

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Hmisc::latex and the use of \phantom

2016-10-29 Thread Justin Bem
Dear all,



Is it possible to avoid the use of \phantom with latex function ? when I run 
latex(latex=FALSE,…) I get an error message.



When I use Format(big.mark= ‘’ ‘’) the result appear correct in R console but 
not in Latex code. Is it a way to combine numprint with latex ? col.just= 
‘n{#}{#}’ donc work



Sincerly.



Provenance : Courrier pour 
Windows 10

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Using apply on a three dimensional matrix and passing multiple arguments to user defined function

2016-09-08 Thread Justin Peter
Dear Liusfo and Jean,

Thank you both for your help and suggestions, which both work.

As Jean mentioned, there is no real speed up using the apply family, which I 
thought there would be, so I will stick with the for loop for clarity.

Liusfo, the reason I used c(1,2) in the apply function (i.e. mask_data <- 
apply(data,c(1,2),mask,y=lsmask)) was because this is what you would do if you 
wanted to sum all the two-dimensional vectors over time (for instance to 
produce an average of a field over a year).

i.e. to get the sum you would do sum_data <- apply(data,c(1,2),sum)

I thought it would extend to using my mask function.

Anyway, thanks again for both of your help.

Cheers,
Justin
--
Justin Peter
Research Fellow
International Centre for Applied Climate Sciences,
University of Southern Queensland
West St, Toowoomba, QLD, 4350
Australia

Email: justin.pe...@usq.edu.au<mailto:justin.pe...@usq.edu.au>
Ph: +61 (0) 7 4631 1181
Fax: +61 (0) 7 4631 5581
Mob: +61 (0)474 774 107


-Original Message-
From: Luisfo <luisf...@yahoo.es<mailto:luisfo%20%3cluisf...@yahoo.es%3e>>
To: "Adams, Jean" 
<jvad...@usgs.gov<mailto:%22Adams,%20jean%22%20%3cjvad...@usgs.gov%3e>>, Justin 
Peter 
<justin.pe...@usq.edu.au<mailto:justin%20peter%20%3cjustin.pe...@usq.edu.au%3e>>
CC: r-help@r-project.org 
<r-help@r-project.org<mailto:%22r-h...@r-project.org%22%20%3cr-h...@r-project.org%3e>>
Subject: Re: [R] Using apply on a three dimensional matrix and passing multiple 
arguments to user defined function
Date: Wed, 7 Sep 2016 16:05:48 +0200

Hi,

Jean's example with lapply works fine.

However, if you still want to use apply, I think this works.
One observation first. You were passing c(1,2) as second argument to apply, in 
your code. And that is what makes you have lots of NAs as a result, since your 
function is being applied twice, by rows and columns (first and second 
dimensions) respectively.
Use:
masked_data <- apply(data,3,mask,y=lsmask)
# but now masked_data has dim(nlon*nlat,ntime), so change it
dim(masked_data) <- dim(data)

The apply goes over the third dimension (second parameter '3'), so it takes 
every nlot*nlat matrix as first argument for function mask.
I think it should work.

Regards,

Luisfo Chiroque
PhD Student | PhD Candidate
IMDEA Networks Institute
http://fourier.networks.imdea.org/people/~luis_nunez/<http://fourier.networks.imdea.org/people/%7Eluis_nunez/>


On 09/07/2016 03:17 PM, Adams, Jean wrote:



Justin,

I don't think you can get the apply() function to return an array.  You
could use lapply() instead, and then use simplify2array() to convert the
list of matrices to an array.  Also, in your mask() function you don't need
the which() and you should return the x.  See my example with toy data
below.

# toy data
nlon <- 2
nlat <- 4
ntime <- 3
data <- array(1:(nlon*nlat*ntime), dim=c(nlon, nlat, ntime))
lsmask <- array(sample(0:1, size=nlon*nlat, replace=TRUE), dim=c(nlon,
nlat))

# newly defined function
mask <- function(x, y) {
  x[y==0] <- NA
  x
}

# doit
data2 <- simplify2array(lapply(1:ntime, function(i) mask(data[, , i],
lsmask)))


You may prefer to stick with the for() loop approach (for clarity or
simplicity or ...)  When I ramped up the toy data to much larger
dimensions, the lapply() approach was only slightly faster than the for()
loop approach on my PC.

data3 <- data
data3[ , , i] <- mask(data3[ , , i], lsmask)

Jean




On Tue, Sep 6, 2016 at 11:33 PM, Justin Peter 
<justin.pe...@usq.edu.au><mailto:justin.pe...@usq.edu.au>
wrote:




Dear R-user,

I have a three-dimensional matrix of atmospheric data. The first two
dimensions are spatial (lon and lat) and the third is time, such that

dim(data) <- c(nlon,nlat,ntime)

I wish to apply a land sea mask data which is a matrix of "0" and "1" if
dim(nlon,nlat)

dim(lsmask) <- c(nlon,nlat)

I wish to set all of the elements in the two-dimensional array of
data[,,ntime] for every 1:length(ntime).

I could do this in a loop:

for (i in 1:ntime){
data[,,i][which(lsmask == 0)] <- NA
}

I would like to do this using apply, but I need to pass two variables to
the function in apply (data and lsmask), where data is a two-dimensional
array.

I tried:

mask <- function(x,y) {x[which(y==0)] <- NA}

masked_data <- apply(data,c(1,2),mask,y=lsmask)

but I get back a vector of dim(nlon,nlat) populated with NA.

Any clues as to what I am missing?

Thanks in advance for you help.

Kind regards,
Justin



--
Justin Peter
Research Fellow
International Centre for Applied Climate Sciences,
University of Southern Queensland
West St, Toowoomba, QLD, 4350
Australia

Email: 
justin.pe...@usq.edu.au<mailto:justin.pe...@usq.edu.au><mailto:justin.pe...@usq.edu.au><mailto:justin.pe...@usq.edu.au>
Ph: +61 (0) 7 4631 1181
Fax: +61 (0) 7 4631 5581
Mob: +61 (0)474 774 107



[R] Using apply on a three dimensional matrix and passing multiple arguments to user defined function

2016-09-07 Thread Justin Peter
Dear R-user,

I have a three-dimensional matrix of atmospheric data. The first two dimensions 
are spatial (lon and lat) and the third is time, such that

dim(data) <- c(nlon,nlat,ntime)

I wish to apply a land sea mask data which is a matrix of "0" and "1" if 
dim(nlon,nlat)

dim(lsmask) <- c(nlon,nlat)

I wish to set all of the elements in the two-dimensional array of data[,,ntime] 
for every 1:length(ntime).

I could do this in a loop:

for (i in 1:ntime){
data[,,i][which(lsmask == 0)] <- NA
}

I would like to do this using apply, but I need to pass two variables to the 
function in apply (data and lsmask), where data is a two-dimensional array.

I tried:

mask <- function(x,y) {x[which(y==0)] <- NA}

masked_data <- apply(data,c(1,2),mask,y=lsmask)

but I get back a vector of dim(nlon,nlat) populated with NA.

Any clues as to what I am missing?

Thanks in advance for you help.

Kind regards,
Justin



--
Justin Peter
Research Fellow
International Centre for Applied Climate Sciences,
University of Southern Queensland
West St, Toowoomba, QLD, 4350
Australia

Email: justin.pe...@usq.edu.au<mailto:justin.pe...@usq.edu.au>
Ph: +61 (0) 7 4631 1181
Fax: +61 (0) 7 4631 5581
Mob: +61 (0)474 774 107




_
This email (including any attached files) is confidential and is for the 
intended recipient(s) only. If you received this email by mistake, please, as a 
courtesy, tell the sender, then delete this email.

The views and opinions are the originator's and do not necessarily reflect 
those of the University of Southern Queensland. Although all reasonable 
precautions were taken to ensure that this email contained no viruses at the 
time it was sent we accept no liability for any losses arising from its receipt.

The University of Southern Queensland is a registered provider of education 
with the Australian Government.
(CRICOS Institution Code QLD 00244B / NSW 02225M, TEQSA PRV12081 )


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Estimated Effects Not Balanced

2016-08-23 Thread Justin Thong
Hi,

Thanks Richard,

That was me playing with too many examples and having too many variables
just lying around. Thanks for the tip though.

On 22 August 2016 at 23:32, Bert Gunter <bgunter.4...@gmail.com> wrote:

> Thanks, Rich. I didn't notice that!
>
> -- Bert
> Bert Gunter
>
> "The trouble with having an open mind is that people keep coming along
> and sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>
>
> On Mon, Aug 22, 2016 at 1:43 PM, Richard M. Heiberger <r...@temple.edu>
> wrote:
> > The problem is that you have 12 observations and 1+2+10=13 degrees of
> freedom.
> > There should be 1 + 2 + 8 = 11 degrees of freedom.
> > Probably one of your variables is masked by something else in you
> workspace.
> > Protect yourself by using a data.frame
> >
> >> tmp <- data.frame(A=factor(c(1,1,1,1,1,1,2,2,2,2,2,2)),
> > + B=factor(c(1,1,2,2,3,3,1,1,2,2,3,3)),
> > + y=rnorm(12))
> >> mod <- aov(y ~ A+B, data=tmp)
> >> summary(mod)
> > Df Sum Sq Mean Sq F value Pr(>F)
> > A    1  1.553   1.553   1.334  0.281
> > B2  3.158   1.579   1.357  0.311
> > Residuals8  9.311   1.164
> >
> > On Mon, Aug 22, 2016 at 11:15 AM, Justin Thong <justinthon...@gmail.com>
> wrote:
> >> Something does not make sense in R. It has to do with the question of
> >> balance and unbalance.
> >>
> >> *A<-factor(c(1,1,1,1,1,1,2,2,2,2,2,2))*
> >> *B<-factor(c(1,1,2,2,3,3,1,1,2,2,3,3))*
> >> *y<-rnorm(12)*
> >> *mod<-aov(y~A+B)*
> >>
> >> I was under the impression that the design is balanced ie order does not
> >> effect the sums of squares. However, when I compute the anova R reports
> >> that the Estimated Effects are Unbalanced. I thought that when all
> >> combinations of levels of A and B have equal replications then the
> design
> >> is called balanced. But, R tends to think that when not all levels of A
> and
> >> levels of B have equal replication, then the "Estimated Effects are
> >> unbalanced" Is this the same as the design being unbalanced? Because
> >> for the example below, where the error occured, the order does not
> matter
> >> (which make me think that the design is balanced).
> >>
> >>
> >> *Call:*
> >> *   aov(formula = y ~ A + B)*
> >>
> >> *Terms:*
> >> *A B Residuals*
> >> *Sum of Squares   0.872572  0.025604 16.805706*
> >> *Deg. of Freedom 1 210*
> >>
> >> *Residual standard error: 1.296368*
> >> *Estimated effects may be unbalanced*
> >> --
> >> Yours sincerely,
> >> Justin
> >>
> >> *I check my email at 9AM and 4PM everyday*
> >> *If you have an EMERGENCY, contact me at +447938674419(UK) or
> >> +60125056192(Malaysia)*
> >>
> >> [[alternative HTML version deleted]]
> >>
> >> __
> >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
> >
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>



-- 
Yours sincerely,
Justin

*I check my email at 9AM and 4PM everyday*
*If you have an EMERGENCY, contact me at +447938674419(UK) or
+60125056192(Malaysia)*

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Estimated Effects Not Balanced

2016-08-22 Thread Justin Thong
Something does not make sense in R. It has to do with the question of
balance and unbalance.

*A<-factor(c(1,1,1,1,1,1,2,2,2,2,2,2))*
*B<-factor(c(1,1,2,2,3,3,1,1,2,2,3,3))*
*y<-rnorm(12)*
*mod<-aov(y~A+B)*

I was under the impression that the design is balanced ie order does not
effect the sums of squares. However, when I compute the anova R reports
that the Estimated Effects are Unbalanced. I thought that when all
combinations of levels of A and B have equal replications then the design
is called balanced. But, R tends to think that when not all levels of A and
levels of B have equal replication, then the "Estimated Effects are
unbalanced" Is this the same as the design being unbalanced? Because
for the example below, where the error occured, the order does not matter
(which make me think that the design is balanced).


*Call:*
*   aov(formula = y ~ A + B)*

*Terms:*
*A B Residuals*
*Sum of Squares   0.872572  0.025604 16.805706*
*Deg. of Freedom 1 210*

*Residual standard error: 1.296368*
*Estimated effects may be unbalanced*
-- 
Yours sincerely,
Justin

*I check my email at 9AM and 4PM everyday*
*If you have an EMERGENCY, contact me at +447938674419(UK) or
+60125056192(Malaysia)*

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Intercept in Model Matrix (Parameters not what I expected)

2016-08-21 Thread Justin Thong
I have something which has been bugging me and I have even asked this on
cross validated but I did not get a response.  Let's construct a simple
example. Below is the code.

A<-gl(2,4) #factor of 2 levels
B<-gl(4,2) #factor of 4 levels
df<-data.frame(y,A,B)

As you can see, B is nested within A.
The peculiar result I am interested in the output of the model matrix when
I fit for a nested model . *How does R decide what is included inside the
intercept?* Since we are using dummy coding, the coefficients of the model
is interpreted as the difference between a particular level and the
reference level/the intercept for an single factor model. I understand for
model ~A, A1 becomes the intercept and that for model ~A+B, A1 and B1
(both) become the intercept.

*I do not get why when we use a nested model, A1:B2 appears as a column
inside the model matrix. Why isn't the first parameter of the interaction
subspace A1:B1 or A2:B1? *I think I am missing the concept. I think the
intercept is A1. *Hence, Why do we not compare the levels of A1:B1 and
A1(intercept)  or A2:B1 and A1(intercept)?*

#nested model
> mod<-aov(y~A+A:B)
> model.matrix(mod)
  (Intercept) A2 A1:B2 A2:B2 A1:B3 A2:B3 A1:B4 A2:B4
1   1  0 0 0 0 0 0 0
2   1  0 0 0 0 0 0 0
3   1  0 1 0 0 0 0 0
4   1  0 1 0 0 0 0 0
5   1  1 0 0 0 1 0 0
6   1  1 0 0 0 1 0 0
7   1  1 0 0 0 0 0 1
8   1  1 0 0 0 0 0 1


-- 
Yours sincerely,
Justin

*I check my email at 9AM and 4PM everyday*
*If you have an EMERGENCY, contact me at +447938674419(UK) or
+60125056192(Malaysia)*

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] RJDBC - Java connection to Oracle database crashing on execution

2016-08-19 Thread Justin Holder
Hi,

I have a script that starts with some code to open a connection to an Oracle 
database, however the code is crashing R/RStudio as soon as it runs. The code 
does run successfully on another machine.

The script opens by loading the required RJDBC package:

library("RJDBC", lib.loc="C:/Program Files/R/R-3.3.1/library")

After, this I run the code below that points to the required ojdbc7.jar file, 
that should start the connection, however this crashes R, closing it down a few 
seconds later:

drv = JDBC("oracle.jdbc.OracleDriver",
   classPath="C:/Program Files/R/ojdbc7.jar", " ")

The same thing happens in Rstudio - there is no error message, the program 
simply crashes stating that "R encountered a fatal error . The session was 
terminated".

Again, the code executes exactly as written on a another machine.I have the 
latest versions of all required software/packages, so I'm not sure what is 
causing the crash.


Thanks.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] What is "args" in this function?

2016-08-02 Thread Justin Thong
Hi again I need help

*R-code*
debug(model.matrix)
model.matrix(~S)

*model.matrix code*
ans <- .External2(C_modelmatrix, t, data) #t =terms(object) , data="data
frame of object"

*modelframe C-code*
SEXP modelframe(SEXP call, SEXP op, SEXP args, SEXP rho)
{
SEXP terms, data, names, variables, varnames, dots, dotnames, na_action;
SEXP ans, row_names, subset, tmp;
char buf[256];
int i, j, nr, nc;
int nvars, ndots, nactualdots;
const void *vmax = vmaxget();

args = CDR(args);
terms = CAR(args); args = CDR(args);
row_names = CAR(args); args = CDR(args);
variables = CAR(args); args = CDR(args);
varnames = CAR(args); args = CDR(args);
dots = CAR(args); args = CDR(args);
dotnames = CAR(args); args = CDR(args);
subset = CAR(args); args = CDR(args);
na_action = CAR(args);

. . . .

I am sorry I virtually have no experience in C.
Can someone explain to me what "args" is at the point when it enters the
function? I know CAR points to the first element of an object, and CDR
points to the complement of the first element of an object.

Does "args" represent the list of t and data?
or
Does "args" represent the thrid argument in .External2 which is data?
or
something else

I am guessing this whole process of playing CAR and CDR is just a way of
extracting variables from "args" until everything thing in "args" is
assigned to.

For instance, if args=(1,2,3,4,5,6) then below correspond in square
brackets

  args = CDR(args); [(1,2,3,4,5,6)]
  terms = CAR(args) ;[(1)] args = CDR(args);[(2,3,4,5,6)]
row_names = CAR(args);[(2)] args = CDR(args);[(3,4,5,6)]
variables = CAR(args);[(3)] args = CDR(args);[(4,5,6)]
varnames = CAR(args);[(4)] args = CDR(args);[(5,6)]
   etc

Is this correct?

I am sorry if I am asking too many questions on C. Please advise if I am
posting inappropriately.



-- 
Yours sincerely,
Justin

*I check my email at 9AM and 4PM everyday*
*If you have an EMERGENCY, contact me at +447938674419(UK) or
+60125056192(Malaysia)*

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Ways to understand C code (like debug function)

2016-08-01 Thread Justin Thong
Hi

I need some advice. Note: I do not know anything from C apart from my 2
days of research.

I am currently trying to make meaning of the modelmatrix function (written
in C) and called from R function model.matrix() via .External2.

In trying to view the source code (in R) for model.matrix(), I have been
reasonably succesful thanks to the debug command. This command was good
because I was able to check line-by-line what the code was doing and obtain
an output within my R console. Furthermore, checking the values of each of
my variables while sequentially moving through the lines was also very
useful. However, just by looking at the R source code, it is insufficient
in understanding most of the computation. I have to look within the C code.
In particular, within model.matrix(), a .External2 call is executed to a C
function named modelmatrix. I downloaded the source from the website and
can view the function modelmatrix(in model.c) in a text editor. I am now
finding a way to play with the code so I understand whats going on and I
don't know what's the best way to do this.

I* was wondering whether there is an equivalent way as the debug function
to check C code line by line. ie each line of code are typed in, and an
output is obtained*. I know a package "inline" allows you to build C
functions and use them in R. But I can't find anything which does what I
want. * If this is not possible, is there an alternative good, easy way to
run through and understand the commands in C that anyone knows about.*

-- 
Yours sincerely,
Justin

*I check my email at 9AM and 4PM everyday*
*If you have an EMERGENCY, contact me at +447938674419(UK) or
+60125056192(Malaysia)*

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reference for aov()

2016-07-27 Thread Justin Thong
Hi Peter,


Thank you for your good answer. I am sorry for the late reply.

*An ortogonalized model matrix generates a decomposition of the model space
into orthogonal subspaces corresponding to the terms of the model.
Projections onto each of the subspaces are easily worked out.  E.g., for a
two-way analysis (Y~row+col+row:col) you can decompose the model effect as
a row effect, a column effect, and an interaction effect. This allows
straightforward calculation of the sums of squares of the ANOVA table. As
you are probably well aware, if the design is unbalanced, the results will
depend on the order of terms -- col + row + col:row gives a different
result.*

It may be a stupid question. How are projections of each sums of squares
easily worked out and how does the sums of squares follow easily? Does it
matter that certain parameters of the model are not estimated. R appears to
just give a sums of squares despite some of the parameters being
non-estimable.

Thank you





On 14 July 2016 at 09:50, peter dalgaard <pda...@gmail.com> wrote:

> I am not aware of a detailed documentation of this beyond the actual
> source code.
> However, the principles are fairly straightforward, except that the rules
> for constructing the design matrix from the model formula can be a bit
> arcane at times.
>
> The two main tools are the design matrix constructor (model.matrix) and a
> Gram-Schmidt type ortogonalization of its columns (the latter is called a
> QR decomposition in R, which it is, but there are several algorithms for
> QR, and the linear models codes depend on the QR algorithm being based on
> orthogonalization - so LINPACK works and LAPACK doesn't).
>
> An ortogonalized model matrix generates a decomposition of the model space
> into orthogonal subspaces corresponding to the terms of the model.
> Projections onto each of the subspaces are easily worked out.  E.g., for a
> two-way analysis (Y~row+col+row:col) you can decompose the model effect as
> a row effect, a column effect, and an interaction effect. This allows
> straightforward calculation of the sums of squares of the ANOVA table. As
> you are probably well aware, if the design is unbalanced, the results will
> depend on the order of terms -- col + row + col:row gives a different
> result.
>
> What aov() does is that it first decomposes the observations according to
> the Error() term, forming the error strata, then fits the systematic part
> of the model to each stratum in turn. In the nice cases, each term of the
> model will be estimable in exactly one stratum, and part of the aov() logic
> is to detect and remove unestimable terms. E.g., if you have a balanced two
> way layout, say individual x treatment, the variable gender is a subfactor
> of individual, so Y ~ gender * treatment + Error(individual/treatment), the
> gender effect is estimated in the individual stratum, whereas treatment and
> gender:treatment are estimated in the individual:treatment stratum.
>
> It should be noted that it is very hard to interpret the results of aov()
> unless the Error() part of the model corresponds to a balanced experimental
> design. Or put more sharply: The model implied by the decomposition into
> error strata becomes nonsensical otherwise. If you do have a balanced
> design, the error strata reduce to simple combinations of means and
> observation, so the aov() algorithm is quite inefficient, but to my
> knowledge nobody has bothered to try and do better.
>
> -pd
>
> > On 13 Jul 2016, at 18:18 , Justin Thong <justinthon...@gmail.com> wrote:
> >
> > Hi
> >
> > *I have been looking for a reference to explain how R uses the aov
> > command(at a deeper level)*. More specifically, how R reads the formulae
> > and R computes the sums of squares. I am not interested in understanding
> > what the difference of Type 1,2,3 sum of squares are. I am more
> interested
> > in finding out about how R computes ~x1:x2:x3  or how R computes ~A:x1
> > emphasizing sequential nature of the way it computes, and models even
> more
> > complicated than this.
> >
> > Yours sincerely,
> > Justin
> >
> > *I check my email at 9AM and 4PM everyday*
> > *If you have an EMERGENCY, contact me at +447938674419(UK) or
> > +60125056192(Malaysia)*
> >
> >   [[alternative HTML version deleted]]
> >
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> --
> Peter Dalgaard, Professor,
> Center for Statistics

[R] Linear Dependance of Model Matrix and How Fitted/ Sums of Squares Follow

2016-07-26 Thread Justin Thong
  0
116  4  0  0  4  4  0
117  4  0  0  4  0  4
118  4  0  0  4  0  4
119  4  0  0  4  0  4
120  4  0  0  4  0  4
121  4  0  0  0  4  4
122  4  0  0  0  4  4
123  4  0  0  0  4  4
124  4  0  0  0  4  4
125  0  4  4  4  0  0
126  0  4  4  4  0  0
127  0  4  4  4  0  0
128  0  4  4  4  0  0
129  0  4  4  0  4  0
130  0  4  4  0  4  0
131  0  4  4  0  4  0
132  0  4  4  0  4  0
133  0  4  4  0  0  4
134  0  4  4  0  0  4
135  0  4  4  0  0  4
136  0  4  4  0  0  4
137  0  4  0  4  4  0
138  0  4  0  4  4  0
139  0  4  0  4  4  0
140  0  4  0  4  4  0
141  0  4  0  4  0  4
142  0  4  0  4  0  4
143  0  4  0  4  0  4
144  0  4  0  4  0  4
145  0  4  0  0  4  4
146  0  4  0  0  4  4
147  0  4  0  0  4  4
148  0  4  0  0  4  4
149  0  0  4  4  4  0
150  0  0  4  4  4  0
151  0  0  4  4  4  0
152  0  0  4  4  4  0
153  0  0  4  4  0  4
154  0  0  4  4  0  4
155  0  0  4  4  0  4
156  0  0  4  4  0  4
157  0  0  4  0  4  4
158  0  0  4  0  4  4
159  0  0  4  0  4  4
160  0  0  4  0  4  4
161  0  0  0  4  4  4
162  0  0  0  4  4  4
163  0  0  0  4  4  4
164  0  0  0  4  4  4

*F<- factor(c(rep(1,3),rep(2,3)))*
*G<- factor(c(rep(1,2),rep(2,2),rep(3,2)))*
*H<-F<- factor(c(rep(1,3),rep(2,3)))*
*y<-rnorm(6,2)*
*test3<-aov(y~F*G)*

*model.matrix(test3)*

 (Intercept) F2 G2 G3 F2:G2 F2:G3
1  1  0  0 0 0 0
2  1  0  0 0 0 0
3  1  0  1 0 0 0
4  1  1  1 0 1 0
5  1  1  0 1 0 1
6  1  1  0 1 0 1
attr(,"assign")
[1] 0 1 2 2 3 3
attr(,"contrasts")
attr(,"contrasts")$F
[1] "contr.treatment"

attr(,"contrasts")$G
[1] "contr.treatment"

*alias(test3)*

Model :
y ~ F * G

Complete :
  (Intercept) F2 G2 G3
F2:G2  0   1  0 -1
F2:G3  0   0  0  1

*summary(test3)*

  Df Sum Sq Mean Sq F value Pr(>F)
F1   0.0479  0.0479   0.059  0.830
G2  0.9762  0.4881   0.604  0.624
Residuals   2  1.6175  0.8087
-- 
Yours sincerely,
Justin

*I check my email at 9AM and 4PM everyday*
*If you have an EMERGENCY, contact me at +447938674419
<%2B447938674419>(UK) or +60125056192 <%2B60125056192>(Malaysia)*

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Soft Question: Where to find this reference.

2016-07-25 Thread Justin Thong
I notice a lot of r documentation refer to this reference below. I can't
seem to find it anywhere.
Does anyone have a link to point to where I can either view it or buy it?


*Chambers, J. M., Freeny, A and Heiberger, R. M. (1992) Analysis of
variance; designed experiments*

-- 
Yours sincerely,
Justin

*I check my email at 9AM and 4PM everyday*
*If you have an EMERGENCY, contact me at +447938674419(UK) or
+60125056192(Malaysia)*

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Missing rows anova

2016-07-20 Thread Justin Thong
Hi Michael,

Thank you for the reply.

I am sorry I forgot to print out the anova table to make my question clear.

  DfSum Sq  Mean Sq F value   Pr(>F)
S 20.199.630e-060.8180.444
x110.0002562.560e-04   21.751   9.44e-06 ***
ID   47   0.0035247.498e-056.370 3.35e-15 ***
Resid102   0.0012011.177e-05
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

There is *no* unique value for ID for each combination of S and x1. For
example, S=1 and x1=0 can equal to either  B or C or  D or  E or  F .
Perhaps you mean that for each combination of S and x1 have different
values. If that is the case, I think maybe it makes sense.

*What I think? *
Anova has this thing where it fits the terms of 1st order first ( a formula
term including no interactions) before it fits a 2nd order term ( a formula
term including 1 interaction) and so on.

First Order--> Second Order--> Third Order--> etc

Therefore, it is known that the true fitting formula is not S+x1+S:x1+ID
but it is S+x1+ID+S:x1. Hence, it appears that ID is fitted before S:x1 but
since ID is a more refined factor than S:x1, it can be said that S:x1 is
already included in the fit of ID so R recognizes the linear dependance and
excludes the term S:x1.
In other words, S:x1 is linearly dependant to ID. And so, the row S:x1
disappears because it is considered within ID.

Does this makes sense?








On 19 July 2016 at 16:19, Michael Dewey <li...@dewey.myzen.co.uk> wrote:

> Presumably it disappears because there is a unique value of ID for eac
> combination of S*x1 so they are indistinguishable.
>
>
> On 19/07/2016 12:53, Justin Thong wrote:
>
>> Why does the S:x1 column disappear (presumably S:x1 goes into ID but I
>> dont
>> know why)? S is a factor, x1 is a covariate and ID is a factor.
>>
>> rich.side<-aov(y~S*x1+ID)
>> summary(rich.side)
>>
>> Below is the model frame
>>
>> model.frame(~S*x1+ID)
>>
>> S x1  ID
>> 1   1 12   A
>> 2   1 12   A
>> 3   1 12   A
>> 4   1 12   A
>> 5   1  0   B
>> 6   1  0   B
>> 7   1  0   B
>> 8   1  0   B
>> 9   1  0   C
>> 10  1  0   C
>> 11  1  0   C
>> 12  1  0   C
>> 13  1  0   D
>> 14  1  0   D
>> 15  1  0   D
>> 16  1  0   D
>> 17  1  0   E
>> 18  1  0   E
>> 19  1  0   E
>> 20  1  0   E
>> 21  1  0   F
>> 22  1  0   F
>> 23  1  0   F
>> 24  1  0   F
>> 25  2  6  AB
>> 26  2  6  AB
>> 27  2  6  AB
>> 28  2  6  AB
>> 29  2  6  AC
>> 30  2  6  AC
>> 31  2  6  AC
>> 32  2  6  AC
>> 33  2  6  AD
>> 34  2  6  AD
>> 35  2  6  AD
>> 36  2  6  AD
>> 37  2  6  AE
>> 38  2  6  AE
>> 39  2  6  AE
>> 40  2  6  AE
>> 41  2  6  AF
>> 42  2  6  AF
>> 43  2  6  AF
>> 44  2  6  AF
>> 45  2  0  BC
>> 46  2  0  BC
>> 47  2  0  BC
>> 48  2  0  BC
>> 49  2  0  BD
>> 50  2  0  BD
>> 51  2  0  BD
>> 52  2  0  BD
>> 53  2  0  BE
>> 54  2  0  BE
>> 55  2  0  BE
>> 56  2  0  BE
>> 57  2  0  BF
>> 58  2  0  BF
>> 59  2  0  BF
>> 60  2  0  BF
>> 61  2  0  CD
>> 62  2  0  CD
>> 63  2  0  CD
>> 64  2  0  CD
>> 65  2  0  CE
>> 66  2  0  CE
>> 67  2  0  CE
>> 68  2  0  CE
>> 69  2  0  CF
>> 70  2  0  CF
>> 71  2  0  CF
>> 72  2  0  CF
>> 73  2  0  DE
>> 74  2  0  DE
>> 75  2  0  DE
>> 76  2  0  DE
>> 77  2  0  DF
>> 78  2  0  DF
>> 79  2  0  DF
>> 80  2  0  DF
>> 81  2  0  EF
>> 82  2  0  EF
>> 83  2  0  EF
>> 84  2  0  EF
>> 85  3  4 ABC
>> 86  3  4 ABC
>> 87  3  4 ABC
>> 88  3  4 ABC
>> 89  3  4 ABD
>> 90  3  4 ABD
>> 91  3  4 ABD
>> 92  3  4 ABD
>> 93  3  4 ABE
>> 94  3  4 ABE
>> 95  3  4 ABE
>> 96  3  4 ABE
>> 97  3  4 ABF
>> 98  3  4 ABF
>> 99  3  4 ABF
>> 100 3  4 ABF
>> 101 3  4 ACD
>> 102 3  4 ACD
>> 103 3  4 ACD
>> 104 3  4 ACD
>> 105 3  4 ACE
>> 106 3  4 ACE
>> 107 3  4 ACE
>> 108 3  4 ACE
>> 

[R] Missing rows anova

2016-07-19 Thread Justin Thong
Why does the S:x1 column disappear (presumably S:x1 goes into ID but I dont
know why)? S is a factor, x1 is a covariate and ID is a factor.

rich.side<-aov(y~S*x1+ID)
summary(rich.side)

Below is the model frame

model.frame(~S*x1+ID)

S x1  ID
1   1 12   A
2   1 12   A
3   1 12   A
4   1 12   A
5   1  0   B
6   1  0   B
7   1  0   B
8   1  0   B
9   1  0   C
10  1  0   C
11  1  0   C
12  1  0   C
13  1  0   D
14  1  0   D
15  1  0   D
16  1  0   D
17  1  0   E
18  1  0   E
19  1  0   E
20  1  0   E
21  1  0   F
22  1  0   F
23  1  0   F
24  1  0   F
25  2  6  AB
26  2  6  AB
27  2  6  AB
28  2  6  AB
29  2  6  AC
30  2  6  AC
31  2  6  AC
32  2  6  AC
33  2  6  AD
34  2  6  AD
35  2  6  AD
36  2  6  AD
37  2  6  AE
38  2  6  AE
39  2  6  AE
40  2  6  AE
41  2  6  AF
42  2  6  AF
43  2  6  AF
44  2  6  AF
45  2  0  BC
46  2  0  BC
47  2  0  BC
48  2  0  BC
49  2  0  BD
50  2  0  BD
51  2  0  BD
52  2  0  BD
53  2  0  BE
54  2  0  BE
55  2  0  BE
56  2  0  BE
57  2  0  BF
58  2  0  BF
59  2  0  BF
60  2  0  BF
61  2  0  CD
62  2  0  CD
63  2  0  CD
64  2  0  CD
65  2  0  CE
66  2  0  CE
67  2  0  CE
68  2  0  CE
69  2  0  CF
70  2  0  CF
71  2  0  CF
72  2  0  CF
73  2  0  DE
74  2  0  DE
75  2  0  DE
76  2  0  DE
77  2  0  DF
78  2  0  DF
79  2  0  DF
80  2  0  DF
81  2  0  EF
82  2  0  EF
83  2  0  EF
84  2  0  EF
85  3  4 ABC
86  3  4 ABC
87  3  4 ABC
88  3  4 ABC
89  3  4 ABD
90  3  4 ABD
91  3  4 ABD
92  3  4 ABD
93  3  4 ABE
94  3  4 ABE
95  3  4 ABE
96  3  4 ABE
97  3  4 ABF
98  3  4 ABF
99  3  4 ABF
100 3  4 ABF
101 3  4 ACD
102 3  4 ACD
103 3  4 ACD
104 3  4 ACD
105 3  4 ACE
106 3  4 ACE
107 3  4 ACE
108 3  4 ACE
109 3  4 ACF
110 3  4 ACF
111 3  4 ACF
112 3  4 ACF
113 3  4 ADE
114 3  4 ADE
115 3  4 ADE
116 3  4 ADE
117 3  4 ADF
118 3  4 ADF
119 3  4 ADF
120 3  4 ADF
121 3  4 AEF
122 3  4 AEF
123 3  4 AEF
124 3  4 AEF
125 3  0 BCD
126 3  0 BCD
127 3  0 BCD
128 3  0 BCD
129 3  0 BCE
130 3  0 BCE
131 3  0 BCE
132 3  0 BCE
133 3  0 BCF
134 3  0 BCF
135 3  0 BCF
136 3  0 BCF
137 3  0 BDE
138 3  0 BDE
139 3  0 BDE
140 3  0 BDE
141 3  0 BDF
142 3  0 BDF
143 3  0 BDF
144 3  0 BDF
145 3  0 BEF
146 3  0 BEF
147 3  0 BEF
148 3  0 BEF
149 3  0 CDE
150 3  0 CDE
151 3  0 CDE
152 3  0 CDE
153 3  0 CDF
154 3  0 CDF
155 3  0 CDF
156 3  0 CDF
157 3  0 CEF
158 3  0 CEF
159 3  0 CEF
160 3  0 CEF
161 3  0 DEF
162 3  0 DEF
163 3  0 DEF
164 3  0 DEF

-- 
Yours sincerely,
Justin

*I check my email at 9AM and 4PM everyday*
*If you have an EMERGENCY, contact me at +447938674419(UK) or
+60125056192(Malaysia)*

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Reference for aov()

2016-07-13 Thread Justin Thong
Hi

*I have been looking for a reference to explain how R uses the aov
command(at a deeper level)*. More specifically, how R reads the formulae
and R computes the sums of squares. I am not interested in understanding
what the difference of Type 1,2,3 sum of squares are. I am more interested
in finding out about how R computes ~x1:x2:x3  or how R computes ~A:x1
emphasizing sequential nature of the way it computes, and models even more
complicated than this.

Yours sincerely,
Justin

*I check my email at 9AM and 4PM everyday*
*If you have an EMERGENCY, contact me at +447938674419(UK) or
+60125056192(Malaysia)*

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] General copula model with heterogeneous marginals

2015-11-30 Thread Justin Balthrop


	I am looking to model the sum of a number of random variables with  
arbitrary gamma distributions and an empirical dependence structure  
that I obtain from data. Basically I observe all of the individual  
pieces but I want to model their sum, as opposed to many copula  
questions which observe a single outcome of a multivariate process and  
seek to fit possible marginal and covariance structure.

It has been years since I coded in R, but this is what I have thus far:

library(copula)
library(scatterplot3d)
library(psych)
set.seed(1)
myCop<-  
normalCopula(param=c(.1,.1,.1,.1,.1,.2,.2,.2,.2,.2,.2,.2,.4,.4,.4,.4,.4,.5,.5,.5,.5), dim=7,  
dispstr="un")
myMvd<-mvdc(copula=myCop, margins=rep("gamma",7),  
paramMargins=list(list(shape=3,scale=4),

  list(shape=2, scale=5),
  list(shape=2, scale=5),
  list(shape=2, scale=5),
  list(shape=2, scale=5),
  list(shape=3, scale=5),
  list(shape=3, scale=5)))

simulation<- rMvdc(2,myMvd)

colnames(simulation)<-c("P1","P2","P3","P4","P5","P6","P7")

total =  
simulation[,1]+simulation[,2]+simulation[,3]+simulation[,4]+simulation[,5]+simulation[,6]+simulation[,7]


As you can see, I have forced 7 gamma distributions with a placeholder  
covariance matrix input. The problem is that I am looking to  
generalize this to the order of ~150 different marginals with  
potentially differing distributions and parameters.

Ultimately I will have the following input:
•   matrix of 150 marginal distributions with family and parameters
•   150x150 covariance matrix
And what I need to produce is the following:
An empirical CDF/PDF of the sum of realizations from 5-10 of the  
underlying marginal distributions. To be more clear, assume each  
marginal distribution is a person's response to a treatment, and I  
need to calculate the cumulative treatment effect for a sub-group of  
the population of 150. So, I will have a vector of 0s and 1s to  
identify which members of the population are grouped together for a  
trial. Then I will have a separate vector for the next group. Each  
group vector will have dim=150 but have between 5 and 10 1s with the  
rest 0s. I need a different empirical CDF for each vector.

Any help?

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Limiting state probability for Markov chain

2015-04-28 Thread Justin USHIZE RUTIKANGA
Dear All,

I am trying to determine  the liming state probability  .
my_fun-function(A,b){
for (j in 1:3){
x-A;
while ((sum(x[j,]) ==1) )
{
  x - x%*%x;
  print (x);
  if ( b%*%x==b)
  {
break;
  }}}
}
A-rbind(c(.5,.3,.2), c(.3,.3,.4),c(.1,.5,.4))
b - matrix(data=c(1,0,0), nrow=1, ncol=3, byrow=FALSE)
my_fun(A,b)

I got the  following warning
1: In if (b %*% x == b) { :
  the condition has length  1 and only the first element will be used
2: In if (b %*% x == b) { :
  the condition has length  1 and only the first element will be used
3: In if (b %*% x == b) { :
  the condition has length  1 and only the first element will be used
4: In if (b %*% x == b) { :
  the condition has length  1 and only the first element will be used
5: In if (b %*% x == b) { :
  the condition has length  1 and only the first element will be used
6: In if (b %*% x == b) { :
  the condition has length  1 and only the first element will be used
7: In if (b %*% x == b) { :
  the condition has length  1 and only the first element will be used
8: In if (b %*% x == b) { :
  the condition has length  1 and only the first element will be used
9: In if (b %*% x == b) { :
  the condition has length  1 and only the first element will be used
your help will be appreciate

Best Regard
Ushize Rutikanga Justin
Student at African Institute for Mathematical Sciences (AIMS) South Africa
E-mail:justinush...@aims.ac.za
Tel:+27717029144

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Awaiting feedback/moderation (Re: Non-convergence in boot.stepAIC function with a logit model)

2014-05-26 Thread Justin Michell
Hi

I posted a question to this list, and received an email indicating that it is 
awaiting moderation. However I have not received any feedback yet. If the 
question is not appropriate, I’d like to know where or how I should post 
differently, if possible.

Thanks and Regards
Justin Michell
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Non-convergence in boot.stepAIC function with a logit model

2014-05-22 Thread Justin Michell
Hi all

I am getting warning when I try to perform a bootstrap selection procedure on 
variables (using boot.stepAIC function in the bootStepAIC package). I had 
previously established which variables were collinear and kept the one which 
had the lowest AIC following univariate regression on each predictor. I obtain 
a candidate list of variables that are not correlated at the end of this 
procedure. I then revisit those variables that were excluded at each step using 
bootstrapping. I have referred to other list questions (such as 
http://stackoverflow.com/questions/8596160/why-am-i-getting-algorithm-did-not-converge-and-fitted-prob-numerically-0-or)
 and I see that this is a common problem with logit models, but convergence 
fails only in a bootstrapping context. 

For the first set of previously excluded variables, I added individually each 
variable to candidate list of variables and then performed bootstrapping, and 
then added more than one variable to see if the algorithm would converge. 
Sometimes it did other times not (as indicated by ‘dc’). I suppose the presence 
of multicollinearity affects this process?

From the next group  on of excluded variables I only really considered adding 
variables separately one at a time and then checked if there was an 
improvement in AIC. If one of the previously excluded variables is in the 
candidate list, then i take that variable out and add the previously excluded 
one and see if there is an improvement in AIC. 

From this reasoning I end up adding two new variables to the list. They are 
not correlated with any of the variables in candidate list, nor are they 
correlated.  

My question is, is this a valid way to come up with my best set of predictors? 
Is there a way I can monitor more closely what is going on, i.e. if 
multicollinearity in is mathematically causing the algorithm not to converge 
for some variables? 

Here is my workflow using the boot.stepAIC function in the the forward stepwise 
direction (the forward direction seems to be more robust w.r.t convergence):  

(if reproducible code is required I can happily provide it - via dropbox for 
the data)

#kept altitude (15454.23) (but not in candidate list) and excluded: 
# meanTemp (14422.72), minTemp (14435.72), bio1 (14767.88), bio6 (didn't 
converge (dc)), bio8 (15050.46), bio10 (14285.46), bio11 (14655.82), bio18 
(15445.24), bio10+bio11 (dc), # bio10 + bio11 + # bio18 (dc), bio10+bio18 (dc), 
meanTemp + bio10+bio11 (14160.33), minTemp + bio10+bio11 (14204.41), meanTemp + 
minTemp + bio10+bio11 (14135.49), meanTemp+minTemp + bio10+bio11+bio1(dc),
# bio10+bio11+bio1(dc)

fit.1 - glm(Pos/Examind ~ bio13 + bio15 + bio2 + bio3 + DstTClW + bio4 + NDVI 
+ bio5 + bio9 + bio10, weights = Examind, data = spatialVars, family=binomial)

bootGLM.1 - boot.stepAIC(fit.1, spatialVars, direction = backward, alpha = 
0.05, B = 1000)  #15445.24

# add bio10 (lowest AIC - 14285.46)
# (backward drection dc)


# kept bio2 and excluded: bio7 (dc), -bio2 + bio7 (14710.04), -bi02 - bio10 + 
bio7 (15676.62)
fit.2 - glm(Pos/Examind ~ bio13 + bio15 + bio2 + bio3 + DstTClW + bio4 + NDVI 
+ bio5 + bio9 + bio10, weights = Examind, data = spatialVars, 
family=binomial) #15302.59

bootGLM.2 - boot.stepAIC(fit.2, spatialVars, direction = forward, alpha = 
0.05, B = 1000)  

# keep bio2 in candidate list (+bio10)

# kept bio5 and excluded: altitude (dc), -bio5 + altitude (15659.26), -bio5 
+maxTemp (15637.91) 
fit.3 - glm(Pos/Examind ~ bio13 + bio15 + bio2 + bio3 + DstTClW + bio4 + NDVI 
+ bio5 + bio9 + bio10, weights = Examind, data = spatialVars, 
family=binomial) 

bootGLM.3 - boot.stepAIC(fit.3, spatialVars, direction = forward, alpha = 
0.05, B = 1000)  

# keep bio5 in candidate list (+bio10)

# kept bio17 (not in candidate list) (bio17 (14178.88)) and excluded: bio12 
(dc), bio14 (14168.77), bio16 (14287.42), bio19 (14248.65), rain (14287.45), 
bio17+bio12(14162.3)
fit.4 - glm(Pos/Examind ~ bio13 + bio15 + bio2 + bio3 + DstTClW + bio4 + NDVI 
+ bio5 + bio9 + bio10 + rain, weights = Examind, data = spatialVars, 
family=binomial) 

bootGLM.4 - boot.stepAIC(fit.4, spatialVars, direction = forward, alpha = 
0.05, B = 1000) 

# add bio14 to candiate list (+bio10)

# keptp bio15 (14168.77) (not included in candidate list) and excluded: bio17 
(14161.03)
fit.5 - glm(Pos/Examind ~ bio13 + bio15 + bio2 + bio3 + DstTClW + bio4 + NDVI 
+ bio5 + bio9 + bio10 + bio14, weights = Examind, data = spatialVars, 
family=binomial) 

bootGLM.5 - boot.stepAIC(fit.5, spatialVars, direction = forward, alpha = 
0.05, B = 1000) 

Thanks very much (for any help, advice or thoughts)
Justin 


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Maintaining data order in factanal with missing data

2013-07-26 Thread Justin Delahunty
Hi Petr,

Thanks for the quick response. Unfortunately I cannot share the data I am
working with, however please find attached a suitable R workspace with
generated data. It has the appropriate variable names, only the data has
been changed.

The last function in the list (init.dfs()) I call to subset the overall data
set into the three waves, then conduct the factor analysis on each (1 factor
CFA); it's just in a function to ease re-typing in a new workspace.


Thanks,

Justin

-Original Message-
From: PIKAL Petr [mailto:petr.pi...@precheza.cz] 
Sent: Friday, 26 July 2013 7:35 PM
To: Justin Delahunty; r-help@r-project.org
Subject: RE: [R] Maintaining data order in factanal with missing data

Hi

You provided functions, so far so good. But without data it would be quite
difficult to understand what the functions do and where could be the issue.

I suspect combination of complete cases selection together with subset and
factor behaviour. But I can be completely out of target too.

Petr

 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-bounces@r- 
 project.org] On Behalf Of s00123...@myacu.edu.au
 Sent: Friday, July 26, 2013 9:35 AM
 To: r-help@r-project.org
 Subject: [R] Maintaining data order in factanal with missing data
 
 Hi,
 
 
 
 I'm new to R, so sorry if this is a simple answer. I'm currently 
 trying to collapse some ordinal variables into a composite; the 
 program ideally should take a data frame as input, perform a factor 
 analysis, compute factor scores, sds, etc., and return the rescaled 
 scores and loadings. The difficulty I'm having is that my data set 
 contains a number of NA, which I am excluding from the analysis using 
 complete.cases(), and thus the incomplete cases are skipped. These 
 functions are for a longitudinal data set with repeated waves of data, 
 so the final rescaled scores from each wave need to be saved as 
 variables grouped by a unique ID (DMID). The functions I'm trying to 
 implement are as follows:
 
 
 
 weighted.sd-function(x,w){
 
 sum.w-sum(w)
 
 sum.w2-sum(w^2)
 
 mean.w-sum(x*w)/sum(w)
 
 
 x.sd.w-sqrt((sum.w/(sum.w^2-sum.w2))*sum(w*(x-mean.w)^2))
 
 return(x.sd.w)
 
 }
 
 
 
 re.scale-function(f.scores, raw.data, loadings){
 
 
 fz.scores-(f.scores+mean(f.scores))/(sd(f.scores))
 
 
 means-apply(raw.data,1,weighted.mean,w=loadings)
 
 
 sds-apply(raw.data,1,weighted.sd,w=loadings)
 
 grand.mean-mean(means)
 
 grand.sd-mean(sds)
 
 
 final.scores-((fz.scores*grand.sd)+grand.mean)
 
 return(final.scores)
 
 }
 
 
 
 get.scores-function(data){
 
 
 fact-
 factanal(data[complete.cases(data),],factors=1,scores=regression)
 
 f.scores-fact$scores[,1]
 
 f.loads-fact$loadings[,1]
 
 rescaled.scores-re.scale(f.scores,
 data[complete.cases(data),], f.loads)
 
 output.list-list(rescaled.scores,
 f.loads)
 
 names(output.list)- 
 c(rescaled.scores,
 factor.loadings)
 
 return(output.list)
 
 }
 
 
 
 init.dfs-function(){
 
 
 ab.1.df-subset(ab.df,,select=c(dmid,g5oab2:g5ovb1))
 
 
 ab.2.df-subset(ab.df,,select=c(dmid,w2oab3:w2ovb1))
 
 ab.3.df-subset(ab.df,,select=c(dmid,
 w3oab3, w3oab4, w3oab7, w3oab8, w3ovb1))
 
 
 
 ab.1.fa-get.scores(ab.1.df[-1])
 
 ab.2.fa-get.scores(ab.2.df[-1])
 
 ab.3.fa-get.scores(ab.3.df[-1])
 
 
 }
 
 
 
 Thanks for your help,
 
 
 
 Justin
 
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting- 
 guide.html and provide commented, minimal, self-contained, 
 reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Maintaining data order in factanal with missing data

2013-07-26 Thread Justin Delahunty
Hi Petr,

So sorry, I accidentally attached the complete data set rather than the one
with missing values. I've attached the correct file to this email. RE:
init.dfs() being local, I hadn't even thought of that. I've been away from
OOP for close to 15 years now, so it might be time to revise!

The problem I have is that with missing values the list of factor scores
returned (ab.w1.fa$factor.scores) does not map onto the originating data
frame (ab.w1.df) as it no longer includes the cases which had missing
values. So while the original data set for ab.w1.df contains 154 ordered
cases, the factor analysis contains only 150.

I am seeking a way to map the values derived from the factor analysis
(ab.w1.fa$factor.scores) back to their original ordered position, so that
these factor score variables may be merged back into the master data frame
(ab.df). A unique ID for each case is available ($dmid) which I had thought
to use when merging the new variables, however I don't know how to implement
this.


Thanks for your help,

Justin


-Original Message-
From: PIKAL Petr [mailto:petr.pi...@precheza.cz] 
Sent: Friday, 26 July 2013 10:59 PM
To: Justin Delahunty; Justin Delahunty; r-help@r-project.org
Subject: RE: [R] Maintaining data order in factanal with missing data

Hi

Well, the function init.dfs does nothing as all data frames created inside
it does not propagate to global environment and there is nothing what the
function returns.

Tha last line (when used outside a function) gives warnings but there is no
sign of error.

When 

 head(ab.1.df)
  dmid   g5oab2  g53  g54  g55   g5ovb1
11 1.418932 1.805227 2.791152 3.624116 3.425586
22 2.293907 1.187830 1.611237 1.748526 3.816533
33 2.836536 2.679523 1.279639 2.674986 2.452395
44 1.872259 3.278359 1.785872 2.458315 1.146480
55 1.467195 1.180747 3.564127 3.007682 2.109506
66 3.098512 3.151974 3.969379 3.750571 1.497358
 head(ab.2.df)
  dmid   w2oab3  w22  w23  w24   w2ovb1
11 4.831362 5.522764 7.809366 6.969172 7.398385
22 6.706346 4.101742 1.434697 5.266775 5.357641
33 3.653806 2.666885 1.209326 5.125556 4.963374
44 7.221255 7.649152 6.540398 6.648506 2.576081
55 1.848023 5.044314 2.761881 3.307220 1.454234
66 7.606429 4.911766 2.034813 2.638573 2.818834
 head(ab.3.df)
  dmid   w3oab3   w3oab4   w3oab7   w3oab8   w3ovb1
11 5.835609 6.108220 6.587721 2.451461 2.785467
22 4.973198 1.196815 6.388056 1.110877 4.226463
33 3.800367 6.697287 5.235345 6.666829 6.319073
44 1.093141 1.43 2.269252 3.194978 4.916342
55 1.975060 7.204516 4.825435 1.775874 3.484027
66 3.273361 2.243805 5.326547 5.720892 6.118723


 str(ab.1.fa)
List of 2
 $ rescaled.scores: Named num [1:154] 3.43 3.83 2.43 1.1 2.08 ...
  ..- attr(*, names)= chr [1:154] 1 2 3 4 ...
 $ factor.loadings: Named num [1:5] -0.0106 -0.0227 -0.1093 -0.0912 0.9975
  ..- attr(*, names)= chr [1:5] g5oab2 g53 g54 g55 ...
 str(ab.2.fa)
List of 2
 $ rescaled.scores: Named num [1:154] 6.34 5.24 5.3 1.91 2.16 ...
  ..- attr(*, names)= chr [1:154] 1 2 3 4 ...
 $ factor.loadings: Named num [1:5] -0.2042 0.0063 -0.2287 -0.0119 0.7138
  ..- attr(*, names)= chr [1:5] w2oab3 w22 w23 w24 ...
 str(ab.3.fa)
List of 2
 $ rescaled.scores: Named num [1:154] NaN NaN NaN NaN NaN NaN NaN NaN NaN
NaN ...
  ..- attr(*, names)= chr [1:154] 1 2 3 4 ...
 $ factor.loadings: Named num [1:5] -0.1172 0.0128 -0.0968 0.106 0.9975
  ..- attr(*, names)= chr [1:5] w3oab3 w3oab4 w3oab7 w3oab8 ...

Anyway I have no idea what you consider wrong?

Regards
Petr



 -Original Message-
 From: Justin Delahunty [mailto:a...@genius.net.au]
 Sent: Friday, July 26, 2013 2:22 PM
 To: PIKAL Petr; 'Justin Delahunty'; r-help@r-project.org
 Subject: RE: [R] Maintaining data order in factanal with missing data
 
 Hi Petr,
 
 Thanks for the quick response. Unfortunately I cannot share the data I 
 am working with, however please find attached a suitable R workspace 
 with generated data. It has the appropriate variable names, only the 
 data has been changed.
 
 The last function in the list (init.dfs()) I call to subset the 
 overall data set into the three waves, then conduct the factor 
 analysis on each
 (1 factor CFA); it's just in a function to ease re-typing in a new 
 workspace.
 
 
 Thanks,
 
 Justin
 
 -Original Message-
 From: PIKAL Petr [mailto:petr.pi...@precheza.cz]
 Sent: Friday, 26 July 2013 7:35 PM
 To: Justin Delahunty; r-help@r-project.org
 Subject: RE: [R] Maintaining data order in factanal with missing data
 
 Hi
 
 You provided functions, so far so good. But without data it would be 
 quite difficult to understand what the functions do and where could be 
 the issue.
 
 I suspect combination of complete cases selection together with subset 
 and factor behaviour. But I can be completely out of target too.
 
 Petr
 
  -Original Message-
  From: r-help-boun...@r-project.org [mailto:r-help-bounces@r- 
  project.org

[R] R CMD check package mypkg-Ex.R failed

2013-06-01 Thread Justin
prepare_Rd: f.Rd:25-31: Dropping empty section \value
prepare_Rd: f.Rd:38-40: Dropping empty section \note
prepare_Rd: f.Rd:35-37: Dropping empty section \author
prepare_Rd: f.Rd:32-34: Dropping empty section \references
prepare_Rd: f.Rd:44-46: Dropping empty section \seealso
checkRd: (5) f.Rd:0-60: Must have a
 \description
prepare_Rd: mypkg-package.Rd:33: All text must be in a section
prepare_Rd: mypkg-package.Rd:34: All text must be in a section
* checking Rd metadata ... OK
* checking Rd cross-references ... WARNING
Unknown package(s) ‘pkg’ in Rd xrefs
* checking for missing documentation entries ... OK
* checking for code/documentation mismatches ... OK
* checking Rd \usage sections ... OK
* checking Rd contents ... OK
* checking for unstated dependencies in examples ... NOTE
Warning: parse error in file ‘mypkg-Ex.R’:
11: unexpected symbol
42: 
43: ~~ simple examples
              ^
* checking examples ... ERROR
Running examples in ‘mypkg-Ex.R’ failed
The error most likely occurred in:

 ### Name: mypkg-package
 ### Title: What the package does (short line) ~~ package title ~~
 ### Aliases:
 mypkg-package mypkg
 ### Keywords: package
 
 ### ** Examples
 
 ~~ simple examples of the most important functions ~~
Error: unexpected symbol in ~~ simple examples
Execution halted


My question: How to correct the above error?

Thanks,

Justin
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Getting a vector result from a computed index

2013-03-22 Thread Justin Long
Greetings,

I confess to being new to R, which I am exploring for some simulation
modeling purposes. I am a long time programmer in PHP.

I am having some trouble getting over the steep learning curve with
some very basic things which are probably just little a-ha things in
R. I have hunted and hunted through the manual and cannot figure this
one out, so I am appealing for help.

I have the following program:

results - replicate(1,0)

for (year in 2000:2050) {
print(year);
for (i in 1:1) {
x=results[i];
if (x == 0) {
prev = results[i-1];
next = results[i+1];
prob=0.1;
if (i=1) {
if (prev==1) { prob=prob+0.4; }
}
if (i1) {
if (next==1) { prob=prob+0.4; }
}
y=runif(1,0,1);
if (yprob) { x=1; }
results[i]=x;
}
}
}

No matter how I try this (and I've tried a number of variations), I
get an error similar to

Error in next = results[i + 1] : invalid (NULL) left side of assignment

I need to be able to loop through the results vector and compare the
previous/next items in the list to the current one. Is that possible?
What am I missing?

Cordially,
Justin Long

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] two phases sampling

2013-01-10 Thread justin bem
Dear all,

I have a question about estimation in two phases sampling. 
I' have a first sample of household from a complex sampling S1, a second sample 
is drawned from S2. 

from S2, I compute an estimator y2, for households of S1 not in S2 I set y2=0. 
I have an estimator y1 on S1
My indicator is y=y1+y2. So the variance of y is v(y)=v(y1)+v(y2)+cov(y1,y2). 
Sampling theory indicate how to compute v(y1) and v(y2) but how can I compute 
cov(y1,y2) ?

Can the survey package help me for that ?


 
Justin BEM
BP 1917 Yaoundé
Tél (237) 76043774
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How do anova() and Anova(type=III) handle incomplete designs?

2012-06-18 Thread Justin Montemarano
Thanks for your response, John.  That was helpful.

I was using Type III from Anova() as a comparison to some results I had
obtained JMP, which I've lost access to and have moved on to R, and I was
confused by the error.  Given that I do have a continuous covariate, the
analyses are not likely comparable, considering your response.

I am still confused about interpretation of interactions within an anova()
with an incomplete design, as mine is.  Is the interaction term still
informative?

-
Justin Montemarano
Graduate Student
Kent State University - Biological Sciences

http://www.montegraphia.com


On Sat, Jun 16, 2012 at 9:20 PM, John Fox j...@mcmaster.ca wrote:

 Dear Justin,

 anova() and Anova() are entirely different functions; the former is part
 of the standard R distribution and the second part of the car package. By
 default, Anova() produces an error for type-III tests conducted on
 rank-deficient models because the hypotheses tested aren't generally
 sensible.

 From ?Anova:

 singular.ok
 defaults to TRUE for type-II tests, and FALSE for type-III tests (where
 the tests for models with aliased coefficients will not be
 straightforwardly interpretable); if FALSE, a model with aliased
 coefficients produces an error.

 and

 The designations type-II and type-III are borrowed from SAS, but the
 definitions used here do not correspond precisely to those employed by SAS.
 Type-II tests are calculated according to the principle of marginality,
 testing each term after all others, except ignoring the term's higher-order
 relatives; so-called type-III tests violate marginality, testing each term
 in the model after all of the others. This definition of Type-II tests
 corresponds to the tests produced by SAS for analysis-of-variance models,
 where all of the predictors are factors, but not more generally (i.e., when
 there are quantitative predictors). Be very careful in formulating the
 model for type-III tests, or the hypotheses tested will not make sense.

 I hope this helps,
  John

 
 John Fox
 Sen. William McMaster Prof. of Social Statistics
 Department of Sociology
 McMaster University
 Hamilton, Ontario, Canada
 http://socserv.mcmaster.ca/jfox/


 On Fri, 15 Jun 2012 15:01:27 -0400
  Justin Montemarano jmont...@kent.edu wrote:
  Hello all:
 
  I am confused about the output from a lm() model with an incomplete
  design/missing level.
 
  I have two categorical predictors and a continuous covariate (day) that
  I am using to model larval mass (l.mass):
 
  leaf.species has three levels - map, syc, and oak
 
  cond.time has two levels - 30 and 150.
 
  There are no response values for Map-150, so that entire, two-way, level
  is missing.
 
  When running anova() on the model with Type I SS, the full factorial
  design does not return errors; however, using package:car Anova() and
  Type III SS, I receive an singularity error unless I used the argument
  'singular.ok = T' (it is defaulted to F).
 
  So, why don't I receive an error with anova() when I do with Anova(type
  = III)?  How do anova() and Anova() handle incomplete designs, and how
  can interactions of variables with missing levels be interpreted?
 
  I realize these are fairly broad questions, but any insight would be
  helpful. Thanks, all.
 
  Below is code to illustrate my question(s):
 
lmMass - lm(log(l.mass) ~ day*leaf.species + cond.time, data =
  growth.data) #lm() without cond.time interactions
lmMassInt - lm(log(l.mass) ~ day*leaf.species*cond.time, data =
  growth.data) #lm() with cond.time interactions
anova(lmMass); anova(lmMassInt) #ANOVA summary of both models
  with Type I SS
  Analysis of Variance Table
 
  Response: log(l.mass)
 Df  Sum Sq Mean Sq F valuePr(F)
  day1  51.373  51.373 75.7451 2.073e-15
  leaf.species   2   0.340   0.170  0.25060.7786
  cond.time  1   0.161   0.161  0.23690.6271
  day:leaf.species   2   1.296   0.648  0.95510.3867
  Residuals179 121.404   0.678
  Analysis of Variance Table
 
  Response: log(l.mass)
   Df  Sum Sq Mean Sq F value  Pr(F)
  day  1  51.373  51.373 76.5651 1.693e-15
  leaf.species 2   0.340   0.170  0.2533 0.77654
  cond.time1   0.161   0.161  0.2394 0.62523
  day:leaf.species 2   1.296   0.648  0.9655 0.38281
  day:cond.time1   0.080   0.080  0.1198 0.72965
  leaf.species:cond.time   1   1.318   1.318  1.9642 0.16282
  day:leaf.species:cond.time   1   1.915   1.915  2.8539 0.09293
  Residuals  176 118.091   0.671
Anova(lmMass, type = 'III'); Anova(lmMassInt, type = 'III')
  #ANOVA summary of both models with Type III SS
  Anova Table (Type III tests)
 
  Response: log(l.mass

[R] How do anova() and Anova(type=III) handle incomplete designs?

2012-06-15 Thread Justin Montemarano
Hello all:

I am confused about the output from a lm() model with an incomplete 
design/missing level.

I have two categorical predictors and a continuous covariate (day) that 
I am using to model larval mass (l.mass):

leaf.species has three levels - map, syc, and oak

cond.time has two levels - 30 and 150.

There are no response values for Map-150, so that entire, two-way, level 
is missing.

When running anova() on the model with Type I SS, the full factorial 
design does not return errors; however, using package:car Anova() and 
Type III SS, I receive an singularity error unless I used the argument 
'singular.ok = T' (it is defaulted to F).

So, why don't I receive an error with anova() when I do with Anova(type 
= III)?  How do anova() and Anova() handle incomplete designs, and how 
can interactions of variables with missing levels be interpreted?

I realize these are fairly broad questions, but any insight would be 
helpful. Thanks, all.

Below is code to illustrate my question(s):

  lmMass - lm(log(l.mass) ~ day*leaf.species + cond.time, data =
growth.data) #lm() without cond.time interactions
  lmMassInt - lm(log(l.mass) ~ day*leaf.species*cond.time, data =
growth.data) #lm() with cond.time interactions
  anova(lmMass); anova(lmMassInt) #ANOVA summary of both models
with Type I SS
Analysis of Variance Table

Response: log(l.mass)
   Df  Sum Sq Mean Sq F valuePr(F)
day1  51.373  51.373 75.7451 2.073e-15
leaf.species   2   0.340   0.170  0.25060.7786
cond.time  1   0.161   0.161  0.23690.6271
day:leaf.species   2   1.296   0.648  0.95510.3867
Residuals179 121.404   0.678
Analysis of Variance Table

Response: log(l.mass)
 Df  Sum Sq Mean Sq F value  Pr(F)
day  1  51.373  51.373 76.5651 1.693e-15
leaf.species 2   0.340   0.170  0.2533 0.77654
cond.time1   0.161   0.161  0.2394 0.62523
day:leaf.species 2   1.296   0.648  0.9655 0.38281
day:cond.time1   0.080   0.080  0.1198 0.72965
leaf.species:cond.time   1   1.318   1.318  1.9642 0.16282
day:leaf.species:cond.time   1   1.915   1.915  2.8539 0.09293
Residuals  176 118.091   0.671
  Anova(lmMass, type = 'III'); Anova(lmMassInt, type = 'III')
#ANOVA summary of both models with Type III SS
Anova Table (Type III tests)

Response: log(l.mass)
   Sum Sq  Df F value   Pr(F)
(Intercept)   39.789   1 58.6653 1.13e-12
day3.278   1  4.8336  0.02919
leaf.species   0.934   2  0.6888  0.50352
cond.time  0.168   1  0.2472  0.61968
day:leaf.species   1.296   2  0.9551  0.38672
Residuals121.404 179
Error in Anova.III.lm(mod, error, singular.ok = singular.ok, ...) :
   there are aliased coefficients in the model
  Anova(lmMassInt, type = 'III', singular.ok = T) #Given the error
in Anova() above, set singular.ok = T
Anova Table (Type III tests)

Response: log(l.mass)
 Sum Sq  Df F value  Pr(F)
(Intercept) 39.789   1 59.3004 9.402e-13
day  3.278   1  4.8860   0.02837
leaf.species 1.356   2  1.0103   0.36623
cond.time0.124   1  0.1843   0.66822
day:leaf.species 2.783   2  2.0738   0.12877
day:cond.time0.805   1  1.1994   0.27493
leaf.species:cond.time   0.568   1  0.8462   0.35888
day:leaf.species:cond.time   1.915   1  2.8539   0.09293
Residuals  118.091 176
 



-
Justin Montemarano
Graduate Student
Kent State University - Biological Sciences

http://www.montegraphia.com
http://www.montegraphia.com/
-- 
Justin Montemarano
Graduate Student
Kent State University - Biological Sciences

http://www.montegraphia.com


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Violation of sample independence in Pearson's product-moment correlation

2012-06-01 Thread Justin Montemarano
Hi all:

There was a concern raised by reviewers of a manuscript of mine over the
proper execution of a Pearson's correlation. In brief, this was undertaken
in order to determine the relationship between the extent of wheel running
(y axis) and ethanol intake (x axis) across three, separate 10 day periods
in 7 animals.

In the paper, the correlational plots for each 10 day-period had 70 data
points: One point for each day and each animal across 10 days of
experimentation. The reviewers, however, appropriately pointed out that
this is a violation of the assumption of sample independence for Pearson's
test, and I should have had only 7 points, which would reflect the means of
my two variables for each individual animal across 10 days. Is this
appropriate or is there a means of accounting for repeated sampling with a
correlation test?

-
Justin Montemarano
Graduate Student
Kent State University - Biological Sciences

http://www.montegraphia.com

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] [R-pkgs] ANN: bigml package for R bigml_0.1.tar.gz

2012-05-09 Thread Justin Donaldson
The bigml package is an R wrapper for the BigML API:
https://bigml.com/developers

It contains straightforward methods for most of the relevant API end
points, as well as some fancier methods that allow for direct translations
from R dataframes directly into datasets appropriate for BigML.

Excerpt from an upcoming blog post, which describes the package in more
detail:

Today BigML releases the bigml package for R.  R is already well known for
 its capabilities in statistics and data analysis, and we use it internally
 for a number of different day-to-day tasks.  The bigml package enables the
 R community to easily take advantage of our highly scalable cloud based
 machine learning infrastructure, while using familiar R data structures and
 workflows.



Apologies for sending this e-mail a bit late.

Best,
-Justin

-- 
blog: http://www.scwn.net
aim: iujjd
twitter: jjdonald

[[alternative HTML version deleted]]

___
R-packages mailing list
r-packa...@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-packages

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Deleting observations from baseline that don't appear in follow up

2012-04-27 Thread justin jarvis
Hello all,
I'm almost embarrassed to post this , it seems so easy.  Suppose I have a
baseline and follow up survey but some people are missing in the follow up:

 baseline-data.frame(id=c(3,5,7,9,12), data= runif(5))
 follow.up-data.frame(id=c(3,7,9,12), data= runif(4))
 baseline
  id   data
1  3 0.66771988
2  5 0.28794744
3  7 0.01892821
4  9 0.64863175
5 12 0.86485882
 follow.up
  id  data
1  3 0.8237210
2  7 0.8140544
3  9 0.8803674
4 12 0.8031520

Here, in follow up we are missing person #5.  I need to delete him from the
baseline, so that I have an equal number of rows once again.  Obviously

baseline-baseline[-2,] won't cut it here, since in my data set I have
thousands of people.

Thanks in advance

Justin

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] system command to a specific shell (bash)

2012-04-16 Thread Justin Haynes
I need to run a bash command, but when you call system() the default shell
is sh (see my sessionInfo below).
I found the shell command (
http://www.stat.ucl.ac.be/ISdidactique/Rhelp/library/base/html/shell.html)
but it seems to be disappeared in current versions of R?
I am running all this from R CMD BATCH  with system calls to other R
scripts.

For a little more info, I'm generating sphinx documents (a python
documentation library) through R and need to use a python virtual
environment.
So I need to call system('source bin/activate'), but source isn't a
recognized command in the sh shell...


Any help is appreciated,

Justin

 sessionInfo()
R version 2.15.0 (2012-03-30)
Platform: x86_64-pc-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8LC_PAPER=C
  LC_NAME=C
 [9] LC_ADDRESS=C   LC_TELEPHONE=C
LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] graphics  grDevices utils datasets  stats grid  methods
base

other attached packages:
[1] ggplot2_0.9.0  reshape2_1.2.1 plyr_1.7.1

loaded via a namespace (and not attached):
 [1] colorspace_1.1-1   dichromat_1.2-4digest_0.5.1   MASS_7.3-16
 memoise_0.1munsell_0.3
 [7] proto_0.3-9.2  RColorBrewer_1.0-5 scales_0.2.0   stringr_0.6
 tools_2.15.0

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] system command to a specific shell (bash)

2012-04-16 Thread Justin Haynes
Thanks Jeff, but I'm running a python program that expects certain
functionality that bash provides and sh doesn't...  I can just stop using
github checkouts and use system packages though and fix this.

I'm mostly wondering where the shell command went in base R... it sounds
like it completely solves this issue but doesn't exist in my R




On Mon, Apr 16, 2012 at 10:58 AM, Jeff Newmiller
jdnew...@dcn.davis.ca.uswrote:

 You could make a hash bang bash script that sources the file and then
 proceeds to do whatever you want. Bourne shell should have no problems
 invoking another shell.
 ---
 Jeff NewmillerThe .   .  Go Live...
 DCN:jdnew...@dcn.davis.ca.usBasics: ##.#.   ##.#.  Live
 Go...
  Live:   OO#.. Dead: OO#..  Playing
 Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
 /Software/Embedded Controllers)   .OO#.   .OO#.  rocks...1k
 ---
 Sent from my phone. Please excuse my brevity.

 Justin Haynes jto...@gmail.com wrote:

 I need to run a bash command, but when you call system() the default
 shell
 is sh (see my sessionInfo below).
 I found the shell command (
 http://www.stat.ucl.ac.be/ISdidactique/Rhelp/library/base/html/shell.html
 )
 but it seems to be disappeared in current versions of R?
 I am running all this from R CMD BATCH  with system calls to other R
 scripts.
 
 For a little more info, I'm generating sphinx documents (a python
 documentation library) through R and need to use a python virtual
 environment.
 So I need to call system('source bin/activate'), but source isn't a
 recognized command in the sh shell...
 
 
 Any help is appreciated,
 
 Justin
 
  sessionInfo()
 R version 2.15.0 (2012-03-30)
 Platform: x86_64-pc-linux-gnu (64-bit)
 
 locale:
  [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
 LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
  [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8LC_PAPER=C
   LC_NAME=C
  [9] LC_ADDRESS=C   LC_TELEPHONE=C
 LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
 
 attached base packages:
 [1] graphics  grDevices utils datasets  stats grid  methods
 base
 
 other attached packages:
 [1] ggplot2_0.9.0  reshape2_1.2.1 plyr_1.7.1
 
 loaded via a namespace (and not attached):
 [1] colorspace_1.1-1   dichromat_1.2-4digest_0.5.1
 MASS_7.3-16
  memoise_0.1munsell_0.3
 [7] proto_0.3-9.2  RColorBrewer_1.0-5 scales_0.2.0
 stringr_0.6
  tools_2.15.0
 
[[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] A little exercise in R!

2012-04-14 Thread Justin Haynes
Since I thought this was a cool question, I posted it to StackOverflow.
 Vincent Zookynd's  answer is amazing and really exercises the power of R.


http://stackoverflow.com/questions/10150161/ordering-117-by-perfect-square-pairs/10150797#10150797



On Fri, Apr 13, 2012 at 10:06 PM, Bert Gunter gunter.ber...@gene.comwrote:

 ... and a moment's more consideration immediately shows it cannot be
 done for n = 18, since 16,17, and 18 cannot all be at an end.

 -- Bert

 On Fri, Apr 13, 2012 at 9:59 PM, Bert Gunter bgun...@gene.com wrote:
  Folks:
 
  IMHO this is exactly the **wrong** way t go about this. These are
  mathematical exercises that should employ mathematical thinking, not
  brute force checking of cases.
 
  Consider, for example, the 1 to 17 sequence given by Ted. Then 17
  **must** be one end of the sequence and 16 the other. (Why?) Hence,
  starting from the 17 end, the values ** must** be 17  8 1 ...
  Proceeding in this way, it takes only a couple of minutes to solve.
 
  The more interesting point which I think the question was really
  about, is can this always be done? I haven't given this any thought,
  but there may be an easy proof or counterexample. If the answer to
  this latter is no, then perhaps even more interesting is to
  characterize the set of numbers where it can/cannot be done.
 
  But this is all way off topic, no?
 
  Cheers,
  Bert
 
 
 
  On Fri, Apr 13, 2012 at 6:26 PM, Philippe Grosjean
  phgrosj...@sciviews.org wrote:
  Hi all,
 
  I got another solution, and it would apply probably for the ugliest one
 :-(
  I made it general enough so that it works for any series from 1 to n (n
 not
  too large, please... tested up to 30).
 
  Hint for a better algorithm: inspect the object 'friends' in my code:
 there
  is a nice pattern appearing there!!!
 
  Best,
 
  Philippe
 
  ..¡}))
   ) ) ) ) )
  ( ( ( ( (Prof. Philippe Grosjean
   ) ) ) ) )
  ( ( ( ( (Numerical Ecology of Aquatic Systems
   ) ) ) ) )   Mons University, Belgium
  ( ( ( ( (
  ..
 
  findSerie - function (n, tmax = 500) {
   ## Check arguments
   n - as.integer(n)
   if (length(n) != 1 || is.na(n) || n  1)
 stop('n' must be a single positive integer)
 
   tmax - as.integer(tmax)
   if (length(tmax) != 1 || is.na(tmax) || tmax  1)
 stop('tmax' must be a single positive integer)
 
   ## Suite of our numbers to be sorted
   nbrs - 1:n
 
   ## Trivial cases: only one or two numbers
   if (n == 1) return(1)
   if (n == 2) stop(The pair does not sum to a square number)
 
   ## Compute all possible pairs
   omat - outer(rep(1, n), nbrs)
   ## Which pairs sum to a square number?
   friends - sqrt(omat + nbrs) %% 1  .Machine$double.eps
   diag(friends) - FALSE # Eliminate pairs of same numbers
 
   ## Get a list of possible neighbours
   neigb - apply(friends, 1, function(x) nbrs[x])
 
   ## Nbr of neighbours for each number
   nf - sapply(neigb, length)
 
   ## Are there numbers without neighbours?
   ## then, problem impossible to solve..
   if (any(!nf))
 stop(Impossible to solve:\n,
   paste(nbrs[!nf], collapse = , ),
sum to square with nobody else!)
 
   ## Are there numbers that can have only one neighbour?
   ## Must be placed at one extreme
   toEnds - nbrs[nf == 1]
   ## I must have two of them maximum!
   l - length(toEnds)
   if (l  2)
 stop(Impossible to solve:\n,
   More than two numbers form only one pair:\n,
   paste(toEnds, collapse = , ))
 
   ## The other numbers can appear in the middle of the suite
   inMiddle - nbrs[!nbrs %in% toEnds]
 
   generateSerie - function (neigb, toEnds, inMiddle) {
 ## Allow to generate serie by picking candidates randomly
 if (length(toEnds)  1) toEnds - sample(toEnds)
 if (length(inMiddle)  1) inMiddle - sample(inMiddle)
 
 ## Choose a number to start with
 res - rep(NA, n)
 
 ## Three cases: 0, 1, or 2 numbers that must be at an extreme
 ## Following code works in all cases
 res[1] - toEnds[1]
 res[n] - toEnds[2]
 
 ## List of already taken numbers
 taken - toEnds
 
 ## Is there one number in res[1]? Otherwise, fill it now...
 if (is.na(res[1])) {
 taken - inMiddle[1]
 res[1] - taken
 }
 
 ## For each number in the middle, choose one acceptable neighbour
 for (ii in 2:(n-1)) {
   prev - res[ii - 1]
   allpossible - neigb[[prev]]
   candidate - allpossible[!(allpossible %in% taken)]
   if (!length(candidate)) break # We fail to construct the serie
   ## Take randomly one possible candidate
   if (length(candidate)  1) take - sample(candidate, 1) else
 take - candidate
   res[ii] - take
   taken - c(taken, take)
 }
 
 ## If we manage to go to the end, check last pair...
 if (length(taken) == (n - 1)) {
   take - nbrs[!(nbrs %in% taken)]
   res[n] - take
   taken 

Re: [R] A little exercise in R!

2012-04-13 Thread Justin Haynes
I thought this was kinda cool!  Here's my solution, its not robust or
probably efficient

I'd to hear improvements or other solutions!

Justin


sq.test - function(a, b) {
  ## test for number pairs that sum to squares.
  sqrt(sum(a, b)) == floor(sqrt(sum(a, b)))
}

ok.pairs - function(n, vec) {
  ## given n as a member of vec,
  ## which other members of vec satisfiy sq.test
  vec - vec[vec!=n]
  vec[sapply(vec, sq.test, b=n)]
}

grow.seq - function(y) {
  ## given a starting point (y) and a pairs list (pl)
  ## grow the squaring sequence.
  ly - length(y)
  if(ly == y[1]) return(y)

  ## this line is the one that breaks down on other number sets...
  y - c(y, max(pl[[y[ly]]][!pl[[y[ly]]] %in% y]))
  y - grow.seq(y)

  return(y)
}


## start vector
x - 1:17

## get list of possible pairs
pl - lapply(x, ok.pairs, vec=x)

## pick start at max since few combinations there.
y - max(x)
grow.seq(y)



On Fri, Apr 13, 2012 at 2:34 PM, Ted Harding ted.hard...@wlandres.netwrote:

 Greetings all!
 A recent news item got me thinking that a problem stated
 therein could provide a teasing little exercise in R
 programming.

 http://www.bbc.co.uk/news/uk-england-cambridgeshire-17680326

  Cambridge University hosts first European 'maths Olympiad'
  for girls

  The first European girls-only mathematical Olympiad
  competition is being hosted by Cambridge University.
  [...]
  Olympiad co-director, Dr Ceri Fiddes, said competition questions
  encouraged clever thinking rather than regurgitating a taught
  syllabus.
  [...]
  A lot of Olympiad questions in the competition are about
  proving things, Dr Fiddes said.

  If you have a puzzle, it's not good enough to give one answer.
  You have to prove that it's the only possible answer.
  [...]
  In the Olympiad it's about starting with a problem that anybody
  could understand, then coming up with that clever idea that
  enables you to solve it, she said.

  For example, take the numbers one up to 17.

  Can you write them out in a line so that every pair of numbers
  that are next to each other, adds up to give a square number?

 Well, that's the challenge: Write (from scratch) an R program
 that solves this problem. And make it neat.

 NOTE: If there should happen to be some R package that can solve
 this kind of problem already, without you having to think much,
 then its use is illegitimate! (I.e. will be deemed regurgitation).

 Over to you.

 With best wishes,
 Ted.

 -
 E-Mail: (Ted Harding) ted.hard...@wlandres.net
 Date: 13-Apr-2012  Time: 22:33:43
 This message was sent by XFMail

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Remove carriage return in writing tab-delimited file.

2012-04-04 Thread Justin Haynes
take a look at ?paste

paste(yourmatrix, sep='\t', collapse='')

On Wed, Apr 4, 2012 at 2:58 PM, kickout plant.breeding.cr...@gmail.com wrote:
 Having problems with the write.table function. I can write a tab delimited
 file just fine, but for each line in my matrix its inputs a carriage return
 when i dont want it to.

 For example my matrix might be:

 ID V1 V2 V3
 FARY1004 1 2 3
 FARY2067 2 3 1
 FARY4587 2 2 2

 And I want the written File to be:

 FARY1004     1     2     3FARY2067     2     3     1FARY4587     2     2
 2

 TIA

 --
 View this message in context: 
 http://r.789695.n4.nabble.com/Remove-carriage-return-in-writing-tab-delimited-file-tp4533322p4533322.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sampling rows from a list

2012-04-02 Thread Justin Haynes
## recreating your data
mydata-list(matrix(1:9, nrow=3, byrow=T),
  matrix(10:15, nrow=2, byrow=T),
  matrix(16:30, nrow=5, byrow=T))

## get the shortest matrix in your list
n - min(unlist(lapply(mydata, nrow)))

## subset the list into random samples of length n
out - lapply(mydata, function(x, n) x[sample(1:nrow(x), n),], n=n)
## this  structure is still a list though...

## converting directly to an array:
out.array - array(unlist(out), dim=c(dim(out[[1]]), length(out)))

not totally sure about what structure you're wanting in the last step,
so if i missed i apologize...

Hope that helps,

Justin


On Mon, Apr 2, 2012 at 11:24 AM, Bcampbell99 briand.campb...@ec.gc.ca wrote:
 Hi:

 I'm sure this seems like a rudimentary question, but I am not well versed
 with R syntax for lists.  I have a ragged array from which I've removed
 records (entire rows) with missing data.  The functions I used to remove the
 missing cases resulted in the generation of an R list class object, that
 looks something like this;

 mydata
 [[1]]
     [,1] [,2] [,3]
 [1,]    1    2    3
 [2,]    4    5    6
 [3,]    7    8    9

 [[2]]
     [,1] [,2] [,3]
 [1,]   10   11   12
 [2,]   13   14   15

 [[3]]
     [,1] [,2] [,3]
 [1,]   16   17   18
 [2,]   19   20   21
 [3,]   22   23   24
 [4,]   25   26   27
 [5,]   28   29   30

 Part1
 What I would like to do is draw an equal number of random row samples
 from[[1]],[[2]] and [[3]] (to preserve the structure of [,1][,2],[,3].

 Part2
 Then I would like to cocerce the list object into something like an array.

 Help scripting out part 1 or 2 would be much appreciated.

 Brian Campbell




 --
 View this message in context: 
 http://r.789695.n4.nabble.com/sampling-rows-from-a-list-tp4526831p4526831.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] list assignment syntax?

2012-03-30 Thread Justin Haynes
You can also take a look at

http://stackoverflow.com/questions/7519790/assign-multiple-new-variables-in-a-single-line-in-r

which has some additional solutions.



On Fri, Mar 30, 2012 at 4:49 PM, Peter Ehlers ehl...@ucalgary.ca wrote:
 On 2012-03-30 15:40, ivo welch wrote:

 Dear R wizards:  is there a clean way to assign to elements in a list?
  what I would like to do, in pseudo R+perl notation is

  f- function(a,b) list(a+b,a-b)
  (c,d)- f(1,2)

 and have c be assigned 1+2 and d be assigned 1-2.  right now, I use the
 clunky

   x- f(1,2
   c- x[[1]]
   d- x[[2]]
   rm(x)

 which seems awful.  is there a nicer syntax?

 regards, /iaw
 
 Ivo Welch (ivo.we...@brown.edu, ivo.we...@gmail.com)


 I must be missing something. Why not just assign to a
 vector instead of a list?

  f- function(a,b) c(a+b,a-b)

 If it's imperative that f return a list, then you
 could use

  (c, d) - unlist(f(a, b))

 to get vector (c, d).

 Peter Ehlers


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] scanning data into r

2012-03-28 Thread Justin Haynes
What have you tried?

What type of file are you trying to import from?

What do you want your data to look like in R?

take a look at ?read.table and ?readLines


On Wed, Mar 28, 2012 at 11:23 AM, joel.green joel.gr...@live.co.uk wrote:

 Hey

 I am having trouble importing data into R, my data field looks like this

 21  TEST DATA
 32  year:2012
 33
 34
  5
 36

 I require the the number at the start of each line however the text is not
 needed, i am struggling to get R to import the data with out changing the
 file itself?

 how do i import the data, i have tried using comment.char= , however this
 didnt work, any help would be much appreciated thanks



 --
 View this message in context:
 http://r.789695.n4.nabble.com/scanning-data-into-r-tp4513182p4513182.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Why does this work? plyr within-subset normalization

2012-03-28 Thread Justin Haynes
To those without access to nabble, the code in reference is:

relative - ddply(ranktable, .(Timestamp), function(x)
data.frame(relative = x[,5]/max(x[,5])))


I may be misunderstanding your question, but:

ddply splits your data.frame, ranktable, by the column Timestamp into
many smaller data.frames, one for each unique Timestamp value.

Those new small data.frames are sent one at a time to the function you
specify.
So, when you call max(x[,5]) you're taking the max of the data.frame
sent to the function rather than the max of the larger ranktable
data.frame.




On Wed, Mar 28, 2012 at 10:18 AM, z2.0 zack.abraham...@gmail.com wrote:

 Working code that normalize each row's value against the subset's maximum.



 Does the invocation of max() somehow instruct R to 'step back' and evaluate
 the subset?

 Thanks, Zack

 --
 View this message in context: 
 http://r.789695.n4.nabble.com/Why-does-this-work-plyr-within-subset-normalization-tp4512989p4512989.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to match exact phrase using gsub (or similar function)

2012-03-28 Thread Justin Haynes
In most regexs the carrot( ^ ) signifies the start of a line and the
dollar sign ( $ ) signifies the end.

gsub('^S S', 'S', a)

gsub('^S S', 'S', '3421 BIGS St')

you can use logical or inside your pattern too:

gsub('^S S|S S$| S S ', 'S', a)

the  S S  condition is difficult.

gsub('^S S|S S$| S S ', 'S', 'foo S S bar')

gives the wrong output. as does:

gsub('^S S | S S$| S S ', ' S ', 'foo S S bar')
gsub('^S S | S S$| S S ', ' S ', a)


so you might have to catch that with a second gsub.

gsub(' S S ', ' S ', 'foo S S bar')


On Wed, Mar 28, 2012 at 12:32 PM, Markus Weisner r...@themarkus.com wrote:
 trying to switch out addresses that have double directions, such as the
 following example:

 a = S S Main St  Interstate 95

 a = gsub(pattern=S S , replacement=S , a)


 … the problem is that I don't want to affect instances where this might be
 a correct address such as the following:


 3421 BIGS St


 what I want to say is switch out only if this is either of the following
 situations


 [beginning of char]S S

  S S 

 S S[end of char]


 Is there anyway of making gsub or a similar function make the replacements
 I want?  Thanks in advance for your help.


 ~Markus

        [[alternative HTML version deleted]]


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to match exact phrase using gsub (or similar function)

2012-03-28 Thread Justin Haynes
wow!  and here I thought I was starting to know most things about regexes...

On Wed, Mar 28, 2012 at 1:34 PM, William Dunlap wdun...@tibco.com wrote:
 You can use the \ and \ patterns (backslashing the backslashes) to
 mean start and end of word, respectively.  E.g.,

   addresses - c(S S Main St  Interstate 95, 3421 BIGS St)
   gsub(\\S S\\, S, addresses)
  [1] S Main St  Interstate 95 3421 BIGS St

 Bill Dunlap
 Spotfire, TIBCO Software
 wdunlap tibco.com


 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
 Behalf
 Of Justin Haynes
 Sent: Wednesday, March 28, 2012 1:24 PM
 To: Markus Weisner
 Cc: r-help@r-project.org
 Subject: Re: [R] how to match exact phrase using gsub (or similar function)

 In most regexs the carrot( ^ ) signifies the start of a line and the
 dollar sign ( $ ) signifies the end.

 gsub('^S S', 'S', a)

 gsub('^S S', 'S', '3421 BIGS St')

 you can use logical or inside your pattern too:

 gsub('^S S|S S$| S S ', 'S', a)

 the  S S  condition is difficult.

 gsub('^S S|S S$| S S ', 'S', 'foo S S bar')

 gives the wrong output. as does:

 gsub('^S S | S S$| S S ', ' S ', 'foo S S bar')
 gsub('^S S | S S$| S S ', ' S ', a)


 so you might have to catch that with a second gsub.

 gsub(' S S ', ' S ', 'foo S S bar')


 On Wed, Mar 28, 2012 at 12:32 PM, Markus Weisner r...@themarkus.com wrote:
  trying to switch out addresses that have double directions, such as the
  following example:
 
  a = S S Main St  Interstate 95
 
  a = gsub(pattern=S S , replacement=S , a)
 
 
  . the problem is that I don't want to affect instances where this might be
  a correct address such as the following:
 
 
  3421 BIGS St
 
 
  what I want to say is switch out only if this is either of the following
  situations
 
 
  [beginning of char]S S
 
   S S 
 
  S S[end of char]
 
 
  Is there anyway of making gsub or a similar function make the replacements
  I want?  Thanks in advance for your help.
 
 
  ~Markus
 
         [[alternative HTML version deleted]]
 
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide 
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Convert day of year back into a date format.

2012-03-27 Thread Justin Haynes
There may very well be a better solution, but this works.

format(strptime(dayofyear, format=%j), format=%m-%d)

On Tue, Mar 27, 2012 at 11:12 AM, Sam Albers tonightstheni...@gmail.comwrote:

 Hello,

 I am having trouble figuring out how to convert a Day of Year integer
 back into a Date format. For example I have the following:

 date -
 c('2008-01-01','2008-01-02','2008-01-03','2008-01-04','2008-01-05','2008-01-06','2008-01-07',

 '2008-01-08','2008-01-09','2008-01-10','2008-01-11','2008-01-12','2008-01-13','2008-01-14','2008-01-15',

 '2008-01-16','2008-01-17','2008-01-18','2008-01-19','2008-01-20','2008-01-21','2008-01-22','2008-01-23')

 ## this is then converted into a number corresponding to the day of
 the year like so:

 dayofyear - strptime(date, format=%Y-%m-%d)$yday + 1

 ## Now my question is how do I get back to a date format (obviously
 omitting the year).
 ## The end result is that I'd like to be able to have axis labels as
 something like Month-Day or just Month
 ## instead of just an integers which isn't always intuitive for people
 but I can't seem to figure out how to tell R
 ## to recognize an integer as a date.

 Any suggestions?

 Many thanks in advance!

 Sam

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Unable to specify order of a factor

2012-03-21 Thread Justin Montemarano
Hi all:

I'm attempting to create a faceted plot with ggplot2 and I'm having issues
with a factor's order that is used to define the facet_grid().

The factor (named total.density) has three levels - 8, 16, and 32 - and I
would like them presented in that order.  Running
order(levels(total.density)) yields the incorrect order of the facet grid -
2 3 1, corresponding with 16, 32, and 8.

I have attempted correcting the order with the following solutions (of
course, not run at once):

#total.density - relevel(total.density, '8')
#total.density - as.numeric(levels(total.density)[total.density])
#total.density - factor(total.density, levels = c('8','16','32'))
#total.density - factor(total.density, levels =
levels(total.density)[c(3,1,2)])
#library(gregmisc)
#total.density - reorder.factor(total.density, c('8', '16', '32'),
order = T)

The data are as follows:

total.density -
c(8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32)

I'm running R 2.14.2 with all packages up-to-date as of 21.3.2012.

Any help would be greatly appreciated.

-
Justin Montemarano
Graduate Student
Kent State University - Biological Sciences

http://www.montegraphia.com

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Unable to specify order of a factor

2012-03-21 Thread Justin Montemarano
Actually I've try that too, Sarah

The test is to run order(levels(total.density)), which I need to be 1 2 3,
not 2 3 1, and your solution still gives me 2 3 1.

I also don't know how to reply to this thread with the previous message
below...
-
Justin Montemarano
Graduate Student
Kent State University - Biological Sciences

http://www.montegraphia.com

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Unable to specify order of a factor

2012-03-21 Thread Justin Montemarano
I think I understand, but I believe my original interest is in the order of
levels(total.density), since ggplot appears to be using that to order the
facets.  Thus, I'm still getting three graphs, ordered (and displayed as)
16 to 32 to 8, rather than the more intuitive, 8 to 16 to 32.  I'm sorry if
I wasn't clear and/or I've missed your message.
-
Justin Montemarano
Graduate Student
Kent State University - Biological Sciences

http://www.montegraphia.com

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Unable to specify order of a factor

2012-03-21 Thread Justin Montemarano
Ista:

Your attached code did work for me; moreover, the facets were presented in
the desired order with facet_wrap() and facet_grid(), which is what I'm
using because I have a second factor used in facet_grid().

Still, my plots with total.density as a facet are coming out in 16, 32, 8,
and I'm not seeing why.  Below is my plot code -

ggplot(ag.tab[ag.tab$plant.sp == 'EC',], aes(x = days.out, y = per.remain))
 + facet_grid(total.density ~ prop.ec) +
 #add point and error bar data
 theme_set(theme_bw()) +
 geom_point() + geom_errorbar(aes(ymin = per.remain - se, ymax =
 per.remain + se), width = 3) +
 #add predicted model data
 geom_line(data = se.predict.data[se.predict.data$plant.sp == 'EC',],
 aes(x = x.values, y = predicted.values), colour = c('red')) +
 geom_line(data = dc.predict.data[dc.predict.data$plant.sp == 'EC',],
 aes(x = x.values, y = predicted.values), colour = c('blue'), linetype =
 c('dashed')) +

 xlab('Day') + ylab('Percent Mass Remaining') + opts(panel.grid.major =
 theme_blank(), panel.grid.minor = theme_blank())

Is there anything odd about it that might be producing the odd ordering
problem?  FYI, avoiding subsetting ag.tab doesn't do the trick.
-
Justin Montemarano
Graduate Student
Kent State University - Biological Sciences

http://www.montegraphia.com


On Wed, Mar 21, 2012 at 11:42 AM, Ista Zahn istaz...@gmail.com wrote:

 Hi Justin,

 this gives the correct order (8, 16, 32) on my machine:

 total.density -

 c(8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32)
 total.density - factor(total.density, levels=c(8, 16, 32), ordered=TRUE)
 str(total.density)

 order(levels(total.density))

 dat - data.frame(td = total.density, v1 = rnorm(1:length(total.density)))

 ggplot(dat, aes(x = v1)) +
  geom_density() +
  facet_wrap(~td)

 Does it work for you? If yes, then you need to tell us what you're
 doing that is different from this example. If no, please give use the
 output of sessionInfo().

 best,
 Ista

 On Wed, Mar 21, 2012 at 11:16 AM, Justin Montemarano jmont...@kent.edu
 wrote:
  I think I understand, but I believe my original interest is in the order
 of
  levels(total.density), since ggplot appears to be using that to order the
  facets.  Thus, I'm still getting three graphs, ordered (and displayed as)
  16 to 32 to 8, rather than the more intuitive, 8 to 16 to 32.  I'm sorry
 if
  I wasn't clear and/or I've missed your message.
  -
  Justin Montemarano
  Graduate Student
  Kent State University - Biological Sciences
 
  http://www.montegraphia.com
 
 [[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Unable to specify order of a factor

2012-03-21 Thread Justin Montemarano
Hi all:

I've got it... it appears that total.density was also defined in two
separate data frames (se.predict.data and dc.predict.data) with levels
order 16, 32, 8. Using relevel(), I moved 8 to the first position and it's
solved the plotting problem.

Ista's 'minimal' reproducible code request prompted me to discover my
error; thanks all.

-
Justin Montemarano
Graduate Student
Kent State University - Biological Sciences

http://www.montegraphia.com


On Wed, Mar 21, 2012 at 12:42 PM, R. Michael Weylandt 
michael.weyla...@gmail.com wrote:

 You'll also want to use dput() to send us an exact encoding of your
 data when making that reproducible example: there might be something
 subtle at play here that print methods won't show.

 Michael

 On Wed, Mar 21, 2012 at 12:28 PM, Ista Zahn istaz...@gmail.com wrote:
  On Wed, Mar 21, 2012 at 12:00 PM, Justin Montemarano jmont...@kent.edu
 wrote:
  Ista:
 
  Your attached code did work for me; moreover, the facets were presented
 in
  the desired order with facet_wrap() and facet_grid(), which is what I'm
  using because I have a second factor used in facet_grid().
 
  Still, my plots with total.density as a facet are coming out in 16, 32,
 8,
  and I'm not seeing why.  Below is my plot code -
 
  ggplot(ag.tab[ag.tab$plant.sp == 'EC',], aes(x = days.out, y =
  per.remain)) + facet_grid(total.density ~ prop.ec) +
  #add point and error bar data
  theme_set(theme_bw()) +
  geom_point() + geom_errorbar(aes(ymin = per.remain - se, ymax =
  per.remain + se), width = 3) +
  #add predicted model data
  geom_line(data = se.predict.data[se.predict.data$plant.sp ==
 'EC',],
  aes(x = x.values, y = predicted.values), colour = c('red')) +
  geom_line(data = dc.predict.data[dc.predict.data$plant.sp ==
 'EC',],
  aes(x = x.values, y = predicted.values), colour = c('blue'), linetype =
  c('dashed')) +
 
  xlab('Day') + ylab('Percent Mass Remaining') +
 opts(panel.grid.major =
  theme_blank(), panel.grid.minor = theme_blank())
 
  Is there anything odd about it that might be producing the odd ordering
  problem?  FYI, avoiding subsetting ag.tab doesn't do the trick.
 
  I don't know. Please create a minimal example that isolates the
  problem. You can start with
 
  levels(ag.tab$total.density)
 
  ggplot(ag.tab[ag.tab$plant.sp == 'EC',], aes(x = days.out, y =
 per.remain)) +
 facet_grid(total.density ~ prop.ec) +
 geom_point()
 
  Best,
  Ista
 
  -
  Justin Montemarano
  Graduate Student
  Kent State University - Biological Sciences
 
  http://www.montegraphia.com
 
 
  On Wed, Mar 21, 2012 at 11:42 AM, Ista Zahn istaz...@gmail.com wrote:
 
  Hi Justin,
 
  this gives the correct order (8, 16, 32) on my machine:
 
  total.density -
 
 
 c(8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32)
  total.density - factor(total.density, levels=c(8, 16, 32),
 ordered=TRUE)
  str(total.density)
 
  order(levels(total.density))
 
  dat - data.frame(td = total.density, v1 =
 rnorm(1:length(total.density)))
 
  ggplot(dat, aes(x = v1)) +
   geom_density() +
   facet_wrap(~td)
 
  Does it work for you? If yes, then you need to tell us what you're
  doing that is different from this example. If no, please give use the
  output of sessionInfo().
 
  best,
  Ista
 
  On Wed, Mar 21, 2012 at 11:16 AM, Justin Montemarano 
 jmont...@kent.edu
  wrote:
   I think I understand, but I believe my original interest is in the
 order
   of
   levels(total.density), since ggplot appears to be using that to order
   the
   facets.  Thus, I'm still getting three graphs, ordered (and displayed
   as)
   16 to 32 to 8, rather than the more intuitive, 8 to 16 to 32.  I'm
 sorry
   if
   I wasn't clear and/or I've missed your message.
   -
   Justin Montemarano
   Graduate Student
   Kent State University - Biological Sciences
  
   http://www.montegraphia.com
  
  [[alternative HTML version deleted]]
  
   __
   R-help@r-project.org mailing list
   https://stat.ethz.ch/mailman/listinfo/r-help
   PLEASE do read the posting guide
   http://www.R-project.org/posting-guide.html
   and provide commented, minimal, self-contained, reproducible code.
 
 
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Remove a word from a character vector value XXXX

2012-03-07 Thread Justin Haynes
Hadley's package stringr is wonderful for all things string.

library(stringr)

?str_trim

and

?str_replace are what you want.  (the base R equivalent of these two
would be ?gsub and some regular expressions)

str_trim(str_replace(d5.Region, 'Average', ''))

should do the trick.

hope that helps,
Justin


On Wed, Mar 7, 2012 at 8:03 AM, Dan Abner dan.abne...@gmail.com wrote:
 Hi everyone,

 What is the easiest way to remove the word Average and strip leading
 and trailing blanks from the character vector (d5.Region) below?

 .nrow.d5.           d5.Region
 1            1     Central Average
 2            2     Coastal Average
 3            3        East Average
 4            4  Metro East Average
 5            5 Metro North Average
 6            6 Metro South Average
 7            7  Metro West Average
 8            8   Northeast Average
 9            9   Northwest Average


 Thanks!

 Dan

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] logical to vector?

2012-03-07 Thread Justin Haynes
?as.numeric

 as.numeric(c(TRUE, FALSE))
[1] 1 0


On Wed, Mar 7, 2012 at 8:02 AM, Ed Siefker ebs15...@gmail.com wrote:
 I am trying to use the coXpress function from
 the coXpress package.  This function requires
 numerical vectors indicating which columns
 are in which group.

 The problem is, I can only figure out how
 to get a logical structure, not a numerical one.
 In other words, coXpress wants something like:
 1:3

  I have something like:
 TRUE TRUE TRUE FALSE FALSE

 Can I convert one into the other easily?

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] GPS handling libraries or (String manipulation)

2012-03-07 Thread Justin Haynes
Take a look at:
http://cran.r-project.org/web/views/Spatial.html

But I've always just parsed the string...

This is from the last time I did this, its not quite the same but you
can see the similarities.


## if data is presented as 43°02'46.60059 N need to split on the °
symbol, ' and .
to.decimal - function(vec){
  # convert all symbols to _
  vec - gsub('°','_',vec)
  vec - gsub('\'','_',vec)
  vec - gsub('\','_',vec)

  split - str_split(vec,'_')
  deg - as.numeric(sapply(split,'[',1))
  min - as.numeric(sapply(split,'[',2))
  sec - as.numeric(sapply(split,'[',3))

  deg - deg + min/60 + sec/3600
  return(deg)
}


On Wed, Mar 7, 2012 at 8:28 AM, Alaios ala...@yahoo.com wrote:
 Dear all,
 I would like to ask you if R has a library that can work with different GPS 
 formats

 For example
 I have a string of this format

 N50° 47.513 E006° 03.985
 and I would like to convert to GPS decimal format.

 that means for example converting the part N50° 47.513
 to 50 + 47/60 + 513/3600.

 Is it possible to do that with R?
 What is the name of such a library?

 I would like to thank you in advance for your help

 B.R
 Alex

        [[alternative HTML version deleted]]


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] GPS handling libraries or (String manipulation)

2012-03-07 Thread Justin Haynes
Wow... that is WAY better!

Thanks Gabor!

On Wed, Mar 7, 2012 at 8:51 AM, Gabor Grothendieck
ggrothendi...@gmail.com wrote:
 On Wed, Mar 7, 2012 at 11:28 AM, Alaios ala...@yahoo.com wrote:
 Dear all,
 I would like to ask you if R has a library that can work with different GPS 
 formats

 For example
 I have a string of this format

 N50° 47.513 E006° 03.985
 and I would like to convert to GPS decimal format.

 that means for example converting the part N50° 47.513
 to 50 + 47/60 + 513/3600.

 Is it possible to do that with R?
 What is the name of such a library?


 Use strapply to extract the digits and convert them to numeric
 followed by matrix multiplication to apply the formula:

 library(gsubfn)
 x - N50° 47.513

 c(1, 1/60, 1/3600) %*% strapply(x, \\d+, as.numeric, simplify = TRUE)


 --
 Statistics  Software Consulting
 GKX Group, GKX Associates Inc.
 tel: 1-877-GKX-GROUP
 email: ggrothendieck at gmail.com

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] regular expression

2012-02-29 Thread Justin Haynes
gsub('.+; (.+);.+','\\1',x)

or if you just want the value out:

gsub('.+; Surv\\(months\\): ([0-9]+);.+','\\1',x)

You can also look at strsplit:
 strsplit(x,';')
[[1]]
[1] 99-625: Cell type: S Surv(months): 21   
STATUS(0=alive, 1=dead): 1

 lapply(strsplit(x,';'),'[',2)
[[1]]
[1]  Surv(months): 21

But i would follow David's second suggestion and just read them in with
sep=';' instead.


Justin

On Wed, Feb 29, 2012 at 11:24 AM, Fred G bayespoker...@gmail.com wrote:

 Computer Friends,

 with the following example lines:

 [107] 98-610: Cell type: S; Surv(months): 6; STATUS(0=alive, 1=dead): 1

 [108] 99-625: Cell type: S; Surv(months): 21; STATUS(0=alive, 1=dead): 1

 i want to be able to isolate the number of months of survival for each row.

 is there a regular expression that can find the first instance of a ;,
 delete everything in front of it-- and find the second instance of an ;
 and delete everything behind it? in python there is a function line.find(),
 would be grateful to hear the R equiv; or, any other better alternatives to
 get the number of months of survival stored as a variable.

 Much Thank You!

[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Problem building up ggplot graph in a loop.

2012-02-16 Thread Justin Haynes
ggplot is looking for thisData as a column of coffs.  the most
'ggplotesque' way of doing this would be:

# melt your data to a long format:
coffs.melt - melt(coffs, id.vars = 'levels')

# plot using colour aes parameter:
ggplot(coffs.melt, aes(x=levels, y=value, colour=variable)) + geom_line() +
ylab('Total Chargeoffs')

this is untested since there is no sample data!

Justin


On Thu, Feb 16, 2012 at 2:50 PM, Keith Weintraub kw1...@gmail.com wrote:

 Folks,
  I want to automate some graphing using ggplot.

 Here is my code
 graphChargeOffs2-function(coffs) {
  ggplot(coffs, aes(levels))
  dataNames-names(coffs)[!names(coffs) == levels]
  for(i in dataNames) {
thisData-coffs[[i]]
last_plot() + geom_line(aes(y = thisData, colour = i))
  }
  last_plot() + ylab(Total Chargeoffs)
 }

 coffs is a data.frame.

 I get the following error:
 Error in eval(expr, envir, enclos) : object 'thisData' not found

 As little as I know about environments in R I am pretty sure that the
 geom_line in the loop is not able to see the thisData variable.

 Any help you could provide would be appreciated. I would be surprised if
 there wasn't a way to pass the data into the geom_line function without
 using environments. Of course I have been wrong once or twice in the past.
 :)

 Note that geom_line also can't see the input variable coffs.

 Thanks for any and all heo




 --


[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Change dataframe-structure

2012-02-13 Thread Justin Haynes
There is probably a more ellegant way, but:

 df -
data.frame(p1=c(1,2,1),p2=c(3,3,2),p3=c(2,1,3),p4=c(5,6,4),p5=c(4,4,6),p6=c(6,5,5))
 as.data.frame(t(apply(df,1,function(x) names(x)[match(1:6,x)])))
  V1 V2 V3 V4 V5 V6
1 p1 p3 p2 p5 p4 p6
2 p3 p1 p2 p5 p6 p4
3 p1 p2 p3 p4 p6 p5



On Mon, Feb 13, 2012 at 2:07 PM, David Studer stude...@gmail.com wrote:

 Hello everybody,

 I have the following problem and have no idea how to solve it:

 In my dataframe I have six columns representing six societal problems (p1,
 p2, ..., p6).
 The values are ranks between 1 (worst problem) and 6 (best problem)


 p1 p2 p3  p4 p5 p6
 1   3   2   5   4   6
 2   3   1   6   4   5
 1   2   3   4   6   5

 but I'd like the dataframe the other way round:
 123456
 p1  p3  p2  p4  p4  p6
 p3  p1  p2  p5  p6  p4
 p1  p2  p3  p4  p6  p5

 Can anyone help?

 Thanks!

[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] debug in a loop

2012-02-10 Thread Justin Haynes
You can add

if(is.na(tab[i])) browser()

or

if(is.na(tab[i])) break

see inline

On Fri, Feb 10, 2012 at 7:22 AM, ikuzar raz...@hotmail.fr wrote:

 Hi,

 I'd like to debug in a loop (using debug() and browser() etc but not
 print()
 ). I'am looking for the first occurence of NA.
 For instance:

 tab = c(1:300)
 tab[250] = NA
 len = length(tab)
 for (i in 1:len){
   if(i != len){

   if(is.na(tab[i])) browser()

 tab[i] = tab[i]+tab[i+1]
   }
 }

 I do not want to do Browse[2] n for each step ... I'd like to declare a
 browser() in the loop with a condition. But how to write stop running
 when you encounter NA ?

 Thanks for your help

 --
 View this message in context:
 http://r.789695.n4.nabble.com/debug-in-a-loop-tp4376563p4376563.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Memory allocation problem (again!)

2012-02-08 Thread Justin Haynes
32 bit windows has a memory limit of 2GB.  Upgrading to a computer thats
less than 10 years old is the best path.

But short of that, if you're just generating random data, why not do it in
two or more pieces and combine them later?

mat.1 - matrix(rnorm(5*2000),nrow=5)
mat.2 - matrix(rnorm(5*2000),nrow=5)
mat.3 - matrix(rnorm(5*2000),nrow=5)

mat.1.sums - rowSums(mat.1)
mat.2.sums - rowSums(mat.2)
mat.3.sums - rowSums(mat.3)

mat.sums - c(mat.1.sums,mat.2.sums,mat.3.sums)



On Wed, Feb 8, 2012 at 8:37 AM, Christofer Bogaso 
bogaso.christo...@gmail.com wrote:

 Dear all, I know this problem was discussed many times in forum, however
 unfortunately I could not find any way out for my own problem. Here I am
 having Memory allocation problem while generating a lot of random number.
 Here is my description:

  rnorm(5*6000)
 Error: cannot allocate vector of size 2.2 Gb
 In addition: Warning messages:
 1: In rnorm(5 * 6000) :
  Reached total allocation of 1535Mb: see help(memory.size)
 2: In rnorm(5 * 6000) :
  Reached total allocation of 1535Mb: see help(memory.size)
 3: In rnorm(5 * 6000) :
  Reached total allocation of 1535Mb: see help(memory.size)
 4: In rnorm(5 * 6000) :
  Reached total allocation of 1535Mb: see help(memory.size)
  memory.size(TRUE)
 [1] 15.75
  rnorm(5*6000)
 Error: cannot allocate vector of size 2.2 Gb
 In addition: Warning messages:
 1: In rnorm(5 * 6000) :
  Reached total allocation of 1535Mb: see help(memory.size)
 2: In rnorm(5 * 6000) :
  Reached total allocation of 1535Mb: see help(memory.size)
 3: In rnorm(5 * 6000) :
  Reached total allocation of 1535Mb: see help(memory.size)
 4: In rnorm(5 * 6000) :
  Reached total allocation of 1535Mb: see help(memory.size)

 And the Session info is here:

  sessionInfo()
 R version 2.14.0 (2011-10-31)
 Platform: i386-pc-mingw32/i386 (32-bit)

 locale:
 [1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United
 States.1252
 [3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C

 [5] LC_TIME=English_United States.1252

 attached base packages:
 [1] graphics  grDevices utils datasets  grid  stats methods
 base

 other attached packages:
 [1] ggplot2_0.8.9 proto_0.3-9.2 reshape_0.8.4 plyr_1.6  zoo_1.7-6

 loaded via a namespace (and not attached):
 [1] lattice_0.20-0

 I am using Windows 7 (home version) with 4 GB of RAM (2.16GB is usable as
 my
 computer reports). So in my case, is it not possible to generate a random
 vector with such length? Note that generating such vector is my primary
 job.
 Later I need to do something on that vector. Those Job includes:
 1. Create a matrix with 50,000 rows.
 2. Get the row sum
 3. then report some metrics on that sum values (min. 50,000 elements must
 be
 there).

 Can somebody help me with some real solution/suggesting?

 Thanks and regards,

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Force printing of excluded axis annotations

2012-02-08 Thread Justin Fincher
Howdy,
   This should be simple, but I am finding that I can't find a simple
solution.  I have a plot to which I am manually adding the annotations
to the y-axis with this command:

axis(2, 
c(-4,-3,-2,-1,0,1,2,3,4,5,6,7),labels=c(-4,-3,-2,-1,0,1,2,3,4,5,6,7),cex.axis=8)

The issue is that, apparently, R doesn't think that the -1 can fit,
even though there is most certainly enough space.  Is there a way to
force R to print all the annotations I give it, regardless of
proximity or to reduce the space it believes it needs? Thank you.

- Fincher

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Help need

2012-02-07 Thread Justin Haynes
Instead of a for loop, why not use the vectorization inherent in R?

sigmasqaured - 1
i - complex(real = 0, imaginary =1)
f - seq(0,0.5,0.1)
spectrum
- 
(sigmasqaured)/(abs(1-2.7607*exp(2*pi*i*f)+3.8106*exp(4*pi*i*f)-2.6535*exp(6*pi*i*f)+0.9258*exp(8*pi*i*f))^2)

 spectrum
[1] 9.632720e+00 1.411130e+03 2.947753e+00 6.479994e-02 1.295175e-02
8.042731e-03


On Tue, Feb 7, 2012 at 1:08 PM, Jaymin Shah jayminsh...@live.com wrote:

 I have mad a for loop to try and output values which i have named
 spectrum.  However, I cannot seem to get the answers to come out as a
 vector which is what i need. They come out as separate values which I am
 then unable to join together. Thank you

 for(f in seq(0,0.5,0.1)) {
sigmasqaured - 1
i = complex(real = 0, imaginary = 1)
spectrum -
 (sigmasqaured)/(abs(1-2.7607*exp(2*pi*i*f)+3.8106*exp(4*pi*i*f)-2.6535*exp(6*pi*i*f)+0.9258*exp(8*pi*i*f))^2)
  print(spectrum)
 }
[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] I bet apply has a solution

2012-02-06 Thread Justin Haynes
How bout:

 apply(Data..,1, function(vec) !all(vec==vec[1]))
 [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE



On Mon, Feb 6, 2012 at 10:34 AM, LCOG1 jr...@lcog.org wrote:

 Hi all
 For the data below, I would like to return a logical value indicating
 differences in the data.

 #Create data
 Data..-data.frame(a=rep(1,10),b=c(rep(1,9),2),c=c(rep(1,8),2,2))

   a b c
 1  1 1 1
 2  1 1 1
 3  1 1 1
 4  1 1 1
 5  1 1 1
 6  1 1 1
 7  1 1 1
 8  1 1 1
 9  1 1 2
 10 1 2 2


 So what I want is to return logical value telling me if all the values are
 the same.  So the result would be a b c DidChange
 1  1 1 1 FALSE
 2  1 1 1 FALSE
 3  1 1 1 FALSE
 4  1 1 1 FALSE
 5  1 1 1 FALSE
 6  1 1 1 FALSE
 7  1 1 1 FALSE
 8  1 1 1 FALSE
 9  1 1 2  TRUE
 10 1 2 2  TRUE

 I bet apply could handle this elegantly but that family of functions is
 still not 100% intuitive to me.  Thoughts.  Thanks everyone

 Cheers,
  Josh


 --
 View this message in context:
 http://r.789695.n4.nabble.com/I-bet-apply-has-a-solution-tp4362294p4362294.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Summary.formula question

2012-02-02 Thread justin bem
Dear all,

Before my question, I wish  to all of you my very best wishes for 2012.
I'm using summary.formula to make table. I have something like this :

s1-summary(fdh~cup5+cup6+schef+cpro1+stratify(id2),data=dat,na.include=F)
the output give the marginal row named overall, but is it possible to add a 
marginal column ?
 Sincerly


 
Justin BEM
BP 1917 Yaoundé
Tél (237) 76043774
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Calculate the natural log of cdf between 2 intervals

2012-02-02 Thread justin jarvis
Hello all,
I was wondering if there is an R function to do the following:

[*] log(pnorm(x)-pnorm(y)), where xy.

I don't want all the area under the natural log of the normal pdf less than
x, I only want the area between y and x.

I am aware of the ability to specify log.p=TRUE, which gives me the log of
the probability that X=x.  This does not help me, because the following
code:
pnorm(x, log.p=TRUE)-pnorm(y,log.p=TRUE) is not the same as [*]
mathematically.

I cannot use [*] because some of my x's are far less than the mean, more
than 10 sd.  This causes me to take the log(0) which is an error.  Thus, I
need to stay in the log scale, since, for z less than 10 sd below the mean,

log(pnorm(z)) is an error, and
pnorm(z,log.p=TRUE) is stable even though theoretically they are equivalent.

Thanks for your time

Justin

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Select elements from text

2012-01-24 Thread Justin Haynes
how bout using read.table(... , sep= ).

That would give you a vector of single words.  then

grepl(\\[[9-z]+\\],x)

will return a boolean vector


 x-c('test','[bracket]','hi]','[blah','foo','[bar]')
 grepl('\\[[9-z]+\\]',x)
[1] FALSE  TRUE FALSE FALSE FALSE  TRUE
 x[grepl('\\[[9-z]+\\]',x)]
[1] [bracket] [bar]

You might need a more complex reg-ex to catch them all incase of
([citation]) instances for example.

Justin

On Tue, Jan 24, 2012 at 6:52 AM, mdvaan mathijsdev...@gmail.com wrote:

 Hi,

 I have a series of MS word files and each file contains plain text. From
 these texts I would like to extract only those elements (read: words) that
 are between square brackets. Example of a text:

 Most fundamentally, it has led to an effort to clarify the organizational
 form concept. According to them [see also Smith, Jones and Carroll 2002],
 categories emerge as audience members recognize dissimilarities among
 groups
 of consumers and label them as members of a common set [Nicol 2000].

 Now I would like to get the following selection:

 see also Smith, Jones and Carroll 2002
 Nicol 2000

 Any ideas on how to do this? What would be the best way to import the text
 in R? The entire text as an element in a dataframe? Thank you very much!

 Best,

 Mathijs


 --
 View this message in context:
 http://r.789695.n4.nabble.com/Select-elements-from-text-tp4323947p4323947.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] drop columns whose rows are all 0

2012-01-24 Thread Justin Haynes
 dataset-data.frame(a=1:10,b=c(0,0,0,1,0,0,0,0,1,0),c=rep(0,10))
 apply(dataset,2,function(x) all(x==0))
a b c
FALSE FALSE  TRUE

 dataset[,!apply(dataset,2,function(x) all(x==0))]
a b
1   1 0
2   2 0
3   3 0
4   4 1
5   5 0
6   6 0
7   7 0
8   8 0
9   9 1
10 10 0




On Tue, Jan 24, 2012 at 8:14 AM, Francisco franciscororol...@google.comwrote:

 Hello,
 I have a dataset with 40 variables, some of them are always 0 (each row).
 I would like to make a subset containing only the columns which values are
 not all 0, but I don't know how to do it.

 I tried:

 for(cut_column in 1:40) {

 if(sum(dataset[,cut_column])!=**0) {
columns_useful-c(columns_**
 useful,dataset[cut_column])

 }
 }

 sorted_dataset-subset(**dataset, select=columns_useful)

 But it doesn't work.
 Thank you

 Francisco

 __**
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/**listinfo/r-helphttps://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/**
 posting-guide.html http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How can I access information stored after I run a command in R?

2012-01-23 Thread Justin Haynes
?str tells you about the object.

str(MAX3(a,'asy',1))

from that you can see the names of the various parts including p.value.

foo - MAX3(a,'asy',1)$p.value



On Mon, Jan 23, 2012 at 9:32 AM, Tiago V. Pereira
tiago.pere...@mbe.bio.brwrote:

 Dear all,

 Supposed I run the following command:

 ###
 #install.packages(Rassoc, dependencies=TRUE)
 library(Rassoc)
 ca=c(139,249,112)

 co=c(136,244,120)

 a=rbind(ca,co)

 MAX3(a,asy,1)
 ##

 I get:

The MAX3 test using the asy method

 data:  a
 statistic = 0.5993, p-value = 0.7933


 How can one save the result 0.7933 into a file?

 say:

 foo - 0.7933

 write.table(foo, file =/home/foo.txt, sep =  ,
 row.names=FALSE,col.names=TRUE, quote=FALSE, qmethod = double)


 However, instead of typing the value above, I would like to replace it by
 the macro (scalar, local) that has the accurate p-value.

 thanks in advance for your help.

 Tiago

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] colored outliers

2012-01-20 Thread Justin Haynes
TOC_NI-read.csv2(C:/Users/hilliges/Desktop/Master/Daten/Statistik/TOC-NI.csv,
sep=;, dec=,, encoding=UTF-8)
circ-TOC_NI[order(TOC_NI$NI,decreasing=T),][1:4,]
plot(NI~TOC,data=TOC_NI,col=blue, pch=16, xlim=c(0,450))
abline(lm(NI~TOC,data=TOC_NI),col = red,lwd=3)
points(NI~TOC,data=TOC_NI,col='red',pch=1,size=3)  ## this line is coloring
all points because you're using TOC_NI still

points(NI~TOC,data=circ,col='red',pch=1,size=3)  ## now we're only plotting
the four points in circ.


sorry for the confusion.  however, in the future please provide a
reproducible data set along with your question so we can more easily help.

Justin


On Fri, Jan 20, 2012 at 5:49 AM, Geophagus
falk.hilli...@twain-systems.comwrote:

 Dear Petr and Justin,
 my problem ist, that I only want to have the 4 highest values for Ni as a
 red point or with a red circle. The other points should not be modificated.
 In your proposals always all points get a red circle or a red point not
 only
 the 4 highest Ni values!
 I hope you could understand me!
 Thanks  for your help!
 GeO


 --
 View this message in context:
 http://r.789695.n4.nabble.com/colored-outliers-tp4282207p4313278.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Stacked barchart in ggplot (or other library)

2012-01-20 Thread Justin Haynes
to use ggplot:


dat-data.frame(num=1:3,usage=c(4,2,5),cap=c(10,20,10),diff=c(6,18,5))
dat.melt-melt(dat,id.var=c('num','cap'))
ggplot(dat.melt)+geom_bar(aes(x=num,y=value,fill=variable),stat='identity')



On Fri, Jan 20, 2012 at 12:30 PM, Jean V Adams jvad...@usgs.gov wrote:

 Bart6114 wrote on 01/20/2012 08:54:39 AM:

  Hey,
 
  I want to create a stacked barchart in R for the following dataset
  (http://pastebin.com/pyHUNgr2):
 
  #   usage   capacity   diff
  1   4   10  6
  2   2   20  18
  3   5   10  5
 
  The stacked barchart should, in one plot show each line of the dataset
 as a
  stacked bar using data from 'usage' and 'diff' to create the stacked
 bar.
 
  I can't find a good example of how to do this on the ggplot2 site.
 
  Thanks in advance!


 See the help on barplot:
 ?barplot

 For example:

 df - data.frame(usage=c(4, 2, 5), capacity=c(10, 20, 10), diff=c(6, 18,
 5))
 barplot(t(as.matrix(df[, 1:2])))

 Jean
[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Establishing groups using something other than ifelse()

2012-01-19 Thread Justin Haynes
how bout

levels(df$z)[grep('A',levels(df$z))] - 'A'
levels(df$z)[grep('B',levels(df$z))] - 'B'
levels(df$z)[grep('C',levels(df$z))] - 'C'

does that do what you're wanting?


On Thu, Jan 19, 2012 at 3:05 PM, Sam Albers tonightstheni...@gmail.comwrote:

 Hello all,

 This is one of those Is there a better way to do this questions. Say
 I have a dataframe (df) with a grouping variable (z). This is my base
 data. Now I know that there is a higher order level of grouping that
 exist for my group variable. So what I want to do is create a new
 column that express that higher order level of grouping based on
 values in the sub-group (z  in this case). In the past I have used
 ifelse() but this tends to get fairly redundant and messy with a large
 amount of sub-groupings (z). I've created a sample dataset below. Can
 anyone recommend a better way of achieving what I am currently
 achieving with ifelse()? A long series of ifelse statements makes me
 think that there is something better for this.

 ## Dataframe creation
 df - data.frame(x=runif(36, 0, 120),
   y=runif(36, 0, 120),

 z=factor(c(A1,A1,A2,A2,B1,B1,B2,B2,C1,C,C2,C2))
   )

 ## Current method is grouping
 df$Big.Group - with(df, ifelse(df$z==A1,A, ifelse(df$z==A2,A,
 ifelse(df$z==B1, B, ifelse(df$z==B2, B, C)


 So any suggestions? Thanks in advance!

 Sam

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] png output on a server?

2012-01-18 Thread Justin Haynes
I've got R running on a gentoo server that doesn't have X11 installed.  Its
a custom build to keep those dependencies at bay!  However, some of my
scripts use the base png() function and ggplot2. But, png uses X11.

A google search suggests using the Cairo package, which works... but
changes the fonts (specifically the size of the font).  Adjusting the
pointsize doesn't seem to have much effect.

Aside from tuning the CairoPNG function to make my graphs look right, has
anyone found a good way to avoid the X11 dependency but still use the base
png function?

If anyone has experience with CairoPNG and making it look like the base png
function, id love to hear what you've learned!


Thanks,

Justin


 capabilities()
jpeg  png tifftcltk  X11 aqua http/ftp  sockets
libxml fifo   clediticonv  NLS  profmem
   FALSEFALSEFALSEFALSEFALSEFALSE TRUE TRUE
TRUE TRUE TRUE TRUE TRUEFALSE
   cairo
   FALSE


 sessionInfo()
R version 2.14.1 (2011-12-22)
Platform: x86_64-pc-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8LC_PAPER=C
  LC_NAME=C
 [9] LC_ADDRESS=C   LC_TELEPHONE=C
LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics  grDevices utils datasets  grid  methods
base

other attached packages:
[1] Cairo_1.5-1   ggplot2_0.8.9 proto_0.3-9.2 reshape_0.8.4 plyr_1.7.1

loaded via a namespace (and not attached):
[1] tools_2.14.1


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Points inside a polygon

2012-01-12 Thread Justin Haynes

On Wed 11 Jan 2012 08:28:03 PM PST, Hasan Diwan wrote:

I have a list of bounds for a series of polygons. I do understand the
formula to determine whether point i is within polygon X (X[x1]  i[x]
  X[x2]  i[x]  X[y1]  i[y]  X[y2]  i[y]), and I can apply this
throughout the dataset. However, this naive algorithm doesn't scale
very well. The data set contains 10,000 points consisting of (n,e)
pairs where I'm interested in which are inside polygons denoted by
vertices (V[x1]/V[y1],V[x2],V[y2]). Is there a shortcut to accomplish
this goal? Many thanks!  -- H



Check out the splancs package.  particularly the inout function.

Justin

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] relative frequency plot using ggplot or other function

2012-01-12 Thread Justin Haynes

On Thu 12 Jan 2012 09:02:27 AM PST, Mary Kindall wrote:

Hi
I have a data frame in the following form. There are two groups and for
each 'width' relative frequency for group1 and group2 is given. How to plot
this in R using ggplot or other package.


  Width   relativeFrequency1   relativeFrequency2
1   100 0.0006388783 0.02265428
2   200 0.0022677303 0.02948625
3   300 0.0061182673 0.01739936
4   400 0.0152237225 0.02569902
5   500 0.0300215262 0.03639880
6   600 0.0597610250 0.07717765


Thanks



not sure exactly what you're looking for but...


dat-data.frame(width=1:6*100,rel1=runif(6), rel2=runif(6))
dat.melt-melt(dat,id.var='width')
ggplot(dat.melt,aes(x=factor(width),y=value,fill=variable))+geom_bar(stat='identity',position='dodge')


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] relative frequency plot using ggplot or other function

2012-01-12 Thread Justin Haynes
ggplot(dat.melt,aes(x=width,y=value,fill=variable,colour=variable))+geom_density(stat='identity',alpha=0.5)

the fill and colour variables can be removed if you want.

or

ggplot(dat.melt,aes(x=width,y=value,fill=variable))+geom_density(stat='identity',alpha=0.5)+facet_wrap(~variable,ncol=1)

same with this version.



On Thu, Jan 12, 2012 at 9:35 AM, Mary Kindall mary.kind...@gmail.comwrote:

 Hi this is exactly what i am looking for but I do not like to draw as
 histogram instead I want two separate plot for this data.  Something like
 the ones shown in the following link. Please disregard the legends of the
 following fig.


 http://had.co.nz/ggplot2/graphics/55078149a733dd1a0b42a57faf847036.png

 http://had.co.nz/ggplot2/graphics/90983232ced45a93d9fbbe40afffd69a.png

 Thanks

 On Thu, Jan 12, 2012 at 12:13 PM, Justin Haynes jto...@gmail.com wrote:

 On Thu 12 Jan 2012 09:02:27 AM PST, Mary Kindall wrote:

 Hi
 I have a data frame in the following form. There are two groups and for
 each 'width' relative frequency for group1 and group2 is given. How to
 plot
 this in R using ggplot or other package.


  Width   relativeFrequency1   relativeFrequency2
 1   100 0.0006388783 0.02265428
 2   200 0.0022677303 0.02948625
 3   300 0.0061182673 0.01739936
 4   400 0.0152237225 0.02569902
 5   500 0.0300215262 0.03639880
 6   600 0.0597610250 0.07717765


 Thanks


 not sure exactly what you're looking for but...

  dat-data.frame(width=1:6*100,**rel1=runif(6), rel2=runif(6))
 dat.melt-melt(dat,id.var='**width')
 ggplot(dat.melt,aes(x=factor(**width),y=value,fill=variable))**
 +geom_bar(stat='identity',**position='dodge')






 --
 -
 Mary Kindall
 Yorktown Heights, NY
 USA



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Add color to Boxplot by value

2012-01-12 Thread Justin Haynes
how bout:

dat-data.frame(val=rnorm(100,12,10),x=letters[1:4])
col.val-ddply(dat,.(x),summarise,mean(val))
col.val$breaks-cut(col.val$..1,c(0,9,15,Inf))
dat.merge-merge(dat,col.val)
ggplot(dat.merge,aes(x=x,y=val,colour=breaks))+geom_boxplot()+scale_color_manual(values=c('green','yellow','red'))


On Thu, Jan 12, 2012 at 7:45 AM, KWyshak kwys...@illumina.com wrote:

 I have a boxplot of Production run rates per 10 minute intervals and I
 would
 like to color code them by the average (i.e. 15ppm = green, 9ppm = red,
 everything else yellow).

 Is there a way to do this?

 http://r.789695.n4.nabble.com/file/n4289381/RunRateBoxWhisker.png

 --
 View this message in context:
 http://r.789695.n4.nabble.com/Add-color-to-Boxplot-by-value-tp4289381p4289381.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] colored outliers

2012-01-10 Thread Justin Haynes
# find top 4 points
circ
- 
TOC_NI[order(TOC_NI$NI,decreasing=T),][1:4,]TOC_NI[order(TOC_NI$NI,decreasing=T),][1:4,]

# add them to your plot!
plot(NI~TOC,data=TOC_NI,col=blue, pch=16, xlim=c(0,450))
abline(lm(NI~TOC,data=TOC_NI),col = red,lwd=3)
points(NI~TOC,data=TOC_NI,col='red',pch=1,size=3)



Justin

On Tue, Jan 10, 2012 at 7:11 AM, Geophagus
falk.hilli...@twain-systems.comwrote:

 Hi @ all,
 I have question how to mark significant outliers in R.
 This is my very simple script to plot a regression:

 TOC_NI-read.csv2(C:/Users/XYZ/Desktop/Master/Daten/Statistik/TOC-NI.csv,
 sep=;, dec=,, encoding=UTF-8)
 plot(NI~TOC,data=TOC_NI,col=blue, pch=16, xlim=c(0,450))
 abline(lm(NI~TOC,data=TOC_NI),col = red,lwd=3)
 summary(lm(NI~TOC,data=TOC_NI))

 The result is the following pic:
 http://r.789695.n4.nabble.com/file/n4282207/nickel_TOC_5f.png
 nickel_TOC_5f.png

 Now I want to make small red circles around the four highest values of Ni.
 Does anyone has an idea how to do that?
 Thanks a lot!

 Best Regards
 Geophagus




 --
 View this message in context:
 http://r.789695.n4.nabble.com/colored-outliers-tp4282207p4282207.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] colored outliers

2012-01-10 Thread Justin Haynes
woops! see inline.


Hope that helps, and enjoy R.


Justin

On Tue, Jan 10, 2012 at 8:40 AM, Geophagus
falk.hilli...@twain-systems.comwrote:

 Hi Justin,
 thanks a lot for your quick answer.
 If I use your code, all points become red.
 How do you include the sorted and separated four values into the points
 argument?
 The variable in your script is called circ but this is not fronted up
 anymore.
 Here the script again:


 TOC_NI-read.csv2(C:/Users/hilliges/Desktop/Master/Daten/Statistik/TOC-NI.csv,
 sep=;, dec=,, encoding=UTF-8)


this line just needs trimming.  not sure how i missed that on my copy...
anyway, order puts the data.frame in order of the given vector, default
behavior sorts in ascending order unless you specify decreasing=TRUE.

circ-TOC_NI[order(TOC_NI$NI,decreasing=T),][1:4,]


and it should work


 plot(NI~TOC,data=TOC_NI,col=blue, pch=16, xlim=c(0,450))
 abline(lm(NI~TOC,data=TOC_NI),col = red,lwd=3)
 points(NI~TOC,data=TOC_NI,col='red',pch=1,size=3)

 Thanks a lot for your help!
 GeO



 --
 View this message in context:
 http://r.789695.n4.nabble.com/colored-outliers-tp4282207p4282481.html
 Sent from the R help mailing list archive at Nabble.com.


__
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] match matrices of different lengths

2012-01-05 Thread Justin Haynes
see ?merge

 merge(xx,aa,by.x='x',by.y='a')
x   y   b
1 2.00112e+11 1.0 1.2
2 2.00112e+11 1.1 1.9

making the two matricies time series does not mean that R knows that the
first column is a datetime.
and depending on your desired result, that may not be important.

hope that helps,

Justin


On Thu, Jan 5, 2012 at 5:51 AM, Thijs vanden Bergh 
bergh.thijsvan...@gmail.com wrote:

 was trying to match different matrices of different lengths with in
 the first collumn date and time info (yearmonthdayhourminute). the
 routine needs to return NA´s where data  of either of the matrices is
 non existent.

 have been trying the following:

 
 x - c(200112030003, 200112030004, 200112030005, 200112030006)
 y - c(0.1, 1, 1.1, 1.5)
 a - c(200112030004, 200112030005, 200112030007, 200112030008,
 200112030009)
 b - c(1.2, 1.9, 2.0, 2.5, 2.1)

 xx - cbind(x, y)
 aa - cbind(a, b)

 xxnew - ts(xx)
 aanew - ts(aa)

 cc - ts.union(xxnew, aanew)
 cc
 

 this does however not give the wished for result as it simply cbinds
 the two matrices and filles up empty spots that are created due to the
 one matrix being shorter then the other at the bottom end of the
 shortest matrix. i realy want the routine to match matrix xx and aa
 to time in the first collumn of both matrices.

 any help towards this end would be much appreciated,

 th.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] ggplot2 - tricky problem

2012-01-05 Thread Justin Haynes
how bout:

dat-data.frame(id=1:4,city=c('berlin','munich'),likeability=c(5,4,6,5),uniqueness=c(3,4,4,4))

ggplot(ddply(melt(dat,
  id.vars=c('id','city')),
  .(variable,city),
  summarise,
  value=mean(value)),
  aes(x=factor(city),y=value)) +
geom_point() +
facet_wrap(~variable)

the line drawing is a bit more tricky...  Since the x values are factors
rather than continuous, fitting a line to them is kind of nonsense.  It
matters which order they are in for example.  If instead you want to plot
something like:

ggplot(dat,aes(x=likeability,y=uniqueness,colour=city))+geom_point()+geom_smooth(aes(group=city),method='lm')

You could draw fit lines that make a bit more sense.  Forgive me if I'm
over simplifying your problem!


Justin

On Thu, Jan 5, 2012 at 7:46 AM, Mario Giesel rr.gie...@yahoo.de wrote:

 Hello, R friends,

  I've been struggling quite a bit with ggplot2.
 Having worked through Hadleys book twice I still wonder how to solve this
 task.


 1. Short example Dataframe:

 idcityLikeabilityUniqueness
 1Berlin53
 2Munich44
 3Berlin64
 4Munich54

 2. Task:

 a) Facetting plots for each attitude (1 plot for likeability and
 uniqueness each, horizontally on one page)
 b) Showing Berlin and Munich together on x axis
 c) Showing the means of Berlin and Munich on y axis (means of cities in
 likeability on first plot, means of cities in uniqueness on second plot)
 d) Drawing a line through mean points on each plot



 Hope I could explain it understandably. Any help is appreciated!

 Thanks a lot,
  Mario

[[alternative HTML version deleted]]


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] [newbie] stack operations, or functions with side effects (or both)

2012-01-04 Thread Justin Haynes
do s[1] and s[-1] do what you're looking for?
those are just to display... if you want to change s, you need to reassign
it or fiddle with namespacing.  however, I'd say it is better to write R
code as though data structures are immutable until you explicitly re-assign
them rather than trying to deal with side effects and state...


 pop - function(vec){
+   print(vec[1])
+   print(vec[-1])
+   return(vec[-1])
+}
 s - 1:5
 s - pop(s)
[1] 1
[1] 2 3 4 5
 s
[1] 2 3 4 5



On Wed, Jan 4, 2012 at 1:22 PM, Tom Roche tom_ro...@pobox.com wrote:


 summary: Specifically, how does one do stack/FIFO operations in R?
 Generally, how does one code functions with side effects in R?

 details:

 I have been a coder for years, mostly using C-like semantics (e.g.,
 Java). I am now trying to become a scientist, and to use R, but I don't
 yet have the sense of good R and R idiom (i.e., expressions that are
 to R what (e.g.) the Schwartzian transform is to Perl).

 I have a data-assimilation problem for which I see a solution that
 wants a stack--or, really, just a pop(...) such that

 * s - c(1:5)
 * print(s)
 [1] 1 2 3 4 5
 * pop(s)
 [1] 1
 * print(s)
 [1] 2 3 4 5

 but in fact I get

  pop(s)
 Error: could not find function pop

 and Rseek'ing finds me nothing. When I try to write pop(...) I get

 pop1 - function(vector_arg) {
 +   length(vector_arg) - lv
 +   vector_arg[1] - ret
 +   vector_arg - vector_arg[2:lv]
 +   ret
 + }
 
  pop1(s)
 [1] 1
  print(s)
 [1] 1 2 3 4 5

 i.e., no side effect on the argument

 pop2 - function(vector_arg) {
 +   length(vector_arg) - lv
 +   vector_arg[1] - ret
 +   assign(vector_arg, vector_arg[2:lv])
 +   return(ret)
 + }
 
  pop2(s)
 [1] 1
  print(s)
 [1] 1 2 3 4 5

 ditto :-( What am I missing?

 * Is there already a stack API for R (which I would expect)? If so, where?

 * How to cause the desired side effect to the argument in the code above?

 TIA, Tom Roche tom_ro...@pobox.com

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Combining characters

2012-01-04 Thread Justin Haynes
apply(expand.grid(x, y, z, stringsAsFactors=F), 1, paste, collapse=' ')



On Wed, Jan 4, 2012 at 8:32 AM, jeremy jeremynamer...@gmail.com wrote:

 Hi all,

 I'm trying to combine exhaustively several character arrays in R like:
 x=c(one,two,three)
 y=c(yellow,blue,green)
 z=c(apple,cheese)

 in order to get concatenation of

 x[1] y[1] z[1]  (one yellow apple)
 x[1] y[1] z[2] (one yellow cheese)
 x[1] y[2] z[1](one blue apple)
 ...
 x[length(x)] y[length(y)] z[length(z)]  (three green cheese)

 Anyone has a solution ?
 Thank in advance

 --
 View this message in context:
 http://r.789695.n4.nabble.com/Combining-characters-tp4261888p4261888.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] a quick question about rbinom

2012-01-04 Thread Justin Haynes
homework or not,

?rbinom

should be plenty.




On Wed, Jan 4, 2012 at 1:38 PM, lynn.tsai vernal@gmail.com wrote:

 Hello, I have the following code using rbinom, but I don't understand what
 *+1* means in the code. Could someone help? Thanks so much,

  X1-c(A,B)[rbinom(n,1,0.6)+1]
  X2-c(C,D)[rbinom(n,1,0.1)+1]

 --
 View this message in context:
 http://r.789695.n4.nabble.com/a-quick-question-about-rbinom-tp4262977p4262977.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Applyiing mode() or class() to each column of a data.frame XXXX

2011-12-30 Thread Justin Haynes
there is also colwise in the plyr package.

 library(plyr)
 colwise(class)(data6)
  v13 v14   v15 f4 v16
1 integer numeric character factor logical


Justin


On Thu, Dec 29, 2011 at 4:47 PM, Jean V Adams jvad...@usgs.gov wrote:

 Dan Abner wrote on 12/29/2011 06:13:11 PM:

  Hi everyone,
 
  I am attempting to use the apply() function to obtain the mode and class
 of
  each column in a data frame, however, I am encountering unexpected
 results.
  I have the following example data:
 
 
  v13-1:6
  v14-c(1,2,3,3,NA,1)
  v15-c(Good,Bad,NA,Good,Bad,Bad)
  f4-factor(rep(c(Blue,Red,Green),2))
  v16-c(F,T,F,F,T,F)
  data6-data.frame(v13,v14,v15,f4,v16)
  data6
 
 
  Here is my function definition:
 
 
  contents-function(x){
   output-data.frame(Varnum=1:ncol(x),
Name=names(x),
Mode=apply(x,2,mode),
Class=apply(x,2,class))
   print(output)
  }


 Use sapply() instead of apply().  In the help file for apply() it says: 
 If X is not an array but an object of a class with a non-null dim value
 (such as a data frame), apply attempts to coerce it to an array via
 as.matrix if it is two-dimensional (e.g., a data frame) or via as.array.
 This coercion to a matrix might be causing the unexpected result. sapply()
 and lapply() are designed specifically for lists (which a data frame is).
 I also simplified the function a bit ...

 contents-function(x){
data.frame(Varnum=1:ncol(x), Name=names(x),
Mode=sapply(x,mode), Class=sapply(x,class))
}

 Jean


  
 
  When I call the function, I obtain the following:
 
 
   contents(data6)
  Varnum Name  Mode Class
  v13  1  v13 character character
  v14  2  v14 character character
  v15  3  v15 character character
f4   4   f4 character character
  v16  5  v16 character character
 
  =
 
  Any help is appreciated.
 
  Thank you,
 
  Dan
 
 [[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.

[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Help with code

2011-12-20 Thread Justin Haynes
the short answer... which is a guess cause you didn't provide a
reproducible example... is:

your column (i think its called t1d_ptype[1:25]) is a factor and using
factors is dangerous at best.

you can check with ?str.

see ?factor for how to convert back to strings and see if your code works.



to answer your second question, yes I'm sure there is a better simple way
to do this, but i can't follow what you're doing... for example, I don't
know what c1 is...

but, the place I would look is at the plyr package.  its excellent at
splitting and reordering data.


and one final note, you should avoid naming things with pre-existing R
functions (e.g. data).

Justin


On Tue, Dec 20, 2011 at 11:14 AM, 1Rnwb sbpuro...@gmail.com wrote:

 hello gurus,

 i have a data frame like this
   HTN HTN_FDR Dyslipidemia CAD t1d_ptype[1:25]
 1Y   YY T1D
 2   T1D
 3  Ctrl_FDR
 4   T1D
 5Y Ctrl
 6  Ctrl
 7  Ctrl_FDR
 8   T1D
 9YY T1D
 10  T1D
 11 Ctrl_FDR
 12   YY T1D
 13   Y   YY T1D
 14  T1D
 15 Ctrl
 16 Ctrl
 17 Ctrl_FDR
 18  T1D
 19  T1D
 20   Y  T1D
 21 Ctrl_FDR
 22 Ctrl_FDR
 23 Ctrl
 24 Ctrl
 25  T1D

 i am converting it to define the groups more uniformly using this code:

 for( i in 1:dim(c1)[1])
 {
  num_comp-0
  for (j in 1:dim(c1)[2])
 if (c1[i,j]==2) num_comp=num_comp+1  #Y=2
  for (j in 1:dim(c1)[2])
if(num_comp0)
{
  if (data$t1d_ptype[i] == T1D  c1[i ,j] == 2) c2[i,j]-T1D_w
if (data$t1d_ptype[i] == T1D  c1[i, j] == 1)  c2[i,j]-T1D_oc
if(substr(data$t1d_ptype[i],1,4) == Ctrl  c1[i,j] == 2)
 c2[i,j]-Ctrl_w
if (substr(data$t1d_ptype[i],1,4) == Ctrl  c1[i,j] == 1)
 c2[i,j]-Ctrl_oc
  }
  else
   {
if(data$t1d_ptype[i] == T1D) c2[i,j]-T1D_noc
if(substr(data$t1d_ptype[i],1,4) == Ctrl) c2[i,j]-Ctrl_noc
   }
 }

 it is giving me error
 In `[-.factor`(`*tmp*`, iseq, value = structure(c(NA,  ... :
  invalid factor level, NAs generated

 Also it there a simple way to do this.
 Thanks
 Sharad

 --
 View this message in context:
 http://r.789695.n4.nabble.com/Help-with-code-tp4218989p4218989.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Help with code

2011-12-20 Thread Justin Haynes
Fair enough and good point.  How about, dangerous when used unknowingly!


On Tue, Dec 20, 2011 at 1:01 PM, William Dunlap wdun...@tibco.com wrote:

 Re
  your column (i think its called t1d_ptype[1:25]) is a factor and using
  factors is dangerous at best.

 This depends on how you want to define dangerous.  If t1d_ptype ought
 take values from a certain set of strings then making it a factor gives
 you some safety, since it warns you when you go outside of that set and
 try to give it an illegal value.  E.g.,
 sex - factor(c(M,F,F), levels=c(F, M))
 sex[2] - no
Warning message:
In `[-.factor`(`*tmp*`, 2, value = no) :
   invalid factor level, NAs generated

 It does take more work to set up, since you need to enumerate the set
 of good strings.  That is tedium, not danger.

 If t1d_ptype might take any value, then make it a character vector.

 Bill Dunlap
 Spotfire, TIBCO Software
 wdunlap tibco.com

  -Original Message-
  From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
 On Behalf Of Justin Haynes
  Sent: Tuesday, December 20, 2011 11:54 AM
  To: 1Rnwb
  Cc: r-help@r-project.org
  Subject: Re: [R] Help with code
 
  the short answer... which is a guess cause you didn't provide a
  reproducible example... is:
 
  your column (i think its called t1d_ptype[1:25]) is a factor and using
  factors is dangerous at best.
 
  you can check with ?str.
 
  see ?factor for how to convert back to strings and see if your code
 works.
 
 
 
  to answer your second question, yes I'm sure there is a better simple way
  to do this, but i can't follow what you're doing... for example, I don't
  know what c1 is...
 
  but, the place I would look is at the plyr package.  its excellent at
  splitting and reordering data.
 
 
  and one final note, you should avoid naming things with pre-existing R
  functions (e.g. data).
 
  Justin
 
 
  On Tue, Dec 20, 2011 at 11:14 AM, 1Rnwb sbpuro...@gmail.com wrote:
 
   hello gurus,
  
   i have a data frame like this
 HTN HTN_FDR Dyslipidemia CAD t1d_ptype[1:25]
   1Y   YY T1D
   2   T1D
   3  Ctrl_FDR
   4   T1D
   5Y Ctrl
   6  Ctrl
   7  Ctrl_FDR
   8   T1D
   9YY T1D
   10  T1D
   11 Ctrl_FDR
   12   YY T1D
   13   Y   YY T1D
   14  T1D
   15 Ctrl
   16 Ctrl
   17 Ctrl_FDR
   18  T1D
   19  T1D
   20   Y  T1D
   21 Ctrl_FDR
   22 Ctrl_FDR
   23 Ctrl
   24 Ctrl
   25  T1D
  
   i am converting it to define the groups more uniformly using this code:
  
   for( i in 1:dim(c1)[1])
   {
num_comp-0
for (j in 1:dim(c1)[2])
   if (c1[i,j]==2) num_comp=num_comp+1  #Y=2
for (j in 1:dim(c1)[2])
  if(num_comp0)
  {
if (data$t1d_ptype[i] == T1D  c1[i ,j] == 2)
 c2[i,j]-T1D_w
  if (data$t1d_ptype[i] == T1D  c1[i, j] == 1)
  c2[i,j]-T1D_oc
  if(substr(data$t1d_ptype[i],1,4) == Ctrl  c1[i,j] == 2)
   c2[i,j]-Ctrl_w
  if (substr(data$t1d_ptype[i],1,4) == Ctrl  c1[i,j] == 1)
   c2[i,j]-Ctrl_oc
}
else
 {
  if(data$t1d_ptype[i] == T1D) c2[i,j]-T1D_noc
  if(substr(data$t1d_ptype[i],1,4) == Ctrl)
 c2[i,j]-Ctrl_noc
 }
   }
  
   it is giving me error
   In `[-.factor`(`*tmp*`, iseq, value = structure(c(NA,  ... :
invalid factor level, NAs generated
  
   Also it there a simple way to do this.
   Thanks
   Sharad
  
   --
   View this message in context:
   http://r.789695.n4.nabble.com/Help-with-code-tp4218989p4218989.html
   Sent from the R help mailing list archive at Nabble.com.
  
   __
   R-help@r-project.org mailing list
   https://stat.ethz.ch/mailman/listinfo/r-help
   PLEASE do read the posting guide
   http://www.R-project.org/posting-guide.html
   and provide commented, minimal, self-contained, reproducible code.
  
 
[[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http

Re: [R] how to manually enter an double quote as data feed?

2011-12-13 Thread Justin Haynes
\ is how its displayed on the screen.  however, if you write your object
to a csv it will be correct.  r cant display  as it is so it is escaping
the second double quote for you

however, ' (double quote single quote double quote) does display
correctly as well as save correctly.

If that doesn't answer your question, some more back story on what you're
trying to do would help.

Justin

On Tue, Dec 13, 2011 at 2:03 PM, bonnieyuan bby2...@columbia.edu wrote:

 I'm doing a text mining project where I have to manually enter a double
 quote
 as an element inside a vector.

 I tried

 char[10]=''#where i enclosed the double quote in a pair of single quotes.

 But the result is [1] \. Somehow a back slash is added automatically.

 I also tried to enclose the double quote in a pair of double quotes. That
 didn't work either.

 I'm using Mac and latest release of R.

 Thank you!

 Bonnie Yuan


 --
 View this message in context:
 http://r.789695.n4.nabble.com/how-to-manually-enter-an-double-quote-as-data-feed-tp4192283p4192283.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] using sample

2011-12-07 Thread Justin Haynes
Emma,

If you haven't spent much time on the r-help forums, please do read the
posting guide.

You need to provide reproducible examples for us to help you.

We don't know anything about your data...

what is event.details, (if you can't provide the data often ?str will do)

since I don't know what event.details is, I can't figure out waht the line:


obs = (1:133429)[event.details[,2] == i]

is supposed to do.

But if I had to guess... ?sample says it expects the first argument as a
vector.  I assume obs is not a vector but a larger structure?

Feel free to post more info about your data (see ?str and ?dput) or if you
can generate made up data that replicates your problem that works too.


Justin


On Wed, Dec 7, 2011 at 9:16 AM, bevare emma.ra...@jbaconsulting.co.ukwrote:

 Hi,

 Can anyone help sort out the problem with the following script - I am a R
 newbie and I am self taught.

 obs.all = c()
 for(i in 1:386){
  if (n.sim[i]0){
obs = (1:133429)[event.details[,2] == i]
obs.all = c(obs.all, sample(obs[obs  n.sim[i]], size = n.sim[i],
 replace=T))
}

 Basically, in the sample bit, I only want to get obs.all if the value of
 obs
 is less than the value of n.sim[i]. I get the error message

 Error in sample(obs[obs  n.sim[i]], size = n.sim[i], replace = T) :
  invalid first argument

 length(n.sim)  is 386

 Thanks in advance for your suggestions

 Emma







 --
 View this message in context:
 http://r.789695.n4.nabble.com/using-sample-tp4169747p4169747.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] adding hyperlinked text to pdf plot

2011-12-05 Thread Justin Fincher
Howdy,
   I have read that if you put a URL in the text of a plot being saved
into pdf, the result is a functional hyperlink. I am interested in
having text in a plot that is linked to a URL, but I would like the
text to be something other than the URL. Is this possible? Thank you.

- Fincher

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] adding hyperlinked text to pdf plot

2011-12-05 Thread Justin Fincher
For example, say I am plotting some data that is genomic and therefore
maps to a specific locus on the human genome, like a gene.  I was
hoping to have the title of the plot display the gene name, but have
it be a link so that clicking on it would take you to those
coordinates on a public browser, like USCS's genome browser.  So
basically, I was hoping to have text in a plot generated by R function
as a normal html-style link.

- Fincher


On Mon, Dec 5, 2011 at 14:09, Yihui Xie x...@yihui.name wrote:
 It seems I missed the context of this post -- who is you, and what
 is something other than the URL?

 I feel the tikzDevice package should be an option for the task.

 Regards,
 Yihui
 --
 Yihui Xie xieyi...@gmail.com
 Phone: 515-294-2465 Web: http://yihui.name
 Department of Statistics, Iowa State University
 2215 Snedecor Hall, Ames, IA



 On Mon, Dec 5, 2011 at 12:39 PM, Justin Fincher finc...@cs.fsu.edu wrote:
 Howdy,
   I have read that if you put a URL in the text of a plot being saved
 into pdf, the result is a functional hyperlink. I am interested in
 having text in a plot that is linked to a URL, but I would like the
 text to be something other than the URL. Is this possible? Thank you.

 - Fincher

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

 --
 This message has been scanned for viruses and
 dangerous content by MailScanner, and is
 believed to be clean.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] hour in x-axis

2011-11-29 Thread Justin Haynes
without knowing much about your data or the base plotting...

I'd use the library ggplot2.

First, you'll need to format your dates to POSIXct

AggData$time - as.POSIXct(AggData$time,format='%H:%M')

Then plotting is trivial.
ggplot(AggData,aes(x=time,y=value))+geom_points()

or +geom_line() if you'd rather.


Hope that helps,

Justin

On Tue, Nov 29, 2011 at 10:07 AM, threshold r.kozar...@gmail.com wrote:


 Dear R useres, got the following problem. Given the AggData (listed below)
 I need to plot AggData[,2] vs time (AggData[,1]) for chosen 'rows'. Ive
 done
 already:

 plot(AggData[rows,2], xaxt='n')
 axis(1,at=seq(1,length(rows),1),sub(,, AggData[rows,1]))

 which works, but I need to list only chosen data points, say full hours or
 every 60th point, something like:

 axis(1,at=seq(1,seq(1,length(rows),60)),sub(, ,
 AggData[day.rows[seq(1,length(rows),60)],2]))

 but does not work. Could be nice if time on the x-axis is in H:m format (no
 seconds).

 In the original data time bout is 1 minute, e.g. 17:19:35, 17:20:35,
 17:21:35 . Taken every 100th for brevity yields

  (AggData[seq(1,length(rows),100),c(2,7)])

  time value
 117:19:3580.68327
 101  18:59:3580.97230
 201  20:39:3578.30810
 301  22:19:3580.41558
 401  23:59:3577.01051
 501  01:39:3577.19687
 601  03:19:3578.20762
 701  04:59:3577.13315
 801  06:39:3576.29110
 901  08:19:3575.32090
 1001 09:59:3585.32890
 1101 11:39:3579.86978
 1201 13:19:3583.32418
 1301 14:59:3578.26018
 1401 16:39:3579.06434


 Thanks in advance.
 Best, robert




 --
 View this message in context:
 http://r.789695.n4.nabble.com/hour-in-x-axis-tp4120142p4120142.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] aggregate syntax for grouped column means

2011-11-29 Thread Justin Haynes
look at just your data that is in that first id category and I bet you can
figure it out!

 myData[myData$id=='0m11',]
var1  var2   id
10 30.79 32.15 0m11
11 30.79 32.39 0m11
12 30.94NA 0m11

aggregate performs the na.rm step on the entire row thus, a mean of 30.79.
 data.table and plyr perform the na.rm on each column.


Justin

On Tue, Nov 29, 2011 at 12:21 PM, Juliet Hannah juliet.han...@gmail.comwrote:

 I am calculating the mean of each column grouped by the variable 'id'.
 I do this using aggregate, data.table, and plyr. My aggregate results
 do not match the other two, and I am trying to figure out what is
 incorrect with my syntax. Any suggestions? Thanks.

 Here is the data.

 myData - structure(list(var1 = c(31.59, 32.21, 31.78, 31.34, 31.61, 31.61,
 30.59, 30.84, 30.98, 30.79, 30.79, 30.94, 31.08, 31.27, 31.11,
 30.42, 30.37, 30.29, 30.06, 30.3, 30.43, 30.61, 30.64, 30.75,
 30.39, 30.1, 30.25, 31.55, 31.96, 31.87, 30.29, 30.15, 30.37,
 29.59, 29.52, 28.96, 29.69, 29.58, 29.52, 30.21, 30.3, 30.25,
 30.23, 30.29, 30.39), var2 = c(33.78, 33.25, NA, 32.05, 32.59,
 NA, 32.24, NA, NA, 32.15, 32.39, NA, 32.4, 31.6, NA, 30.5, 30.66,
 NA, 30.6, 29.95, NA, 31.24, 30.73, NA, 30.51, 30.43, 31.17, 31.44,
 31.17, 31.18, 31.01, 30.98, 31.25, 30.44, 30.47, NA, 30.47, 30.56,
 NA, 30.6, 30.57, NA, 31, 30.8, NA), id = c(0m4, 0m4, 0m4,
 0m5, 0m5, 0m5, 0m6, 0m6, 0m6, 0m11, 0m11, 0m11,
 0m12, 0m12, 0m12, 205m1, 205m1, 205m1, 205m4, 205m4,
 205m4, 205m5, 205m5, 205m5, 205m6, 205m6, 205m6,
 205m7, 205m7, 205m7, 600m1, 600m1, 600m1, 600m3,
 600m3, 600m3, 600m4, 600m4, 600m4, 600m5, 600m5,
 600m5, 600m7, 600m7, 600m7)), .Names = c(var1, var2,
 id), row.names = c(NA, -45L), class = data.frame)

  head(myData)
   var1  var2  id
 1 31.59 33.78 0m4
 2 32.21 33.25 0m4
 3 31.78NA 0m4
 4 31.34 32.05 0m5
 5 31.61 32.59 0m5
 6 31.61NA 0m5



 results1 - aggregate(. ~  id ,data=myData,FUN=mean,na.rm=T)
  head(results1,1)
 #id  var1  var2
 # 1 0m11 30.79 32.27

 library(data.table)
 mydt - data.table(myData)
 setkey(mydt,id)
 results2 - mydt[,lapply(.SD,mean,na.rm=TRUE),by=id]
  head(results2,1)
 #   id  var1  var2
 # [1,] 0m11 30.84 32.27

 library(plyr)
 results3 - ddply(myData,.(id),colwise(mean),na.rm=TRUE)
  head(results3,1)
 #id  var1  var2
 # 1 0m11 30.84 32.27

  sessionInfo()
 R version 2.14.0 (2011-10-31)
 Platform: i386-pc-mingw32/i386 (32-bit)

 locale:
 [1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United
 States.1252LC_MONETARY=English_United States.1252 LC_NUMERIC=C
 [5] LC_TIME=English_United States.1252

 attached base packages:
 [1] stats graphics  grDevices utils datasets  methods   base

 other attached packages:
 [1] plyr_1.6 data.table_1.7.3

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] tip: large plots

2011-11-18 Thread Justin Haynes
Very cool.  Sadly, as far as I can tell, it doesn't work with ggplot though
:(


 x-runif(1e6)
 y-runif(1e6)
 system.time(plot(x,y,pch='.'))
   user  system elapsed
  0.824   0.012   0.845
 system.time(plot(x,y))
   user  system elapsed
 33.422   0.016  33.545
 system.time(print(qplot(x,y)))
   user  system elapsed
 45.142   0.228  45.687
 system.time(print(qplot(x,y,pch='.')))
   user  system elapsed
 47.483   1.060  49.040
 system.time(print(qplot(x,y,shape='.')))
   user  system elapsed
 44.807   0.689  45.710


On Fri, Nov 18, 2011 at 11:03 AM, Sarah Goslee sarah.gos...@gmail.comwrote:

 Hi all,

 I'm working with a bunch of large graphs, and stumbled across
 something useful. Probably many of you know this, but I didn't and so
 others might benefit.

 Using pch=. speeds up plotting considerably over using symbols.

  x - runif(100)
  y - runif(100)
  system.time(plot(x, y, pch=.))
   user  system elapsed
  1.042   0.030   1.077
  system.time(plot(x, y))
   user  system elapsed
  37.865   0.033  38.122

 If you have enough points, the result is also more legible.

 Choice of which pch symbol makes a difference too, the default pch=1 being
 the slowest of what I tried, but . is by far the speediest.

  system.time(plot(x, y, pch=0))
   user  system elapsed
  11.191   0.011  11.270
  system.time(plot(x, y, pch=1))
   user  system elapsed
  38.024   0.008  38.245
  system.time(plot(x, y, pch=2))
   user  system elapsed
  14.140   0.027  14.270
  system.time(plot(x, y, pch=3))
   user  system elapsed
  15.696   0.011  15.799
  system.time(plot(x, y, pch=4))
   user  system elapsed
  18.770   0.007  18.888

 This is a vanilla R session, 2.13.1 for x86_64-redhat-linux-gnu. I
 haven't tried it on any other OS, but it's making my life a lot
 smoother right now.

 Sarah

 --
 Sarah Goslee
 http://www.functionaldiversity.org

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] tip: large plots

2011-11-18 Thread Justin Haynes
That is a function I did not know about, thanks Hadley!

I still don't see the speed increase that you do with the base plot
package, but I'm sticking with ggplot anyway!

 x-runif(1e6)
 y-runif(1e6)
 system.time(print(qplot(x,y)))
   user  system elapsed
 42.234   0.520  43.061
 system.time(print(qplot(x,y,pch=I('.'
   user  system elapsed
 32.370   0.204  33.868


On Fri, Nov 18, 2011 at 12:39 PM, Hadley Wickham had...@rice.edu wrote:

 You need: system.time(print(qplot(x,y,pch=I('.'

 Hadley

 On Fri, Nov 18, 2011 at 1:30 PM, Justin Haynes jto...@gmail.com wrote:
  Very cool.  Sadly, as far as I can tell, it doesn't work with ggplot
 though
  :(
 
 
  x-runif(1e6)
  y-runif(1e6)
  system.time(plot(x,y,pch='.'))
user  system elapsed
   0.824   0.012   0.845
  system.time(plot(x,y))
user  system elapsed
   33.422   0.016  33.545
  system.time(print(qplot(x,y)))
user  system elapsed
   45.142   0.228  45.687
  system.time(print(qplot(x,y,pch='.')))
user  system elapsed
   47.483   1.060  49.040
  system.time(print(qplot(x,y,shape='.')))
user  system elapsed
   44.807   0.689  45.710
 
 
  On Fri, Nov 18, 2011 at 11:03 AM, Sarah Goslee sarah.gos...@gmail.com
 wrote:
 
  Hi all,
 
  I'm working with a bunch of large graphs, and stumbled across
  something useful. Probably many of you know this, but I didn't and so
  others might benefit.
 
  Using pch=. speeds up plotting considerably over using symbols.
 
   x - runif(100)
   y - runif(100)
   system.time(plot(x, y, pch=.))
user  system elapsed
   1.042   0.030   1.077
   system.time(plot(x, y))
user  system elapsed
   37.865   0.033  38.122
 
  If you have enough points, the result is also more legible.
 
  Choice of which pch symbol makes a difference too, the default pch=1
 being
  the slowest of what I tried, but . is by far the speediest.
 
   system.time(plot(x, y, pch=0))
user  system elapsed
   11.191   0.011  11.270
   system.time(plot(x, y, pch=1))
user  system elapsed
   38.024   0.008  38.245
   system.time(plot(x, y, pch=2))
user  system elapsed
   14.140   0.027  14.270
   system.time(plot(x, y, pch=3))
user  system elapsed
   15.696   0.011  15.799
   system.time(plot(x, y, pch=4))
user  system elapsed
   18.770   0.007  18.888
 
  This is a vanilla R session, 2.13.1 for x86_64-redhat-linux-gnu. I
  haven't tried it on any other OS, but it's making my life a lot
  smoother right now.
 
  Sarah
 
  --
  Sarah Goslee
  http://www.functionaldiversity.org
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 
 [[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 



 --
 Assistant Professor / Dobelman Family Junior Chair
 Department of Statistics / Rice University
 http://had.co.nz/


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] apply on rows and columns?

2011-11-16 Thread Justin Haynes
To expand on what Sarah and Michael said:

if you have a 3d array:

 x-array(1:4,c(2,2,4))
 x
, , 1

 [,1] [,2]
[1,]13
[2,]24

, , 2

 [,1] [,2]
[1,]13
[2,]24

, , 3

 [,1] [,2]
[1,]13
[2,]24

, , 4

 [,1] [,2]
[1,]13
[2,]24

 apply(x,c(1,2),sum)
 [,1] [,2]
[1,]4   12
[2,]8   16

a margin of c(1,2) makes more sense.  Hope that clarifies things.


Justin

On Wed, Nov 16, 2011 at 12:18 PM, Sarah Goslee sarah.gos...@gmail.com wrote:
 Hi,

 On Wed, Nov 16, 2011 at 3:13 PM,  rkevinbur...@charter.net wrote:

 I have the following scenario:

 m - matrix(1:4, ncol=2)
 m
      [,1] [,2]
 [1,]    1    3
 [2,]    2    4
 apply(m, 2, sum)
 [1] 3 7
 apply(m, 1, sum)
 [1] 4 6

 So I can apply to rows *or* columns. According to the documentation
 (?apply)

 MARGIN a vector giving the subscripts which the function will be applied
 over. E.g., for a matrix 1 indicates rows, 2 indicates columns, c(1, 2)
 indicates rows and columns. Where X has named dimnames, it can be a
 character vector selecting dimension names.


 But I get the following results:

 apply(m, c(1,2), sum)
      [,1] [,2]
 [1,]    1    3
 [2,]    2    4

 How am I to interpret this result?

 I'm pretty sure R is taking the sum of m[1,1] and putting it [1,1],
 and the sum of m[1,2] and putting it in [1,2] and so on. You
 instructed apply() to work on rows and columns *simultaneously*,
 rather than sequentially.

 apply() on c(1,2) is useful if you have a matrix that's three-dimensional,
 but not so much if it's two dimensional.

 What are you trying to accomplish?

 Sarah




 --
 Sarah Goslee
 http://www.functionaldiversity.org

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Extract pattern from string

2011-11-15 Thread Justin Haynes
take a look at the structure of what Sys.time returns.

str(Sys.time)

and now at ?strptime!

 format(Sys.time(),format='%d-%H-%M-%S')
[1] 15-09-55-55

 format(Sys.time(),format='%Y')
[1] 2011
 format(Sys.time(),format='%m')
[1] 11



Hope that helps,

Justin

On Tue, Nov 15, 2011 at 9:48 AM, syrvn ment...@gmx.net wrote:
 Hello,

 with Sys.time() you get the following string:

 2011-11-15 16:25:55 GMT

 How can I extract the following substrings:

 year - 2011

 month - 11

 day_time - 15_16_25_55


 Cheers,

 Syrvn

 --
 View this message in context: 
 http://r.789695.n4.nabble.com/Extract-pattern-from-string-tp4073432p4073432.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Create design matrix

2011-11-03 Thread Justin Haynes
?expand.grid

 expand.grid(c(M,F),c(Y,O))
  Var1 Var2
1MY
2FY
3MO
4FO



Justin

On Thu, Nov 3, 2011 at 10:56 AM, Bond, Stephen stephen.b...@cibc.com wrote:
 Greetings useRs,

 What is the easiest way to create a design matrix of several factor 
 variables? Function gendata in Design seems to do that for a fitted model, 
 but how to do that only on several factor vectors??

 The result should be a df with one row for each distinct combination of 
 levels of factors eg for (M,F) (Y,O)
 We get
 M Y
 M O
 F Y
 F O

 In reality I will have more than 1000 rows so doing by hand not good.
 Maybe there is a way with outer, but I couldn't see it.
 All the best to everybody.

 Stephen

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] mysterious warning message regarding bytecode...

2011-11-02 Thread Justin Haynes
While running a long script which source()s other scripts I get the
following warning:

Warning message:
In t(object$S[[1]]) : bytecode version mismatch; using eval


I cannot replicate it if I run the sourced files line by line though...

What is that error?  And do I care about it?  It doesn't seem to
affect my output as far as I can tell.


Thanks!
Justin


 sessionInfo()
R version 2.13.2 (2011-09-30)
Platform: x86_64-pc-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=C  LC_MESSAGES=en_US.UTF-8
LC_PAPER=en_US.UTF-8   LC_NAME=C
 [9] LC_ADDRESS=C   LC_TELEPHONE=C
LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] grid  stats graphics  grDevices utils datasets
methods   base

other attached packages:
 [1] mgcv_1.7-9stringr_0.5   RPostgreSQL_0.2-0 biglm_0.8
  DBI_0.2-5 doMC_1.2.3multicore_0.1-7
 [8] foreach_1.3.2 codetools_0.2-8   iterators_1.0.5
cairoDevice_2.19  pixmap_0.4-11 gridExtra_0.8.5   splancs_2.01-29
[15] sp_0.9-91 ellipse_0.3-5 ggplot2_0.8.9
proto_0.3-9.2 reshape_0.8.4 plyr_1.6  MASS_7.3-14

loaded via a namespace (and not attached):
[1] compiler_2.13.2 digest_0.5.1lattice_0.19-33 Matrix_1.0-1nlme_3.1-102

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] factor level issue after subsetting

2011-11-01 Thread Justin Haynes
first of all, the subsetting line is overly complicated.

dat.sub-dat[dat$treat!='cont',]

will work just fine.  R does exactly what you're describing.  It knows
the levels of the factor.  Once you remove 'cont' from the data, that
doesn't mean that the level is removed from the factor:

 df-data.frame(let=factor(sample(letters[1:5],100,replace=T)),num=rnorm(100))
 str(df)
'data.frame':   100 obs. of  2 variables:
 $ let: Factor w/ 5 levels a,b,c,d,..: 1 5 1 4 3 5 2 2 1 3 ...
 $ num: num  0.224 -0.523 0.974 -0.268 -0.61 ...

 df.sub-df[df$let!='a',]
 str(df.sub)
'data.frame':   82 obs. of  2 variables:
 $ let: Factor w/ 5 levels a,b,c,d,..: 5 4 3 5 2 2 3 3 5 3 ...
 $ num: num  -0.523 -0.268 -0.61 -1.383 -0.193 ...

 unique(df.sub$let)
[1] e d c b
Levels: a b c d e

 df.sub$let-factor(df.sub$let)
 unique(df.sub$let)
[1] e d c b
Levels: e d c b

 str(df.sub$let)
 Factor w/ 4 levels e,d,c,b: 1 2 3 1 4 4 3 3 1 3 ...


by redefining your factor you can eliminate the problem.  the other
option, if you don't want factors to begin with is:

options(stringsAsFactors=FALSE)  # to set the global option

or

dat-read.csv(~/MyFiles/data.csv,stringsAsFactors=FALSE)  # to set
the option locally for this single read.csv call.


On Tue, Nov 1, 2011 at 2:28 PM, Schreiber, Stefan
stefan.schrei...@ales.ualberta.ca wrote:
 Dear list,

 I cannot figure out why, after sub-setting my data, that particular item
 which I don't want to plot is still in the newly created subset (please
 see example below). R somehow remembers what was in the original data
 set. A work around is exporting and importing the new subset. Then it's
 all fine; but I don't like this idea and was wondering what am I missing
 here?

 Thanks!
 Stefan

 P.S. I am using R 2.13.2 for Mac.

 dat-read.csv(~/MyFiles/data.csv)
 class(dat$treat)
 [1] factor
 dat
   treat yield
 1   cont  98.7
 2   cont  97.2
 3   cont  96.1
 4   cont  98.1
 5     10 103.0
 6     10 101.3
 7     10 102.1
 8     10 101.9
 9     30 121.1
 10    30 123.1
 11    30 119.7
 12    30 118.9
 13    60 109.9
 14    60 110.1
 15    60 113.1
 16    60 112.3
 plot(dat$treat,dat$yield)
 dat.sub-dat[which(dat$treat!='cont')]
 class(dat.sub$treat)
 [1] factor
 dat.sub
   treat yield
 5     10 103.0
 6     10 101.3
 7     10 102.1
 8     10 101.9
 9     30 121.1
 10    30 123.1
 11    30 119.7
 12    30 118.9
 13    60 109.9
 14    60 110.1
 15    60 113.1
 16    60 112.3
 plot(dat.sub$treat,dat.sub$yield)

        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


  1   2   3   >