date:20120423

Hi

I am looking for a efficient way to estimate all parameters in your
data.frame set using a specific function:

for example 
ln(T)=b_0 + b_1*ln(Y_i*Y_j) + b_2*ln()+ ... + etc.

Thanks,
Ph

--
View this message in context: 
http://r.789695.n4.nabble.com/OLS-Estimating-tp4580055p4580055.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] OLS Estimating

2012-04-23 Thread Uwe Ligges




On 23.04.2012 12:42, phillip03 wrote:

Hi

I am looking for a efficient way to estimate all parameters in your
data.frame set using a specific function:

for example
ln(T)=b_0 + b_1*ln(Y_i*Y_j) + b_2*ln()+ ... + etc.



Sounds like you are looking for lm().

Uwe Ligges



Thanks,
Ph

--
View this message in context: 
http://r.789695.n4.nabble.com/OLS-Estimating-tp4580055p4580055.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] lm()

2012-04-23 Thread Stephen Sefick

Rpart has a built in cross-validation proceedure.  the xerror and so on 
are based on a ten fold cross validation, I believe.  Read the manual 
for more results.  If you plot the fit with post and then look at the 
postscript file it should have a misclassification rate under the 
terminal nodes.

HTH,

Stephen

On Mon 23 Apr 2012 02:58:37 AM CDT, Mariam wrote:

Hi, Jorge!
Help me please! I made a classification tree (rpart package) according to my 
train data and set of variables. How can I validate my test data? I want to 
check if the test data will classified properly by the same tree. Thanks
cheers
Maria


Tue, 10 Apr 2012 08:44:39 -0700 (PDT) Ð¾Ñ‚ Jorge I Velez [via 
R]ml-node+s789695n4546022...@n4.nabble.com:


  Hi Mariam,

Check out the ?poly function.

Best,
Jorge.-


On Tue, Apr 10, 2012 at 10:12 AM, Mariam  wrote:


People, help me please!
How to use lm() function to defind a cofficient for 7-polinom, and what
expression should I put in /formula/

--
View this message in context:
http://r.789695.n4.nabble.com/lm-tp4545740p4545740.html
Sent from the R help mailing list archive at Nabble.com.

__
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Â  Â  Â  Â  [[alternative HTML version deleted]]

__
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--


If you reply to this email, your message will be added to the discussion below: 
http://r.789695.n4.nabble.com/lm-tp4545740p4546022.html
  To unsubscribe from lm(), click here.
  NAML


--
View this message in context: 
http://r.789695.n4.nabble.com/lm-tp4545740p4579717.html
Sent from the R help mailing list archive at Nabble.com.
[[alternative HTML version deleted]]



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


--
Stephen Sefick
**
Auburn University
Biological Sciences
331 Funchess Hall
Auburn, Alabama
36849
**
sas0...@auburn.edu
http://www.auburn.edu/~sas0025
**

Let's not spend our time and resources thinking about things that are 
so little or so large that all they really do for us is puff us up and 
make us feel like gods.  We are mammals, and have not exhausted the 
annoying little problems of being mammals.


   -K. Mullis

A big computer, a complex algorithm and a long time does not equal 
science.


 -Robert Gentleman

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Survreg

2012-04-23 Thread Terry Therneau




On 04/22/2012 05:00 AM, r-help-requ...@r-project.org wrote:

I am trying to run Weibull PH model  in R.

Assume in the data set  I  have x1  a continuous variable and x2 a
categorical  variable with two classes (0= sick and 1= healthy).  I fit the
model in the following way.

Test=survreg(Surv(time,cens)~ x1+x2,dist=weibull)

My questions are

1. Is it Weibull PH model or Weibull AFT model?
Call:
survreg(formula = Surv(time, delta) ~ x1 + x2, data = nn,
 dist = weibull)
 Value Std. Error  z  p
(Intercept)   5.6521553.54e-02159.8   0.00e+00
x10.4925921.92e-0225.7   3.65e-145
x2   -0.0002125.64e-06  -37.60.00e+00
Log(scale)  -0.269219 1.57e-02  -17.1   1.24e-65
Scale= 0.764
The Weibull model can be veiwed as either.  The cumulative hazard for a 
Weibull is  t^p, viewed as an AFT model we have (at)^p [multiply time], 
viewed as PH we have a(t^p) [multiply the hazard].  The survreg routing 
uses the AFT parameterization found in Chapter 2 of Kalbfleisch and 
Prentice, The statistical analysis of failure time data.


 For the routine our multiplier a above is exp(X beta), for the usual 
reason that negative multipliers should be avoided -- it would 
correspond to time running backwards.  In the above x1 makes time run 
faster, x2 time run slower.

  Terry T

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to take ID of number 7.

2012-04-23 Thread Yellow

Thanks for saying. :) 

This morning I tried out some things with this code, and different variants
with it. 
Works nice. 

--
View this message in context: 
http://r.789695.n4.nabble.com/How-to-take-ID-of-number-7-tp4577998p4580364.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] PCA sensitive to outliers?

2012-04-23 Thread Igor Carron

I could not reply directly to the initial thread with the same title.

There are two sorts of Robust PCA, those that were devised before the
recent string of Low Rank approaches and then the new set of algorithms
that provide robust PCA in light of sparse but potentially large
errors/outliers (typically the sort of outliers that break normal PCA).
These recent algorithms initially come from some of the folks  involved in
compressive sensing.

I am keeping a list of all these new solvers here in the Matrix
Factorization Jungle Page @

https://sites.google.com/site/igorcarron2/matrixfactorizations

Most are written in Matlab and should not need be too difficult to
translate into R.

To get a sense of what these new Robust PCA techniques can do, a friend and
I apply on different YouTube videos, you can see some of the entries listed
here:

http://nuit-blanche.blogspot.com/p/its-cai-cable-and-igors-adventures-in.html





Igor Carron, Ph.D.
http://nuit-blanche.blogspot.com

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] save model summary

2012-04-23 Thread Eiko Fried

Hello,

I'm working with RStudio, which does not display enough lines in the
console that I can read the summary of my (due to the covariance-matrix
rather long) model. There are no ways around this, so I guess I need to
export the summary into a file in order to see it ...

I'm new to R, and R save model summary in google doesn't help, neither
does help(save) or help(write.csv). If I try the commands I get the
error:

write.csv(summary(m2),file=data.csv)
Error in as.data.frame.default(x[[i]], optional = TRUE) :
  cannot coerce class 'structure(mer, package = lme4)' into a
data.frame

Thanks for your help

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] How does survreg ordered factors vs not ordered factors?

2012-04-23 Thread wwreith

Consider the following generic code for a survival model

survobj-Surv(data$Time,data$Satisfactory)
survmodel-survreg(survobj~x1+x2+x3+x4+x5+x6, data=data, dist=weibull)
survsum-summary(survmodel)
survsum

My question: Does anyone know what exactly survreg() does differently if

x1-factor(data$x1, ordered=TRUE)
x2-factor(data$x2, ordered=TRUE)

vs. 

x1-factor(data$x1)
x2-factor(data$x2)

Thanks,

William

--
View this message in context: 
http://r.789695.n4.nabble.com/How-does-survreg-ordered-factors-vs-not-ordered-factors-tp4580395p4580395.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] save model summary

2012-04-23 Thread Berend Hasselman


On 23-04-2012, at 15:20, Eiko Fried wrote:

 Hello,
 
 I'm working with RStudio, which does not display enough lines in the
 console that I can read the summary of my (due to the covariance-matrix
 rather long) model. There are no ways around this, so I guess I need to
 export the summary into a file in order to see it ...
 
 I'm new to R, and R save model summary in google doesn't help, neither
 does help(save) or help(write.csv). If I try the commands I get the
 error:
 
 write.csv(summary(m2),file=data.csv)
 Error in as.data.frame.default(x[[i]], optional = TRUE) :
 cannot coerce class 'structure(mer, package = lme4)' into a
 data.frame

?sink
?capture.output

Berend

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] clusplot() for all used variables - not only for 2 principal components?

2012-04-23 Thread greatest.possible.newbie

Hi there,

To see the results of my clustering graphically I was using clusplot. But it
only provides a look at the two most important components of the dataset.
I recently found the Mclust() function which produces very nice colored pair
plots for the clustered dataset.
see  Graph: http://www.statmethods.net/advstats/images/cluster4.jpg
on Website: http://www.statmethods.net/advstats/cluster.html

I am looking for such a pair plot when I have done the clustering with other
clustering methods than model based ones. E.g. when I clustered simply with
hclust().

Can anyone give me a suggestion?
Many thanks!

--
View this message in context: 
http://r.789695.n4.nabble.com/clusplot-for-all-used-variables-not-only-for-2-principal-components-tp4580387p4580387.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R-help Digest, Vol 110, Issue 23

2012-04-23 Thread Terry Therneau

Yes, the (start, stop] formalism is the easiest way to deal with time
dependent data.

Each individual only needs to have sufficient data to describe them, so
for if id number 4 is in house 1, their housemate #1 was eaten at time
2, and the were eaten at time 10, the following is sufficient data for
that subject:

id house time1 time2 status discovered
4 10 20 false
4 12 10 1 true

We don't need observations for each intermediate time, only that from
0-2 they were not yet discovered and that from 2-10 they were. The
status variable tells whether an interval ended in disaster. Use
Surv((time1, time2, status) on the left side of the equation.

Since the time scale is discrete you should technically use
method='exact' in a Cox model, but the default Efron approximation will
be very close.

Interval censoring isn't necessary. You will have a model of time to
discovery instead of time to eaten, but with a fixed examination
schedule such as you have there is no information in the data to help
you move from one to the other. The standard interval approach would
just assume deaths happened at the midpoint between examinations.

Terry T.

On 04/21/2012 05:00 AM, r-help-requ...@r-project.org wrote:

Dear R users,

I fear this is terribly trivial but I'm struggling to get my head around it.

First of all, I'm using the survival package in R 2.12.2 on Windows Vista with the
RExcel plugin. You probably only need to know that I'm using survival for this.

I have data collected from 180 or so individuals that were checked 7 times
throughout a trial with set start and end times. Once the event happens (death
by predator) there are no more checks for that individual. This means that I
check on each individual up to 7 times with either an event recorded or the
final time being censored.

At the moment, I have a data sheet with one observation per individual; that is
either the event time (the observation time when the individual had had an
event) or the censored time. However, I'd like to add a time dependent factor
and I also wonder if this data should be treated as interval censored.

The time dependent factor is like this. The individuals are grouped in houses and once one individual in a group has
an event, it makes biological sense that the rest of them should be at greater risk, as the predator is likely to have discovered
the others in the house as well (the predator is able to consume many individuals). At the moment I'm coding this as
a normal two level factor (discovered) where all individuals alive after the first event in that house are TRUE and
the first individuals in a house to be eaten are FALSE. All individuals in houses that were not discovered at al are
also FALSEl. Obviously, all individuals that were eaten, were first discovered, then eaten. However, the first
individuals in a house to be eaten, had not been previously discovered by the predator (not observably so, anyway).

Should I write up this data set with a start and stop time for every check I
made so each individual has up to 7 records, one for each time I checked?

Is there a quick and easy way to do this in R or would I have to go through the
data set manually?

Does coding the discovered factor the way I have, make statistical sense?

Should I worry about proportional hazards of the discovered factor? It seems
to me that it would often turn out not proportional because of its nature.

Sorry, lots of stats questions. I don't mind if you don't answer all of these.
Just knowing how to best feed this data into R would help me no end. The rest I
can probably glean from the millions of survival analysis books I have lying
about.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] [BioC] Overlay Gene Expression on SNP (copy number) data

2012-04-23 Thread Paul Shannon

Another possibility for visual display of several kinds of data is RCytoscape, 
for which an example can be seen here:

http://rcytoscape.systemsbiology.net/versions/current/gallery/TCGA/subnet.TCGA.02.0014.png

This portrays
  1) gene expression (green: under-expression; red: over-expression) 
  2) copy number (blue: high, black: low)
  3) gene type (node shape: hexagons are kinases, arrows are ligands, diamonds 
are receptors, circles for everything else)
  4) mutation status (not SNPs, but non-synonymous amino acid substitutions)
  5) in the context of gene relationships, from KEGG

A vignette for a larger version of this network is nearly complete.  The 
Cytoscape network map is created from data and R code.

 - Paul

On Apr 23, 2012, at 5:15 AM, Steve Lianoglou wrote:

 Hi,
 
 On Mon, Apr 23, 2012 at 7:33 AM, Ekta Jain ekta_j...@jubilantbiosys.com 
 wrote:
 Hello,
 Can anyone please suggest any packages in R that can be used to overlay gene 
 expression data on SNP (affymetrix) copy number ?
 
 I guess you mean visually? If so, I'd suggest skimming through the
 vignettes of the following packages to see which one might suit you
 best:
 
 * Gviz
 * ggbio
 * GenomeGraphs
 
 -steve
 
 -- 
 Steve Lianoglou
 Graduate Student: Computational Systems Biology
  | Memorial Sloan-Kettering Cancer Center
  | Weill Medical College of Cornell University
 Contact Info: http://cbio.mskcc.org/~lianos/contact
 
 ___
 Bioconductor mailing list
 bioconduc...@r-project.org
 https://stat.ethz.ch/mailman/listinfo/bioconductor
 Search the archives: 
 http://news.gmane.org/gmane.science.biology.informatics.conductor

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How does survreg ordered factors vs not ordered factors?

2012-04-23 Thread Marc Schwartz

On Apr 23, 2012, at 8:29 AM, wwreith wrote:

 Consider the following generic code for a survival model
 
 survobj-Surv(data$Time,data$Satisfactory)
 survmodel-survreg(survobj~x1+x2+x3+x4+x5+x6, data=data, dist=weibull)
 survsum-summary(survmodel)
 survsum
 
 My question: Does anyone know what exactly survreg() does differently if
 
 x1-factor(data$x1, ordered=TRUE)
 x2-factor(data$x2, ordered=TRUE)
 
 vs. 
 
 x1-factor(data$x1)
 x2-factor(data$x2)
 
 Thanks,
 
 William


You might want to Google search for Orthogonal Polynomial Contrasts, which is 
what you get by default in R for ordered factors and that will apply not just 
for survreg, but for all typical modeling functions in R (lm, glm, etc.). There 
is a page here that might be helpful:

  http://www.ats.ucla.edu/stat/r/library/contrast_coding.htm

and it is also covered, albeit briefly, in An Introduction to R:

  http://cran.r-project.org/doc/manuals/R-intro.html#Contrasts

as well as ?contr.poly. Briefly, it allows for an analysis/exploration of 
linear and higher order polynomial trends in the factor in relation to the 
response variable, which would be more typical for an ordinal, as compared to a 
nominal, independent variable.

For unordered factors, the default in R is to use what are called treatment 
contrasts, which compares each level of the factor with the base or reference 
level. Depending upon the nature of the analysis you are conducting and your 
underlying hypotheses, treatment contrasts are very commonly used for ordinal 
variables as well.

Regards,

Marc Schwartz

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Solve an ordinary or generalized eigenvalue problem in R

2012-04-23 Thread John C Nash

This thread reveals that R has some holes in the solution of some of the linear algebra 
problems that may arise. It looks like Jim Ramsay used a quick and dirty approach to the 
generalized eigenproblem by using B^(-1) %*% A, which is usually not too successful due to 
issues with condition of B and making a symmetric/Hermitian problem unsymmetric.


In short, the problem is stated as follows:

 Find the eigenvalues e and vectors v that satisfy

A v  =  e B v

 There is a matrix form (I think   A V = B V E is usual, though I tend to work on them 
one at a time in my mind.)


The real trouble is that A and B can have different forms e.g., real, complex, 
Hermitian
and B may or may not be positive definite. I published a code for complex Hermitian case 
with posdef (but complex) metric B in 1974 in Computer Physics Communications. Around the 
same time there was Linda Kaufman's LZ code (TOMS / ACM 496) and various QZ codes (van 
Loan and Smith), see TOMS / ACM 535. There are descendants of these in LAPACK. And I 
suspect there are some sparse variants too.


The nice place for these would be in the Matrix package, but as a shorter-term solution, 
it could be useful to put together a suite for dense matrices. Using existing Fortran 
codes would allow this to be done fairly quickly and it could be a nice summer or term 
project for a student (Google Summer of Code 2013?). I'll be happy to kibbitz and mentor, 
but I'd rather stay on the sidelines of actual package building, as I no longer have a 
direct need. My application was the quantum electronic structure of polymers in 1972/3. 
However, I think the code is still reasonable.


Best, JN


On 04/23/2012 06:00 AM, r-help-requ...@r-project.org wrote:

Message: 36 Date: Sun, 22 Apr 2012 21:27:23 +0200 From: Berend Hasselman 
b...@xs4all.nl
To: Jonathan Greenberg j...@illinois.edu Cc: R Project Help 
r-help@r-project.org
Subject: Re: [R] Solve an ordinary or generalized eigenvalue problem in R? 
Message-ID:
378c084b-5cf5-4087-9aba-116c5155d...@xs4all.nl Content-Type: text/plain;
charset=us-ascii On 22-04-2012, at 21:08, Jonathan Greenberg wrote:

  Thanks all (particularly to you, Berend) -- I'll push forward with these solutions and 
integrate them into my code.  I did come across geigen while rooting around in the CCA code 
but its not formally documented (it just says for internal use or something 
along those lines) and as you found out above, it does not produce the same solution as the 
dggev.  It would be nice to have a more complete set of formal packages for doing LA in R 
(rather than having to hand-write .Fortran calls) but I'll leave that to someone with more 
expertise in linear algebra than me.  Something that perhaps matches the SciPy set of 
functions (both in terms of input and output):

  http://docs.scipy.org/doc/scipy/reference/linalg.html

  Some of these are already implemented, but clearly not all of them.

Package CCA has package fda as dependency.
And package fda defines a function geigen.
The first 14 lines of this function are

geigen- function(Amat, Bmat, Cmat)
{
   #  solve the generalized eigenanalysis problem
   #
   #max {tr L'AM / sqrt[tr L'BL tr M'CM] w.r.t. L and M
   #
   #  Arguments:
   #  AMAT ... p by q matrix
   #  BMAT ... order p symmetric positive definite matrix
   #  CMAT ... order q symmetric positive definite matrix
   #  Returns:
   #  VALUES ... vector of length s = min(p,q) of eigenvalues
   #  LMAT   ... p by s matrix L
   #  MMAT   ... q by s matrix M

It's not clear to me how it is used and exactly what it is doing and how that 
compares with Lapack.

Berend



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] [BioC] Overlay Gene Expression on SNP (copy number) data

2012-04-23 Thread Tengfei Yin

On Mon, Apr 23, 2012 at 6:33 AM, Ekta Jain ekta_j...@jubilantbiosys.comwrote:

 Hello,
 Can anyone please suggest any packages in R that can be used to overlay
 gene expression data on SNP (affymetrix) copy number ?

Hi Ekta,

If you mean visually, as Steve suggested, you could try packages like
ggbio, Gviz, Rcytoscape.. it depends on how you plan to visualize your
data, track-based? circular view? net work? and what format your data are?

for example, in ggbio, it depends on what data you are using, you can
arrange your data into GRanges manually or just provide data  that
rtracklayer supported like bed, then just use autoplot, it accepts
different objects, like GRanges, IRanges, bamfiles or character... allow
some transformation like coverage. For files like bed, it automatically use
bar to represent your data and use score as y(you can specify other y).
Function tracks() allow you to bind or overlay any graphics produced by
ggbio or ggplot2, so you could work from data.frame too, it will help you
align your plots after the graphics are produced. For genomic structure, if
you want to overlay with your data, try autoplot, TranscriptDb. And if you
want to show interaction between genes, you could try either arches in
linear view or links in circular view(layout_circle).

http://tengfei.github.com/ggbio/

this website is still under development, just to show some possible cases,
it will be re-built against R 2.15 and more case studies are coming.

HTH

Tengfei




 Thanks,
 Ekta
 Senior Research Associate
 Bioinformatics Department
 Jubilant Biosys Pvt Ltd,
 #96, Industrial Suburb, 2nd Stage
 Yeshwantpur, Bangalore 560 022
 Ph No : +91-80-66628346

 The information contained in this electronic message and in any
 attachments to this message is confidential, legally privileged and
 intended only for use by the person or entity to which this electronic
 message is addressed. If you are not the intended recipient, and have
 received this message in error, please notify the sender and system manager
 by return email and delete the message and its attachments and also you are
 hereby notified that any distribution, copying, review, retransmission,
 dissemination or other use of this electronic transmission or the
 information contained in it is strictly prohibited. Please note that any
 views or opinions presented in this email are solely those of the author
 and may not represent those of the Company or bind the Company. Any
 commitments made over e-mail are not financially binding on the company
 unless accompanied or followed by a valid purchase order. This message has
 been scanned for viruses and dangerous content by Mail Scanner, a!
  nd is believed to be clean. The Company accepts no liability for any
 damage caused by any virus transmitted by this email.
 www.jubl.com

[[alternative HTML version deleted]]

 ___
 Bioconductor mailing list
 bioconduc...@r-project.org
 https://stat.ethz.ch/mailman/listinfo/bioconductor
 Search the archives:
 http://news.gmane.org/gmane.science.biology.informatics.conductor




-- 
Tengfei Yin
MCDB PhD student
1620 Howe Hall, 2274,
Iowa State University
Ames, IA,50011-2274
Homepage: www.tengfei.name

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] take data from a file to another according to their correlation coefficient

2012-04-23 Thread jeff6868

Hi Rui,

Yes you're right. It's me again ^^
This post is the last part (I hope) of my job. You helped me a lot last time
for the correlation matrices. 
I have to leave my work now, so I'll check and test your proposition
tomorrow. But it makes no doubt that it'll help me a lot again. 
I'll tell you tomorrow. Thanks Rui!

--
View this message in context: 
http://r.789695.n4.nabble.com/take-data-from-a-file-to-another-according-to-their-correlation-coefficient-tp4580054p4580898.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] How can I run package ca (correspondence analysis), which needs rgl, without X11?

2012-04-23 Thread Jocelyn Ireson-Paine

I want to invoke R on a Linux Web server from Java, in order to analyse 
data in ways that would take too long to code, and run too slowly, in 
Java. In particular, to do correspondence analyses. To this end, I've 
installed R version 2.15.0 on my Web host's x86_64 GNU/Linux machine, and 
tried using package ca to run the author example of correspondence 
analysis on page 3 of http://cran.r-project.org/web/packages/ca/ca.pdf .


But there's a problem anent X11. My Web host's Linux doesn't have X11, and 
its administrator says that it would be hairy for him to install, because 
of its dependencies. So when I installed R, I did so without X11. Since 
I'm not going to plot graphs interactively, I wasn't expecting to need it 
anyway.


However, I find that ca requires it. When I did
  install.packages(ca)
(from Bristol mirror), it downloaded rgl_0.92.879.tar.gz and 
ca_0.33.tar.gz, then started its checks, and then complained:

  checking for X... no
  configure: error: X11 not found but required, configure aborted.
  ERROR: configuration failed for package 'rgl'
  * removing '/home/jp/r/R-2.15.0/library/rgl'
  ERROR: dependency 'rgl' is not available for package 'ca'
  * removing '/home/jp/r/R-2.15.0/library/ca'

Why? Not all uses of ca require X11. For example, if you merely call the 
ca function without plotting anything, that surely can't need it. 
Moreover, one can plot to PDF without needing X11. I managed to do so from 
another correspondence-analysis package, ade4, by redirecting my plot to 
PDF and then running the Bordeaux example near the top of 
http://pbil.univ-lyon1.fr/ade4/ade4-html/bordeaux.html . (I then converted 
the PDF to PNG using ImageMagick.) So if ade4 can do this, ca ought to be 
able to. How can I make it? I don't mind ca complaining when it indeed 
does need X11, but couldn't the dependency check be postponed until 
runtime, so as not to spoil things for people to whom it's irrelevant?


Thanks,

Jocelyn Ireson-Paine
http://www.j-paine.org

Jocelyn's Cartoons:
http://www.j-paine.org/blog/jocelyns_cartoons/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Selecting columns whose names contain mutated except when they also contain non or un

2012-04-23 Thread Paul Miller

Hello All,

Started out awhile ago trying to select columns in a dataframe whose names 
contain some variation of the word mutant using code like:

names(KRASyn)[grep(muta, names(KRASyn))]

The idea then would be to add together the various columns using code like:

KRASyn$Mutant_comb - rowSums(KRASyn[grep(muta, names(KRASyn))])

What I discovered though, is that this selects columns like nonmutated and 
unmutated as well as columns like mutated, mutation, and mutational.

So I'd like to know how to select columns that have some variation of the word 
mutant without the non or the un. I've been looking around for an example 
of how to do that but haven't found anything yet.

Can anyone show me how to select the columns I need?

Thanks,

Paul

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Selecting columns whose names contain mutated except when they also contain non or un



On Apr 23, 2012, at 12:10 PM, Paul Miller wrote:


Hello All,

Started out awhile ago trying to select columns in a dataframe whose  
names contain some variation of the word mutant using code like:


names(KRASyn)[grep(muta, names(KRASyn))]

The idea then would be to add together the various columns using  
code like:


KRASyn$Mutant_comb - rowSums(KRASyn[grep(muta, names(KRASyn))])

What I discovered though, is that this selects columns like  
nonmutated and unmutated as well as columns like mutated,  
mutation, and mutational.


So I'd like to know how to select columns that have some variation  
of the word mutant without the non or the un. I've been  
looking around for an example of how to do that but haven't found  
anything yet.


Can anyone show me how to select the columns I need?


If you want only columns whose names _begin_ with muta then add the  
^ character at the beginning of your pattern:


names(KRASyn)[grep(^muta, names(KRASyn))]

(This should be explained on the ?regex page.)

--

David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Selecting columns whose names contain mutated except when they also contain non or un

2012-04-23 Thread Paul Miller

Hello Dr. Winsemius,

Unfortunately, I also have terms like krasmutated. So simply selecting words 
that start with muta won't work in this case. 

Thanks,

Paul


--- On Mon, 4/23/12, David Winsemius dwinsem...@comcast.net wrote:

 From: David Winsemius dwinsem...@comcast.net
 Subject: Re: [R] Selecting columns whose names contain mutated except when 
 they also contain non or un
 To: Paul Miller pjmiller...@yahoo.com
 Cc: r-help@r-project.org
 Received: Monday, April 23, 2012, 11:16 AM
 
 On Apr 23, 2012, at 12:10 PM, Paul Miller wrote:
 
  Hello All,
  
  Started out awhile ago trying to select columns in a
 dataframe whose names contain some variation of the word
 mutant using code like:
  
  names(KRASyn)[grep(muta, names(KRASyn))]
  
  The idea then would be to add together the various
 columns using code like:
  
  KRASyn$Mutant_comb - rowSums(KRASyn[grep(muta,
 names(KRASyn))])
  
  What I discovered though, is that this selects columns
 like nonmutated and unmutated as well as columns like
 mutated, mutation, and mutational.
  
  So I'd like to know how to select columns that have
 some variation of the word mutant without the non or the
 un. I've been looking around for an example of how to do
 that but haven't found anything yet.
  
  Can anyone show me how to select the columns I need?
 
 If you want only columns whose names _begin_ with muta
 then add the ^ character at the beginning of your
 pattern:
 
 names(KRASyn)[grep(^muta, names(KRASyn))]
 
 (This should be explained on the ?regex page.)
 
 --
 David Winsemius, MD
 West Hartford, CT
 


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How can I run package ca (correspondence analysis), which needs rgl, without X11?

2012-04-23 Thread Duncan Murdoch


On 23/04/2012 12:05 PM, Jocelyn Ireson-Paine wrote:

I want to invoke R on a Linux Web server from Java, in order to analyse
data in ways that would take too long to code, and run too slowly, in
Java. In particular, to do correspondence analyses. To this end, I've
installed R version 2.15.0 on my Web host's x86_64 GNU/Linux machine, and
tried using package ca to run the author example of correspondence
analysis on page 3 of http://cran.r-project.org/web/packages/ca/ca.pdf .

But there's a problem anent X11. My Web host's Linux doesn't have X11, and
its administrator says that it would be hairy for him to install, because
of its dependencies. So when I installed R, I did so without X11. Since
I'm not going to plot graphs interactively, I wasn't expecting to need it
anyway.

However, I find that ca requires it. When I did
install.packages(ca)
(from Bristol mirror), it downloaded rgl_0.92.879.tar.gz and
ca_0.33.tar.gz, then started its checks, and then complained:
checking for X... no
configure: error: X11 not found but required, configure aborted.
ERROR: configuration failed for package 'rgl'
* removing '/home/jp/r/R-2.15.0/library/rgl'
ERROR: dependency 'rgl' is not available for package 'ca'
* removing '/home/jp/r/R-2.15.0/library/ca'

Why? Not all uses of ca require X11. For example, if you merely call the
ca function without plotting anything, that surely can't need it.
Moreover, one can plot to PDF without needing X11. I managed to do so from
another correspondence-analysis package, ade4, by redirecting my plot to
PDF and then running the Bordeaux example near the top of
http://pbil.univ-lyon1.fr/ade4/ade4-html/bordeaux.html . (I then converted
the PDF to PNG using ImageMagick.) So if ade4 can do this, ca ought to be
able to. How can I make it? I don't mind ca complaining when it indeed
does need X11, but couldn't the dependency check be postponed until
runtime, so as not to spoil things for people to whom it's irrelevant?


That sounds reasonable -- you should ask the maintainer of ca to 
consider listing rgl only as a suggestion, not a dependency.  It may 
require some other changes (e.g. checks for the presence of rgl before 
trying to use it).


Or perhaps you could ask for a minimal installation of X11.  rgl will be 
happy with the Xvfb virtual server.  It won't display anything, but 
the calls into X11 work.  I don't know if it's any easier to install 
than all of X11, though.


Duncan Murdoch



Thanks,

Jocelyn Ireson-Paine
http://www.j-paine.org

Jocelyn's Cartoons:
http://www.j-paine.org/blog/jocelyns_cartoons/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] plot function creating bars instead of lines

2012-04-23 Thread la mer

Hello,

I am having a problem where code that plots lines using a different data
frame plots bars with the current data frame (I am intended to plot lines).
The code specifies lines (see below), so I can't figure out why the results
are bars. I suspect that it may have something to do with the fact that in
the data frame where the code worked as intended, the both variables
specifying different lines were numeric, whereas in the current data frame
one of those variables (challenge) is a factor with 2 levels. Any
suggestions for getting this to plot as intended would be much appreciated.

Thank you!

 This is meant to plot a separate line for each subject for each
challenge*
for (subj in unique(lab.samples$subid)) {
#par(new=T)
plot.new()
par(mfrow=c(2,1))
par(mfg=c(1,1))
plot(data=lab.samples, subset=(subid==subj), cortisol ~ Sample, 
type='n', 
main=paste('Cortisol and Amylase for subject ', 
as.character(subj)))

for ( t in unique(subset(lab.samples,subid==subj)$challenge) ) {
par(mfg=c(1,1))
lines(data=lab.samples, subset=(subid==subj  challenge==t), 
cortisol ~ Sample, type='b', pch=as.character(t), 
col=rainbow(2)[t])
}
par(mfg=c(2,1))
plot(data=lab.samples, subset=(subid==subj), amylase ~ Sample, type='n')
for ( t in unique(subset(lab.samples,subid==subj)$challenge) ) {
par(mfg=c(2,1))
lines(data=lab.samples, subset=(subid==subj  challenge==t), 
amylase ~ Sample, type='b', pch=as.character(t), 
col=heat.colors(2)[t])
}
}   


--
View this message in context: 
http://r.789695.n4.nabble.com/plot-function-creating-bars-instead-of-lines-tp4580765p4580765.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] write a png inside a pdf for large graphics?

2012-04-23 Thread Adam Wilson

I routinely write graphics into multi-page PDFs, but some graphics (i.e.
plots of large spatial datasets using levelplot()) can result in enormous
files.  I'm curious if there is a better way.  For example:

#First, make some data:
library(lattice)
d=expand.grid(x=1:1000,y=1:1000)
d$z=rnorm(nrow(d))

#Now, the PDF.  The following produces a PDF that's ~50MB.
pdf(width=11,height=8.5,file=test1.pdf)
levelplot(z~x*y,data=d)
dev.off()

#If you write the same graphic to a png with reasonable resolution, the
file size is ~500k:
png(width=1024,height=768,file=test1.png)
levelplot(z~x*y,data=d)
dev.off()

#  I would prefer to embed a png (or other raster format) inside a PDF
directly from R.
#  Is this possible?  I'm looking for some way to achieve something like
the following (of course this doesn't work):
pdf(width=11,height=8.5,file=test1.pdf)
 png(width=1024,height=768,file=current device)
 levelplot(z~x*y,data=d)
 dev.off()
dev.off()


Of course the PDF preserves vector scalability, but there are times it's
not worth the extra file size.  And you can write out the png's as separate
files and then merge them with imagemagick or ghostscript.  I currently get
around this by writing the graphics to a potentially very large (100MB)
PDF, then use ghostscript to convert *only* the large pages of the pdf to
png and put it back together as a PDF (a function I wrote for this is
described here:
http://planetflux.adamwilson.us/2010/06/shrinking-rs-pdf-output.html).

I'm curious if there is a way to do it directly by instructing R to write a
png and embed it within the already open PDF device.  Any ideas?

Adam

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] take data from a file to another according to their correlation coefficient

2012-04-23 Thread Rui Barradas

Hello,



jeff6868 wrote
 
 Hi Sarah,
 
 Thank you for your answer.
 Yes I know that my proposition is not necessary the better way to do it.
 But my problem concerns only big gaps of course (more than half a day of
 missing data, till several months of missing data).
 I've already filled small gaps with the interpolation that you were
 talking in your message (with the function na.approx of the package zoo).
 For the study, it's not important to have perfectly  identical values
 between the 2 correlated stations, because I'll calculate after the
 reconstruction the daily mean of each station. For my boss, it's enough to
 work on daily means. But before that, I need to rebuild the big missing
 data gaps of my stations (by the way I explained in the first message of
 my topic).
 Do you have any idea of the way to do it on R according to my first post?
 I forgot to precise that my examples are completely fakes! I chose these
 numbers in order for you to understand what I want to do (I chose easy and
 readable numbers). I tested on excel with 2 stations, it was not too bad
 when I filled the gaps (between the data of the 2 well correlated
 stations).
 

I remember this data set from some time ago. (Weeks?)

First of all, please use ?dput to post your data, it makes it much easier
for everyone to
just copy and paste to an R session. The output you should post looks like
this:

 dput(s1)
structure(list(time = c(01/01/2008 00:00, 01/01/2008 00:15, 
01/01/2008 00:30, 01/01/2008 00:45), data = c(1L, 2L, NA, 
4L)), .Names = c(time, data), row.names = c(NA, -4L), class =
data.frame)
 dput(s2)
structure(list(time = c(01/01/2008 00:00, 01/01/2008 00:15, 
01/01/2008 00:30, 01/01/2008 00:45), data = 8:11), .Names = c(time, 
data), row.names = c(NA, -4L), class = data.frame)
 dput(s3)
structure(list(time = c(01/01/2008 00:00, 01/01/2008 00:15, 
01/01/2008 00:30, 01/01/2008 00:45), data = c(123L, NA, NA, 
NA)), .Names = c(time, data), row.names = c(NA, -4L), class =
data.frame)
 dput(m)
structure(c(1, 0.9, 0.8, 0.9, 1, 0.7, 0.8, 0.7, 1), .Dim = c(3L, 
3L), .Dimnames = list(c(Station1, Station2, Station3), 
c(Station1, Station2, Station3)))


I've named your data.frames 's1', 's2' and made up an 's3'; 'm' is the
correlation matrix.

Now the problem.
Sarah's comment seems sensible, to just fill in missing values using some
other dataset isn't very canonic
but here it goes.
It assumes the data frames are in a list.

lst - list(s1, s2, s3)
names(lst) - paste(Station, seq.int(length(lst)), sep=)
lst



# station - list number or name, not the data.frame
# mat - correlation matrix
get.max.cor - function(station, mat){
mat[row(mat) == col(mat)] - -Inf
which( mat[station, ] == max(mat[station, ]) )
}

# x - data.frame to be transformed
# y - data.frame with greater correlation
na.fill - function(x, y){
i - is.na(x$data)
x$data[i] - y$data[i]
x
}

mx.cor - get.max.cor(1, m)
mx.cor
na.fill(lst[[1]], lst[[mx.cor]])

Like it's said in the comments before the function, the call to the first
function could be

get.max.cor(Station1, m)

The two functions above solve the problem, all what's left to do is to
automate their calls.
Note that there might be a need for two passes through 'na.fill', if the
data.frame with greater correlation
also has NAs. This is the case of Station1 filling in values for Station3.
Try commenting out the second pass
in the function below


process.all - function(df.list, mat){
f - function(station)
na.fill(df.list[[ station ]], df.list[[ max.cor[station] ]])
#
n - length(df.list)
nms - names(df.list)
# First the max on each row
max.cor - sapply(seq.int(n), get.max.cor, m)
# Note the two passes
df.list - lapply(seq.int(n), f)
df.list - lapply(seq.int(n), f)
# Makes nicer output
names(df.list) - nms
df.list
}

process.all(lst, m)



Hope this helps,

Rui Barradas


--
View this message in context: 
http://r.789695.n4.nabble.com/take-data-from-a-file-to-another-according-to-their-correlation-coefficient-tp4580054p4580845.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Can I specify POSIX[cl]t column classes inside read.csv?

2012-04-23 Thread Thomas Levine

I'm loading a nicely formatted csv file.

    #!/usr/bin/env Rscript
    kpi - read.csv(
      # This is a dump of the username, date_joined and last_login columns
      # from the auth_user Django table.
      'data/2012-04-23.csv',
      colClasses = c('character')
    )
    print(kpi[sample(nrow(kpi), 3),2:3])

Here's what the three rows I printed look like.

             last_login         date_joined
    2012-02-22 02:44:11 2011-09-19 03:07:35
    2011-09-16 01:34:41 2011-09-16 01:34:41
    2011-07-02 20:29:17 2011-07-02 20:29:17

Once I load them, I'm converting the datetimes to datetimes.

    kpi$last_login - as.POSIXlt(kpi$last_login)
    kpi$date_joined - as.POSIXlt(kpi$date_joined)

Can I do this inside of read.csv by specifying colClasses? It's
obviously not a problem if I can't; it just seems like I should be
able to.

Note that the following doesn't work because it doesn't save the times.

    colClasses = c('character', 'Date', 'Date')

Thanks

Tom

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] glmnet sparse matrix error: dim specifies too large an array

2012-04-23 Thread Nathan Stephens

I'm running into an unexpected error using the glmnet and Matrix packages.

I have a matrix that is 8 million rows by 100 columns with 75% of the
entries being zero. When I run a vanilla glmnet logistic model on my server
with 300 GB of RAM, the task completes in 20 minutes:

 x # 8 million x 100 matrix
 model1 - glmnet(x,y,'binomial',alpha=1) # run time 20 minutes

But if I convert the matrix to a sparse matrix using the Matrix package,
the model does not run at all:

 x2 - Matrix(x,sparse=T) # 75% sparse
 model2 - glmnet(x2,y,'binomial',alpha=1) # error
Error in array(0, c(n, p)) : 'dim' specifies too large an array

This result is the opposite of what I might have expected. The non-sparse
data runs fine, but the sparse data fails because it is too large. Is
this a glmnet issue or an R memory issue? Is there a way to fix this in
glmnet?

--Nathan

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Splitting a dataframe by character vector

2012-04-23 Thread Ben Neal

Thanks Steve, your explanation of the == is very helpful, and I will look up 
?by. Ben 

-Original Message-
From: S Ellison [mailto:s.elli...@lgcgroup.com]
Sent: Mon 4/23/2012 3:19 AM
To: Ben Neal; r-help@r-project.org
Subject: RE: Splitting a dataframe by character vector

 -Original Message-
 I am just trying to split a dataframe of 750 observations of 
 29 variables by Site, which is a vector in the dataframe 
 with five text names (ex. PtaCaracol). 
A couple of methods.
i) First, look up ?split, which chops your data frame into a list of five data 
frames. Then use lapply on the list to get a list of summaries, or sapply to 
get something that will (if the results of your summary are a simple number or 
vector) look like an array.

ii) look up ?by, which will do something like lapply (and returns a list with 
extra features) and will print a tidier result.

 I tried subset 
 # Divide dataframe by Site names
 Site1 - subset(Cover, Site = PtaCaracol)

Check your syntax again; that should have been
 Site1 - subset(Cover, Site == PtaCaracol)

'=' is a pairwise link or an assignment operator, not the equality test that 
subset would be looking for. 

Steve E
***
This email and any attachments are confidential. Any use...{{dropped:8}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] How to insert filename as column in a file

2012-04-23 Thread Shivam

Hi,

I am relatively new to R. Have scourged the help files and the www but
havent been able to get a solution.

I have around 250 csv files, one file for each date. They have columns of
all types, numeric, string etc. The name of each file is the date in the
form of 'mmdd'. There is no column within the file which helps me
identify the date on which the file was generated, only the filename has
that info.

I am selecting some data (using read.csv.sql) from each file and creating a
dataset for each day. Ultimately I will combine all the datasets. I can
accomplish the select and combine part, but after combining I wont have a
record as to the date corresponding to the data.

Hence I want to insert the filename as a column in the respective file to
help me in identifying to what date each data row belongs to.

Sorry for the long mail, but wanted to make myself clear. Any help would be
greatly appreciated.

Thanks in advance,
Shivam

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] check for difference.

2012-04-23 Thread aoife doherty

Hello
I have two lists of numbers, each list is ~800 numbers long. I want to know
if the two lists are significantly different from each other.
Could anyone suggest what library in R to use?

I think maybe the mann-whitney test, as it is not parametric, but i am
unsure if it is suitable as my list of items are so long.So i am unsure
which library would suit best.

Aaral.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How can I run package ca (correspondence analysis), which needs rgl, without X11?

2012-04-23 Thread Prof Brian Ripley


On 23/04/2012 17:33, Duncan Murdoch wrote:

On 23/04/2012 12:05 PM, Jocelyn Ireson-Paine wrote:

I want to invoke R on a Linux Web server from Java, in order to analyse
data in ways that would take too long to code, and run too slowly, in
Java. In particular, to do correspondence analyses. To this end, I've
installed R version 2.15.0 on my Web host's x86_64 GNU/Linux machine, and
tried using package ca to run the author example of correspondence
analysis on page 3 of http://cran.r-project.org/web/packages/ca/ca.pdf .

But there's a problem anent X11. My Web host's Linux doesn't have X11,
and
its administrator says that it would be hairy for him to install, because
of its dependencies. So when I installed R, I did so without X11. Since
I'm not going to plot graphs interactively, I wasn't expecting to need it
anyway.

However, I find that ca requires it. When I did
install.packages(ca)
(from Bristol mirror), it downloaded rgl_0.92.879.tar.gz and
ca_0.33.tar.gz, then started its checks, and then complained:
checking for X... no
configure: error: X11 not found but required, configure aborted.
ERROR: configuration failed for package 'rgl'
* removing '/home/jp/r/R-2.15.0/library/rgl'
ERROR: dependency 'rgl' is not available for package 'ca'
* removing '/home/jp/r/R-2.15.0/library/ca'

Why? Not all uses of ca require X11. For example, if you merely call the
ca function without plotting anything, that surely can't need it.
Moreover, one can plot to PDF without needing X11. I managed to do so
from
another correspondence-analysis package, ade4, by redirecting my plot to
PDF and then running the Bordeaux example near the top of
http://pbil.univ-lyon1.fr/ade4/ade4-html/bordeaux.html . (I then
converted
the PDF to PNG using ImageMagick.) So if ade4 can do this, ca ought to be
able to. How can I make it? I don't mind ca complaining when it indeed
does need X11, but couldn't the dependency check be postponed until
runtime, so as not to spoil things for people to whom it's irrelevant?


That sounds reasonable -- you should ask the maintainer of ca to
consider listing rgl only as a suggestion, not a dependency. It may
require some other changes (e.g. checks for the presence of rgl before
trying to use it).

Or perhaps you could ask for a minimal installation of X11. rgl will be
happy with the Xvfb virtual server. It won't display anything, but the
calls into X11 work. I don't know if it's any easier to install than all
of X11, though.


Unfortunately, it is usually harder.  Not least because rgl does not 
need just Xvfb, but Xvfb with GL (and some extensions) support in the 
server.  We have this problem on the Solaris Sparc check machine: it is 
a server with no GL hardware support, so rgl will not run on it (we ssh 
-X from a server with fuller X11 to work around this).  And I had a 
similar issue with Ubuntu pre-12.04 on a prototype server: no GL drivers 
yet.


I do think 'ca' should 'Suggests: rgl' only.


--
Brian D. Ripley,  rip...@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] write a png inside a pdf for large graphics?

2012-04-23 Thread Duncan Murdoch


On 23/04/2012 10:49 AM, Adam Wilson wrote:

I routinely write graphics into multi-page PDFs, but some graphics (i.e.
plots of large spatial datasets using levelplot()) can result in enormous
files.  I'm curious if there is a better way.  For example:

#First, make some data:
library(lattice)
d=expand.grid(x=1:1000,y=1:1000)
d$z=rnorm(nrow(d))

#Now, the PDF.  The following produces a PDF that's ~50MB.
pdf(width=11,height=8.5,file=test1.pdf)
levelplot(z~x*y,data=d)
dev.off()

#If you write the same graphic to a png with reasonable resolution, the
file size is ~500k:
png(width=1024,height=768,file=test1.png)
levelplot(z~x*y,data=d)
dev.off()

#  I would prefer to embed a png (or other raster format) inside a PDF
directly from R.
#  Is this possible?  I'm looking for some way to achieve something like
the following (of course this doesn't work):
pdf(width=11,height=8.5,file=test1.pdf)
  png(width=1024,height=768,file=current device)
  levelplot(z~x*y,data=d)
  dev.off()
dev.off()


Of course the PDF preserves vector scalability, but there are times it's
not worth the extra file size.  And you can write out the png's as separate
files and then merge them with imagemagick or ghostscript.  I currently get
around this by writing the graphics to a potentially very large (100MB)
PDF, then use ghostscript to convert *only* the large pages of the pdf to
png and put it back together as a PDF (a function I wrote for this is
described here:
http://planetflux.adamwilson.us/2010/06/shrinking-rs-pdf-output.html).

I'm curious if there is a way to do it directly by instructing R to write a
png and embed it within the already open PDF device.  Any ideas?


I haven't tried this, but rasterImage() can plot to PDF.  So you just 
need to get your PNG display into a raster image.


Duncan Murdoch

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Selecting columns whose names contain mutated except when they also contain non or un

2012-04-23 Thread Bert Gunter

Below.

-- Bert

On Mon, Apr 23, 2012 at 9:10 AM, Paul Miller pjmiller...@yahoo.com wrote:
Hello All,

Started out awhile ago trying to select columns in a dataframe whose names
contain some variation of the word mutant using code like:

names(KRASyn)[grep(muta, names(KRASyn))]

The idea then would be to add together the various columns using code like:

KRASyn$Mutant_comb - rowSums(KRASyn[grep(muta, names(KRASyn))])

What I discovered though, is that this selects columns like nonmutated and
unmutated as well as columns like mutated, mutation, and mutational.

So I'd like to know how to select columns that have some variation of the
word mutant without the non or the un. I've been looking around for an
example of how to do that but haven't found anything yet.

You can't, because you have not provided a full specification of what
can be selected and what can't. Software can only do what you tell it
to -- it cannot read minds. Once you have provided a a complete and
accurate specification of inclusion/exclusion criteria, it should be
easy to write a regex procedure.

The fault, dear Brutus, lies not in the stars but in ourselves.

-- Bert

Can anyone show me how to select the columns I need?

Thanks,

Paul

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm

Re: [R] Selecting columns whose names contain mutated except when they also contain non or un



On Apr 23, 2012, at 12:25 PM, Paul Miller wrote:


Hello Dr. Winsemius,

Unfortunately, I also have terms like krasmutated. So simply  
selecting words that start with muta won't work in this case.


You are aware that negative indexing can be used with grep aren't you?

--
David.


Thanks,

Paul


--- On Mon, 4/23/12, David Winsemius dwinsem...@comcast.net wrote:


From: David Winsemius dwinsem...@comcast.net
Subject: Re: [R] Selecting columns whose names contain mutated  
except when they also contain non or un

To: Paul Miller pjmiller...@yahoo.com
Cc: r-help@r-project.org
Received: Monday, April 23, 2012, 11:16 AM

On Apr 23, 2012, at 12:10 PM, Paul Miller wrote:


Hello All,

Started out awhile ago trying to select columns in a

dataframe whose names contain some variation of the word
mutant using code like:


names(KRASyn)[grep(muta, names(KRASyn))]

The idea then would be to add together the various

columns using code like:


KRASyn$Mutant_comb - rowSums(KRASyn[grep(muta,

names(KRASyn))])


What I discovered though, is that this selects columns

like nonmutated and unmutated as well as columns like
mutated, mutation, and mutational.


So I'd like to know how to select columns that have

some variation of the word mutant without the non or the
un. I've been looking around for an example of how to do
that but haven't found anything yet.


Can anyone show me how to select the columns I need?


If you want only columns whose names _begin_ with muta
then add the ^ character at the beginning of your
pattern:

names(KRASyn)[grep(^muta, names(KRASyn))]

(This should be explained on the ?regex page.)

--
David Winsemius, MD
West Hartford, CT




David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Selecting columns whose names contain mutated except when they also contain non or un

2012-04-23 Thread Bert Gunter

But maybe ... (see below)
-- Bert

On Mon, Apr 23, 2012 at 9:25 AM, Paul Miller pjmiller...@yahoo.com wrote:
 Hello Dr. Winsemius,

 Unfortunately, I also have terms like krasmutated. So simply selecting 
 words that start with muta won't work in this case.

 Thanks,

 Paul


 --- On Mon, 4/23/12, David Winsemius dwinsem...@comcast.net wrote:

 From: David Winsemius dwinsem...@comcast.net
 Subject: Re: [R] Selecting columns whose names contain mutated except when 
 they also contain non or un
 To: Paul Miller pjmiller...@yahoo.com
 Cc: r-help@r-project.org
 Received: Monday, April 23, 2012, 11:16 AM

 On Apr 23, 2012, at 12:10 PM, Paul Miller wrote:

  Hello All,
 
  Started out awhile ago trying to select columns in a
 dataframe whose names contain some variation of the word
 mutant using code like:
 
  names(KRASyn)[grep(muta, names(KRASyn))]
 
  The idea then would be to add together the various
 columns using code like:
 
  KRASyn$Mutant_comb - rowSums(KRASyn[grep(muta,
 names(KRASyn))])
 
  What I discovered though, is that this selects columns
 like nonmutated and unmutated as well as columns like
 mutated, mutation, and mutational.
 
  So I'd like to know how to select columns that have
 some variation of the word mutant without the non or the
 un. I've been looking around for an example of how to do
 that but haven't found anything yet.

If this **is** a complete specification then wouldn't simply:

x - names(yourdataframe)
 grepl(muta,x)  !grepl(nonmuta|unmuta,x)

do it?

e.g.
 x - c(nonmutated,unmutated,mutation,mutated,krasmutated)
 grepl(muta,x)  !grepl(nonmuta|unmuta,x)
[1] FALSE FALSE  TRUE  TRUE  TRUE

 
  Can anyone show me how to select the columns I need?

 If you want only columns whose names _begin_ with muta
 then add the ^ character at the beginning of your
 pattern:

 names(KRASyn)[grep(^muta, names(KRASyn))]

 (This should be explained on the ?regex page.)

 --
 David Winsemius, MD
 West Hartford, CT



 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Can I specify POSIX[cl]t column classes inside read.csv?



On Apr 23, 2012, at 11:48 AM, Thomas Levine wrote:


I'm loading a nicely formatted csv file.

#!/usr/bin/env Rscript
kpi - read.csv(
  # This is a dump of the username, date_joined and last_login  
columns

  # from the auth_user Django table.
  'data/2012-04-23.csv',
  colClasses = c('character')
)
print(kpi[sample(nrow(kpi), 3),2:3])

Here's what the three rows I printed look like.

 last_login date_joined
2012-02-22 02:44:11 2011-09-19 03:07:35
2011-09-16 01:34:41 2011-09-16 01:34:41
2011-07-02 20:29:17 2011-07-02 20:29:17

Once I load them, I'm converting the datetimes to datetimes.

kpi$last_login - as.POSIXlt(kpi$last_login)
kpi$date_joined - as.POSIXlt(kpi$date_joined)

Can I do this inside of read.csv by specifying colClasses?



Possibly. If there is an as function for a particular class, it can  
be used in the colClasses vector of read.* functions. It appears that  
your input file might have the right combination of formats and  
separators for this to succeed.





It's
obviously not a problem if I can't; it just seems like I should be
able to.

Note that the following doesn't work because it doesn't save the  
times.


colClasses = c('character', 'Date', 'Date')



Try instead:

colClasses = c('character', 'POSIXlt', 'POSIXlt')


--

David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] glmnet_1.7.3 on windows

2012-04-23 Thread Trevor Hastie

We are aware that glmnet_1.7.3 does not pass for windows
and are looking into the problem. It has something to do
with the gcc compiler being slightly different on
windows versus linux/mac  platforms. As soon as we have 
resolved the issue, we will post a new version to CRAN

Trevor Hastie
 

  Trevor Hastie   has...@stanford.edu  
  Professor, Department of Statistics, Stanford University
  Phone: (650) 725-2231 Fax: (650) 725-8977  
  URL: http://www.stanford.edu/~hastie  
   address: room 104, Department of Statistics, Sequoia Hall
   390 Serra Mall, Stanford University, CA 94305-4065  
 
--




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Problem extracting enough coefs from gam (mgcv package)

2012-04-23 Thread Martijn Wieling

Dear useRs,

I have used using the excellent mgcv package (version 1.7-12) to
create a generalized additive model (gam) including random effects -
represented with s(...,bs=re) - on the basis of dialect data.

My model contains two random-effect factors (Word and Key - the latter
representing a speaker) and I have added both random intercepts and
various random slopes for these random-effect factors. There is no
missing data in my dataset. When I try to extract the by-word random
intercepts from my model, using coef(model), I find 357 values, equal
to the number of words in my dataset. Using coef(model) I get
uninformative names: s(Word,1) until s(Word,357), but I'm assuming (I
might be wrong though?) that I can link the labels of the words to
these values by obtaining the 357 labels from the original dataset:
unique(dat[,c(Word)])

Unfortunately, I cannot use this procedure to label the by-word random
slopes, because I find a varying number of values for these (ranging
from 346 to 356) which is always less than 357. (The number of
by-speaker random slopes does equal the number of speakers, though.)

Does anybody i) have an idea why I obtain fewer by-word random slopes
than words, and/or ii) how I can link the random slopes which are
present to the correct labels of the words?

(I did not include the model as it is 300 MB in size, but let me know
if this is necessary.)

Any help would be greatly appreciated!

With kind regards,
Martijn Wieling
University of Groningen
http://www.martijnwieling.nl

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Assignment problems

2012-04-23 Thread Rui Barradas

Hello,

phillip03 wrote
 
 Thank you Rui
 
 Can you help me with my ifelse problem - I would like to add a list to my
 data.frame where avgflow in those rows where ONLY my country pair both are
 in euro
 

First of all, try to use better (much, much better) code writing.
What are you trying to do here ???

trade-data.frame(avgflow,EMU,stringsAsFactors=FALSE)

avgflowEURO - rep(0, nrow(trade))

trade1 - (for (i in 1:nrow(trade)){
ifelse(EMU[i] == 1, avgflowEURO[i] - avgflow[i], NA)
})

And please note that I've already INDENTED it.
And separated it in LINES of code.
And used SPACES before/after '==', '-', etc.
And why TWO assignments in the same instruction? (At first I couldn't even
see this.)
You'll need far more programming experience before using 2 or more
assignments.

Now for the question. You don't need a loop/ifelse. Nor to duplicate the
data.frame 'trade'.


avgflowEURO - avgflow
avgflowEURO[ !EMU ] - NA
trade - data.frame(avgflow, EMU, avgflowEURO, stringsAsFactors=FALSE)

Rui Barradas



--
View this message in context: 
http://r.789695.n4.nabble.com/Assignment-problems-tp4578672p458.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] OLS Estimating



On Apr 23, 2012, at 12:53 PM, phillip03 wrote:


So how would I use the lm() to estimate b_0 and b_1 for example

My Y_i and Y_j are data observations how does the lm() use my  
data.frame ?


Have your read An Introduction to R. Section 11 would appear to  
cover all the needed topics.



--
David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Solve an ordinary or generalized eigenvalue problem in R?

2012-04-23 Thread Jonathan Greenberg

Building in Berend's suggestions I think this function should work for most
people (I'm going to wrap it into a package but figured people may want to
grab this directly):

# Please see http://www.netlib.org/lapack/double/dggev.f for a description
of inputs and outputs.
Rdggev - function(A,B,JOBVL=FALSE,JOBVR=TRUE)
{
# R implementation of the DGGEV LAPACK function (with generalized
eigenvalue computation)
# See http://www.netlib.org/lapack/double/dggev.f
 # coded by Jonathan A. Greenberg i...@estarcion.net
# Contributions from Berend Hasselman.

if( .Platform$OS.type == windows ) {
Lapack.so -
file.path(R.home(bin),paste(Rlapack,.Platform$dynlib.ext,sep=))
} else {
Lapack.so -
file.path(R.home(modules),paste(lapack,.Platform$dynlib.ext,sep=))
}
 dyn.load(Lapack.so)
 if(JOBVL)
{
JOBVL=V
} else
{
JOBVL=N
}
 if(JOBVR)
{
JOBVR=V
} else
{
JOBVR=N
}
 if(!is.matrix(A)) stop(Argument A should be a matrix)
if(!is.matrix(B)) stop(Argument B should be a matrix)
dimA - dim(A)
if(dimA[1]!=dimA[2]) stop(A must be a square matrix)
dimB - dim(B)
if(dimB[1]!=dimB[2]) stop(B must be a square matrix)
if(dimA[1]!=dimB[1]) stop(A and B must have the same dimensions)
 if( is.complex(A) ) stop(A may not be complex)
if( is.complex(B) ) stop(B may not be complex)
 # Input parameters
N=dim(A)[[1]]
LDA=N
LDB=N
LDVL=N
LDVR=N
LWORK=as.integer(max(1,8*N))
 Rdggev_out - .Fortran(dggev, JOBVL, JOBVR, N, A, LDA, B, LDB,
double(N), double(N), double(N),
array(data=0,dim=c(LDVL,N)), LDVL, array(data=0,dim=c(LDVR,N)), LDVR,
double(max(1,LWORK)), LWORK, integer(1))

names(Rdggev_out)=c(JOBVL,JOBVR,N,A,LDA,B,LDB,ALPHAR,ALPHAI,
BETA,VL,LDVL,VR,LDVR,WORK,LWORK,INFO)
 # simplistic calculation of eigenvalues (see caveat in
http://www.netlib.org/lapack/double/dggev.f)
if( all(Rdggev_out$ALPHAI==0) )
Rdggev_out$GENEIGENVALUES - Rdggev_out$ALPHAR/Rdggev_out$BETA
else
Rdggev_out$GENEIGENVALUES - complex(real=Rdggev_out$ALPHAR,
imaginary=Rdggev_out$ALPHAI)/Rdggev_out$BETA
 return(Rdggev_out)
}

Thanks!

--j

On Sun, Apr 22, 2012 at 2:27 PM, Berend Hasselman b...@xs4all.nl wrote:


 On 22-04-2012, at 21:08, Jonathan Greenberg wrote:

  Thanks all (particularly to you, Berend) -- I'll push forward with these
 solutions and integrate them into my code.  I did come across geigen while
 rooting around in the CCA code but its not formally documented (it just
 says for internal use or something along those lines) and as you found
 out above, it does not produce the same solution as the dggev.  It would be
 nice to have a more complete set of formal packages for doing LA in R
 (rather than having to hand-write .Fortran calls) but I'll leave that to
 someone with more expertise in linear algebra than me.  Something that
 perhaps matches the SciPy set of functions (both in terms of input and
 output):
 
  http://docs.scipy.org/doc/scipy/reference/linalg.html
 
  Some of these are already implemented, but clearly not all of them.

 Package CCA has package fda as dependency.
 And package fda defines a function geigen.
 The first 14 lines of this function are

 geigen - function(Amat, Bmat, Cmat)
 {
  #  solve the generalized eigenanalysis problem
  #
  #max {tr L'AM / sqrt[tr L'BL tr M'CM] w.r.t. L and M
  #
  #  Arguments:
  #  AMAT ... p by q matrix
  #  BMAT ... order p symmetric positive definite matrix
  #  CMAT ... order q symmetric positive definite matrix
  #  Returns:
  #  VALUES ... vector of length s = min(p,q) of eigenvalues
  #  LMAT   ... p by s matrix L
  #  MMAT   ... q by s matrix M

 It's not clear to me how it is used and exactly what it is doing and how
 that compares with Lapack.

 Berend




-- 
Jonathan A. Greenberg, PhD
Assistant Professor
Department of Geography and Geographic Information Science
University of Illinois at Urbana-Champaign
607 South Mathews Avenue, MC 150
Urbana, IL 61801
Phone: 415-763-5476
AIM: jgrn307, MSN: jgrn...@hotmail.com, Gchat: jgrn307, Skype: jgrn3007
http://www.geog.illinois.edu/people/JonathanGreenberg.html

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to insert filename as column in a file

2012-04-23 Thread jim holtman

This might do it for you:


for (i in fileNames){
input - read.table(i, .)
   # you might want to use regular expressions to extract just the date.
input$fileName - i
write.table(i, )
}

On Mon, Apr 23, 2012 at 12:29 PM, Shivam shivamsi...@gmail.com wrote:
 Hi,

 I am relatively new to R. Have scourged the help files and the www but
 havent been able to get a solution.

 I have around 250 csv files, one file for each date. They have columns of
 all types, numeric, string etc. The name of each file is the date in the
 form of 'mmdd'. There is no column within the file which helps me
 identify the date on which the file was generated, only the filename has
 that info.

 I am selecting some data (using read.csv.sql) from each file and creating a
 dataset for each day. Ultimately I will combine all the datasets. I can
 accomplish the select and combine part, but after combining I wont have a
 record as to the date corresponding to the data.

 Hence I want to insert the filename as a column in the respective file to
 help me in identifying to what date each data row belongs to.

 Sorry for the long mail, but wanted to make myself clear. Any help would be
 greatly appreciated.

 Thanks in advance,
 Shivam

        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Selecting columns whose names contain mutated except when they also contain non or un

2012-04-23 Thread Paul Miller

Hi Bert,

Yes, code like:

x - names(yourdataframe)
grepl(muta,x)  !grepl(nonmuta|unmuta,x)

works perfectly.

Thanks very much for your help.

Paul




--- On Mon, 4/23/12, Bert Gunter gunter.ber...@gene.com wrote:

 From: Bert Gunter gunter.ber...@gene.com
 Subject: Re: [R] Selecting columns whose names contain mutated except when 
 they also contain non or un
 To: Paul Miller pjmiller...@yahoo.com
 Cc: David Winsemius dwinsem...@comcast.net, r-help@r-project.org
 Received: Monday, April 23, 2012, 12:15 PM
 But maybe ... (see below)
 -- Bert
 
 On Mon, Apr 23, 2012 at 9:25 AM, Paul Miller pjmiller...@yahoo.com
 wrote:
  Hello Dr. Winsemius,
 
  Unfortunately, I also have terms like krasmutated. So
 simply selecting words that start with muta won't work in
 this case.
 
  Thanks,
 
  Paul
 
 
  --- On Mon, 4/23/12, David Winsemius dwinsem...@comcast.net
 wrote:
 
  From: David Winsemius dwinsem...@comcast.net
  Subject: Re: [R] Selecting columns whose names
 contain mutated except when they also contain non or
 un
  To: Paul Miller pjmiller...@yahoo.com
  Cc: r-help@r-project.org
  Received: Monday, April 23, 2012, 11:16 AM
 
  On Apr 23, 2012, at 12:10 PM, Paul Miller wrote:
 
   Hello All,
  
   Started out awhile ago trying to select
 columns in a
  dataframe whose names contain some variation of the
 word
  mutant using code like:
  
   names(KRASyn)[grep(muta, names(KRASyn))]
  
   The idea then would be to add together the
 various
  columns using code like:
  
   KRASyn$Mutant_comb -
 rowSums(KRASyn[grep(muta,
  names(KRASyn))])
  
   What I discovered though, is that this selects
 columns
  like nonmutated and unmutated as well as
 columns like
  mutated, mutation, and mutational.
  
   So I'd like to know how to select columns that
 have
  some variation of the word mutant without the
 non or the
  un. I've been looking around for an example of
 how to do
  that but haven't found anything yet.
 
 If this **is** a complete specification then wouldn't
 simply:
 
 x - names(yourdataframe)
  grepl(muta,x)  !grepl(nonmuta|unmuta,x)
 
 do it?
 
 e.g.
  x -
 c(nonmutated,unmutated,mutation,mutated,krasmutated)
  grepl(muta,x)  !grepl(nonmuta|unmuta,x)
 [1] FALSE FALSE  TRUE  TRUE  TRUE
 
  
   Can anyone show me how to select the columns I
 need?
 
  If you want only columns whose names _begin_ with
 muta
  then add the ^ character at the beginning of
 your
  pattern:
 
  names(KRASyn)[grep(^muta, names(KRASyn))]
 
  (This should be explained on the ?regex page.)
 
  --
  David Winsemius, MD
  West Hartford, CT
 
 
 
  __
  R-help@r-project.org
 mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained,
 reproducible code.
 
 
 
 -- 
 
 Bert Gunter
 Genentech Nonclinical Biostatistics
 
 Internal Contact Info:
 Phone: 467-7374
 Website:
 http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] How to test if a slope is different than 1?

2012-04-23 Thread Mark Na

Dear R-helpers,

I would like to test if the slope corresponding to a continuous variable in
my model (summary below) is different than one.

I would appreciate any ideas for how I could do this in R, after having
specified and run this model?

Many thanks,

Mark Na



Call:
lm(formula = log(data$AB.obs + 1, 10) ~ log(data$SIZE, 10) +
   data$Y)

Residuals:
Min   1Q   Median   3Q  Max
-0.94368 -0.13870  0.04398  0.17825  0.63365

Coefficients:
  Estimate Std. Error t value  Pr(|t|)
(Intercept)-1.182820.09120 -12.9702e-16 ***
log(data$SIZE, 10)  0.560090.02564  21.8462e-16 ***
data$Y2008  0.168250.04366   3.854  0.000151 ***
data$Y2009  0.203100.04707   4.315 0.238 ***
---
Signif. codes:  0 *** 0.001 ** 0.01 * 0.05 . 0.1   1

Residual standard error: 0.2793 on 228 degrees of freedom
Multiple R-squared: 0.6768, Adjusted R-squared: 0.6726
F-statistic: 159.2 on 3 and 228 DF,  p-value:  2.2e-16

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Newbie Question on making subsets for every element of a table column

2012-04-23 Thread cyclondude

Hello, very new to R, playing with tables, and I am trying to do 

x - subset(data, columnlabel == x)

for every element in my column that I could find by using 

table (data [,columnlabel])

I'd appreciate any useful help and I'm sorry if I didn't get the terminology
perfect.  Thanks. 


--
View this message in context: 
http://r.789695.n4.nabble.com/Newbie-Question-on-making-subsets-for-every-element-of-a-table-column-tp4581228p4581228.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Assignment problems

Hi RUI 

Thank you so much ! I know I have a lot to learn : / sorry for that. 

If I want to make a new data.frame where it is the NONEURO avgflows. how do
I do that ?

Ph

--
View this message in context: 
http://r.789695.n4.nabble.com/Assignment-problems-tp4578672p4581164.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] how to cut files from any folder to another folder?

2012-04-23 Thread MacQueen, Don

see file.rename()

-- 
Don MacQueen

Lawrence Livermore National Laboratory
7000 East Ave., L-627
Livermore, CA 94550
925-423-1062





On 4/22/12 1:25 AM, sagarnikam123 sagarnikam...@gmail.com wrote:

i want to cut file from e.g. abc  folder  put it into another location
with folder name e.g. xyz
how should i proceed?

--
View this message in context:
http://r.789695.n4.nabble.com/how-to-cut-files-from-any-folder-to-another-
folder-tp4577818p4577818.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to insert filename as column in a file

2012-04-23 Thread MacQueen, Don

This little example might help.

 foo - data.frame(a=1:10, b=letters[1:0])
 foo
a b
1   1 a
2   2 a
3   3 a
4   4 a
5   5 a
6   6 a
7   7 a
8   8 a
9   9 a
10 10 a
 foo$date - '20120423'
 foo
a b date
1   1 a 20120423
2   2 a 20120423
3   3 a 20120423
4   4 a 20120423
5   5 a 20120423
6   6 a 20120423
7   7 a 20120423
8   8 a 20120423
9   9 a 20120423
10 10 a 20120423


In other words, immediately after reading the data into a data frame, add
a date column as in the example. You'll have to extract the date from the
filename, of course.

-Don


-- 
Don MacQueen

Lawrence Livermore National Laboratory
7000 East Ave., L-627
Livermore, CA 94550
925-423-1062





On 4/23/12 9:29 AM, Shivam shivamsi...@gmail.com wrote:

Hi,

I am relatively new to R. Have scourged the help files and the www but
havent been able to get a solution.

I have around 250 csv files, one file for each date. They have columns of
all types, numeric, string etc. The name of each file is the date in the
form of 'mmdd'. There is no column within the file which helps me
identify the date on which the file was generated, only the filename has
that info.

I am selecting some data (using read.csv.sql) from each file and creating
a
dataset for each day. Ultimately I will combine all the datasets. I can
accomplish the select and combine part, but after combining I wont have a
record as to the date corresponding to the data.

Hence I want to insert the filename as a column in the respective file to
help me in identifying to what date each data row belongs to.

Sorry for the long mail, but wanted to make myself clear. Any help would
be
greatly appreciated.

Thanks in advance,
Shivam

   [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] ggplot2 - geom_bar

2012-04-23 Thread Brian Diggs


On 4/23/2012 9:24 AM, Matthias Rieber wrote:

Hello,

I've some problem with the ggplot2. Here's a small example:

--8--
library(ggplot2)
molten- data.frame(date=c('01','01','01','01',
 '02','02','02','02'),
  channel=c('red','red','blue','blue',
'red','red','blue','blue'),
  product=c('product1','product2',
'product1','product2',
'product1','product2',
'product1','product2'),
  value=c(1,1,1,1,
  1,1,1,1))

str(molten)
molten

ggplot(molten, aes(date, weight = value, fill = channel)) +
   geom_bar(colour = I('black')) + facet_grid(product ~ .)
--8--

This gives the expected result (at least I expect this, see attachment
case1). When I change molten to:

molten- data.frame(date=c('01','01','01','01',
 '02','02','02','02'),
  channel=c('red','red','blue','blue',
'red','red','blue','blue'),
  product=c('product1','product2',
'product1','product2',
'product1','product1',
'product1','product2'),
  value=c(1,1,1,1,
  1,1,1,1))

I get a strange result(see case2). I expect that for date=02 and
product=product1 the bar should show 2 'red', but it's just 1. So the
total sum is 7 instead of 8.

When I change molten again:

molten- data.frame(date=c('01','01','01','01',
 '02','02','02','02'),
  channel=c('red','red','blue','blue',
'red','red','blue','blue'),
  product=c('product1','product2',
'product1','product2',
'product1','product1',
'product1','product2'),
  value=c(1,1,1,1,
  1,2,1,1))

I get the expected result, where the bar show 3 'red' and the total sum
is 9.

Is it wrong to use geom_bar with that kind of data? I could avoid this
issue when I cast the data.frame, but I like to avoid that solution.


There is nothing wrong with using bars with this sort of data. There is 
a bug in the faceting code of 0.9.0 that will be fixed in 0.9.1 (see 
https://github.com/hadley/ggplot2/issues/443 ) which caused duplicate 
rows of data to be dropped when there was faceting. That is what you are 
seeing in the second example; row 6 is identical to row 7 and is dropped 
before plotting.  One easy workaround until 0.9.1 comes out is to add 
unique column to the data that is otherwise ignored:


molten - data.frame(date=c('01','01','01','01',
'02','02','02','02'),
 channel=c('red','red','blue','blue',
   'red','red','blue','blue'),
 product=c('product1','product2',
   'product1','product2',
   'product1','product1',
   'product1','product2'),
 value=c(1,1,1,1,
 1,1,1,1),
 dummy=1:8)



Matthias


--
Brian S. Diggs, PhD
Senior Research Associate, Department of Surgery
Oregon Health  Science University

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] .rda vs. .RData

2012-04-23 Thread Shi, Tao

Are they the same with .RData being the newer format?  Thanks,

...Tao


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Assignment problems

2012-04-23 Thread Rui Barradas


 If I want to make a new data.frame where it is the NONEURO avgflows. how
 do I do that ?
 

Exactly like above, but without the negation (the exclamation mark).
You must also start to use the help system, for instance:

 ?!

Rui Barradas


--
View this message in context: 
http://r.789695.n4.nabble.com/Assignment-problems-tp4578672p4581434.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to insert filename as column in a file

2012-04-23 Thread Shivam

Thanks for the quick response. It works for an individual dataframe, but I
have many dataframes. This is the code so far

fnames = list.files(path = getwd())
for (i in 1:length(fnames)){
assign(paste(file,i,sep=),read.csv.sql(fnames[i], sql = select * from
file where V3 == 'XXX' and V5=='YYY',header = FALSE, sep= '|', eol = \n))
}

This generates dataframes named as as file1,file2,...,file250. Is there a
way to do something like below within the same loop?

file1$date = substr(fnames[1],1,8))
file2$date = substr(fnames[2],1,8))
.
.
file250$date = substr(fnames[250],1,8))

assign(paste(file,i,sep=)$date doesnt work.

Any help?





On Tue, Apr 24, 2012 at 12:01 AM, MacQueen, Don macque...@llnl.gov wrote:

 This little example might help.

  foo - data.frame(a=1:10, b=letters[1:0])
  foo
a b
 1   1 a
 2   2 a
 3   3 a
 4   4 a
 5   5 a
 6   6 a
 7   7 a
 8   8 a
 9   9 a
 10 10 a
  foo$date - '20120423'
  foo
a b date
 1   1 a 20120423
 2   2 a 20120423
 3   3 a 20120423
 4   4 a 20120423
 5   5 a 20120423
 6   6 a 20120423
 7   7 a 20120423
 8   8 a 20120423
 9   9 a 20120423
 10 10 a 20120423


 In other words, immediately after reading the data into a data frame, add
 a date column as in the example. You'll have to extract the date from the
 filename, of course.

 -Don


 --
 Don MacQueen

 Lawrence Livermore National Laboratory
 7000 East Ave., L-627
 Livermore, CA 94550
 925-423-1062





 On 4/23/12 9:29 AM, Shivam shivamsi...@gmail.com wrote:

 Hi,
 
 I am relatively new to R. Have scourged the help files and the www but
 havent been able to get a solution.
 
 I have around 250 csv files, one file for each date. They have columns of
 all types, numeric, string etc. The name of each file is the date in the
 form of 'mmdd'. There is no column within the file which helps me
 identify the date on which the file was generated, only the filename has
 that info.
 
 I am selecting some data (using read.csv.sql) from each file and creating
 a
 dataset for each day. Ultimately I will combine all the datasets. I can
 accomplish the select and combine part, but after combining I wont have a
 record as to the date corresponding to the data.
 
 Hence I want to insert the filename as a column in the respective file to
 help me in identifying to what date each data row belongs to.
 
 Sorry for the long mail, but wanted to make myself clear. Any help would
 be
 greatly appreciated.
 
 Thanks in advance,
 Shivam
 
[[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
*Victoria Concordia Crescit*

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] write a png inside a pdf for large graphics?

2012-04-23 Thread baptiste auguie

Hi,

On 24 April 2012 05:00, Duncan Murdoch murdoch.dun...@gmail.com wrote:
 On 23/04/2012 10:49 AM, Adam Wilson wrote:

 I routinely write graphics into multi-page PDFs, but some graphics (i.e.
 plots of large spatial datasets using levelplot()) can result in enormous
 files.  I'm curious if there is a better way.  For example:

 #First, make some data:
 library(lattice)
 d=expand.grid(x=1:1000,y=1:1000)
 d$z=rnorm(nrow(d))

 #Now, the PDF.  The following produces a PDF that's ~50MB.
 pdf(width=11,height=8.5,file=test1.pdf)
 levelplot(z~x*y,data=d)
 dev.off()

 #If you write the same graphic to a png with reasonable resolution, the
 file size is ~500k:
 png(width=1024,height=768,file=test1.png)
 levelplot(z~x*y,data=d)
 dev.off()

 #  I would prefer to embed a png (or other raster format) inside a PDF
 directly from R.
 #  Is this possible?  I'm looking for some way to achieve something like
 the following (of course this doesn't work):
 pdf(width=11,height=8.5,file=test1.pdf)
      png(width=1024,height=768,file=current device)
              levelplot(z~x*y,data=d)
      dev.off()
 dev.off()


 Of course the PDF preserves vector scalability, but there are times it's
 not worth the extra file size.  And you can write out the png's as
 separate
 files and then merge them with imagemagick or ghostscript.  I currently
 get
 around this by writing the graphics to a potentially very large (100MB)
 PDF, then use ghostscript to convert *only* the large pages of the pdf to
 png and put it back together as a PDF (a function I wrote for this is
 described here:
 http://planetflux.adamwilson.us/2010/06/shrinking-rs-pdf-output.html).

 I'm curious if there is a way to do it directly by instructing R to write
 a
 png and embed it within the already open PDF device.  Any ideas?


 I haven't tried this, but rasterImage() can plot to PDF.  So you just need
 to get your PNG display into a raster image.

There's a corresponding panel.levelplot.raster function in lattice. It
usually results in smaller files than using rectangular tiles, and
it's also faster.

HTH,

baptiste


 Duncan Murdoch


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] automating a script to read a file

2012-04-23 Thread Steve_Friedman


Hi,


The following script (which I did not develop) is used to calculate and
plot a skewed normal curve.  The script currently requires the user to
input six parameters, rather than reading these directly from a file.

I've been spinning wheels here, trying to figure out how to modify the
script to automate it.  I have four data sets, each in excess of 300
records that I need to process.

My initial thoughts were to use the  lapply and use a pdf graphic device to
capture the plots to do this, but my R programming skills are too limited
to determine how to best accomplish this.

If any one can provide assistance I would appreciate the help.


Thanks,

Steve


## Function set to find values in a skewed normal distribution

print(syntax:  plot.spdf(min, max, skewlocation, skewscale, skewshape,
skewmax, skewtitle))
flush.console()

#  sample input data could be the following:
#-100, 1000, 976.02, 230, -34, 0.7543
# 0,  500,  270, 350, -13, 0.7707
#or any other data of similar form


erf - function(z) {
## Chebyshev fitting formula for erf(z) from
##  Vetterling, W.T., , W.H. Press,  S.A. Teukolsky, and B.P.
Flannery. 1999.
##  Numerical Recipes: Example Book [C], Second Edition.
##  Cambridge University Press, NY. , Chapter 6-2.

  t - 1.0/(1.0 + 0.5 * abs(z))
  ## use Horner's method
   ans - (1 - t * exp(-z * z - 1.26551223
+ t * (1.2368
+ t * (0.37409196
+ t * (0.09678418
+ t * (-0.18628806
+ t * (0.27886807
+ t * (-1.13520398
+ t * (1.48851587
+ t * (-0.82215223
+ t * (0.17087277)))
  if (z = 0) return(ans) else return(-1 * ans)
}

pdf - function(x) {
  return((1 / sqrt(2 * pi)) * exp(-1 * ((x * x) / 2)))
}

cdf - function(x) {
  return( 0.5 * (1 + erf(x / sqrt(2
}

spdf - function(x, skewlocation, skewscale, skewshape) {
  xmod - (x - skewlocation) / skewscale
  return( 2 * pdf(xmod) * (cdf(xmod * skewshape)))
}


 Plotting Function 
plot.spdf - function(xmin, xmax, skewlocation, skewscale, skewshape,
skewmax, skewtitle) {

if(missing(skewtitle)) {
plottitle - Skewed Probability Density Function
} else {
plottitle - skewtitle
  }

  skip - (xmax - xmin) / 100.0
  xArray - numeric(100)
  yArray - numeric(100)

  for (i in 1:100){
x - xmin + i * skip
y - (spdf(x, skewlocation, skewscale, skewshape))/skewmax
xArray[i] - x
yArray[i] - y
  }

  plot(xArray,yArray, main=plottitle)

}





Steve Friedman Ph. D.
Ecologist  / Spatial Statistical Analyst
Everglades and Dry Tortugas National Park
950 N Krome Ave (3rd Floor)
Homestead, Florida 33034

steve_fried...@nps.gov
Office (305) 224 - 4282
Fax (305) 224 - 4147

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] need advice on using excel to check data for import into R

2012-04-23 Thread Greg Snow

This is really a job for a database, and Excel is not a database (even
though many think it is).  I have some clients that I have convinced
to create an Access database rather than use Excel (still MS product
so it can't be that scary, right?).  They were often a little
reluctant at first because they would be using a new tool, and they
actually had to think about the design of the database up front, but
once they got to serious data entry they were very grateful for me
directing them to Access over Excel.  Databases have tools to validate
data on entry so there will be fewer cases where you need to ask them
for corrections (and it will be easier for them to fix any problems
that do sneak through).

On Sun, Apr 22, 2012 at 12:34 PM, Markus Weisner r...@themarkus.com wrote:
 I have created an S4 object type for conducting fire department data
 analysis.  The object includes validity check that ensures certain fields
 are present and that duplicate records don't exist for certain combinations
 of columns (e.g. no duplicate incident number / incident data / unit ID
 ensures that the data does not show the same fire engine responding twice
 on the same call).

 I am finding that I spend a lot of time taking client data, converting it
 to my S4 object, and then sending it back to the client to correct data
 validity issues.

 I am trying to figure out a clever way to have excel (typically the program
 used by my clients) check client data prior to them submitting it to me.  I
 have been working with somebody on trying to develop an excel toolbar
 add-in with limited success.

 My question is whether anybody can think of clever alternatives for clients
 to validate their data … for example, is their a R excel plugin (that would
 be easily installed by a client) where I might be able write some lines of
 R to check the data and output messages … or maybe some sort of server
 where they could upload their data and I could have some lines of R code
 that would check the code and send back potential error messages?

 I realize this is a fairly open ended question … just looking for some
 general ideas and directions to go. Getting a little frustrated with
 spending most of my work time dealing with data cleaning issues … guessing
 this is a problem shared by many of us that use R!

 Thanks,
 Markus

        [[alternative HTML version deleted]]


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Gregory (Greg) L. Snow Ph.D.
538...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Assignment problems

Thank you!

Do you know why ifelse() sometimes returns NULL ?

--
View this message in context: 
http://r.789695.n4.nabble.com/Assignment-problems-tp4578672p4581491.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Assignment problems

2012-04-23 Thread Berend Hasselman


On 23-04-2012, at 21:37, phillip03 wrote:

 Thank you!
 
 Do you know why ifelse() sometimes returns NULL ?


Please provide a reproducible example for this phenomenon.

Berend

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Newbie Question on making subsets for every element of a table column

2012-04-23 Thread Petr Savicky

On Mon, Apr 23, 2012 at 10:58:26AM -0700, cyclondude wrote:
 Hello, very new to R, playing with tables, and I am trying to do 
 
 x - subset(data, columnlabel == x)
 
 for every element in my column that I could find by using 
 
 table (data [,columnlabel])

Hi.

The following may be close to what you require.

  #prepare some data
  dat - expand.grid(v1=letters[1:3], v2=1:3)
  dat

v1 v2
  1  a  1
  2  b  1
  3  c  1
  4  a  2
  5  b  2
  6  c  2
  7  a  3
  8  b  3
  9  c  3

  out - split(dat, dat$v1)

  #the first two groups are
  out[[1]]

v1 v2
  1  a  1
  4  a  2
  7  a  3

  out[[2]]

v1 v2
  2  b  1
  5  b  2
  8  b  3

Hope this helps.

Petr Savicky.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] automating a script to read a file

2012-04-23 Thread Petr Savicky

On Mon, Apr 23, 2012 at 04:02:45PM -0400, steve_fried...@nps.gov wrote:
 
 Hi,
 
 
 The following script (which I did not develop) is used to calculate and
 plot a skewed normal curve.  The script currently requires the user to
 input six parameters, rather than reading these directly from a file.
 
 I've been spinning wheels here, trying to figure out how to modify the
 script to automate it.  I have four data sets, each in excess of 300
 records that I need to process.
 
 My initial thoughts were to use the  lapply and use a pdf graphic device to
 capture the plots to do this, but my R programming skills are too limited
 to determine how to best accomplish this.

Hi.

If you read the parameters from a file and put them to a matrix,
then all the plots may be produced using a loop like the following.

  #some parameters
  p - matrix(1:18, nrow=3, ncol=6)
  for (i in 1:nrow(p)) {
  plot.spdf(p[i, 1], p[i, 2], p[i, 3], p[i, 4], p[i, 5], p[i, 6])
  readline(press Enter to continue)
  }

If you use pdf() for sending the graphics to a file, then remove
the readline command.

Hope this helps.

Petr Savicky.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] [R-pkgs] new version of QCAGUI

2012-04-23 Thread Adrian Duşa

Dear All,

I have just submitted a new version of the QCAGUI package on CRAN, it
should be propagated in a couple of days.
This version is nothing but a quick update to the latest Rcmdr base
package, and works (as usual) with the QCA package up to version 0.6-5

For the later versions of the QCA package, I will start adapting the
GUI in the shortest time possible.

Best wishes,
Adrian

-- 
Adrian Dusa
Romanian Social Data Archive
1, Schitu Magureanu Bd.
050025 Bucharest sector 5
Romania
Tel.:+40 21 3126618 \
       +40 21 3120210 / int.101
Fax: +40 21 3158391

___
R-packages mailing list
r-packa...@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-packages
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Selecting columns whose names contain mutated except when they also contain non or un

2012-04-23 Thread Greg Snow

Here is a method that uses negative look behind:

 tmp - c('mutation','nonmutated','unmutated','verymutated','other')
 grep((?!un)(?!non)muta, tmp, perl=TRUE)
[1] 1 4

it looks for muta that is not immediatly preceeded by un or non (but
it would match unusually mutated since the un is not immediatly
befor the muta).

Hope this helps,

On Mon, Apr 23, 2012 at 10:10 AM, Paul Miller pjmiller...@yahoo.com wrote:
 Hello All,

 Started out awhile ago trying to select columns in a dataframe whose names 
 contain some variation of the word mutant using code like:

 names(KRASyn)[grep(muta, names(KRASyn))]

 The idea then would be to add together the various columns using code like:

 KRASyn$Mutant_comb - rowSums(KRASyn[grep(muta, names(KRASyn))])

 What I discovered though, is that this selects columns like nonmutated and 
 unmutated as well as columns like mutated, mutation, and mutational.

 So I'd like to know how to select columns that have some variation of the 
 word mutant without the non or the un. I've been looking around for an 
 example of how to do that but haven't found anything yet.

 Can anyone show me how to select the columns I need?

 Thanks,

 Paul

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Gregory (Greg) L. Snow Ph.D.
538...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to test if a slope is different than 1?

2012-04-23 Thread Greg Snow

One option is to subtract the continuous variable from y before doing
the regression (this works with any regression package/function).  The
probably better way in R is to use the 'offset' function:

formula = I(log(data$AB.obs + 1, 10)-log(data$SIZE,10)) ~
log(data$SIZE, 10) + data$Y
formula = log(data$AB.obs + 1) ~ offset( log(data$SIZE,10) ) +
log(data$SIZE,10) + data$Y

Or you can use a function like 'confint' to find the confidence
interval for the slope and see if 1 is in the interval.

On Mon, Apr 23, 2012 at 12:11 PM, Mark Na mtb...@gmail.com wrote:
 Dear R-helpers,

 I would like to test if the slope corresponding to a continuous variable in
 my model (summary below) is different than one.

 I would appreciate any ideas for how I could do this in R, after having
 specified and run this model?

 Many thanks,

 Mark Na



 Call:
 lm(formula = log(data$AB.obs + 1, 10) ~ log(data$SIZE, 10) +
   data$Y)

 Residuals:
    Min       1Q   Median       3Q      Max
 -0.94368 -0.13870  0.04398  0.17825  0.63365

 Coefficients:
                  Estimate Std. Error t value  Pr(|t|)
 (Intercept)        -1.18282    0.09120 -12.970    2e-16 ***
 log(data$SIZE, 10)  0.56009    0.02564  21.846    2e-16 ***
 data$Y2008          0.16825    0.04366   3.854  0.000151 ***
 data$Y2009          0.20310    0.04707   4.315 0.238 ***
 ---
 Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

 Residual standard error: 0.2793 on 228 degrees of freedom
 Multiple R-squared: 0.6768,     Adjusted R-squared: 0.6726
 F-statistic: 159.2 on 3 and 228 DF,  p-value:  2.2e-16

        [[alternative HTML version deleted]]


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Gregory (Greg) L. Snow Ph.D.
538...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Assignment problems

Hi

 EMU1993-(for (i in 1:nrow(data)){
+   ifelse(year==1992,sum(avgflowEMU),0)
+   }) 

EMU1993 
NULL

--
View this message in context: 
http://r.789695.n4.nabble.com/Assignment-problems-tp4578672p4581590.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Assignment problems

That's not the ifelse() that's the for loop returning NULL
(everything's a function!). If you put the assignment inside you'll
get expected behavior.

x - (for(i in 1:5) i) # Strange
for(i in 1:5) x- i # Normal (but notice you only get the last value
because previous ones are overwritten)

Michael

On Mon, Apr 23, 2012 at 4:26 PM, phillip03 phillipbrig...@hotmail.com wrote:
 Hi

 EMU1993-(for (i in 1:nrow(data)){
 +       ifelse(year==1992,sum(avgflowEMU),0)
 +       })

EMU1993
 NULL

 --
 View this message in context: 
 http://r.789695.n4.nabble.com/Assignment-problems-tp4578672p4581590.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Assignment problems



On Apr 23, 2012, at 5:21 PM, R. Michael Weylandt wrote:


That's not the ifelse() that's the for loop returning NULL
(everything's a function!). If you put the assignment inside you'll
get expected behavior.

x - (for(i in 1:5) i) # Strange
for(i in 1:5) x- i # Normal (but notice you only get the last value
because previous ones are overwritten)



Better would be to avoid the for-loop altogether. And phillip03 should  
note: The for-loop does not create an environment where column names  
are interpreted as object names and I'm guessing that 'avgflowEMU' is  
not an object but rather a column name.


Depending on the unstated goal (and quite a few other unstated  
concerns), something like this might make more sense:


EMU1993-sum( data[ data$year==1993,  avgflowEMU ], na.rm=TRUE)

phillip03: please stop using 'data' as an object name. It is also the  
name of an R function.

--
David.


Michael

On Mon, Apr 23, 2012 at 4:26 PM, phillip03  
phillipbrig...@hotmail.com wrote:

Hi


EMU1993-(for (i in 1:nrow(data)){

+   ifelse(year==1992,sum(avgflowEMU),0)
+   })


EMU1993

NULL



David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Newbie Question on making subsets for every element of a table column

2012-04-23 Thread cyclondude

Yes. That is what I was looking for.  Is there a simple way to (in this
scenario)


 out[[1]]

v1 v2
  1  a  1
  4  a  2
  7  a  3 

 a - out[[1]]

for each one?

Thanks!

--
View this message in context: 
http://r.789695.n4.nabble.com/Newbie-Question-on-making-subsets-for-every-element-of-a-table-column-tp4581228p4581775.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] automating a script to read a file

2012-04-23 Thread Steve_Friedman

Petr,

Thank you very much this works.  A little more tweaking and I'll have what
I need.

Thanks


Steve Friedman Ph. D.
Ecologist  / Spatial Statistical Analyst
Everglades and Dry Tortugas National Park
950 N Krome Ave (3rd Floor)
Homestead, Florida 33034

steve_fried...@nps.gov
Office (305) 224 - 4282
Fax (305) 224 - 4147


   
 Petr Savicky  
 savi...@cs.cas.c 
 z To 
 Sent by:  r-help@r-project.org
 r-help-bounces@r-  cc 
 project.org   
   Subject 
   Re: [R] automating a script to read 
 04/23/2012 04:42  a file  
 PM
   
   
   
   
   




On Mon, Apr 23, 2012 at 04:02:45PM -0400, steve_fried...@nps.gov wrote:

 Hi,


 The following script (which I did not develop) is used to calculate and
 plot a skewed normal curve.  The script currently requires the user to
 input six parameters, rather than reading these directly from a file.

 I've been spinning wheels here, trying to figure out how to modify the
 script to automate it.  I have four data sets, each in excess of 300
 records that I need to process.

 My initial thoughts were to use the  lapply and use a pdf graphic device
to
 capture the plots to do this, but my R programming skills are too limited
 to determine how to best accomplish this.

Hi.

If you read the parameters from a file and put them to a matrix,
then all the plots may be produced using a loop like the following.

  #some parameters
  p - matrix(1:18, nrow=3, ncol=6)
  for (i in 1:nrow(p)) {
  plot.spdf(p[i, 1], p[i, 2], p[i, 3], p[i, 4], p[i, 5], p[i, 6])
  readline(press Enter to continue)
  }

If you use pdf() for sending the graphics to a file, then remove
the readline command.

Hope this helps.

Petr Savicky.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Newbie Question on making subsets for every element of a table column

There are, but it's generally considered better style to keep them all
in a single list and use lapply() if you want to do things to each
element.

Michael

On Mon, Apr 23, 2012 at 5:33 PM, cyclondude hans.thomps...@gmail.com wrote:
 Yes. That is what I was looking for.  Is there a simple way to (in this
 scenario)


 out[[1]]

    v1 v2
  1  a  1
  4  a  2
  7  a  3

 a - out[[1]]

 for each one?

 Thanks!

 --
 View this message in context: 
 http://r.789695.n4.nabble.com/Newbie-Question-on-making-subsets-for-every-element-of-a-table-column-tp4581228p4581775.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Can I specify POSIX[cl]t column classes inside read.csv?

2012-04-23 Thread Jeff Newmiller

I recommend not putting POSIXlt vectors in data frames because of memory use 
and added complexity of the resulting data frame.  That is, use

 colClasses = c('character', 'POSIXct', 'POSIXct') 

instead. The POSIXlt values will still be created as temporary variables for 
reading in, but the data frame will store only the simpler and more compact 
type for later use.
---
Jeff NewmillerThe .   .  Go Live...
DCN:jdnew...@dcn.davis.ca.usBasics: ##.#.   ##.#.  Live Go...
  Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
/Software/Embedded Controllers)   .OO#.   .OO#.  rocks...1k
--- 
Sent from my phone. Please excuse my brevity.

David Winsemius dwinsem...@comcast.net wrote:


On Apr 23, 2012, at 11:48 AM, Thomas Levine wrote:

 I'm loading a nicely formatted csv file.

 #!/usr/bin/env Rscript
 kpi - read.csv(
   # This is a dump of the username, date_joined and last_login  
 columns
   # from the auth_user Django table.
   'data/2012-04-23.csv',
   colClasses = c('character')
 )
 print(kpi[sample(nrow(kpi), 3),2:3])

 Here's what the three rows I printed look like.

  last_login date_joined
 2012-02-22 02:44:11 2011-09-19 03:07:35
 2011-09-16 01:34:41 2011-09-16 01:34:41
 2011-07-02 20:29:17 2011-07-02 20:29:17

 Once I load them, I'm converting the datetimes to datetimes.

 kpi$last_login - as.POSIXlt(kpi$last_login)
 kpi$date_joined - as.POSIXlt(kpi$date_joined)

 Can I do this inside of read.csv by specifying colClasses?


Possibly. If there is an as function for a particular class, it can  
be used in the colClasses vector of read.* functions. It appears that  
your input file might have the right combination of formats and  
separators for this to succeed.



 It's
 obviously not a problem if I can't; it just seems like I should be
 able to.

 Note that the following doesn't work because it doesn't save the  
 times.

 colClasses = c('character', 'Date', 'Date')


Try instead:

colClasses = c('character', 'POSIXlt', 'POSIXlt')


-- 

David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] .rda vs. .RData

2012-04-23 Thread Jeff Newmiller

No, RData saves both the variable name and corresponding content of multiple 
variables.  rda saves content of one variable, with no associated name.  The 
latter allows for greater flexibility in importing the data later into 
different working environments, the former is convenient for recreating a 
particular working environment.
---
Jeff NewmillerThe .   .  Go Live...
DCN:jdnew...@dcn.davis.ca.usBasics: ##.#.   ##.#.  Live Go...
  Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
/Software/Embedded Controllers)   .OO#.   .OO#.  rocks...1k
--- 
Sent from my phone. Please excuse my brevity.

Shi, Tao shida...@yahoo.com wrote:

Are they the same with .RData being the newer format?  Thanks,

...Tao


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] summing two probability density functions from Gompertz hazard model

2012-04-23 Thread piltdownpunk

Hi, r-help members.

I have a question about summing two density distributions.  I have two
samples from which I've estimated hazard parameters for a Gompertz mortality
model.  With those parameters, I can calculate the PDF (survival function
times hazard function) of ages-at-death in a birth cohort subject to the
hazard function at each age.  I'd like to combine these two density
functions and recalculate the resultant hazard parameters that describe the
combined group.  Is there any way to do so?  Unfortunately, the two
datasets, from which I estimated the initial parameters, are not in the same
format, so I can't just combine counts of individuals by age or age category
and calculate the parameters that way.   I've included my functions for the
two distributions below.  Any suggestions are welcome.  Thanks.

--Trey

group1- function (t){
x=c(0.05893007, 0.03339980)
a3-x[1]
b3-x[2]
shift-15

S.t - exp(a3/b3*(1-exp(b3*(t-shift
h.t - a3*exp(b3*(t-shift))
return-S.t*h.t}

plot(seq(15,80,1),group1(seq(15,80,1)),type='l',xlab='Age
years)',ylim=c(0,0.12))  # plot age-at-death distribution for group 1


group2- function (t){
x=c(0.05920472, 0.01128975)
a3-x[1]
b3-x[2]
shift-15

S.t - exp(a3/b3*(1-exp(b3*(t-shift
h.t - a3*exp(b3*(t-shift))
return-S.t*h.t}

plot(seq(15,80,1),group2(seq(15,80,1)),type='l',xlab='Age
(years)',ylim=c(0,0.12))   # plot age-at-death distribution for group 2

-
Trey Batey---Anthropology Instructor
Division of Social Sciences
Mt. Hood Community College
Gresham, OR  97030
Alt. Email:  trey.batey[at]mhcc[dot]edu
--
View this message in context: 
http://r.789695.n4.nabble.com/summing-two-probability-density-functions-from-Gompertz-hazard-model-tp4581793p4581793.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Bivariate Von Mises Distributions

2012-04-23 Thread Cedric Neumann

All, 

I am trying to estimate the parameters of a bivariate Von Mises distributions. 
I am looking for somebody to point me in the direction of an R package or 
function that does this. I have noted the existing packages that allow for 
obtaining the density values once the parameters have been estimated, but I 
have not found anything related to the estimation of those parameters. 

I have found some java code at 
http://www.stat.tamu.edu/~dahl/software/cortorgles/. I have tried to make it 
work with rJava but I always end up with an error:

java.lang.NoClassDefFoundError: BivariateVonMises

Hope somebody can help

Thanks

Cedric
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] change color scheme in mvpart

2012-04-23 Thread leanne heisler


Hello everyone, I am currently using the mvpart package and would like to 
change the color scheme it uses, and was hoping someone could help me out. All 
of the papers I have found have used a grayscale but I can't seem to figure out 
how they did that! Currently, mvpart plots barplots in a repeating sequence of 
3 shades of blue. So if you have 6 response variables the same shade of blue is 
used to represent two different response variables. I would like to use 
grayscale and a different shade of gray for each response variable (I have 7). 
However, the color is more important so if I can only use 3 shades of gray 
thats fine. Thank you!!

---

Leanne Heisler

Graduate Student

Department of Biology

University of Regina
  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] .rda vs. .RData

2012-04-23 Thread Ista Zahn

Hi Jeff.

Can you point me toward the documentation for rda saves content of
one variable, with no associated name? I don't seem to find it in
?save ?load etc.

Thanks,
Ista

On Mon, Apr 23, 2012 at 5:58 PM, Jeff Newmiller
jdnew...@dcn.davis.ca.us wrote:
No, RData saves both the variable name and corresponding content of multiple
variables. rda saves content of one variable, with no associated name. The
latter allows for greater flexibility in importing the data later into
different working environments, the former is convenient for recreating a
particular working environment.
---
Jeff Newmiller The . . Go Live...
DCN:jdnew...@dcn.davis.ca.us Basics: ##.#. ##.#. Live Go...
Live: OO#.. Dead: OO#.. Playing
Research Engineer (Solar/Batteries O.O#. #.O#. with
/Software/Embedded Controllers) .OO#. .OO#. rocks...1k
---
Sent from my phone. Please excuse my brevity.

Shi, Tao shida...@yahoo.com wrote:

Are they the same with .RData being the newer format? Thanks,

...Tao

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] nnet multinom question

2012-04-23 Thread Moritaka Hosotsubo


I'd like to fit a multinomial log-linear model for 4 categories of the form:

 log[(P(D=i | x)/P(D=0 | x)] = alpha + beta_i x for i = 1,2,3.

Is there a way to impose such a constraint in the multinom function of nnet
or another function of some library?

regards,

Hosotsubo

--
Moritaka Hosotsubo

National Institute for Science and Technology Policy, MEXT, Japan.

East Bulidg.16F, Central Government Building No.7,
3-2-2 Kasumigaseki, Chiyoda-ku, Tokyo, 100-0013
E-mail: hosot...@nistep.go.jp
FAX: +81-3-3503-3996

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] zipfR help

2012-04-23 Thread meh L

Hi,

I have a question on generating random variables based on zipf-mandelbrot
distribution.

So when I execute the following lines:

ZM = lnre (zm, alpha = 2/3, B=0.1)
zmsample = rlnre (ZM, n =100)
zmsample

It generates 100 random values based on a zipf-mandelbrot distribution as
below.  But how do I make sure the generated random number is within the
range of 1 - 6000 only?  Can I include that as one of the parameters in
rlnre?

Thank you for your help!


[1] 594396  1435 224
  [9] 611   11   518   82
13
 [17] 3533   10   136
17
 [25] 218   44438141  155  740009   12
2
 [33] 7977 17   10   68   90   25   27
2
 [41] 537   107  7330 6119  2
102
 [49] 130   654228
7
 [57] 68396 1169 66   361  22
665
 [65] 20   1825 925   141  731
56
 [73] 2413520   20   29   513   6
7
 [81] 2579 421   1112   67800
4
 [89] 3154573   13   29   111   48
120
 [97] 550  13   15   305  41   1133
178
[105] 24   476  612   116   27
22
[113] 20   19   17   75   14   1137 3
36
[121] 7914   14   119   257
69
[129] 230  7184   911   2
82
[137] 343  193  4247   817  40
4
[145] 37   10   439   15   397
69
[153] 186  552   36   99119
310
[161] 5777  412212 1456 38
11
[169] 8620   46   362  1407 5
10
[177] 5940 16822345737 59   13
329
[185] 96   379  55   86   16   568  9
23
[193] 34188  20   12292
7
[201] 9217   104653115
1040
[209] 350  72   52711   57
16
[217] 18   278  31   17   80   11   5
211
[225] 32137  75160  4
47
[233] 623   104  13   83   62   12
178
[241] 216   46115   7
5
[249] 2626   241  552
1
[257] 18   40   817   294   122
2
[265] 227   76   263  22515 232
441
[273] 320   6132  191  48   12
57
[281] 13   646   319   45   5
34
[289] 715   50   153  4949 52
8
[297] 44   2312   281  4125
1
[305] 491294 28   1126  4
9
[313] 183  805  9337   1087431
12741
[321] 21   58   22494   1
3
[329] 93   216  15   1575 22   20   1
1
[337] 10   26   4313   50   76
93
[345] 10   28   140  41906 15   30
1
[353] 215   24704 825   14
7
[361] 159  6191  63812
14
[369] 13   53   11   9109  39   3
1
[377] 20   91   13   11917 222  19
1
[385] 19   940   54   267   5
1
[393] 18   28   103090   3024 1601 15
1
[401] 12   422   22260  11
25004832
[409] 7455   1249 5168  115
11
[417] 723332 3796 38   228
12
[425] 235079295  58   384   12
78057
[433] 22178  11   352  13
23
[441] 117  148  10   91   10   12   2
79
[449] 67   4647   4395  10
21
[457] 26   73399 41313
33
[465] 870   13   15   201  18   1623
140
[473] 2658 40   13   96127
605
[481] 26   4248  663  42307
67
[489] 34740 129  2466 136  14   144
1
[497] 21   16   3271

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] check for difference.

Lists of numbers of length ~1000 are no problem for the wilcox.test()
function (Mann-Whitney is a special case) if you leave the default
exact = NULL.

The choice of test is all yours.

Michael

On Mon, Apr 23, 2012 at 12:39 PM, aoife doherty aaral.si...@gmail.com wrote:
 Hello
 I have two lists of numbers, each list is ~800 numbers long. I want to know
 if the two lists are significantly different from each other.
 Could anyone suggest what library in R to use?

 I think maybe the mann-whitney test, as it is not parametric, but i am
 unsure if it is suitable as my list of items are so long.So i am unsure
 which library would suit best.

 Aaral.

        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] .rda vs. .RData

2012-04-23 Thread Jeff Newmiller

My bad, I was thinking of rds (?saveRDS). RData and rda are mentioned under 
?data as alternate file extensions for the same data format. Sorry for posting 
without checking first.
---
Jeff NewmillerThe .   .  Go Live...
DCN:jdnew...@dcn.davis.ca.usBasics: ##.#.   ##.#.  Live Go...
  Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
/Software/Embedded Controllers)   .OO#.   .OO#.  rocks...1k
--- 
Sent from my phone. Please excuse my brevity.

Shi, Tao shida...@yahoo.com wrote:

I would like to know that too.  Many thanks.

Tao



- Original Message -
 From: Ista Zahn istaz...@gmail.com
 To: Jeff Newmiller jdnew...@dcn.davis.ca.us
 Cc: Shi, Tao shida...@yahoo.com; r-help@r-project.org
r-help@r-project.org
 Sent: Monday, April 23, 2012 3:55 PM
 Subject: Re: [R] .rda vs. .RData
 
 Hi Jeff.
 
 Can you point me toward the documentation for rda saves content of
 one variable, with no associated name? I don't seem to find it in
 ?save ?load etc.
 
 Thanks,
 Ista
 
 On Mon, Apr 23, 2012 at 5:58 PM, Jeff Newmiller
 jdnew...@dcn.davis.ca.us wrote:
  No, RData saves both the variable name and corresponding content of

 multiple variables.  rda saves content of one variable, with no
associated name. 
  The latter allows for greater flexibility in importing the data
later into 
 different working environments, the former is convenient for
recreating a 
 particular working environment.
 
---
  Jeff Newmiller                        The     .       .  Go
Live...
  DCN:jdnew...@dcn.davis.ca.us        Basics: ##.#.       ##.#.
 Live 
 Go...
                                       Live:   OO#.. Dead: OO#..
 Playing
  Research Engineer (Solar/Batteries            O.O#.       #.O#.
 with
  /Software/Embedded Controllers)               .OO#.       .OO#.
 rocks...1k
 
---
  Sent from my phone. Please excuse my brevity.
 
  Shi, Tao shida...@yahoo.com wrote:
 
 Are they the same with .RData being the newer format?  Thanks,
 
 ...Tao
 
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] plot function creating bars instead of lines

It is indeed the fact you're plotting factors, but unless you say what
as intended is, it's hard to provide exactly what you're seeking.
Perhaps this will help though:

X - factor(sample(letters[1:5], 15, TRUE))
Y - rnorm(15)

dats - data.frame(X, Y)

plot(Y ~ X, data = dats) # No good
plot(X ~ Y, data = dats) # Also probably not what you want

plot(Y ~ as.numeric(X), data = dats) # Good but ugly lables

plot(Y ~ as.numeric(X), data = dats, xaxt = n, xlab = X)
axis(1, at = seq_along(levels(X)), labels = levels(X)) # Good

But perhaps easier is

library(ggplot2)
qplot(X,Y, dats)

Michael

On Mon, Apr 23, 2012 at 11:25 AM, la mer melissarosenkr...@gmail.com wrote:
 Hello,

 I am having a problem where code that plots lines using a different data
 frame plots bars with the current data frame (I am intended to plot lines).
 The code specifies lines (see below), so I can't figure out why the results
 are bars. I suspect that it may have something to do with the fact that in
 the data frame where the code worked as intended, the both variables
 specifying different lines were numeric, whereas in the current data frame
 one of those variables (challenge) is a factor with 2 levels. Any
 suggestions for getting this to plot as intended would be much appreciated.

 Thank you!

  This is meant to plot a separate line for each subject for each
 challenge*
 for (subj in unique(lab.samples$subid)) {
        #par(new=T)
        plot.new()
        par(mfrow=c(2,1))
        par(mfg=c(1,1))
        plot(data=lab.samples, subset=(subid==subj), cortisol ~ Sample, 
 type='n',
                main=paste('Cortisol and Amylase for subject ', 
 as.character(subj)))

        for ( t in unique(subset(lab.samples,subid==subj)$challenge) ) {
                par(mfg=c(1,1))
                lines(data=lab.samples, subset=(subid==subj  challenge==t),
                        cortisol ~ Sample, type='b', pch=as.character(t), 
 col=rainbow(2)[t])
        }
        par(mfg=c(2,1))
        plot(data=lab.samples, subset=(subid==subj), amylase ~ Sample, 
 type='n')
        for ( t in unique(subset(lab.samples,subid==subj)$challenge) ) {
                par(mfg=c(2,1))
                lines(data=lab.samples, subset=(subid==subj  challenge==t),
                        amylase ~ Sample, type='b', pch=as.character(t), 
 col=heat.colors(2)[t])
        }
 }


 --
 View this message in context: 
 http://r.789695.n4.nabble.com/plot-function-creating-bars-instead-of-lines-tp4580765p4580765.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] zipfR help