Re: [R] Parsing txt file

2010-11-10 Thread Mike Marchywka


> From: santosh.srini...@gmail.com
> To: karthick.laksh...@gmail.com; r-help@r-project.org
> Date: Wed, 10 Nov 2010 16:00:26 +0530
> Subject: Re: [R] Parsing txt file
>
> You could use the following to achieve your objective. To start with
>
> ?readLines
> ?strsplit
> ?for
> ?ifelse
>
> As you try, you may receive more specific answers for the issues you come up
> with.

If you don't have some compelling reason to do it in R, it may be worth
the learning curve to get something like awk, perl, or even sed.
These tasks come up in many settings and these tools are quite verstatile
for ad hoc text manipulation. You can save your reformatted text
file in a form that is easy for R.


>
> -Original Message-
> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On
> Behalf Of karthicklakshman
> Sent: 10 November 2010 15:06
> To: r-help@r-project.org
> Subject: [R] Parsing txt file
>
>
> Hello,
>
> I have a tab limited text document with multiple lines as mentioned below,
>
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
  
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] "repeated"repeated measures in ANOVA or mixed model

2010-11-10 Thread Mike Marchywka






> Date: Tue, 9 Nov 2010 18:25:18 -0800
> From: djmu...@gmail.com
> To: paul.rhee...@up.ac.za
> CC: r-help@r-project.org
> Subject: Re: [R] "repeated"repeated measures in ANOVA or mixed model
>
> Hi:
>
> This sounds like a 'doubly repeated measures problem'. Are any treatments
> assigned to individuals or is this a purely observational study?
>
> Is the time horizon of the between-visit factor (much?) longer than that of
> the within-visit factor? You could try to assess the strength of correlation
> of measurements between visits; if it's close to zero, you might be able to
> get away with treating visit as a non-repeated measures factor, which would
> simplify the analysis.
>
> If not, then the between-visit factor is nested within subject and the
> within-visit factor is nested within visits (obviously :) within subjects,
> so you would have one within-subject correlation structure to deal with for
> visits and another for times within visits. The question comes down to how
> easily the form of the overall covariance matrix can be specified.
>
> It's within the realm of possibility that the within-visit relationship is
> nonlinear in time (as in any number of pharmacokinetic models). If the
> visits are more or less uncorrelated in time, it might be reasonable to
> combine the data over visits in the hope that leads to a better fitting
> model. In that (comparatively happy) situation, the nlmer() function in the
> lme4 package (or the nlme() function in package nlme) would be a good place
> to look.
I thought the OP question was just related to R rather than analysis but
now that you have opened it up,  I often find myself coming out against
graphics in favor or numbers but sometimes
a picture is worth a thousand words and maybe some scatterplots would let
you come up with post hoc hyphotheses to test. Non-linear doesn't mean unrelated
and, for example, you could have a parabolic or at least saturating dose 
response curve that 
was or was not anticipated a priori or for that matter withdrawal/rebound
effects in a time series of clinical effects even if the drug/metabolite  
elimination kinetics
are monotonic in time. Personally I guess I'd be open minded, stare at
some pictures, employ the informal US FDA thought that " a p-value is no
substitute for a brain" and see what the numbers say when you start testing 
ideas. 
The goal of a model fit is to explain, not rationalize, the data but this
is always difficult post hoc. 

For that matter, it is not a waste of time to try to run negative controls.
That is, do you find that your measures have some periodicity that relates
to the state of your lab equipment or time of day etc. Confirmation bias
can be a problem if you get numbers you like early on. 

>
> If it is necessary for you to deal with two types of nontrivial
> within-subject correlation, then I'm not so sure that nlme/lme4 is the right
> path to follow, but I'm not a mixed model expert so I could easily be wrong
> about that and others with more expertise are welcome to chime in with
> alternatives.
>
> You might also consider sending the initial mail and follow-ups to the
> R-mixed-models list, to which you can subscribe from here if you're not a
> list member:
>
> http://www.r-project.org/mail.html
>
> Scroll down to the bottom of the web page to find the list of special
> interest groups (SIGs).
>
> Sounds like an interesting problem...
>
> Cheers,
> Dennis
>
> On Mon, Nov 8, 2010 at 10:02 PM, Paul Rheeder  wrote:
>
> > dear List
> > I have a dataset with blood measurements at 5 points in TIME
> > (0,30,60,90,120) taken on 3 VISITS (same subjects). the interest is to
> > compare these measurements between Visits, overall and at the different time
> > points.
> >
> I have problems setting up repeated measures ANOVA with 2 repeated measures
> > (VISIT and TIME) (and then doing post hoc testing) or doing it with a linear
> > mixed model ( both VISIT and TIME are repeated).
> > Any suggestions?
> > Paul
> >
> >
> > Prof P Rheeder
> > School of Health Systems and Public Health
> > Faculty of Health Sciences
> > University of Pretoria
> > Room 6:12
> > HW Snyman North
> > Tel: 012 354 1488
> > Fax: 012 354 1750
> > Mobile: 082 779 3054
> >

  
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Installing R and an editor on a USB drive

2010-11-10 Thread Mike Marchywka













> Date: Wed, 10 Nov 2010 12:47:10 +0200
> From: hka...@gmail.com
> To: R-help@r-project.org
> Subject: [R] Installing R and an editor on a USB drive
>
> Hi,
>
> I have adviced my students to install R and an editor on a USB drive for
> working in the computer class. With R everything works fine following these
> instructions: http://personal.bgsu.edu/~mrizzo/Rmisc/usbR.htm.
>
> But several editors (e.g., Tinn-R and WinEdt) require administrator rights.

> http://vgoulet.act.ulaval.ca/en/ressources/emacs/windows) can be installed
> without administrator rights. There is, however, one problem. I have edited
> the site-start.el file and adjusted the path variable:
>
> (setq inferior-R-program-name "G:/r-2.12.0/bin/i386/rterm.exe")
>

If you really need hard absolute paths, I guess you could have it look for
R in a start up script. Presumably removable or plug-n-play media
won't be too consistent and it isn't hard to scan a few drive letters.
For example, I just wrote this on cygwin ( hotmail and R spam filter have been
redacting my mail, there should be a one line script here LOL ) 

$ for f in `mount | cut -c 1 | sort | uniq `; do echo `cygpath $f:` ; done
/cygdrive/c
/cygdrive/d
/cygdrive/f

The above lists all my drives, f is a flash stick. and you could find one with
R and write that to the start file. 



> since R is installed on the G drive. Everything works if R is on G, but when
> changing the computer, R is usually on another drive and Emacs cannot find
> it. Is it possible to handle this case?
>

> subdirectory of the "home" directory that is on the hard drive. Is it
> possible to have the init.el file on the USB drive?
>
> Thank you!
>
> Hannu
oducible code.
  
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] the formula of quantile regression for panel data, which is correct?

2010-11-09 Thread Mike Marchywka








> Date: Tue, 9 Nov 2010 00:41:26 -0800
> From: 523541...@qq.com
> To: r-help@r-project.org
> Subject: [R] the formula of quantile regression for panel data, which is 
> correct?
>
>
> Hi,everyone
> I have some trouble in understanding the formula.
> http://r.789695.n4.nabble.com/file/n3033305/%E6%9C%AA%E5%91%BD%E5%90%8D.jpg
> http://r.789695.n4.nabble.com/file/n3033305/%E6%9C%AA%E5%91%BD%E5%90%8D1.jpg
>
> which is correct?

I didn't see anyone else answer this but presumably more context would
help including things like variable defintions. How are the deltas and lambdas
defined or doesn't that matter? You see sign and other conventions change
in many works, especially between disciplines. 
When in doubt of course, paper and pencil with test data
can be quite illuminating too :) I guess you could assume that someone
who knows the answer would recognize the source of the nice pictures
and the meanings of the variables but that isn't always the case. 


>
> best wish.
> thanks
> --
> View this message in context: 
> http://r.789695.n4.nabble.com/the-formula-of-quantile-regression-for-panel-data-which-is-correct-tp3033305p3033305.html
>
  
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] try (nls stops unexpectedly because of chol2inv error

2010-11-08 Thread Mike Marchywka






> Date: Mon, 8 Nov 2010 06:18:49 -0800
> From: monte.shaf...@gmail.com
> To: r-help@r-project.org
> Subject: [R] try (nls stops unexpectedly because of chol2inv error
>
> Hi,
>
> I am running simulations that does multiple comparisons to control.
>
> For each simulation, I need to model 7 nls functions. I loop over 7 to do
> the nls using try
>
> if try fails, I break out of that loop, and go to next simulation.
>
> I get warnings on nls failures, but the simulation continues to run, except
> when the internal call (internal to nls) of the chol2inv fails.
>
> 
> Error in chol2inv(object$m$Rmat()) :
> element (2, 2) is zero, so the inverse cannot be computed
> In addition: Warning messages:
> 1: In nls(myModel.nlm, fData, start = initialValues, control =
> nls.control(warnOnly = TRUE), :
> number of iterations exceeded maximum of 50
> 2: In nls(myModel.nlm, fData, start = initialValues, control =
> nls.control(warnOnly = TRUE), :
> singular gradient
> ===
>
> Any suggestions on how to prevent chol2inv from breaking my simulation...

Since no one else has answered, let me supply some thoughts and google hits.

I'm not sure what your question is- the error message suggests the
matrix has no inverse as in A*A-1 =I can't be found- usually these things happen
because the data is not a good fit to the model. Is the message not
literally true as in you know that A has an inverse? It does seem
you posted a good complete example but it may take a bit of effort
for someone to debug.

The reason it is non-invertible probably has to do with the gradient issue,
in any case some good hits on google like this may help,

https://stat.ethz.ch/pipermail/r-help/2008-March/158329.html

( http://www.google.com/?#hl=en&q=r+nls+singular+gradient&fp=1 ) 

Personally I
tend to use SVD in my c++ code since it is the only method I know
that provides a good diagnostic on how close I came to having an
ill posed model. In your case, presumably either your model or data or code is
creating an exactly singular matrix, this may be easier to find
than the almost singular situations that often create odd results :) 
I would just ask however if anyone
has more thoughts on inverting mtricies for model fits as someone
previously mentioned that R uses QR decomposition for one task to qualify my
generic response to a question.




> The point of the simulation is to address power. As our data goes down to
> N, of the 100 simulations, only 53 are good simulations because we don't
> have enough data for nls or chol2inv to work correctly.
>
>
> monte
>
> {x:
>
> ###
>
>
> ## case I ## EQUAL SAMPLE SIZES and design points
> nsim = 100;
> N_i = M_i = 10; ## also try (10, 30, 50, 100, 200)
> r = M_i / N_i;
>
> X.start = 170; # 6 design points, at 170,180,190, etc. where each point has
> N_i elements
> X.increment = 10;
> X.points = 6;
> X.end = 260;
> Xval = seq(X.start,length.out=X.points,by=X.increment );
> Xval = seq(X.start,X.end,length.out=X.points);
>
> L = 7; ## 6 + control
> k = 3;
> varY = 0.15;
>
> ### for each simulation, we need to record all of this information, write to
> a table or file.
>
> ### Under the null of simulation, we assign all locations to have same model
> ## we assume these are the true parameters
> b = 2.87; d = 0.0345; t = 173;
>
>
> B = seq(2.5,4.5,length.out=21);
> #B = seq(2.75,3.25,length.out=21);
> #B = seq(2.85,2.95,length.out=21);
> #B = seq(2.8,3.0,length.out=21);
> B = seq(2.5,3.2,length.out=21);
> D = seq(0.02,0.04,length.out=21);
> T = seq(165,185,length.out=21);
>
> alpha = .05;
> nu = k; ## number of parameters
> tr = L-1; ## number of treatments (including control)
> rho = 1/(1+r); ## dependency parameter
> myCritical = qmvchi(alpha,nu,tr,rho);
> ## we change one parameter at a time until the results fail most of the
> time.
>
>
> ## do independent for now, but let's store the parameters and quantiles???
> INFO for one location
> # beta delta tau nsim %Reject(V.pooled) %Reject(V.total) [Simulation level]
> resultS
> # beta delta tau i of nsim max(V.pooled) max(V.total) [Individual level]
> resultI
>
> resultS = resultI = NULL;
> for(p1 in 1:length(D))
> {
> print(paste(p1, " [D] of ",length(D))); flush.console();
> print(D[p1]);
> myReject.pooled = myReject.pooled.1 = MAX.pooled = rep(-1,nsim);
> gsim = 0; ## good simulations
> for(i in 1:nsim)
> {
> doubleBreak = F;
> print(paste(i, " of ",nsim)); flush.console();
> tData = NULL;
> pooledNum = matrix(0,nrow=k,ncol=k); ##numerator as weighted sum AS
> (n_k-1)cov.scaled
> pooledDen = 0; ##denominator as correction AS N-k
> #Sigma_pooled = ((omit.1-1)*summary.nls.1$cov.scaled +
> (omit.2-1)*summary.nls.2$cov.scaled +
> (omit.L-1)*summary.nls.L$cov.scaled)/(sum(omit.1,omit.2,omit.L)-L);
>
>
> for(j in 1:L)
> {
> Y = numeric(N_i);
>

Re: [R] can't load nlme on windoze 7

2010-11-07 Thread Mike Marchywka








> Date: Sun, 7 Nov 2010 20:57:19 +0100
> From: lig...@statistik.tu-dortmund.de
> To: marchy...@hotmail.com
> CC: r-help@r-project.org
> Subject: Re: [R] can't load nlme on windoze 7
>


Trying "R CMD INSTALL ...zip" said that  unpackPkgZip was missing
so rather than figure it out I just decided to go ahead and try
the source, which is what I want to do eventually anyway ok
it is a bit of a diversion from actually using R.



>
> install.packages("nlme")
>
> should do the trick?

Well, I did have to find a mirror and indeed 

$ cat getStuff.R
options(repos=c("http://cran.stat.ucla.edu";))
install.packages(c("nlme"),dep=TRUE)

seems to have worked as library("nlme") ran fine.
Judging from all the deps it got, the original
approach may have just failed due to that, hard to 
know since I haven't read the docs LOL. 





>
> Otherwise, see "R Installation and Administration" manual fir details

>
> Note that nlme is a recommended package and included in the binary
> distribution of R anyway.

The above script seemed to have to go get alot, and in fact crashed the first
time probably due to anti-malware stuff ( netstat -abn returned about 14000
entries to a firewall and none of my browsers would work etc etc ). 

Thanks, at least I can get back to trying to use R instead of develop
c++ code but I am now interested in that too. 


>
> Uwe Ligges
>
>
>
>
  
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] can't load nlme on windoze 7

2010-11-07 Thread Mike Marchywka



Either hotmail or the list spam filter butchered my dll list but
I would mention that the cygwin dll occurs when I tried to load
the nlme library. I posted the build settings in first post and
am now floundering with the tar file to find the Makefile so I can
change them to avoid using cygwin, something like -mno-cygwin.
Can you point me to the equivalent of the makefile ? I tried 
setting default gcc to version 3 but no change etc. I can play
with the build flags if I know where they are.

Thanks.



> From: marchy...@hotmail.com
> To: lig...@statistik.tu-dortmund.de
> Date: Sun, 7 Nov 2010 10:31:04 -0500
> CC: r-help@r-project.org
> Subject: Re: [R] can't load nlme on windoze 7
>
>
> > Date: Sun, 7 Nov 2010 16:19:16 +0100
> > From: lig...@statistik.tu-dortmund.de
> > To: marchy...@hotmail.com
> > CC: tal.gal...@gmail.com; r-help@r-project.org
> > Subject: Re: [R] can't load nlme on windoze 7
> >
> > I wonder why cygwin is mentioned here.
>
> Yeah, that was my first question but once I get tied up in these
> things I get confused easily and need input :) I haven't used
> R in quite a while and this is my first exposure to the 64 bit OS and
> all the "stuff" to consider about build issues. There could be a cygwin
> R build, I dunno.
>
>
> > and does not run under cygwin. cygwin1.dll should not be required
> > anywhere. The cygwin platform is not supported.
>
>  I have reported even
> more bizarre results on the cygwin list from time to time on this machine...
>
>
> The library(nlme) call AFAIK is hanging in a 'dohs 7 stack trace with
> no obvious use of cygwin. In fact, if I run cygcheck to find dll
> list it returns this without obvious cygwin need. I did find some notes about
> system32 and WOW64 redirects, not sure if there is anything relevant in that
> mess but I was hoping to avoid this. Maybe I should just try a reinstall.
>
>
> $ cygcheck ../R.exe
> C:\pfs\R\R-2.11.1\bin\junk\..\R.exe
>

 Stupid hotmail or spam filter seems to have wrecked this...

>
>
>
>
>
>
>
> l
>
>
>
>
>
>
>
> l
>
>
>
>
> -0.dl
>
> ll
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> 1-1-0.dll
>
> 1-0.dll
>
>
>
>
>
>
>
> >
> > Best,
> > Uwe Ligges
> >
> >
> >
> > On 07.11.2010 14:37, Mike Marchywka wrote:
> > >
> > >
> > > On further investgiation, I clicked on the "R" picture using
> > > windoze explorer and ran as admin. First, I got a prompt saying
> > > it could not find cygwin1.dll. I changed env variables to add cygwin
> > > to path and now it just silently hangs.
> > >
> > >
> > > If I ask for more details and run R --verbose in gdb and then ctrl-C,

  
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] can't load nlme on windoze 7

2010-11-07 Thread Mike Marchywka







> Date: Sun, 7 Nov 2010 16:19:16 +0100
> From: lig...@statistik.tu-dortmund.de
> To: marchy...@hotmail.com
> CC: tal.gal...@gmail.com; r-help@r-project.org
> Subject: Re: [R] can't load nlme on windoze 7
>
> I wonder why cygwin is mentioned here.

Yeah, that was my first question but once I get tied up in these
things I get confused easily and need input :) I haven't used
R in quite a while and this is my first exposure to the 64 bit OS and
all the "stuff" to consider about build issues. There could be a cygwin
R build, I dunno. 


> and does not run under cygwin. cygwin1.dll should not be required
> anywhere. The cygwin platform is not supported.

 I have reported even
more bizarre results on the cygwin list from time to time on this machine...


The library(nlme) call AFAIK is hanging in a 'dohs 7 stack trace with
no obvious use of cygwin. In fact, if I run cygcheck to find dll 
list it returns this without obvious cygwin need. I did find some notes about
system32 and WOW64 redirects, not sure if there is anything relevant in that
mess but I was hoping to avoid this. Maybe I should just try a reinstall.


$ cygcheck ../R.exe
C:\pfs\R\R-2.11.1\bin\junk\..\R.exe








l







l




-0.dl

ll



















1-1-0.dll

1-0.dll







>
> Best,
> Uwe Ligges
>
>
>
> On 07.11.2010 14:37, Mike Marchywka wrote:
> >
> >
> > On further investgiation, I clicked on the "R" picture using
> > windoze explorer and ran as admin. First, I got a prompt saying
> > it could not find cygwin1.dll. I changed env variables to add cygwin
> > to path and now it just silently hangs.
> >
> >
> > If I ask for more details and run R --verbose in gdb and then ctrl-C,
> > it seems to end up with these thread,
> >
> > #0 0x76a76a6f in NlsUpdateSystemLocale ()
> >

> >
> > #1 0x55c3e074 in ?? ()
> >
> >
> > I get stuff like this,
> > $ gdb
> > GNU gdb 6.8.0.20080328-cvs (cygwin-special)
> > Copyright (C) 2008 Free Software Foundation, Inc.
> > License GPLv3+: GNU GPL version 3 or later
> > This is free software: you are free to change and redistribute it.
> > There is NO WARRANTY, to the extent permitted by law. Type "show copying"
> > and "show warranty" for details.
> > This GDB was configured as "i686-pc-cygwin".
> > (gdb) target exec ../R
> > (gdb) run --verbose
> > Starting program: /cygdrive/c/pfs/R/R-2.11.1/bin/R --verbose
> > [New thread 42148.0xa088]
> > Error: dll starting at 0x7742 not found.
> > Error: dll starting at 0x769c not found.
> > Error: dll starting at 0x7742 not found.
> > Error: dll starting at 0x7754 not found.
> > (no debugging symbols found)
> > (no debugging symbols found)
> > (no debugging symbols found)
> > (no debugging symbols found)
> > (no debugging symbols found)
> > (no debugging symbols found)
> > (no debugging symbols found)
> > (no debugging symbols found)
> > (no debugging symbols found)
> > (no debugging symbols found)
> > (no debugging symbols found)
> > (no debugging symbols found)
> > (no debugging symbols found)
> > (no debugging symbols found)
> > (no debugging symbols found)
> > (no debugging symbols found)
> > (no debugging symbols found)
> > (no debugging symbols found)
> > Error: dll starting at 0x4a29 not found.
> > now dyn.load("C:/pfs/R/R-2.11.1/library/methods/libs/methods.dll") ...
> >
> > R version 2.11.1 (2010-05-31)
> > Copyright (C) 2010 The R Foundation for Statistical Computing
> > ISBN 3-900051-07-0
> >
> > R is free software and comes with ABSOLUTELY NO WARRANTY.
> > You are welcome to redistribute it under certain conditions.
> > Type 'license()' or 'licence()' for distribution details.
> >
> > R is a collaborative project with many contributors.
> > Type 'contributors()' for more information and
> > 'citation()' on how to cite R or R packages in publications.
> >
> > Type 'demo()' for some demos, 'help()' for on-line help, or
> > 'help.start()' for an HTML browser interface to help.
> > Type 'q()' to quit R.
> >
> > [Previously saved workspace restored]
> >
> > now dyn.load("C:/pfs/R/R-2.11.1/library/grDevices/libs/grDevices.dll") ...
> > Garbage collection 1 = 0+0+1 (level 2) ...
> > 2.8 Mbytes of cons cells used (29%)
> > 0.6 Mbytes of vectors used (8%)
> > now dyn.load("C:/pfs/R/R-2.11.1/library/stats/libs/stats.dll&

Re: [R] can't load nlme on windoze 7

2010-11-07 Thread Mike Marchywka
lly sent the problem to the r-developers
> list.
>
>
>
>
> >
> > Best,
> > Tal
> >
> >
> > Contact
> > Details:---
> > Contact me: tal.gal...@gmail.com |
> > 972-52-7275845
> > Read me: www.talgalili.com (Hebrew) |
> > www.biostatistics.co.il (Hebrew) |
> > www.r-statistics.com (English)
> > --
> >
> >
> >
> >
> > On Sun, Nov 7, 2010 at 3:39 AM, Mike Marchywka
> > > wrote:
> >
> > Hi,
> >
> > I've got a problem that sounds a lot like this,
> >
> > http://r.789695.n4.nabble.com/Re-R-R-2-12-0-hangs-while-loading-RGtk2-on-FreeBSD-td3005929.html
> >
> > under windoze 7.
> >
> > but it seems to hang with this stack trace,
> >
> > #0 0x77830190 in ntdll!LdrFindResource_U ()
> >
>
> >
> >
> >
> >
> > building goes as follows,
> >
> > $ ./R CMD INSTALL --no-test-load nlme_3.1-97.tar.gz
> > * installing to library 'C:/pfs/R/R-2.11.1/library'
> > * installing *source* package 'nlme' ...
> > ** libs

  
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] can't load nlme on windoze 7

2010-11-07 Thread Mike Marchywka







> From: tal.gal...@gmail.com
> Date: Sun, 7 Nov 2010 09:27:10 +0200
> Subject: Re: [R] can't load nlme on windoze 7
> To: marchy...@hotmail.com
> CC: r-help@r-project.org
>
> Hello Mika,
>
> Most of my problems with win7 where permission problems.
> You can check if that is the case, by setting R to ran with
> administrator privileges, and see if that solves the problem.


Yes, I have had problems like that but most of the other packages
appear to build and load ok although I haven't done many on this
machine yet. I checked the permissions strings and just did
chmod -R 777 * on library directory but no help. In any case,
someone should detect a permission problem and fail rather than
hang. The problem is that an attempt to load, either test load
during the install or a call to library(nlme) hangs forever.

The other link I cited finally sent the problem to the r-developers
list. 




>
> Best,
> Tal
>
>
> Contact
> Details:---
> Contact me: tal.gal...@gmail.com |
> 972-52-7275845
> Read me: www.talgalili.com (Hebrew) |
> www.biostatistics.co.il (Hebrew) |
> www.r-statistics.com (English)
> ------
>
>
>
>
> On Sun, Nov 7, 2010 at 3:39 AM, Mike Marchywka
> > wrote:
>
> Hi,
>
> I've got a problem that sounds a lot like this,
>
> http://r.789695.n4.nabble.com/Re-R-R-2-12-0-hangs-while-loading-RGtk2-on-FreeBSD-td3005929.html
>
> under windoze 7.
>
> but it seems to hang with this stack trace,
>
> #0 0x77830190 in ntdll!LdrFindResource_U ()
>

>
>
>
>
> building goes as follows,
>
> $ ./R CMD INSTALL --no-test-load nlme_3.1-97.tar.gz
> * installing to library 'C:/pfs/R/R-2.11.1/library'
> * installing *source* package 'nlme' ...
> ** libs
> making DLL ...
> gcc -I"C:/pfs/R/R-2.11.1/include" -O3 -Wall -std=gnu99 -c
> corStruct.c -
> o corStruct.o
> gcc -I"C:/pfs/R/R-2.11.1/include" -O3 -Wall -std=gnu99 -c
> gnls.c -o gnl
> s.o
> gcc -I"C:/pfs/R/R-2.11.1/include" -O3 -Wall -std=gnu99 -c
> init.c -o ini
> t.o
> gcc -I"C:/pfs/R/R-2.11.1/include" -O3 -Wall -std=gnu99 -c
> matrix.c -o m
> atrix.o
> gcc -I"C:/pfs/R/R-2.11.1/include" -O3 -Wall -std=gnu99 -c
> nlOptimizer.c
> -o nlOptimizer.o
> gcc -I"C:/pfs/R/R-2.11.1/include" -O3 -Wall -std=gnu99 -c
> nlme.c -o nlm
> e.o
> gcc -I"C:/pfs/R/R-2.11.1/include" -O3 -Wall -std=gnu99 -c
> nlmefit.c -o
> nlmefit.o
> gcc -I"C:/pfs/R/R-2.11.1/include" -O3 -Wall -std=gnu99 -c
> nls.c -o nls.
> o
> gcc -I"C:/pfs/R/R-2.11.1/include" -O3 -Wall -std=gnu99 -c
> pdMat.c -o pd
> Mat.o
> gcc -shared -s -static-libgcc -o nlme.dll tmp.def corStruct.o gnls.o
> init.o matr
> ix.o nlOptimizer.o nlme.o nlmefit.o nls.o pdMat.o -LC:/pfs/R/R-2.11.1/bin -lR
> installing to C:/pfs/R/R-2.11.1/library/nlme/libs
> ... done
> ** R
> ** data
> ** moving datasets to lazyload DB
> ** inst
> ** preparing package for lazy loading
> ** help
> *** installing help indices
> ** building package indices ...
>
> * DONE (nlme)
>
>
> $ gcc --version
> gcc (GCC) 4.3.4 20090804 (release) 1
> Copyright (C) 2008 Free Software Foundation, Inc.
> This is free software; see the source for copying conditions. There is NO
> warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
>
>
> $ gdb
> GNU gdb 6.8.0.20080328-cvs (cygwin-special)
> Copyright (C) 2008 Free Software Foundation, Inc.
> License GPLv3+: GNU GPL version 3 or later
> This is free software: you are free to change and redistribute it.
> There is NO WARRANTY, to the extent permitted by law. Type "show copying"
> and "show warranty" for details.
> This GDB was configured as "i686-pc-cygwin".
> (gdb) target exec R.exe
> (gdb) run
> Starting program: /cygdrive/c/pfs/R/R-2.11.1/bin/R.exe
> [New thread 20844.0x5368]
> Error: dll starting at 0x7742 not found.
> Error: dll starting at 0x769c not found.
> Error: dll starting at 0x7742 not found.
> Error: dll starting at 0x7754 not found.
> (no debugging symbols found)
> (no debugging symbols found)
> (no debugging symbols found)
> (no debugging symbols found)
> (no debugging symbols found)
> (no debugging symbols found)
> (no debugging symbols found)
> (no debugging symbols found)
> (no debugging symbols found)
> (no debugging symbols found)
> (no debugging symbols found)
> (no debugging symbols found)
> (no debugging symbols found)
> (no 

[R] can't load nlme on windoze 7

2010-11-06 Thread Mike Marchywka

Hi,

I've got a problem that sounds a lot like this,

http://r.789695.n4.nabble.com/Re-R-R-2-12-0-hangs-while-loading-RGtk2-on-FreeBSD-td3005929.html

under windoze 7.

but it seems to hang with this stack trace,

#0  0x77830190 in ntdll!LdrFindResource_U ()

   from /cygdrive/c/Windows/system32/ntdll.dll




building goes as follows,

$ ./R CMD INSTALL --no-test-load nlme_3.1-97.tar.gz
* installing to library 'C:/pfs/R/R-2.11.1/library'
* installing *source* package 'nlme' ...
** libs
  making DLL ...
gcc -I"C:/pfs/R/R-2.11.1/include" -O3 -Wall  -std=gnu99 -c corStruct.c -
o corStruct.o
gcc -I"C:/pfs/R/R-2.11.1/include" -O3 -Wall  -std=gnu99 -c gnls.c -o gnl
s.o
gcc -I"C:/pfs/R/R-2.11.1/include" -O3 -Wall  -std=gnu99 -c init.c -o ini
t.o
gcc -I"C:/pfs/R/R-2.11.1/include" -O3 -Wall  -std=gnu99 -c matrix.c -o m
atrix.o
gcc -I"C:/pfs/R/R-2.11.1/include" -O3 -Wall  -std=gnu99 -c nlOptimizer.c
 -o nlOptimizer.o
gcc -I"C:/pfs/R/R-2.11.1/include" -O3 -Wall  -std=gnu99 -c nlme.c -o nlm
e.o
gcc -I"C:/pfs/R/R-2.11.1/include" -O3 -Wall  -std=gnu99 -c nlmefit.c -o
nlmefit.o
gcc -I"C:/pfs/R/R-2.11.1/include" -O3 -Wall  -std=gnu99 -c nls.c -o nls.
o
gcc -I"C:/pfs/R/R-2.11.1/include" -O3 -Wall  -std=gnu99 -c pdMat.c -o pd
Mat.o
gcc -shared -s -static-libgcc -o nlme.dll tmp.def corStruct.o gnls.o init.o matr
ix.o nlOptimizer.o nlme.o nlmefit.o nls.o pdMat.o -LC:/pfs/R/R-2.11.1/bin -lR
installing to C:/pfs/R/R-2.11.1/library/nlme/libs
  ... done
** R
** data
**  moving datasets to lazyload DB
** inst
** preparing package for lazy loading
** help
*** installing help indices
** building package indices ...

* DONE (nlme)


$ gcc --version
gcc (GCC) 4.3.4 20090804 (release) 1
Copyright (C) 2008 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.


$ gdb
GNU gdb 6.8.0.20080328-cvs (cygwin-special)
Copyright (C) 2008 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "i686-pc-cygwin".
(gdb) target exec R.exe
(gdb) run
Starting program: /cygdrive/c/pfs/R/R-2.11.1/bin/R.exe
[New thread 20844.0x5368]
Error: dll starting at 0x7742 not found.
Error: dll starting at 0x769c not found.
Error: dll starting at 0x7742 not found.
Error: dll starting at 0x7754 not found.
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
Error: dll starting at 0x4a0b not found.

R version 2.11.1 (2010-05-31)
Copyright (C) 2010 The R Foundation for Statistical Computing
ISBN 3-900051-07-0

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(nlme)
[New thread 20844.0x5154]
[Switching to thread 20844.0x5154]
Quit
(gdb) bt
#0  0x77830190 in ntdll!LdrFindResource_U ()
   from /cygdrive/c/Windows/system32/ntdll.dll
(gdb)






Mike Marchywka | V.P. Technology

415-264-8477
marchy...@phluant.com

Online Advertising and Analytics for Mobile
http://www.phluant.com


  
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] anova(lme.model)

2010-11-06 Thread Mike Marchywka

> Date: Sat, 6 Nov 2010 07:45:26 -0700
> From: gunter.ber...@gene.com
> To: sibylle.stoec...@gmx.ch
> CC: r-help@r-project.org
> Subject: Re: [R] anova(lme.model)
>
> Sounds to me like you should really be seeking help from your local
> statistician, not this list. What you request probably cannot be done.


I'm still bringing my install up to speed so I can't immediately
read the cited R stuff below but it sounds like the OP
mentions a controversy documented in the R packages. Is there
a list for discussing these topics? Offhand that seems legitimate
for a user help list unless you want people to believe that 
" it came out of a computer so it must be right, whatever a P value
is." 


>
> What is wrong with what you get from lme, whose results seem fairly
> clear whether the P values are accurate or not?
>
> Cheers,
> Bert
>
>
>
>
>
> On Sat, Nov 6, 2010 at 4:04 AM, "Sibylle Stöckli"
>  wrote:
> > Dear R users
> >
> > Topic: Linear effect model fitting using the nlme package (recomended by 
> > Pinheiro et al. 2008 for unbalanced data set).
> >
> > The R help provides much info about the controversy to use the 
> > anova(lme.model) function to present numerator df and F values. 
> > Additionally different p-values calculated by lme and anova are reported. 
> > However, I come across the same problem, and I would very much appreciate 
> > some R help to fit an anova function to get similar p-values compared to 
> > the lme function and additionally to provide corresponding F-values. I 
> > tried to use contrasts and to deal with the ‚unbalanced data set’.
> >
> > Thanks
> > Sibylle
> >
> >> Kaltenborn<-read.table("Kaltenborn_YEARS.txt", na.strings="*", header=TRUE)
> >>
> >>
> >> library(nlme)
> >
> >> model5c<-lme(asin(sqrt(PropMortality))~Diversity+ 
> >> Management+Species+Height+Height*Diversity, data=Kaltenborn, 
> >> random=~1|Plot/SubPlot, na.action=na.omit, 
> >> weights=varPower(form=~Diversity), subset=Kaltenborn$ADDspecies!=1, 
> >> method="ML")
> >
> >> summary(model5c)
> > Linear mixed-effects model fit by maximum likelihood
> >  Data: Kaltenborn
> >  Subset: Kaltenborn$ADDspecies != 1
> >AIC   BIC   logLik
> >  -249.3509 -205.4723 137.6755
> >
> > Random effects:
> >  Formula: ~1 | Plot
> >(Intercept)
> > StdDev:  0.06162279
> >
> >  Formula: ~1 | SubPlot %in% Plot
> >(Intercept)   Residual
> > StdDev:  0.03942785 0.05946185
> >
> > Variance function:
> >  Structure: Power of variance covariate
> >  Formula: ~Diversity
> >  Parameter estimates:
> >power
> > 0.7302087
> > Fixed effects: asin(sqrt(PropMortality)) ~ Diversity + Management + Species 
> > +  Height + Height * Diversity
> >  Value  Std.Error  DF   t-value p-value
> > (Intercept)   0.5422893 0.05923691 163  9.154585  0.
> > Diversity-0.0734688 0.02333159  14 -3.148896  0.0071
> > Managementm+  0.0217734 0.02283375  30  0.953562  0.3479
> > Managementu  -0.0557160 0.02286694  30 -2.436532  0.0210
> > SpeciesPab   -0.2058763 0.02763737 163 -7.449198  0.
> > SpeciesPm 0.0308005 0.02827782 163  1.089210  0.2777
> > SpeciesQp 0.0968051 0.02689327 163  3.599602  0.0004
> > Height   -0.0017579 0.00031667 163 -5.551251  0.
> > Diversity:Height  0.0005122 0.00014443 163  3.546270  0.0005
> >  Correlation:
> > (Intr) Dvrsty Mngmn+ Mngmnt SpcsPb SpcsPm SpcsQp Height
> > Diversity-0.867
> > Managementm+ -0.173 -0.019
> > Managementu  -0.206  0.005  0.499
> > SpeciesPab   -0.253  0.085  0.000  0.035
> > SpeciesPm-0.239  0.058  0.001  0.064  0.521
> > SpeciesQp-0.250  0.041 -0.001  0.032  0.502  0.506
> > Height   -0.518  0.532 -0.037 -0.004  0.038  0.004  0.033
> > Diversity:Height  0.492 -0.581  0.031 -0.008 -0.149 -0.099 -0.069 -0.904
> >
> > Standardized Within-Group Residuals:
> >Min  Q1 Med  Q3 Max
> > -2.99290873 -0.60522612 -0.05756772  0.62163049  2.80811502
> >
> > Number of Observations: 216
> > Number of Groups:
> > Plot SubPlot %in% Plot
> >   1648
> >
> >> anova(model5c)
> > numDF denDF   F-value p-value
> > (Intercept)  1   163 244.67887  <.0001
> > Diversity114   1.53025  

Re: [R] How to extract particular rows and column from a table

2010-11-05 Thread Mike Rennie
Hi Mauluda,

Next time, please read the posting guide- helping you is made alot easier if
you provide the code that didn't work.

It sounds like you might want something like this?

#make a data frame, with some column names assigned...
aa<-data.frame(c(rep("a",5), rep("c",3)),c(rep(7,5), rep(2,3)))
aa
colnames(aa)<-c("cola", "colb")
aa

#select your items of interest...
ab<-aa$colb[aa$cola=="a"]
ab

HTH,

Mike
On Fri, Nov 5, 2010 at 11:26 AM, Mauluda Akhtar  wrote:

> Hello,
> I'm a new user of R. I've a very big table like the following structure
> (suppose the variable name with "aa"). From this table I want to make a new
> table which'll contain just two column with V2 and V6 with some particular
> rows( Suppose, variable name with "bb"). I'd like to mention V2 column is
> representing the id  that correspond to the column V6 whis is represention
> the base position of DNA. In this bb table, just for an example I want to
> extract all the corresponding rows of V2 column where in V6 column there is
> "30049831" (in my table there is repeatation of same base position). I
> tried
> this but faild to solve.
> Could you please let me know how can i solve this.
>
> Thank you.
> Mauluda
>
>
> V1   V2
> V3V4  V5   V6  V7V8   V9 V10  V11
> ESMEHEP0102102796h05.w2kF59780SCF:32  CpGVariation6
> 3004983130049831+.-1NA
> ESMEHEP0102102796h05.w2kF59780SCF:114CpGVariation6
> 3004991330049913+.31NA
> ESMEHEP0102102796h05.w2kF59780SCF:154CpGVariation6
> 3004995330049953+.48NA
> ESMEHEP0102102796h05.w2kF59780SCF:170CpGVariation6
> 3004996930049969+.30NA
> ESMEHEP0102102796h05.w2kF59780SCF:172CpGVariation6
> 3004997130049971+.38NA
> ESMEHEP0102102796h05.w2kF59780SCF:245CpGVariation6
> 3005004430050044+.14NA
> ESMEHEP0102102796h05.w2kF59780SCF:363CpGVariation6
> 3005016230050162+.0NA
> ESMEHEP0102102796h05.w2kF59780SCF:382CpGVariation6
> 3005018130050181+.1NA
> ESMEHEP0102102796a04.w2kF59780SCF:114CpGVariation6
> 3004991330049913+.25NA
> ESMEHEP0102102796a04.w2kF59780SCF:154CpGVariation6
> 3004995330049953+.28NA
> ESMEHEP0102102796a04.w2kF59780SCF:170CpGVariation6
> 3004996930049969+.28NA
> ESMEHEP0102102796a04.w2kF59780SCF:172CpGVariation6
> 3004997130049971+.45NA
> ESMEHEP0102102796a04.w2kF59780SCF:245CpGVariation6
> 3005004430050044+.29NA
> ESMEHEP0102102796a04.w2kF59780SCF:363CpGVariation6
> 3005016230050162+.0NA
> ESMEHEP0102102796a04.w2kF59780SCF:382CpGVariation6
> 3005018130050181+.8NA
> ESMEHEP0102102796e06.w2kF59780SCF:114CpGVariation6
> 3004991330049913+.20NA
> ESMEHEP0102102796e06.w2kF59780SCF:154CpGVariation6
> 3004995330049953+.28NA
> ESMEHEP0102102796e06.w2kF59780SCF:170CpGVariation6
> 3004996930049969+.44NA
> ESMEHEP0102102796e06.w2kF59780SCF:172CpGVariation6
> 3004997130049971+.-1NA
> ESMEHEP0102102796e06.w2kF59780SCF:245CpGVariation6
> 3005004430050044+.22NA
> ESMEHEP0102102796e06.w2kF59780SCF:363CpGVariation6
> 3005016230050162+.0NA
> ESMEHEP0102102796e06.w2kF59780SCF:382CpGVariation6
> 3005018130050181+.0NA
> ESMEHEP0102102788c04.w2kF59780SCF:32  CpGVariation6
> 3004983130049831+.-1NA
> ESMEHEP0102102788c04.w2kF59780SCF:114CpGVariation6
> 3004991330049913+.38NA
> ESMEHEP0102102788c04.w2kF59780SCF:154CpGVariation6
> 3004995330049953+.31NA
> ESMEHEP0102102788c04.w2kF59780SCF:170CpGVariation6
> 3004996930049969+.54NA
> ESMEHEP0102102788c04.w2kF59780SCF:172CpGVariation6
> 3004997130049971+.36NA
> ESMEHEP0102102788c04.w2kF59780SCF:245CpGVariation6
> 3005004430050044+.27NA
> ESMEHEP0102102788c04.w2kF59780SCF:363CpGVariation6
> 3005016230050162+.0NA
> ESMEHEP0102

Re: [R] NFFT on a Zoo?

2010-11-05 Thread Mike Marchywka







> Date: Fri, 5 Nov 2010 00:14:15 -0700
> From: flym...@gmail.com
> To: marchy...@hotmail.com
> CC: ggrothendi...@gmail.com; r-help@r-project.org; 
> rpy-l...@lists.sourceforge.net
> Subject: Re: [R] NFFT on a Zoo?
>
> FWIW: It turns out I dove into a rabbit hole:
>
> 1. Though the gaps in my 3-axis accelerometer data represent 10% data
> loss (OMG!), the number of gaps represents only 0.1% of the 3 million
> data points (BFD).
>
> 2. The data is noisy enough that 0.1% discontinuity can't affect an
> FFT. Each gap was removed simply by adjusting subsequent timestamps.
>
> 3. With the gaps removed, the remaining jitter in the timestamps is both
> small and nearly normally distributed (no systematic errors). So the
> timestamps were eliminated from further processing, and the mean
> inter-sample time was used as the sampling period.
>
> So, neither NFFT nor Zoo are needed, since a regular FFT now works just
> fine.

Well, again, it is easy to simulate these effects in R by sampling
a sine wave or other known signal at the wrong times, DFT, and see
what spectrum looks like. I suggested sine wave to start, but any
nonlinearities can be hard to estimate without a little work. You
can also do two-tone tests by hand and see IMD, harmonics, etc.


>
> The Moral of the Story is: "Take a closer look at the data before
> deciding difficult processing is needed."
>
> Homer Simpson translation: "Doh!"
>
I think Bart said " if it is in a book it must be true" and computers
of course don't make mistakes :)


> A big "Thanks!" to all who responded to my newbie posts: The R
> Community is richly blessed with wisdom, kindness and patience.

This is the kind of thing you can play with on the back of an 
envelope when bored once you get started. 

>
> -BobC
>
>
>
> On 11/03/2010 01:22 PM, Mike Marchywka wrote:
> > 
> >
> >> From: ggrothendi...@gmail.com
> >> Date: Wed, 3 Nov 2010 15:27:13 -0400
> >> To: flym...@gmail.com
> >> CC: r-help@r-project.org; rpy-l...@lists.sourceforge.net
> >> Subject: Re: [R] NFFT on a Zoo?
> >>
> >> On Wed, Nov 3, 2010 at 2:59 PM, Bob Cunningham wrote:
> >>
> >>> I have an irregular time series in a Zoo object, and I've been unable to
> >>> find any way to do an FFT on it. More precisely, I'd like to do an NFFT
> >>> (non-equispaced / non-uniform time FFT) on the data.
> >>>
> >>> The data is timestamped samples from a cheap self-logging accelerometer.
> >>> The data is weakly regular, with the following characteristics:
> >>> - short gaps every ~20ms
> >>> - large gaps every ~200ms
> >>> - jitter/noise in the timestamp
> >>>
> >>> The gaps cover ~10% of the acquisition time. And they occur often enough
> >>> that the uninterrupted portions of the data are too short to yield useful
> >>> individual FFT results, even without timestamp noise.
> >>>
> >>> My searches have revealed no NFFT support in R, but I'm hoping it may be
> >>> known under some other name (just as non-uniform time series are known as
> >>> 'zoo' rather than 'nts' or 'nuts').
> >>>
> >>> I'm using R through RPy, so any solution that makes use of numpy/scipy 
> >>> would
> >>> also work. And I care more about accuracy than speed, so a non-library
> >>> solution in R or Python would also work.
> >>>
> >>> Alternatively, is there a technique by which multiple FFTs over smaller
> >>> (incomplete) data regions may be combined to yield an improved view of the
> >>> whole? My experiments have so far yielded only useless results, but I'm
> >>> getting ready to try PCA across the set of partial FFTs.
> >>>
> >>>
> >>
> >
> > I'm pretty sure all of this is in Oppenheim and Shaffer meaning it
> > is also in any newer books. I recall something about averaging
> > but you'd need to look at details. Alternatively, and this is from
> > distant memory so maybe someone else can comment, you can just
> > feed a regularly spaced time series to anyone, go get FFTW for example,
> > and insert zeroes for missing data. This is equivalent to multiplying
> > your real data with a window function that is zero at missing points.
> > I think you can prove that multiplication
> > in time domain is convolution in FT domain so you can back this out
>

Re: [R] Sorting data from one column with strings

2010-11-04 Thread Mike Rennie
(apologies for any double hits; forgot to reply all...)

Or, you could just go back to basics, and write yourself a general loop that
goes through whatever levels of a variable and gives you back whatever
statistics you want... below is an example where you estimate means for each
level, but you could estimate any number of statistical parameters...

dat<-data.frame(c(rep("A",5), rep("B",5),rep("C",5)),c(1:15))
results<-NULL
for(i in levels(dat[,1]))
  {
  sub.dat<-subset(dat, dat[,1]==i)
  res<-mean(sub.dat[,2])
  results<-c(results,i,res)
  }
results.mat<-matrix(results, ncol=2, byrow=TRUE)
results.mat


HTH,

Mike

On Thu, Nov 4, 2010 at 7:28 AM, Ramsvatn Silje wrote:

>
> Hello,
>
> I have tried to find this out some other way, but unsuccessful I have to
> try this list.
> I assume this should be quite simple.
>
> I have a dataset with 4 columns, "Sample_no", "Species", "Nitrogen",
> "Carbon" in csv format. In the species column I have many different
> species with varying number of obs per species
>
> Eg
>
> "Sample_no" "Species"   "Nitrogen"  "Carbon"
> 1   Cod 15.2-19.0
> 2   Haddock 14.8-20.2
> 3   Cod 15.6-18.5
> 4   Cod 13.2-20.1
> 5   Haddock 14.3-18.8
> Etc..
>
> And I want to calculate, mean, standard dev etc per species for the
> observations "Nitrogen" and "Carbon". And later do plots and stats with
> the different species. I will in the end have many species, so need it to
> be "automatic" I can't enter code for every species separate.
>
> Can anyone help me with this? Or if this is the wrong list to sendt this
> question to, where do I send it?
>
> Thank you very much in advance.
>
>
> Best regards
>
> Silje Ramsvatn
>
> PhD-candidate
> University of Tromsø
> Norway
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html<http://www.r-project.org/posting-guide.html>
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Converting Strings to Variable names

2010-11-04 Thread Mike Rennie
Hi Anand,

Try creating a variable where you can store your data, and append it in your
loop. See added lines of code to include below...

On Thu, Nov 4, 2010 at 9:43 AM, Anand Bambhania wrote:

> Hi all,
>
> I am processing 24 samples data and combine them in single table called
> CombinedSamples using following:
>
> CombinedSamples<-rbind(Sample1,Sample2,Sample3)
>
> Now variables Sample1, Sample2 and Sample3 have many different columns.
>
> To make it more flexible for other samples I'm replacing above code with a
> for loop:
>
> #Sample is a string vector containing all 24 sample names
>

#create a variable to stick your results

res<- NULL

>
> for (k in 1:length(Sample))
> {
>  CombinedSamples<-rbind(get(Sample[k]))
>
  res<-c(res, CombinedSamples)

> }
>
> Now, every iteration of your loop should append CombinedSamples to res, and
you won't overwrite your results every time.

HTH,

Mike



> This code only stores last sample data as CombinedSample gets overwritten
> every time. Using "CombinedSamples[k]" or "CombinedSamples[k,]" causes
> dimension related errors as each Sample has several rows and not just 24.
> So
> how can I assign data of all 24 samples to CombinedSamples?
>
> Thanks,
>
> Anand
>
>[[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html<http://www.r-project.org/posting-guide.html>
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] NFFT on a Zoo?

2010-11-03 Thread Mike Marchywka







> From: ggrothendi...@gmail.com
> Date: Wed, 3 Nov 2010 15:27:13 -0400
> To: flym...@gmail.com
> CC: r-help@r-project.org; rpy-l...@lists.sourceforge.net
> Subject: Re: [R] NFFT on a Zoo?
>
> On Wed, Nov 3, 2010 at 2:59 PM, Bob Cunningham  wrote:
> > I have an irregular time series in a Zoo object, and I've been unable to
> > find any way to do an FFT on it.  More precisely, I'd like to do an NFFT
> > (non-equispaced / non-uniform time FFT) on the data.
> >
> > The data is timestamped samples from a cheap self-logging accelerometer.
> >  The data is weakly regular, with the following characteristics:
> > - short gaps every ~20ms
> > - large gaps every ~200ms
> > - jitter/noise in the timestamp
> >
> > The gaps cover ~10% of the acquisition time.  And they occur often enough
> > that the uninterrupted portions of the data are too short to yield useful
> > individual FFT results, even without timestamp noise.
> >
> > My searches have revealed no NFFT support in R, but I'm hoping it may be
> > known under some other name (just as non-uniform time series are known as
> > 'zoo' rather than 'nts' or 'nuts').
> >
> > I'm using R through RPy, so any solution that makes use of numpy/scipy would
> > also work.  And I care more about accuracy than speed, so a non-library
> > solution in R or Python would also work.
> >
> > Alternatively, is there a technique by which multiple FFTs over smaller
> > (incomplete) data regions may be combined to yield an improved view of the
> > whole?  My experiments have so far yielded only useless results, but I'm
> > getting ready to try PCA across the set of partial FFTs.
> >
>


I'm pretty sure all of this is in Oppenheim and Shaffer meaning it
is also in any newer books. I recall something about averaging 
but you'd need to look at details. Alternatively, and this is from
distant memory so maybe someone else can comment, you can just
feed a regularly spaced time series to anyone, go get FFTW for example,
and insert zeroes for missing data. This is equivalent to multiplying
your real data with a window function that is zero at missing points. 
I think you can prove that multiplication
in time domain is convolution in FT domain so you can back this out 
by deconvolving with your window function spectrum. This probably is not
painless, the window spectrum will have badly placed zeroes etc, but it
may be helpful.
Apaprently this is still a bit of an open issue,

http://books.google.com/books?id=BW1PdOqZo6AC&pg=PA2&lpg=PA2&dq=dft+window+missing+data&source=bl&ots=fSY-iRoCNN&sig=30cC0SdkrDcp62iWc-Mv26mfNjI&hl=en&ei=AMTRTNmyMYP88AauxtzKDA&sa=X&oi=book_result&ct=result&resnum=6&ved=0CDEQ6AEwBTgK#v=onepage&q&f=false



You should be able to do the case of a sine wave with pencil and paper
and see if or how this really would work. 


> Check out the entire thread that starts here.
>
> http://www.mail-archive.com/r-help@r-project.org/msg36349.html
>
> --
> Statistics & Software Consulting
> GKX Group, GKX Associates Inc.
> tel: 1-877-GKX-GROUP
> email: ggrothendieck at gmail.com
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
  
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] question in using nlme and lme4 for unbalanced data

2010-11-02 Thread Mike Marchywka









> Date: Mon, 1 Nov 2010 17:38:54 -0700
> From: djmu...@gmail.com
> To: cy...@email.arizona.edu
> CC: r-help@r-project.org
> Subject: Re: [R] question in using nlme and lme4 for unbalanced data
>
> Hi:
>
> On Mon, Nov 1, 2010 at 3:59 PM, Chi Yuan  wrote:
>
> > Hello:
> > I need some help about using mixed for model for unbalanced data. I
> > have an two factorial random block design. It's a ecology
[...]
>
> Unbalanced data is not a problem in either package. However, five blocks is
> rather at the boundary of whether or not one can compute reliable variance
> components and random effects. Given that the variance estimate of blocks in
> your models was nearly zero, you're probably better off treating them as
> fixed rather than random and analyzing the data with a fixed effects model
> instead.
>
> Another question is about p values.
> > I kind of heard the P value does not matter that much in the mixed
> > model because it's not calculate properly.
>
>
> No. p-values are not calculated in lme4 (as I understand it) because,
> especially in the case of severely unbalanced data, the true sampling
> distributions of the test statistics in small to moderate samples are not
> necessarily close to the asymptotic distributions used to compute the
> corresponding p-values. It's the (sometimes gross) disparity between the
> small-sample and asymptotic distributions that makes the reported p-values
> based on the latter unreliable, not an inability to calculate the p-value
> properly. I can assure you that Prof. Bates knows how to compute a p-value.

To add my own question on terminology[ even the statements here should be taken
as questions ], assuming the null hypothesis is 
true and you have some underlying population distribution of various 
attirubtes, 
you get some distribution for your test statistic for repeated experiemnts. The 
asymptotic distribution I take it is the true population distribution which may 
not be well reflected
in your ( small ) sample? Usually people justify non-parametrics by saying they
help in the small sample/outlier cases. Alternatively, if you have some 
reasonable
basis for knowing the true population distributions, you could use that for p 
value
calculation and/or do monte carlo and just measure the number of time you 
incorrectly
reject null hypothesis etc. Of course, monte carlo code needs to be debugged 
too so
nothing will be a sure thing. Introducing new things like an indpendently known
population distribution may not be statitically rigorous by some criteria( 
comments welcome LOL) but you free to examine it for analysis.


>
> Is there any other way I can
> > tell whether the treatment has a effect not? I know AIC is for model
> > comparison,

Get more data? In this case,it would seem the goal of statistical analysis
is to make some guesses about causality. Presumably this is one piece of
evidence in a larger "case" that includes theory and other observations.
To paraphrase the legal popular legal phrase, 
"if the model doesn't fit you must not quit." Or, as someone at the US FDA
is quoted as saying, " A p-value is no substitute for a brain." 



> > do I report this in formal publication?

I guess that depends on the journal (LOL). Personally I'd be more worried about
getting a convincing story together than playing to a specific audience. 
However,
many questions of detail do relate to the audience and journal- you want to
use the math to determine reality, what you present depends on the publication. 
There is nothing wrong with presenting novel analyses with enough detail to
the right audience but it may not be for everyone :)

> >
>
> As mentioned above, I would suggest analyzing this as a fixed effects
> problem. Since the imbalance is not too bad, and it is not unusual in field
> experiments to have more control EUs than treatment EUs within each level of
> treatment, a fixed effects analysis may be sufficient. It wouldn't hurt to
> consult with a local statistician to discuss the options.

  
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] biglm: how it handles large data set?

2010-11-01 Thread Mike Marchywka







> Date: Sun, 31 Oct 2010 00:22:12 -0700
> From: tim@netzero.net
> To: r-help@r-project.org
> Subject: [R] biglm: how it handles large data set?
>
>
>
> I am trying to figure out why 'biglm' can handle large data set...
>
> According to the R document - "biglm creates a linear model object that uses
> only p^2 memory for p variables. It can be updated with more data using
> update. This allows linear regression on data sets larger than memory."

I'm not sure anyone answered the question but let me make some
comments having done something similar with non-R code before and motivate
my earlier comments about "streaming" data into a stats widget.
Probably this creates a matrix of some sort with various moments/ 
sums-of-powers
of the data like IIRC what the stats books call "computing formulas."
Each new data point simply adds to the matrix
elements, it needn't be stored by itself- in the simple case of finding 
an average for example each data point just ads to N and a sum and 
you divide the two when finished. So, anyway, up to the limits
of the floating point implementation( when each new "y^n" is too small to 
add a non-zero delta to the current sum LOL) , you can keep updating the matrix
elements with very large data sets and your memory requirement is just
due to matrix elements not number of data points. Finally you invert
the matrix to get your "answer." The ordere you quote seem about
right IIRC as I tried to fit some image related data to a polynomial.
You can probably just write the equations yourself, rearrange terms to
express as sums over past data, and see that your coefficients come from
the matrix inverse. 


>
> After reading the source code below, I still could not figure out how
> 'update' implements the algorithm...
>
> Thanks for any light shed upon this ...
>
> > biglm::biglm
>
> function (formula, data, weights = NULL, sandwich = FALSE)
> {
> tt <- terms(formula)
> if (!is.null(weights)) {
> if (!inherits(weights, "formula"))
> stop("`weights' must be a formula")
> w <- model.frame(weights, data)[[1]]
> }
> else w <- NULL
> mf <- model.frame(tt, data)
> mm <- model.matrix(tt, mf)
> qr <- bigqr.init(NCOL(mm))
> qr <- update(qr, mm, model.response(mf), w)
> rval <- list(call = sys.call(), qr = qr, assign = attr(mm,
> "assign"), terms = tt, n = NROW(mm), names = colnames(mm),
> weights = weights)
> if (sandwich) {
> p <- ncol(mm)
> n <- nrow(mm)
> xyqr <- bigqr.init(p * (p + 1))
> xx <- matrix(nrow = n, ncol = p * (p + 1))
> xx[, 1:p] <- mm * model.response(mf)
> for (i in 1:p) xx[, p * i + (1:p)] <- mm * mm[, i]
> xyqr <- update(xyqr, xx, rep(0, n), w * w)
> rval$sandwich <- list(xy = xyqr)
> }
> rval$df.resid <- rval$n - length(qr$D)
> class(rval) <- "biglm"
> rval
> }
> 
> ---
> --
> View this message in context: 
> http://r.789695.n4.nabble.com/biglm-how-it-handles-large-data-set-tp3020890p3020890.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Odp: connecting points into a smooth curve

2010-11-01 Thread Mike Marchywka


 >
> >
> > Doing much of anything meaningful with 5 points would probably require a
> model
> > as the other
> > poster suggested- your model would need to be solved depending on its
> particulars.
> >
> > You sometimes see these kinds of wild interpolation issues with the
> drawing
> > programs and free-form input "smoothing" where it tries to fit a smooth
> curve
> > to your mouse moves.
>
> And of course in Excel graphs you can find something like "smooth"
> connection to points. Therefore the method depends on what the OP means by
> smooth curve.
>

I think some of these let you manually set various parameters too to make
it look right - if the OP follows my advice,tries to define smooth in terms
of something data-relevant probably related to properties of derivatives, 
he will probably get to a point
where he can have more parameters than data and can set these things
to suite his imagination or artistic inclinations :)

> Regards
> Petr
>




  
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Odp: connecting points into a smooth curve

2010-11-01 Thread Mike Marchywka







> To: kai...@berkeley.edu
> From: petr.pi...@precheza.cz
> Date: Mon, 1 Nov 2010 11:50:17 +0100
> CC: r-help@r-project.org
> Subject: [R] Odp: connecting points into a smooth curve
>
> Hi
>
> r-help-boun...@r-project.org napsal dne 01.11.2010 07:18:47:
>
> >
> > If I have, say, five scatter points and want to connect them together
> into a
> > smooth curve.
>
> As you are not much specific about what you consider "smooth curve" here
> are some options


I think the OP said he had 5 points so that leaves a lot to the imagination.
Presumably smooth means something like, " a curve that goes through all 5 points
and has as many continuous derivatives as possible." So you write a general
fitting function, Taylor would be a good expansion I suppose, impose
the above conditions, and see if there
is an R package that provides the coefficients you want? 
Someone else was asking a related question and there I offered sinc 
interpolation
and indeed this can be exact given a band limited signal and no time base 
errors etc.


Doing much of anything meaningful with 5 points would probably require a model 
as the other
poster suggested- your model would need to be solved depending on its 
particulars.

You sometimes see these kinds of wild interpolation issues with the drawing 
programs and free-form input "smoothing" where it tries to fit a smooth curve 
to your mouse moves. 


>
> use some model ?lm and plot predicted values with line
> ??smooth will give you many functions for smoothing data e.g.
> ?loess
> ?supsmu
> ?spline
> and some others
>
> Regards
> Petr
>
> > I did plot(x,y,type="l"), but the graph is five segments connecting with
> > each other, but not a smooth curve.
[[elided Hotmail spam]]
> >
> >
> > --
> > View this message in context:
> http://r.789695.n4.nabble.com/connecting-points-
> > into-a-smooth-curve-tp3021796p3021796.html
> > Sent from the R help mailing list archive at Nabble.com.
> >
> > __
> > R-help@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
  
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] runtime on ising model

2010-10-28 Thread Mike Marchywka








> Date: Thu, 28 Oct 2010 09:58:40 -0700
> From: wdun...@tibco.com
> To: dwinsem...@comcast.net; mike...@gmail.com
> CC: r-help@r-project.org
> Subject: Re: [R] runtime on ising model
>
> > -Original Message-
> > From: r-help-boun...@r-project.org
> > [mailto:r-help-boun...@r-project.org] On Behalf Of David Winsemius
> > Sent: Thursday, October 28, 2010 9:20 AM
> > To: Michael D
> > Cc: r-help@r-project.org
> > Subject: Re: [R] runtime on ising model
> >
> >
> > On Oct 28, 2010, at 11:52 AM, Michael D wrote:
> >
> > > Mike, I'm not sure what you mean about removing foo but I
> > think the
> > > method
> > > is sound in diagnosing a program issue and the results speak for
> > > themselves.

Agreed on first part but not second- empirical debugging rarely 
produces compelling results in isolation. As a collection
of symptons fine but not conclusive- if you learn c++ you will
find out about all kinds of things like memory corruption that
never make sense :) Here, the big concern is issues with memory
as you never determined to be CPU limited although based on
others comments you likely are in any case.

> > >
> > > I did invert my if statement at the suggestion of a CS professor
> > > (who also
> > > suggested recoding in C, but I'm in an applied math program and
> > > haven't had
> > > the time to take programming courses, which i know would be helpful)
> > >
> > > Anyway, with the statement as:
> > >
> > > if( !(k %in% c(10^4,10^5,10^6,10^7)) ){
> > > #do nothing
> > > } else {
> > > q <- q+1
> > > Out[[q]] <- M
> > > }
> > >
> > > run times were back to around 20 minutes.
>
> Did that one change really make a difference?
> R does not evaluate anything in the if or else
> clauses of an if statement before evaluating
> the condition.

What is at issue here? That is, the OP claimed inverting polarity
sped things up, suggesting that the branch mattered. AFAIK he
never actually proved which branch was taken. This could
imply many things or nothing: one branch may be slow, or cause
a page fault, or the test may fail fast but succed slowly( testing
huge array for equality for example) .


>
> > Have you tried replacing all of those 10^x operations with their
> > integer equivalents, c(1L, 10L, 100L)? Each time through
> > the loop you are unnecessarily calling the "^" function 4 times. You
> > could also omit the last one. 10^7, during testing since M at the
> > last iteration (k=10^7) would be the final value and you could just
> > assign the state of M at the end. So we have eliminated 4*10^7
> > unnecessary "^" calls and 10^7 unnecessary comparisons. (The CS
> > professor is perhaps used to having the C compiler do all
> > thinking of
> > this sort for him.)
>
> %in% is a relatively expensive function. Use == if you
> can. E.g., compare the following 2 ways of stashing
> something at times 1e4, 1e5, and 1e6:
>
> > system.time({z <- integer()
> for(k in seq_len(1e6))
> if(k %in% set) z[length(z)+1]<-k
> print(z)})
> [1] 1 10 100
> user system elapsed
> 46.790 0.023 46.844
>
> > system.time({z <- integer()
> nextCheckPoint <- 10^4
> for(k in seq_len(1e6))
> if( k == nextCheckPoint ) {
> nextCheckPoint <- nextCheckPoint * 10
> z[length(z)+1]<-k
> }
> print(z)})
> [1] 1 10 100
> user system elapsed
> 4.529 0.013 4.545
>
> With such a large number of iterations it pays to
> remove unneeded function calls in arithmetic expressions.
> R does not optimize them out - it is up to you to
> do that. E.g.,
>
> > system.time(for(i in seq_len(1e6)) sign(pi)*(-1))
> user system elapsed
> 6.802 0.014 6.818
> > system.time(for(i in seq_len(1e6)) -sign(pi))
> user system elapsed
> 3.896 0.011 3.911
>
> Bill Dunlap
> Spotfire, TIBCO Software
> wdunlap tibco.com
>
> >
> > --
> > David
> >
> > > So as best I can tell something
> > > happens in the if statement causing the computer to work
> > ahead, as the
> > > professor suggests. I'm no expert on R (and have no desire to try
> > > looking at
> > > the R source code (it would only confuse me)) but if anyone
> > can offer
> > > guidance on how the if statement works (Does R try to work ahead?
> > > Under what
> > > conditions does it try to "work ahead" so I can try to exploit this
> > > behavior) I would greatly

Re: [R] runtime on ising model

2010-10-26 Thread Mike Marchywka








> Date: Tue, 26 Oct 2010 12:53:14 -0400
> From: mike...@gmail.com
> To: j...@bitwrit.com.au
> CC: r-help@r-project.org
> Subject: Re: [R] runtime on ising model
>
> I have an update on where the issue is coming from.
>
> I commented out the code for "pos[k+1] <- M[i,j]" and the if statement for
> time = 10^4, 10^5, 10^6, 10^7 and the storage and everything ran fast(er).
> Next I added back in the "pos" statements and still runtimes were good
> (around 20 minutes).
>
> So I'm left with something is causing problems in:

I haven't looked at this since some passing interest in magnetics
decades ago, something about 8-tracks and cassettes, but you have
to be careful with conclusions like " I removed foo and problem
went away therefore problem was foo." Performance issues are often
caused by memory, not CPU limitations. Removing anything with a big
memory footprint could speed things up. IO can be a real bottleneck.
If you are talking about things on minute timescales, look at task
manager and see if you are even CPU limited. Look for page faults
or IO etc. If you really need performance and have a task which
is relatively simple, don't ignore c++ as a way to generate data
points and then import these into R for analysis. 

In short, just because you are focusing on math it doesn't mean
the computer is limited by that.


>
> ## Store state at time 10^4, 10^5, 10^6, 10^7
> if( k %in% c(10^4,10^5,10^6,10^7) ){
> q <- q+1
> Out[[q]] <- M
> }
>
> Would there be any reason R is executing the statements inside the "if"
> before getting to the logical check?
> Maybe R is written to hope for the best outcome (TRUE) and will just throw
> out its work if the logic comes up FALSE?
> I guess I can always break the for loop up into four parts and store the
> state at the end of each, but thats an unsatisfying solution to me.
>
>
> Jim, I like the suggestion of just pulling one big sample, but since I can
> get the runtimes under 30 minutes just by removing the storage piece I doubt
> I would see any noticeable changes by pulling large sample vectors.
>
> Thanks,
> Michael
>
> On Tue, Oct 26, 2010 at 6:22 AM, Jim Lemon  wrote:
>
> > On 10/26/2010 04:50 PM, Michael D wrote:
> >
> >> So I'm in a stochastic simulations class and I having issues with the
> >> amount
> >> of time it takes to run the Ising model.
> >>
> >> I usually don't like to attach the code I'm running, since it will
> >> probably
> >> make me look like a fool, but I figure its the best way I can find any
> >> bits
> >> I can speed up run time.
> >>
> >> As for the goals of the exercise:
> >> I need the state of the system at time=1, 10k, 100k, 1mill, and 10mill
> >> and the percentage of vertices with positive spin at all t
> >>
> >> Just to be clear, i'm not expecting anyone to tell me how to program this
> >> model, cause I know what I have works for this exercise, but it takes far
> >> too long to run and I'd like to speed it up by replacing slow operations
> >> wherever possible.
> >>
> >> Hi Michael,
> > One bottleneck is probably the sampling. If it doesn't grab too much
> > memory, setting up a vector of the samples (maybe a million at a time if 10
> > million is too big - might be able to rewrite your sample vector when you
> > store the state) and using k (and an offset if you don't have one big
> > vector) to index it will give you some speed.
> >
> > Jim
> >
> >
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
  
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Time series data with dropouts/gaps

2010-10-26 Thread Mike Marchywka








> From: ggrothendi...@gmail.com
> Date: Tue, 26 Oct 2010 00:37:05 -0400
> To: flym...@gmail.com
> CC: r-help@r-project.org
> Subject: Re: [R] Time series data with dropouts/gaps
>
> On Tue, Oct 26, 2010 at 12:28 AM, Bob Cunningham  wrote:
> > I have time-series data from a pair of inexpensive self-logging 3-axis
> > accelerometers (http://www.gcdataconcepts.com/xlr8r-1.html).  Since I'm not
> > sure of the vibration/shock spectrum I'm measuring, for my initial sensor
> > characterization run the units were mounted together with the sample rate
> > set to the maximum of 640 samples/sec.
> >
> > Unfortunately, at this sample rate there are significant data dropouts at
> > various scales (a phenomenon not present at data rates of 160 Hz and below):
> >
> > 1. Approximately every 20ms, a few samples are dropped (believed to be due
> > to internal buffer wrapping).
> >
> > 2. Approximately every 200ms, about 50 samples are dropped (believed to be
> > due to flash write times).
> >
> > 3. At seemingly random intervals, a sample will appear with an out-of-order
> > timestamp (vendor is diagnosing).
> >
> > Initially, I'm trying to answer the following questions:
> >
> > A. How well do the 2 units compare?  (Calibration, time-base drift, etc.)
> >
> > B. Can I use a lower sample rate?  (What is the observed spectrum?)
> >
> > I started attacking the problem in Python (numpy/scipy), where I've done
> > lots of prior time-series sensor data analysis.  Unfortunately, the gaps
> > have made direct use of the data futile, and I found I was spending all my
> > time manipulating Python lists and numpy vectors rather than finding
> > answers.
> >
> > I hope R can help calm my sea of unruly data.  I'm presently working my way
> > through the abundant R references (tutorials, wiki, etc.), but I was hoping
> > to find pointers here to help me become productive sooner rather than later.
> >
> > Here's my present brute-force plan of attack:
> >
> > - Load both data sets (in CSV format).  Each data element is a timestamp +
> > 3-axis acceleration.
> > - Determine timebase offset: The unit clocks don't match perfectly, and the
> > units were started at slightly different times, so I expect to correlate
> > common events in the data.
> > - Find all overlapping data clusters (between superset of gaps).
> > - See if I have enough data to perform spectral analysis.  I'd like to
> > analyze all clusters together, but I suspect I may have to analyze them
> > independently, then combine the results.
> >
> > Thoughts?  Hints?

Is this a question about R or DSP? I think spectral analysis on non-uniformly
sampled data was covered in Oppenheim and Shafer or equivalent texts from this 
century.
I guess you could use sinc interpolation if you really want to make up data
although I should probably read the zoo documentation before commenting further 
:)

I guess my thought at this point would be some simple paper-pencil
analysis to see what time base pertrubations do in time and FT domains
and then look for related R functions. If you force these at a single
frequency, you may be able to get some idea what is going on with your
apparent spectrum. It may help of course if you have known good
data ( you can generate this in R ) and perturb it ( remove samples, jitter
the sample times etc ) and verifiy that your analysis can back-out the
problems you introduced. fwiw.


> >
>
> You can use read.zoo in the zoo package to create a zoo time series
> from a csv file. The zoo merge method can merge two or more series
> together and na.locf, na.approx or na.spline, also in zoo, could be
> used to fill in the NAs. There are three vignettes (pdf documents)
> that come with the zoo package that will get you up to speed.

See comment above, I haven't read docs but name suggests spline when
probably the OP is looking for something like sinc interpolation, 



>

  
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] .R file

2010-10-25 Thread Mike Marchywka


> All three of these editors are external to R but have the capability of
> sending code from the editor to the console. All of them are good and have
> loyal user bases. Notepad++ is another option; but you have to copy/paste
> code to R - I mention it because it has syntax highlighting and is capable
> of saving files in numerous formats.
>
> In fact, all of the editors mentioned above have syntax highlighting
> capabilities, which is very useful when writing/developing code and
> documents, and all of them let you save a file with a .R extension (unlike
> certain Windows editors, for example). I'm certain I've missed a few,
> especially on the Linux and Mac platforms, and I expect others will chime in
> with their preferences.


You mention Notepad++, I'm still using vi under cygwin and an ancient copy of 
ultra edit.
When I was first learning Java and c++, the various commercial IDE 
packages were great. However, I've found that with most things I end up 
integrating
with bash scripts and end up in vi for "touch up" anyway- and if you take the
time to learn it many of the odd commands are quite helpful. IIRC many
of the IDE's offer language-specific help beyond syntax highlighting such as 
signature suggestions as you type or menu options for compiling and debugging 
that 
pure text editors lack, not sure what the
R IDE does. I finally developed alt approaches for finding signatures and
it turned out to be easier to do this outside ide although sometimes I
compile bad code due to typo that IDE would have flagged. 
In the simple case on windows, I vi my .R file and then invoke a script
to run it. In many cases ,the IDE menu's for more language specific actions
like compile and build are confusing and limited and have a learning curve
and there too I end up editing bash scripts using cmd line alt.
So, I guess my point is that I would not just run to an IDE for R or
others :)



>
> HTH,
> Dennis
>
> On Sun, Oct 24, 2010 at 11:57 PM, zhiji19  wrote:
>
> >
> > Hello everyone
> >
> > Can you please teach me how to save my homework as .R file?
> >
> > I write my code in RGui. When I tried to save my work, the RGui only allows
> > me to save it as .RData.
> >

  
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How long does skipping in read.table take

2010-10-22 Thread Mike Marchywka











> From: ggrothendi...@gmail.com
> Date: Fri, 22 Oct 2010 18:28:14 -0400
> To: dimitri.liakhovit...@gmail.com
> CC: r-help@r-project.org
> Subject: Re: [R] How long does skipping in read.table take
>
> On Fri, Oct 22, 2010 at 5:17 PM, Dimitri Liakhovitski
>  wrote:
> > I know I could figure it out empirically - but maybe based on your
> > experience you can tell me if it's doable in a reasonable amount of
> > time:
> > I have a table (in .txt) with a 17,000,000 rows (and 30 columns).
> > I can't read it all in (there are many strings). So I thought I could
> > read it in in parts (e.g., 1 milllion) using nrows= and skip.
> > I was able to read in the first 1,000,000 rows no problem in 45 sec.
> > But then I tried to skip 16,999,999 rows and then read in things. Then
> > R crashed. Should I try again - or is it too many rows to skip for R?
> >
>
> You could try read.csv.sql in sqldf.
>
> library(sqldf)
> read.csv.sql("myfile.csv", skip = 1000, header = FALSE)
> or
> read.csv.sql("myfile.csv, sql = "select * from file 2000, 1000")
>
> The first skips the first 1000 lines including the header and the
> second one skips 1000 rows (but still reads in the header) and then
> reads 2000 rows. You may or may not need to specify other arguments
> as well. For example, you may need to specify eol = "\n" or other
> depending on your line endings.
>
> Unlike read.csv, read.csv.sql reads the data directly into an sqlite
> database (which it creates on the fly for you). The data does not go
> through R during this operation. From there it reads only the data
> you ask for into R so R never sees the skipped over data. After all
> that it automatically deletes the database.


The first time I saw this suggested I thought I would wait to 
reply because it seemed a bit of an odd suggestion and I thought
I was missing some R-speak and a reply would waste everyone's time. However,
I still don't see what I'm missing here. A database is generally a big table
of data with various indicies and locks that facilitate concurrent updates and 
responses to arbitrary queries. This is fine for hotel reservation systems
where you need "ACID" performance but makes little sense with constant
data which will be accessed sequentially. A fast DB could take milliseonds to 
response,
an anticipatory streaming system could always have data in nanoseconds. 
Is this thing really acting as a "DB" or is there something more to it?
Is there no well buffered streaming system for data you will use in order?

It sounds like you are just building indicies and then deleteing them
but never really using random access. Is there not better way?

Thanks



>

> Statistics & Software Consulting
> GKX Group, GKX Associates Inc.
> tel: 1-877-GKX-GROUP
> email: ggrothendieck at gmail.com
>





Mike Marchywka | V.P. Technology

415-264-8477
marchy...@phluant.com

Online Advertising and Analytics for Mobile
http://www.phluant.com

  
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How long does skipping in read.table take

2010-10-22 Thread Mike Marchywka





> Date: Fri, 22 Oct 2010 17:17:58 -0400
> From: dimitri.liakhovit...@gmail.com
> To: r-help@r-project.org
> Subject: [R] How long does skipping in read.table take
>
> I know I could figure it out empirically - but maybe based on your
> experience you can tell me if it's doable in a reasonable amount of
> time:
> I have a table (in .txt) with a 17,000,000 rows (and 30 columns).
> I can't read it all in (there are many strings). So I thought I could
> read it in in parts (e.g., 1 milllion) using nrows= and skip.
> I was able to read in the first 1,000,000 rows no problem in 45 sec.
> But then I tried to skip 16,999,999 rows and then read in things. Then
> R crashed. Should I try again - or is it too many rows to skip for R?
>
I've seen this come up a few times already in my brief time on 
the list. Quick goog search does turn up things like this to deal
with large datasets,

http://yusung.blogspot.com/2007/09/dealing-with-large-data-set-in-r.html

With most OO languages and use of accessors, you can hide a lot of 
things and the data handler is free to return a value or values to
you however makes sense- memory,disk, or even socket is hidden. I'm 
amazed that R is this general to allow package creators this freedom.



Just generally, memory management is a big problem even among computer
people using "computer" ( hard core programming rather than something like R) 
languages. People assume "well gee
I made an array it must be all in memory." Often however the OS tries to
give you VM- which is probably worse than having a file in terms of performance.

One rule that is good to consider is to "act locally" - that is try to
operate only with adjacent data and do something like stream or block
your input data. An R streaming IO class could potentially be very fast
and give implementors a reason to think globally but act locally. 
As is probably apparent, it is easy for even stats and math tasks to 
become IO limited rather than CPU bound.

As an aside, you can use some external utilities to split the file I guess
if split files are ok to use with your R code, head and tail for example
can isolate line ranges. In the past I've crated indexes of line offsets
and then used perl for random access but not sure how that would work with
R. 


> Thank you!

Thank google. 

>





Mike Marchywka | V.P. Technology

415-264-8477
marchy...@phluant.com

Online Advertising and Analytics for Mobile
http://www.phluant.com





  
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] comparing two data files

2010-10-19 Thread Mike Marchywka


> From: nicol...@buffalo.edu
> Date: Tue, 19 Oct 2010 18:23:27 -0400
> To: r-help@r-project.org
> Subject: [R] comparing two data files
>
> I have 2 large data files that I need to compare and find the differences 
> between data file x and data file y in order to correct data entry error. 
> Theoretically both data files should be identical. I am trying to figure 
> ou[[elided Hotmail spam]]


I'm not sure why you want to use R for this, there may be very good reasons,
but generally I use text processing utilities like "diff" ( see linux
or cygwin docs) along with grep,sed, awk, and maybe perl. 
Generally these are not sophisticated with numbers and just process
strings so if your validation and correction relies on R features it
may be worthwhile. If you are really just looking for diffs in strings,
these others could be a good alternative and possibly worth the learning curve
for you if you largest motivation for doing this in R is to learn more R.
I guess the next question is, "what do you want to do if they are not equal?"


  
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Sine function fitting

2010-10-18 Thread Mike Marchywka








> Date: Mon, 18 Oct 2010 05:46:14 -0700
> From: a...@walla.co.il
> To: r-help@r-project.org
> Subject: [R] Sine function fitting
>
>
> Hi,
>
> Is there a package to perform a sine function fitting to XY data?

Since no one replied AFAIK, are you asking about  FFT or something else?
It isn't really clear if you have 2D data or Y as a function of X.
I suppose that given some objective and a set of data, you could minimize
some error metric to fit your not-exactly-sinusoidal samples. You could
just peak-detect FFT as one approach and then you get a detailed
error analysis ( the spectrum). 



>
> Thx,
> Ashz
>
> --
> View this message in context: 
> http://r.789695.n4.nabble.com/Sine-function-fitting-tp3000156p3000156.html
> Sent from the R help mailing list archive at Nabble.com.

  
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] running a long R process on Linux using putty - best practice to disconnect

2010-10-14 Thread Mike Marchywka








> Date: Thu, 14 Oct 2010 18:07:02 +0200
> From: martin.to...@geo.uzh.ch
> To: r-help@r-project.org
> Subject: [R] running a long R process on Linux using putty - best practice to 
> disconnect
>
> Dear all,
> I am sure this has been solved before, googling did not help much,.
> Warning, I am not great with putty/linux servers
>
> Situation: I have been given access to an R installation on a Linux
> server to do some larger number crunching that was killing my machine. I
> use putty to connect, and then go R, source("myscript") and all is fine
> But this will take potentially days, and I would like to disconnect
>
> What is the best practice? Someone mentionne nohup, but this would
> require starting the R script with an argument to R, I think. Is this
> doable?

In short, I think "R CMD BATCH script " does what you want.
I routinely run R from scripts and use nohup, the two should work well
together AFAIK. I have encapsulated R stuff into a script ( "myR") 
and have something like this ,

if [ "$1" == "-run" ]
then
echo running $2
$R CMD BATCH $2
shift ; shift
fi


if [ "$1" == "-go" ]
then
echo starting R with no params
$R
shift


>
> Another option was using screen, btu that is not available on the server.
> Any help is welcome.
>
> Thanks
> Martin
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
  
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Drop matching lines from readLines

2010-10-14 Thread Mike Marchywka







> From: santosh.srini...@gmail.com
> To: r-help@r-project.org
> Date: Thu, 14 Oct 2010 11:27:57 +0530
> Subject: [R] Drop matching lines from readLines
>
> Dear R-group,
>
> I have some noise in my text file (coding issues!) ... I imported a 200 MB
> text file using readlines
> Used grep to find the lines with the error?
>
> What is the easiest way to drop those lines? I plan to write back the
> "cleaned" data set to my base file.

Generally for text processing, I've been using utilities external to R
although there may be R alternatives that work better for you. You
mention grep, I've suggested sed as a general way to fix formatting things,
there is also something called "uniq" on linux or cygwin.
I have gotten into the habit of using these for a variety of data
manipulation tasks, only feed clean data into R.

$ echo -e a bc\\na bc
a bc
a bc

$ echo -e a bc\\na bc | uniq
a bc

$ uniq --help
Usage: uniq [OPTION]... [INPUT [OUTPUT]]
Filter adjacent matching lines from INPUT (or standard input),
writing to OUTPUT (or standard output).

With no options, matching lines are merged to the first occurrence.

Mandatory arguments to long options are mandatory for short options too.
  -c, --count   prefix lines by the number of occurrences
  -d, --repeated    only print duplicate lines
  -D, --all-repeated[=delimit-method]  print all duplicate lines
    delimit-method={none(default),prepend,separate}
    Delimiting is done with blank lines
  -f, --skip-fields=N   avoid comparing the first N fields
  -i, --ignore-case ignore differences in case when comparing
  -s, --skip-chars=N    avoid comparing the first N characters
  -u, --unique  only print unique lines
  -z, --zero-terminated  end lines with 0 byte, not newline
  -w, --check-chars=N   compare no more than N characters in lines
  --help display this help and exit
  --version  output version information and exit

A field is a run of blanks (usually spaces and/or TABs), then non-blank
characters.  Fields are skipped before chars.

Note: 'uniq' does not detect repeated lines unless they are adjacent.
You may want to sort the input first, or use `sort -u' without `uniq'.
Also, comparisons honor the rules specified by `LC_COLLATE'.










>
> Thanks.

  
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] [OT] (slightly) - OpenOffice Calc and text files

2010-10-13 Thread Mike Marchywka





> From: dwinsem...@comcast.net
> To: bsch...@anest.ufl.edu
> Date: Wed, 13 Oct 2010 14:52:21 -0400
> CC: r-help@r-project.org
> Subject: Re: [R] [OT] (slightly) - OpenOffice Calc and text files
>
>
> On Oct 13, 2010, at 1:13 PM, Schwab,Wilhelm K wrote:
>
> > Hello all,
> >
> > I had a very strange looking problem that turned out to be due to
> > unexpected (by me at least) format changes to one of my data files.
> > We have a small lab study in which each run is represented by a row
> > in a tab-delimited file; each row identifies a repetition of the
> > experiment and associates it with some subjective measurements and
> > times from our notes that get used to index another file with lots
> > of automatically collected data. In short, nothing shocking.
> >
> > In a moment of weakness, I opened the file using (I think it's
> > version 3.2) of OpenOffice Calc to edit something that I had mangled
> > when I first entered it, saved it (apparently the mistake), and
> > reran my analysis code. The results were goofy, and the problem was
> > in my code that runs before R ever sees the data. That code was
> > confused by things that I would like to ensure don't happen again,
> > and I suspect that some of you might have thoughts on it.
> >
> > The problems specifically:
> >
> > (1) OO seems to be a little stingy about producing tab-delimited

> > filter and folks (presumably like us) saying that it deserves to be
> > a separate option.
>
> You have been little stingy yourself about describing what you did. I
> see no specifics about the actual data used as input nor the specific
> operations. I just opened an OO.o Calc workbook and dropped a
> character vector, "1969-12-31 23:59:50" copied from help(POSIXct) into



> > Have any of you found a nice (or at least predictable) way to use OO
> > Calc to edit files like this?
>
> I didn't do anything I thought was out of the ordinary and so cannot
> reproduce your problem. (This was on a Mac, but OO.o is probably going
> to behave the same across *NIX cultures.)
>
> --
> David
>
> > If it insists on thinking for me, I wish it would think in 24 hour
> > time and 4 digit years :)
>
> Is it possible that you have not done enough thinking for _it_?
>
> > I work on Linux, so Excel is off the table, but another spreadsheet
> > or text editor would be a viable option, as would configuration
> > changes to Calc.
> >
> > Bill

Probably instead of guessing and seeing how various things react, you
could go get a utility like octal dump or open in an editor that
has a hex mode and see what happened. This could be anything- crlf convention,
someone turned it to unicode, etc. On linux or cygwin I think you have
"od" available. Then of course, if you know what R likes, you can use
sed to fix it...





  
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Slow reading multiple tick data files into list of dataframes

2010-10-11 Thread Mike Marchywka






> Date: Mon, 11 Oct 2010 14:39:54 -0700
> From: aqua...@gmail.com
> To: r-help@r-project.org
> Subject: [R] Slow reading multiple tick data files into list of dataframes
[...]
> Is there a better/quicker or more R way of doing this ?

While there may be an obvious R-related answer, usually it helps if you 
can determine where the bottleneck is in terms of 
resources on your platform- often on older machines you
run out of real memory and then all the time is spent reading
the file onto VM back on disk. Can you tell if you are CPU or
memory limited by using task manager? 

It could in fact be that the best solution involves not trying
to hold your entire data set in memory at once, hard to know without
knowing your platform etc. In the
past, I've found that actually sorting data, a slow process
itself, can speed things up a lot due to less thrashing
of memory hierarchy during the later analysis. I doubt 
if that helps your immediate problem but it does point
to one possible non-obvious "optimization" depending
on what is slowing you down.


>
> Thanks,
> Chris
>
> --
> View this message in context: 
> http://r.789695.n4.nabble.com/Slow-reading-multiple-tick-data-files-into-list-of-dataframes-tp2990723p2990723.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
  
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Memory management in R

2010-10-10 Thread Mike Marchywka








> Date: Sun, 10 Oct 2010 15:27:11 +0200
> From: lorenzo.ise...@gmail.com
> To: dwinsem...@comcast.net
> CC: r-help@r-project.org
> Subject: Re: [R] Memory management in R
>
>
> > I already offered the Biostrings package. It provides more robust
> > methods for string matching than does grepl. Is there a reason that you
> > choose not to?
> >
>
> Indeed that is the way I should go for and I have installed the package
> after some struggling. Since biostring is a fairly complex package and I
> need only a way to check if a certain string A is a subset of string B,
> do you know the biostring functions to achieve this?
> I see a lot of methods for biological (DNA, RNA) sequences, and they may
> not apply to my series (which are definitely not from biology).

Generally the differences relate to alphabet and "things you may want
to know about them." Unless you are looking for reverse complement
text strings, there will be a lot of stuff you don't need. Offhand,
I'd be looking for things like computational linguistics packages
as you are looking to find patterns or predictability in human readable 
character sequences. Now, humans can probably write hairpin-text( look
at what RNA can do LOL) but this is probably not what you care about. 

However,  as I mentioned earlier, I had to write my own regex compiler ( 
coincidently
for bio apps ) to get required performance. Your application and understanding
may benefit from things like building dictionaries that aren't really
part of regex and that can easily be done in a few lines of c++ code
using STL containers. To get statistically meaningful samples, you almost
will certainly need faster code.




> Cheers
>
> Lorenzo
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
  
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Memory management in R

2010-10-08 Thread Mike Marchywka








> From: dwinsem...@comcast.net
> To: lorenzo.ise...@gmail.com
> Date: Fri, 8 Oct 2010 19:30:45 -0400
> CC: r-help@r-project.org
> Subject: Re: [R] Memory management in R
>
>
> On Oct 8, 2010, at 6:42 PM, Lorenzo Isella wrote:
>

> > Please find below the R snippet which requires an input file (a
> > simple text file) you can download from
> >
> > http://dl.dropbox.com/u/5685598/time_series25_.dat
> >
> > What puzzles me is that the list is not really long (less than 2000
> > entries) and I have not experienced the same problem even with
> > longer lists.
>
> But maybe your loop terminated in them eaarlier/ Someplace between
> 11*225 and 11*240 the grepping machine gives up:
>
> > eprs <- paste(rep("aa", 225), collapse="#")
> > grepl(eprs, eprs)
> [1] TRUE
>
> > eprs <- paste(rep("aa", 240), collapse="#")
> > grepl(eprs, eprs)
> Error in grepl(eprs, eprs) :
> invalid regular expression
> 'aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#aa#a
> In addition: Warning message:
> In grepl(eprs, eprs) : regcomp error: 'Out of memory'
>
> The complexity of the problem may depend on the distribution of
> values. You have a very skewed distribution with the vast majority
> being in the same value as appeared in your error message :
>

>
> HTH (although I think it means you need to construct a different
> implementation strategy);

You really need to look at the question posed by your regex and consider 
the complexity of what you are asking and what likely implementations
would do with your regex. Something like this probably needs to be implemented
in dedicated code to handle the more general case or you need to determine
if input data is pathological given your regex. Being able to write something
concisely doesn't mean the execution of that something is simple. Even if
it does manage to return a result, it likely will get very slow. In the
past I have had to write my own simple regex compilers to handle a limited
class of expressions to make the speed reasonable. In this case, depending
on your objectives, dedicated code may even be helpful to you in understanding
the algorithm. 

>
> David.
>
>
> > Many thanks
> >
> > Lorenzo
> >

  
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Memory management in R

2010-10-08 Thread Mike Marchywka







> Date: Fri, 8 Oct 2010 13:30:59 -0400
> From: jholt...@gmail.com
> To: lorenzo.ise...@gmail.com
> CC: r-help@r-project.org
> Subject: Re: [R] Memory management in R
>
> More specificity: how long is the string, what is the pattern you are
> matching against? It sounds like you might have a complex pattern
> that in trying to match the string might be doing a lot of back
> tracking and such. There is an O'Reilly book on Mastering Regular
> Expression that might help you understand what might be happening. So
> if you can provide a better example than just the error message, it
> would be helpful.


This is possibly a stack issue. Error messages are not often literal,
I have seen out of memory for graphic device objects :) Regex suggests
a stack issue but that would be a guess on the mechanism of death but
what you probably really want is a simpler regex :)




>
> On Fri, Oct 8, 2010 at 1:11 PM, Lorenzo Isella  wrote:
> > Dear All,
> > I am experiencing some problems with a script of mine.
> > It crashes with this message
> >
> > Error in grepl(fut_string, past_string) :
> >  invalid regular expression
> > '12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12
> > Calls: entropy_estimate_hash -> total_entropy_lz -> entropy_lz -> grepl
> > In addition: Warning message:
> > In grepl(fut_string, past_string) : regcomp error:  'Out of memory'
> > Execution halted
> >
> > To make a long story short, I use some functions which eventually call grepl
> > on very long strings to check whether a certain substring is part of a
> > longer string.
> > Now, the script technically works (it never crashes when I run it on a
> > smaller dataset) and the problem does not seem to be RAM memory (I have
> > several GB of RAM on my machine and its consumption never shoots up so my
> > machine never resorts to swap memory).
> > So (though I am not an expert) it looks like the problem is some limitation
> > of grepl or R memory management.
> > Any idea about how I could tackle this problem or how I can profile my code
> > to fix it (though it really seems to me that I have to find a way to allow R
> > to process longer strings).
> > Any suggestion is appreciated.
> > Cheers
> >
> > Lorenzo
> >
> > __
> > R-help@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
>
>
> --
> Jim Holtman
> Cincinnati, OH
> +1 513 646 9390
>
> What is the problem that you are trying to solve?
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
  
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R CMD SHLIB changes function name when compiling

2010-10-08 Thread Mike Marchywka










> Date: Sat, 9 Oct 2010 01:21:47 +1030
> From: stephen.peder...@adelaide.edu.au
> To: rip...@stats.ox.ac.uk
> CC: r-help@r-project.org
> Subject: Re: [R] R CMD SHLIB changes function name when compiling
>
> I think I should also add that I have compiled R from source so am
> pretty confident that I have the correct set of Rtools.

Assuming the problem is name mangling or symbol export,
how about "extern "C" ? I haven't done this lately my memory
is bad but quick goog search suggests this is the header and IIRC
it sound about right.


>
>
> On 8/10/2010 8:06 PM, Prof Brian Ripley wrote:
> > On Fri, 8 Oct 2010, Steve Pederson wrote:
> >
> >> Hi,
> >>
> >> I'm trying to write a function in C for implementation with .Call.
> >> When compiling using R CMD SHLIB characters seem to be added to the
> >> function name.
> >>
> >> Here's the complete C code from the file summariseMCMC.c:
> >>
> >> #include
> >> #include
> >> #include
> >>
> >> void summariseMCMC(SEXP data) {
> >>
> >> PROTECT(data=AS_NUMERIC(data));
> >> UNPROTECT(1);
> >>
> >> }
> >>
> >> Then after compiling (R CMD SHLIB summariseMCMC.c) & loading the .dll
> >>
> >> dyn.load("C:/R/R-2.11.1/bin/summariseMCMC.dll")
> >> is.loaded("_Z13summariseMCMCP7SEXPREC")
> >> [1] TRUE
> >> is.loaded("summariseMCMC")
> >> [1] FALSE
> >>
> >> Just wondering if anyone had any pointers for getting rid of this, or
> >> have I missed something outrageously obvious?
> >
> > You have. This was not done by'R CMD SHLIB', but by a C++ compiler -- it
> > is called 'name mangling'. Unfortunately you didn't show us the output
> > from that command, when the cause would probably have been 'outrageously
> > obvious'.
> >
> > The fix is to make sure you use a C compiler to compile C code, and
> > we've almost no idea why that is not being done on your system.
> > But as a guess, check that the environment variable CC is not set.
> >
> >>
> >> Thanks,
> >>
> >> Steve
> >>
> >>
> >> sessionInfo()
> >> R version 2.11.1 (2010-05-31)
> >> i386-pc-mingw32
> >>
> >> locale:
> >> [1] LC_COLLATE=English_Australia.1252 LC_CTYPE=English_Australia.1
> >> [3] LC_MONETARY=English_Australia.1252 LC_NUMERIC=C
> >> [5] LC_TIME=English_Australia.1252
> >>
> >> attached base packages:
> >> [1] stats graphics grDevices utils datasets methods base
> >>
> >> other attached packages:
> >> [1] limma_3.4.4 Biobase_2.8.0 aroma.affymetrix_1.7.0
> >> [4] aroma.apd_0.1.7 affxparser_1.20.0 R.huge_0.2.0
> >> [7] aroma.core_1.7.0 aroma.light_1.16.0 matrixStats_0.2.1
> >> [10] R.rsp_0.3.6 R.cache_0.3.0 R.filesets_0.8.3
> >> [13] digest_0.4.2 R.utils_1.4.4 R.oo_1.7.3
> >> [16] R.methodsS3_1.2.0
> >>
> >> loaded via a namespace (and not attached):
> >> [1] tools_2.11.1
> >>
> >> __
> >> R-help@r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide
> >> http://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
> >>
> >
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
  
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] SVM functions

2010-10-06 Thread Mike Marchywka



I'm not sure if you want general literature or R-specific answers.
In any case, check either source below and you may have better
luck contacting authors as many may use R but not be on this list.



> Date: Wed, 6 Oct 2010 06:28:02 -0700
> From: nikkiha...@gmail.com
> To: r-help@r-project.org
> Subject: Re: [R] SVM functions
>
>
> What are the ways by which one can validate the SVM results and its
> significance? Are there any papers or articles regarding my question? Please
> do let me know.

for open ended research with no leads, gscholar is ok or use this,




>
> Thank you.
>





- - - - - -

Mike Marchywka | V.P. Technology

415-264-8477
marchy...@phluant.com

Online Advertising and Analytics for Mobile


  
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] binary tree construction in R

2010-10-05 Thread Mike Marchywka


Is there a trick to posting links here or is wikipedia blocked?
I see in my sent folder that I included this wikipedia
link on threaded b-trees but it didn't show up in 
the mail i got back from the list? Sorry,
I'm new to this list :)

http:// en.wikipedia.org /wiki/Threaded_binary_tree






> From: marchy...@hotmail.com
> To: matl...@cs.ucdavis.edu; r-help@r-project.org

> > Date: Tue, 5 Oct 2010 14:57:40 -0700
> > From: matl...@cs.ucdavis.edu
> > To: r-help@r-project.org
> > Subject: Re: [R] binary tree construction in R
> >

> >
> > Not sure what you mean by a "threaded" binary tree, but I am enclosing
>
> thought others may be interested but I must admit I'm surprised
> people are exploring data structures in R,
>
>
>
> ( and I just used something like this in an indexing system I wrote ).
> I guess it wouldn't be too far a field to discuss benefits
> of data stucture exploration in R vs cpp or java- Especially
> for something like this where you may want to time it in a multithreaded
> setting- you can always instrument something like that, collect lots
> of monte carlo results, and then import the statistical data into R
> for analysis I would think.

  
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] binary tree construction in R

2010-10-05 Thread Mike Marchywka









> Date: Tue, 5 Oct 2010 14:57:40 -0700
> From: matl...@cs.ucdavis.edu
> To: r-help@r-project.org
> Subject: Re: [R] binary tree construction in R
>
> MK wrote:
>
> > Hi all,
> >
> > I'm very new to R and I'm trying to construct a threaded binary tree using
> > recursive functions.
> >
> > I'm very confused was wondering if anyone had any R sample code they would
> > share. I've come across a lot of C++ code(nothing in R) and this is not
> > helping.
> >
> > best,
> >
> > MK
>
> Not sure what you mean by a "threaded" binary tree, but I am enclosing

thought others may be interested but I must admit I'm surprised
people are exploring data structures in R,



( and I just used something like this in an indexing system I wrote ).
I guess it wouldn't be too far a field to discuss benefits
of data stucture exploration in R vs cpp or java- Especially
for something like this where you may want to time it in a multithreaded
setting- you can always instrument something like that, collect lots
of monte carlo results, and then import the statistical data into R
for analysis I would think. 



> code below. It is from my forthcoming book on software development in
> R.

  
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R-help

2010-10-05 Thread Mike Marchywka







> Date: Tue, 5 Oct 2010 08:18:10 -0700
> From: djmu...@gmail.com
> To: tott...@yahoo.com
> CC: r-help@r-project.org
> Subject: Re: [R] R-help
>
> Hi:
>
> This problem is a useful lesson in the power of vectorizing calculations in
> R.
>
> A remanufacturing of your function:
[...]

> The vectorized version is over 200 times faster than the loop. In R,
> vectorization pays
> dividends (sometimes big ones) when the operation is amenable to it. This
> is an
> important lesson to learn when migrating to R from a traditional programming
> language.

Correct me if wrong as I have already demonstrated that I am not that familiar
with R details, but more generally what you have done is relegated inner loops
to internal R code which is presumably optimized native code and doesn't require
an R interpretter to keep executing your "for" or other loop code. This is
well known in java or scripting languages. I'm not just trying to be defensive
( LOL ) but just point out your result is probably more general than you
have stated it. Using R-like approaches or idiosyncratic special functions
can probably be much faster than rolling your own in R code.


>
> HTH,
> Dennis
>
>

  
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] read columns of quoted numbers as factors

2010-10-05 Thread Mike Marchywka






> From: pda...@gmail.com
> Date: Tue, 5 Oct 2010 13:25:52 +0200
> To: j_hirsch...@yahoo.com
> CC: r-help@r-project.org
> Subject: Re: [R] read columns of quoted numbers as factors
>
>
> On Oct 4, 2010, at 18:39 , james hirschorn wrote:
>
> > Suppose I have a data file (possibly with a huge number of columns), where 
> > the
> > columns with factors are coded as "1", "2", "3", etc ... The default 
> > behavior of
> > read.table is to convert these columns to integer vectors.
> >
> > Is there a way to get read.table to recognize that columns of quoted numbers
> > represent factors (while unquoted numbers are interpreted as integers), 
> > without
> > explicitly setting them with colClasses ?
>
> I don't think there's a simple way, because the modus operandi of read.table 
> is to read everything as character and then see whether it can be converted 
> to numeric, and at that point any quotes will have been lost.
>
> One possibility, somewhat dependent on the exact file format, would be to 
> temporarily set quote="", see which columns contains quote characters, and, 
> on a second pass, read those columns as factors, using a computed colClasses 
> argument. It will break down if you have space-separated columns with quoted 
> multi-word strings, though.
>
>

While this specific example may or may not lend itself to a solution within R,
I would just mention that it is not a faux pas to modify your data file
with something like sed or awk prior to feeding it to some program like R.
Quotes,spaces, commas, etc, may be something that the target app can handle
or it may just be easier to change the format with a familiar tool designed
for that.

  
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] trouble with RODBC -- chopping off part of column names

2010-10-04 Thread Mike Williamson
Marc, et. al,

Below are all of the pertinent version info.  However, I think versions,
etc., are somewhat irrelevant.  Instead, somewhere there must be an
environment variable or something that 'R' is talking to that is forcing the
column name to fit within a set column width, and I need to either blow away
that variable or make it much larger.
Again, to recap:  if I make SQL queries outside of 'R', I am able to
grab column names properly.  If I make SQL queries *within 'R' and through a
Windows server*, I grab all column names properly.  However, if I make SQL
queries *within 'R' and through the linux/unix server described below*, the
column names are cut off at a fixed length of 30 characters.
Any advice as to where to look for this environment variable or whatever
setting would help greatly!

 Thanks!
Mike

*OS information:
*Linux
2.6.18-8.1.15.el5
#1 SMP Mon Oct 22 08:32:28 EDT 2007
x86_64
x86_64
x86_64
GNU/Linux


*DB info:
*Microsoft SQL Server Management Studio10.0.2531.0
Microsoft Data Access Components (MDAC)
6.1.7600.16385
Microsoft MSXML3.0 4.0 5.0 6.0
Microsoft Internet Explorer8.0.7600.16385
Microsoft .NET Framework2.0.50727.4952
Operating System6.1.7600


*'R' info:
*> R.Version()
$platform
[1] "x86_64-redhat-linux-gnu"

$arch
[1] "x86_64"

$os
[1] "linux-gnu"

$system
[1] "x86_64, linux-gnu"

$status
[1] ""

$major
[1] "2"

$minor
[1] "10.0"

$year
[1] "2009"

$month
[1] "10"

$day
[1] "26"

$`svn rev`
[1] "50208"

$language
[1] "R"

$version.string
[1] "R version 2.10.0 (2009-10-26)"





"Telescopes and bathyscaphes and sonar probes of Scottish lakes,
Tacoma Narrows bridge collapse explained with abstract phase-space maps,
Some x-ray slides, a music score, Minard's Napoleanic war:
The most exciting frontier is charting what's already here."
  -- xkcd

--
Help protect Wikipedia. Donate now:
http://wikimediafoundation.org/wiki/Support_Wikipedia/en


On Sat, Oct 2, 2010 at 6:31 AM, Marc Schwartz  wrote:

> On Oct 1, 2010, at 6:26 PM, Mike Williamson wrote:
>
> > Hello all,
> >
> >I have a strange / interesting problem that might be 'R' settings
> > themselves, or it might be something with the OS.
> >
> >I am using the RODBC library.  I have a script that goes out and,
> before
> > making a query for a big data set, will first query for the column names
> of
> > the data set.  The column names could sometimes be quite long (e.g.,
> "Time
> > Background Estimation (seconds)" ).  If I make this query for the column
> > names from my Windows laptop or from a Windows server, using
> odbcConnect() &
> > sqlQuery(), I get the column names properly.  However, if I run this via
> > unix, it will chop off part of the column name.  (E.g., with "Time
> > Background Estimation (seconds)", it becomes "Time Background Estimation
> > (se", which is 30 characters long.)
> >
> >Does anyone have a clue what might be causing this (settings in 'R',
> > something within unix, etc.)?  I am not even sure how to debug, and I
> can't
> > really get around this because I cannot simply query all of the columns,
> the
> > data set would become too large.
> >
> >  Thanks!
> >  Mike
>
>
> Mike,
>
> You indicated 'unix' above. Is that Solaris or are you being generic in a
> reference to a Linux platform?
>
> We need the details of your OS, the version of R you are running and
> whether it is 32 or 64 bit, as well as the database that you are connecting
> to (eg. Oracle).
>
> You can use:
>
>  vignette("RODBC")
>
> from the R command line to bring up a PDF carefully written by Prof.
> Ripley, that contains additional details the use of RODBC and some OS/DB
> specific documentation for the package.
>
> For further details on how we can better help you, see the R Posting Guide:
>
>  http://www.R-project.org/posting-guide.html
>
> Lastly, for future reference, there is an R-SIG-DB list:
>
>  https://stat.ethz.ch/mailman/listinfo/r-sig-db
>
> Marc Schwartz
>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] tyring to save plots using windoze 7 and cygwin

2010-10-03 Thread Mike Marchywka








> Date: Sat, 2 Oct 2010 16:35:03 -0700
> Subject: Re: [R] tyring to save plots using windoze 7 and cygwin
> From: jwiley.psych gmail.com
> To: marchy...@hotmail.com
> CC: r-help@r-project.org
>
> Hi Mike,
>

> > sessionInfo()
> R version 2.11.1 (2010-05-31)
> x86_64-pc-mingw32


> sessionInfo()
R version 2.11.1 (2010-05-31)
i386-pc-mingw32

locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base
>




> ##
> # Initialize pdf (or whatever) device
> pdf("myfile.pdf")
> # plot your graph
> plot(xyz, main="imp rate", xlab="Time(GMT)",ylab="imp/minute")
> # add the grid lines
> grid()
> # shut the device down
> dev.off()
> 


yes, that works fine thanks. I guess it has been a while :)

Hopefully now I can find the 3D plotting stuff and other things
I need as I seem to
recall many years ago it took a while to find...

Thanks.





>
> You would use a similar process for postscript(), png(), etc. What
> version of R are you using? I do not have the save.plot() function
> (at least in the packages that load by default). You can learn more
> by poking around the help pages ?dev.copy ?Devices ?windows
>
> HTH,
>
> Josh
>
> On Sat, Oct 2, 2010 at 3:43 PM, Mike Marchywka  wrote:
> >
> >
> > Hi,
> > I'd been using R in the past and recently installed it on a new windoze 7 
> > machine.
> > There have been many issues with compatibility and 32/64 bit apps etc and I 
> > did
> > find on google on isolated complaint that saveplot failed in scripts a long 
> > time ago.
> > R seems to work fine except script-based plot saving as pdf has not worked.
> > I have tried the following, none of which seem to function,
> >
> >
> > xyz <-read.table("time_frac2")
> > x=plot(xyz,main="imp rate", xlab="Time(GMT)",ylab="imp/minute")
> > grid()
> >
> > dev2bitmap("xxx.pdf",type="pdfwrite")
> > save.plot(x,file="xxx.pdf",format="pdf")
> > dev.copy("pdf","auto_pdf.pdf")
> > dev.off()
> > savePlot("./auto_hit_rate.pdf",type="pdf")
> >
> > q()
> >
> >
> >
> > Now apparently R does save the plot in a default file Rplots.pdf which is
> > just fine for my immediate needs but this may have limitations for future
> > usages.
> >
> > Just curious to know what other may have gotten to work or not work.
> > Thanks.
> >
> > - - - - - -
> >
> > Mike Marchywka | V.P. Technology
> >
> > 415-264-8477
> > marchy...@phluant.com
> >
> > Online Advertising and Analytics for Mobile
> > http://www.phluant.com
> >
> >
> >
> >
> > __
> > R-help@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
>
>
> --
> Joshua Wiley
> Ph.D. Student, Health Psychology
> University of California, Los Angeles
> http://www.joshuawiley.com/
  
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Output Graphics GIF

2010-10-03 Thread Mike Marchywka








Date: Sat, 2 Oct 2010 23:59:50 -0300
From: nilzabar...@gmail.com
To: tal.gal...@gmail.com
CC: r-help@r-project.org
Subject: Re: [R] Output Graphics GIF


On Mon, Sep 27, 2010 at 11:31 AM, Tal Galili  wrote:

> I am guessing you are saving the plot using the menu system.
>
> If that is the case, have a look at:
>
> ?pdf
> ?png
>
> Generally, I like saving my graphics to pdf since it is vectorized.

btw, is SVG supported at all? Now that you mention it that could be a good
option for some plots. I just used pdf earlier for testing but if you just have 
a simple
plot as a picture then an image format should be a better choice. I've always
complained about the cost-benefit for pdf compared to alternatives but
if used properly it can be a good choice in some cases ( I think I tried
to explain some objections I had to pdf files  on the itext mailing list, 
a package which may be of interest to the other poster intereted in 
manipulating pdf files). 

Use a format beneficial for the type of data you have. 

>
> Cheers,
> Tal
>
>

  
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] tyring to save plots using windoze 7 and cygwin

2010-10-02 Thread Mike Marchywka


Hi,
I'd been using R in the past and recently installed it on a new windoze 7 
machine.
There have been many issues with compatibility and 32/64 bit apps etc and I did
find on google on isolated complaint that saveplot failed in scripts a long 
time ago.
R seems to work fine except script-based plot saving as pdf has not worked.
I have tried the following, none of which seem to function,


xyz <-read.table("time_frac2")
x=plot(xyz,main="imp rate", xlab="Time(GMT)",ylab="imp/minute")
grid()

dev2bitmap("xxx.pdf",type="pdfwrite")
save.plot(x,file="xxx.pdf",format="pdf")
dev.copy("pdf","auto_pdf.pdf")
dev.off()
savePlot("./auto_hit_rate.pdf",type="pdf")

q()



Now apparently R does save the plot in a default file Rplots.pdf which is
just fine for my immediate needs but this may have limitations for future
usages.

Just curious to know what other may have gotten to work or not work.
Thanks.

- - - - - -

Mike Marchywka | V.P. Technology

415-264-8477
marchy...@phluant.com

Online Advertising and Analytics for Mobile
http://www.phluant.com



  
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] trouble with RODBC -- chopping off part of column names

2010-10-01 Thread Mike Williamson
Hello all,

I have a strange / interesting problem that might be 'R' settings
themselves, or it might be something with the OS.

I am using the RODBC library.  I have a script that goes out and, before
making a query for a big data set, will first query for the column names of
the data set.  The column names could sometimes be quite long (e.g., "Time
Background Estimation (seconds)" ).  If I make this query for the column
names from my Windows laptop or from a Windows server, using odbcConnect() &
sqlQuery(), I get the column names properly.  However, if I run this via
unix, it will chop off part of the column name.  (E.g., with "Time
Background Estimation (seconds)", it becomes "Time Background Estimation
(se", which is 30 characters long.)

Does anyone have a clue what might be causing this (settings in 'R',
something within unix, etc.)?  I am not even sure how to debug, and I can't
really get around this because I cannot simply query all of the columns, the
data set would become too large.

  Thanks!
      Mike




"Telescopes and bathyscaphes and sonar probes of Scottish lakes,
Tacoma Narrows bridge collapse explained with abstract phase-space maps,
Some x-ray slides, a music score, Minard's Napoleanic war:
The most exciting frontier is charting what's already here."
  -- xkcd

--
Help protect Wikipedia. Donate now:
http://wikimediafoundation.org/wiki/Support_Wikipedia/en

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] pairs and mfrow

2010-09-27 Thread Mike Harwood
Is there an alternative to par(mfrow=c(2,1)) to get stacked scatterplot
matrixes generated with "pairs"?

I am using version 2.11.1 on Windows XP.  The logic I am using follows, and
the second "pairs" plot replaces the first plot in the current graphics
device, which is not what I expected (or desired).

par(mfrow=c(2,1))
pairs(b2007, main="6/2000 - 12/2006")
pairs(a2007, main="1/2007 - 06/2009")

Thanks in advance!

Mike

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] plotting multiple animal tracks against Date/Time

2010-09-23 Thread Mike Rennie
By range on the y-axis, do you mean distance? It would have to be if time is
on your x? Or am I misreading this?

You could just plot() with the data for your first individual, and then add
additional individuals after that using lines(), specifying a different
colour and/or line type for each individual, and could even plot points as
well to identify your actual data collection points.

Alternatively, you could do a bunch of multi-panel plots with the same axes,
and individual data is given in each plot. It sounds like this is what you
want to do by referencing stackplot() (based on it's description, I've never
used it)? If so, see ?par for details, specifically mfrow, mfcol.

By doing it with calls to par(), I think you'll have more control over the
appearance of the plot than with stackplot().

Mike

On Thu, Sep 23, 2010 at 8:50 AM, Struve, Juliane wrote:

> Dear list,
>
> I would like to create a time series plot in which the paths of several
> individuals are stacked above each other, with the x-axis being the total
> observation period of three years ( 1.1.2004 to 31.12.2007) and the y-axis
> being  some defined range[min,max].
>
> My data consist of Date/Time information and the paths of 45 individual as
> the distance from the location of release. An example data set for 2
> individuals is given below.The observation period and frequency of
> observations varies between individuals.
>
> I believe stackplot() may be able to do this task, but I am not sure how to
> handle the variable time period and frequency of observations for different
> individuals. Could someone advise if stackplot() is suitable or if there is
> a better approach or package ?
>
> Thank you very much for your time and best wishes,
>
> Juliane
>
>
> Individual 1
>
> DateDistance [m]
>
> 2006-08-18 22:05:15 1815.798
> 2006-08-18 22:06:35 1815.798
> 2006-08-18 22:08:33 1815.798
> 2006-08-18 22:09:49 1815.798
> 2006-08-18 22:12:50 1815.798
> 2006-08-18 22:16:26 1815.798
>
> Individual 2
>
> Date  Distance [m]
> 2006-08-18 09:53:20  0.0
> 2006-08-18 09:59:07  0.0
> 2006-08-18 10:09:20  0.0
> 2006-08-18 10:21:14  0.0
> 2006-08-18 10:34:18  0.0
> 2006-08-18 10:36:44100.2
>
>
>
> 2 Date Distance
> 6  2006-08-18 09:53:20  0.0
> 7  2006-08-18 09:59:07  0.0
> 8  2006-08-18 10:09:20  0.0
> 9  2006-08-18 10:21:14  0.0
> 10 2006-08-18 10:34:18  0.0
> 11 2006-08-18 10:36:44100.2
> 006-03-1 22:05:15 1815.798
> 2006-03-18 22:06:35 1815.798
> 2006-03-18 22:08:33 1815.798
> 2006-03-18 22:09:49 1815.798
> 2006-03-18 22:12:50 1815.798
> 2006-03-18 22:16:26 1815.798
>
>
>
>
> Dr. Juliane Struve
> Imperial College London
> Department of Life Sciences
> Silwood Park Campus
> Buckhurst Road
> Ascot, Berkshire,
> SL5 7PY, UK
>
> Tel: +44 (0)20 7594 2527
> Fax: +44 (0)1344 874 957
>
> http://www.aquaticresources.org
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html<http://www.r-project.org/posting-guide.html>
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] merging multiple data frames

2010-09-23 Thread Mike Rennie
First, you might want to start by generating a new column to identify your
'pdf" and "bdf" or whatever once it's merged.

For the merging, see

?merge

But as someone's already pointed out, it's not clear what you are trying to
merge by.

Also, as your example calculations show, you don't need to merge it to to
the calcuations you want to do...

On Thu, Sep 23, 2010 at 8:53 AM, Steve Lianoglou <
mailinglist.honey...@gmail.com> wrote:

> Hi,
>
> On Thu, Sep 23, 2010 at 9:04 AM, rasanpreet 
> wrote:
> >
> > hi guys
> > i have multiple data frames which i want to  merge.
> > there are four of them..eg
>
> Can you provide a (correct) example of what you want your merged
> data.frame to look like?
> What column do you want to use in your data.frame to merge against?
> I'm guessing SampleID(?), but then again, these aren't unique in your
> `pdf` data.frame. For instance, what would the row for "SDM001" look
> like in your merged data.frame?
>
> -steve
>
> > pdf
> >
> > SampleID UVDose_J RepairHours   Day_0  Day_45  Day_90
> > 1SDM001  1.0   3 485.612 465.142 490.873
> > 2SDM001  1.0   3 503.658 457.863 487.783
> > 3SDM001  1.0   2 533.193 451.044 456.973
> > 4SDM001  1.0   2 538.334 452.887 474.915
> > 5SDM001  1.0   1 526.034 481.123 477.801
> > 6SDM001  1.0   1 546.543 472.322 481.546
> > 7SDM001  1.0   0  NA  NA  NA
> > 8SDM001  1.0   0  NA  NA  NA
> > 9SDM001  0.5   3 432.134 457.245 497.975
> > 10   SDM001  0.5   3 432.605 450.184 489.468
> > 11   SDM001  0.5   2 450.335 496.520 488.784
> > 12   SDM001  0.5   2 439.590 474.371 470.182
> > 13   SDM001  0.5   1 510.480 489.561 525.029
> > 14   SDM001  0.5   1 487.934 467.258 488.784
> > 15   SDM001  0.5   0  NA  NA  NA
> > 16   SDM001  0.5   0  NA  NA  NA
> > 20   SDM002  1.0   3 465.549 528.715 501.374
> > 21   SDM002  1.0   3 458.168 505.480 489.244
> > 22   SDM002  1.0   2 447.317 464.009 478.058
> > 23   SDM002  1.0   2 452.020 438.446 470.996
> > 24   SDM002  1.0   1 441.718 458.760 499.221
> > 25   SDM002  1.0   1 447.017 402.616 548.797
> > 26   SDM002  1.0   0  NA  NA  NA
> > 27   SDM002  1.0   0  NA  NA  NA
> > 28   SDM002  0.5   3 421.409 448.870 476.392
> > 29   SDM002  0.5   3 404.089 446.413 477.080
> > 30   SDM002  0.5   2 399.775 432.678 465.015
> > 31   SDM002  0.5   2 427.157 443.418 477.048
> > 32   SDM002  0.5   1 389.674 449.353 482.264
> > 33   SDM002  0.5   1 418.147 455.983 495.486
> > 34   SDM002  0.5   0  NA  NA  NA
> > 35   SDM002  0.5   0  NA  NA  NA
> > 39   SDM005  1.0   3 579.836 441.040 476.382
> > 40   SDM005  1.0   3 578.525 443.875 472.867
> > 41   SDM005  1.0   2 564.266 432.116 469.416
> > 42   SDM005  1.0   2 571.045 447.658 458.233
> > 43   SDM005  1.0   1 564.664 427.673 524.122
> > 44   SDM005  1.0   1 568.182 458.039 477.237
> > 45   SDM005  1.0   0  NA  NA  NA
> > 46   SDM005  1.0   0  NA  NA  NA
> > 47   SDM005  0.5   3 556.534 424.786 501.658
> > 48   SDM005  0.5   3 474.027 441.418 507.635
> > 49   SDM005  0.5   2 481.355 430.346 468.021
> > 50   SDM005  0.5   2 478.922 466.933 471.025
> > 51   SDM005  0.5   1 505.539 937.759 460.985
> > 52   SDM005  0.5   1 497.913 457.932 493.152
> > 53   SDM005  0.5   0  NA  NA  NA
> > 54   SDM005  0.5   0  NA  NA  NA
> > 58   SDM006  1.0   3 589.164 459.578 509.565
> > 59   SDM006  1.0   3 608.477 480.233 519.785
> > 60   SDM006  1.0   2 598.354 449.266 487.058
> > 61   SDM006  1.0   2 617.823 456.908 507.467
> > 62   SDM006  1.0   1 566.477 500.189 526.744
> > 63   SDM006  1.0   1 622.170 462.463 550.675
> > 64   SDM006  1.0   0  NA  NA  NA
> > 65   SDM006  1.0   0  NA  NA  NA
> > 66   SDM006  0.5   3 546.472 457.880 468.129
> > 67   SDM006  0.5   3 525.069 444.575 505.154
> > 68   SDM006  0.5   2 569.068 446.196 473.739
> > 69   SDM006  0.5   2 534.205 470.366 476.570
> >
> > bdf
> > SampleID UVDose_J RepairHoursDay_0   Day_45  Day_90
> > 17SDM001  0.5   B  88.6145 388.3575 198.467
> > 36SDM002  0.5   B 100.0760 384.9505 234.740
> > 55SDM005  0.5   B 121.9595 300.3650 241.832
> > 74SDM006  0.5   B 174.7

Re: [R] Plotting densities

2010-09-23 Thread Mike Rennie
In your call to polygon(), include lty="dashed" or "dotted" or whatever you
want your line type to look like.

take a good look at all the options in

?par

For everything you can customize in plots. Alternatively, Paul Murrel's R
Graphics book is the best reference I know for this sort of stuff.

Mike

On Thu, Sep 23, 2010 at 8:10 AM, Ralf B  wrote:

> Hi group,
>
> I am currently plotting two densities using the following code:
>
> x1 <- c(1,2,1,3,5,6,6,7,7,8)
> x2 <- c(1,2,1,3,5,6,5,7)
> plot(density(x1, na.rm = TRUE))
> polygon(density(x2, na.rm = TRUE), border="blue")
>
> However, I would like to avoid bordering the second density as it adds
> a nasty bottom line which I would like to avoid.
> I would also rather have a dashed or dotted line for the second
> (currently blue) density but without the bottom part.
> Any idea how to do that?
>
> Best,
> Ralf
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html<http://www.r-project.org/posting-guide.html>
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Extracting bins and frequencies from frequency table

2010-09-22 Thread Mike Rennie
Hi Ralf

try hist()

obl<-hist(x1, plot=FALSE)

it returns midpoints and their respective frequencies. You can specify the
breakpoints as well.

?hist

for details.

Mike

On Wed, Sep 22, 2010 at 1:44 PM, Ralf B  wrote:

> Dear R users,
>
> I would like to great a frequency table from raw data and then access
> the classes/bins and
> their respective frequencies separately. Here the code to create the
> frequency tables:
>
>
> x1 <- c(1,5,1,1,2,2,3,4,5,3,2,3,6,4,3,8)
> t1 <- table(x1)
> print(t1[1])
>
> Its easy to plot this, but how do I actually access the frequencies
> alone and the bins alone?
> Basically I am looking to get:
>
> bins <- c(1, 2, 3, 4, 5, 6, 8)
> freq <- c(3, 3, 4, 2, 2, 1, 1)
>
> When running
>
> print(t1[1])
>
> I only get one pair. It seems to be organized that way. Is there a
> better way? Perhaps 'table' is not the right approach?
>
> Thanks a lot,
> Ralf
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html<http://www.r-project.org/posting-guide.html>
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] [R-pkgs] ez version 2.0

2010-09-01 Thread Mike Lawrence
The ez package was developed to aid those that are new to statistical
programming. Over the course of several years of helping colleagues
and students learn R, I observed that folks are often initially turned
off R because they have difficulty obtaining SPSS-like results quickly
(SPSS is the dominant environment in my field, psychology). ez
attempts to fill this gap, providing quick and easy analysis and
graphics for common experimental designs. By easing the early portions
of the R learning curve, ez hopes to promote the spread of R as a
means of open source and reproducible analysis. ez now also attempts
to pique interest in cutting-edge statistical methods by providing
easy specification and visualization of simple* mixed effects models.

*--> mixed effects models are limited to those with a single random
effect (eg. Participant) and no numeric predictors.


New in version 2.0
- ezDesign(), a function to visualize the balance of data across a
specified experimental design (useful for diagnosing missing data)
- ezPrecis(), a function to summarize a data set (inspired by
summary(), str(), and YaleToolkit:whatis() )
- ezBoot() and ezPlotBoot(), functions to compute and visualize
(respectively) bootstrapped confidence intervals for either cell means
or predictions from a mixed effects model
- ezANOVA() updated with an improved measure of effect size:
generalized eta-square.
- ezPlot() updated to permit simultaneous plotting of multiple DV's,
with each mapped to a row of the plot facets.
- see changelog for further changes


Enjoy!

Mike

--
Mike Lawrence
Graduate Student
Department of Psychology
Dalhousie University

Looking to arrange a meeting? Check my public calendar:
http://tr.im/mikes_public_calendar

~ Certainty is folly... I think. ~

___
R-packages mailing list
r-packa...@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-packages

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Odp: Finding pairs

2010-08-25 Thread Mike Rhodes
Dear Mr Petr PIKAL
After reading the R code provided by you, I realized that I would have never 
figured out how this could have been done. I am going to re-read again and 
again your code to understand the logic and the commands you have provided.
Thanks again from the heart for your kind advice.
Regards
Mike

--- On Wed, 25/8/10, Petr PIKAL  wrote:

From: Petr PIKAL 
Subject: Re: [R] Odp:  Finding pairs
To: "Mike Rhodes" 
Cc: r-help@r-project.org
Date: Wednesday, 25 August, 2010, 9:01

Hm

r-help-boun...@r-project.org napsal dne 25.08.2010 09:43:26:

> Dear Mr Petr Pikal
> 
> I am extremely sorry for the manner I have raised the query. Actually 
that was
> my first post to this R forum and in fact even I was also bit confused 
while 
> drafting the query, for which I really owe sorry to all for consuming 
the 
> precious time. Perhaps I will try to redraft my query in a better way as 
follows. 
> 
> I have two datasets "A" and "B" containing the names of branch offices 
of a 
> particular bank say XYZ plc bank. The XYZ bank has number of main branch 

> offices (say Parent) and some small branch offices falling under the 
purview 
> of these main branch offices (say Child).
> 
> The datalist "A" and "B" consists of these main branch office names as 
well as
> small branch office names. B is subset of A and these branch names are 
coded. 
> Thus we have two datasets A and B as (again I am using only a
>  portion of a large database just to have some idea)
> 
> 
> A                         B
> 144                      
                       what is here in B? Empty space?, 
> 145                       
> 146                       
> 147                  144                        

How do you know that 144 from B relates to 147 in A? Is it according to 
its positions? I.e. 4th item in B belongs to 4.th item in A?

> 148                  145  
> 
> 149                  147
> 151                  148
> 
> 
> 
> Now the branch 144 appears in A as well as in B and in B it is mapped 
with 
> 147. This means branch 147 comes under the purview of main branch 144. 
Again 
> 147 is controlling the branch 149 (since 147 also has appeared in B and 
is 
> mapped with 149 of A).
> 
> Similarly, branch 145 is controlling branch 148 which further controls 
> operations of bank branch 151 and like wise.

Well as you did not say anything about structure of your data
A<-144:151
B<-144:148
data.frame(A,B)
    A   B
1 144  NA
2 145  NA
3 146  NA
4 147 144
5 148 145
6 149 146
7 150 147
8 151 148
DF<-data.frame(A,B)
main<-DF$A[is.na(DF$B)]
branch1<-DF[!is.na(DF$B),]
selected.branch1<-branch1$A[branch1$B%in%main]
branch2<-branch1[!branch1$B%in%main,]
selected.branch2<-branch2$A[branch2$B%in%selected.branch1]

and for cbinding your data which has uneven number of values see Jim 
Holtman's answer to this

How to cbind DF:s with differing number of rows?

Regards
Petr


> 
> So in the end I need an output something like -
> 
> Main Branch           Branch office1                 Branch
>  office2
> 144                             147                                 149
> 145                             148                                 151 
   
> 146                             NA
>                                   NA               
> 
...
> 
..
> 
>  
> I understand again I am not able to put forward my query properly. But I 
must 
> thank all of you for giving a patient reading to my query and for 
reverting 
> back earlier. Thanks once again.
> 
> With warmest regards
> 
> Mike 
> 
> 
> --- On Wed, 25/8/10, Petr PIKAL  wrote:
> 
> From: Petr PIKAL 
> Subject: Odp: [R] Finding
>  pairs
> To: "Mike Rhodes" 
> Cc: r-help@r-project.org
> Date: Wednesday, 25 August, 2010, 6:39
> 
> Hi
> 
> without other details it is probably impossible to give you any 
reasonable 
> advice. Do you have your data already in R? What is their form? Are they 

> in 2 columns in data frame? How did you get them paired?
> 
> So without some more information probably nobody will invest his time as 

> it seems no trivial to me.
> 
> Regards
> Petr
> 
> r-help-boun...@r-project.org napsal dne 24.08.2010 20:28:42:
> 
> > 
> > 
> > 
> > 
> > Dear R Helpers,
> > 
> > 
> > I am a newbie and recently got introduced to R. I have a large 
database 
> > containing the names of bank branch offices along-with other details. 
I 
> am 
> > into Operational
>  Risk as envisaged by BASEL II Accord. 

Re: [R] Odp: Finding pairs

2010-08-25 Thread Mike Rhodes
Dear Mr Petr Pikal

I am extremely sorry for the manner I have raised the query. Actually that was 
my first post to this R forum and in fact even I was also bit confused while 
drafting the query, for which I really owe sorry to all for consuming the 
precious time. Perhaps I will try to redraft my query in a better way as 
follows. 

I have two datasets "A" and "B" containing the names of branch offices of a 
particular bank say XYZ plc bank. The XYZ bank has number of main branch 
offices (say Parent) and some small branch offices falling under the purview of 
these main branch offices (say Child).

The datalist "A" and "B" consists of these main branch office names as well as 
small branch office names. B is subset of A and these branch names are coded. 
Thus we have two datasets A and B as (again I am using only a
 portion of a large database just to have some idea)


A B
144  
145   
146   
147                  144    
148  145  
 
149  147
151  148



Now the branch 144 appears in A as well as in B and in B it is mapped with 147. 
This means branch 147 comes under the purview of main branch 144. Again 147 is 
controlling the branch 149 (since 147 also has appeared in B and is mapped with 
149 of A).

Similarly, branch 145 is controlling branch 148 which further controls 
operations of bank branch 151 and like wise.

So in the end I need an output something like -

Main Branch   Branch office1 Branch
 office2
144 147 149
145 148 151    
146 NA
  NA   
...
..

 
I understand again I am not able to put forward my query properly. But I must 
thank all of you for giving a patient reading to my query and for reverting 
back earlier. Thanks once again.

With warmest regards

Mike 


--- On Wed, 25/8/10, Petr PIKAL  wrote:

From: Petr PIKAL 
Subject: Odp: [R] Finding
 pairs
To: "Mike Rhodes" 
Cc: r-help@r-project.org
Date: Wednesday, 25 August, 2010, 6:39

Hi

without other details it is probably impossible to give you any reasonable 
advice. Do you have your data already in R? What is their form? Are they 
in 2 columns in data frame? How did you get them paired?

So without some more information probably nobody will invest his time as 
it seems no trivial to me.

Regards
Petr

r-help-boun...@r-project.org napsal dne 24.08.2010 20:28:42:

> 
> 
> 
> 
> Dear R Helpers,
> 
> 
> I am a newbie and recently got introduced to R. I have a large database 
> containing the names of bank branch offices along-with other details. I 
am 
> into Operational
 Risk as envisaged by BASEL II Accord. 
> 
> 
> I am trying to express my problem and I am using only an indicative data 
which
> comes in coded format.
> 
> 
> 
> 
> A (branch)                      B (controlled by)
> 
> 
> 144                   
> 145                      
> 146                   
> 147                                       144 
> 148                                       145 
> 149       
                                147
> 151                                       146  
>  ..                                      ...
>  
> ..                                      ...
> 
> 
> where 144's etc are branch codes in a given city and B is subset of A.
> 
> 
> 
> 
> If a branch code appearing in "A" also appears in "B" (which is paired 
with 
> some otehr element of A e.g. 144 appearing in A, also appears in "B" and 
is 
> paired with 147 of "A" and
 likewise), then that means 144 is controlling 

> operations of bank office 147. Again, 147 itself appears again in B and 
is 
> paired with bank branch coded 149. Thus, 149 is controlled by 147 and 
147 is 
> controlled by 144. Likewise there are more than 700 hundred branch name 
codes available.
> 
> 
> My objective is to group them as follows -
> 
> 
> Bank Branch
> 
> 
> 144      147    149 
> 
> 
> 145
> 
> 
> 146       151  
> 
> 
> 148
> .....
> 
> 
> or even the following output will do.
> 
> 
> 144
> 147
> 149
> 
> 
> 145
> 
> 
> 146
> 151
> 
> 
> 148
> 151
> ..
> 
> 
> I understand I should be writing some R
 code to begin with which I had 
tried 
> also but as of now I am helpless. Please guide 

[R] Finding pairs

2010-08-24 Thread Mike Rhodes




Dear R Helpers,


I am a newbie and recently got introduced to R. I have a large database 
containing the names of bank branch offices along-with other details. I am into 
Operational Risk as envisaged by BASEL II Accord. 


I am trying to express my problem and I am using only an indicative data which 
comes in coded format.




A (branch)                      B (controlled by)


144                   
145                      
146                   
147                                       144 
148                                       145 
149                                       147
151                                       146  
 ..                                      ...
 
..                                      ...


where 144's etc are branch codes in a given city and B is subset of A.




If a branch code appearing in "A" also appears in "B" (which is paired with 
some otehr element of A e.g. 144 appearing in A, also appears in "B" and is 
paired with 147 of "A" and likewise), then that means 144 is controlling 
operations of bank office 147. Again, 147 itself appears again in B and is 
paired with bank branch coded 149. Thus, 149 is controlled by 147 and 147 is 
controlled by 144. Likewise there are more than 700 hundred branch name codes 
available.


My objective is to group them as follows -


Bank Branch


144      147    149 


145


146       151  


148
.


or even the following output will do.


144
147
149


145


146
151


148
151
..


I understand I should be writing some R code to begin with which I had tried 
also but as of now I am helpless. Please guide me.


Mike




  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] odd behavior of "summary" function

2010-08-24 Thread Mike Williamson
Hello All,

Using the standard "summary" function in 'R', I ran across some odd
behavior that I cannot understand.  Easy to reproduce:

Typing:

   summary(c(6,207936))

Yields::

   Min. *1st Qu.  MedianMean 3rd Qu.Max.*
  6   *51990  104000  104000  156000  207900*


None of these values are correct except for the minimum.  If I perform
"quantile(c(6, 207936))", it gives the correct values.  I originally
presumed that summary was merely calling "quantile" if it saw a numeric, but
this doesn't seem to be the case.
Anyone know what's going on here?  On a related note, what is the
statistically correct answer for calculating the 1st quartile & 3rd quartile
when only 2 values are present?  I presume one takes the mid-point between
the median (also calculated) and the min or max.  So in this case, 51988.5
for 1st & 155953.5 for 3rd (which is what quantile calculates).  But taking
25% & 75% of the sum of the 2 also seems "reasonable".  Either way,
"summary" is calculating the wrong number, and most disturbing is that it
mis-calculates the max.

    Regards,
Mike


"Telescopes and bathyscaphes and sonar probes of Scottish lakes,
Tacoma Narrows bridge collapse explained with abstract phase-space maps,
Some x-ray slides, a music score, Minard's Napoleanic war:
The most exciting frontier is charting what's already here."
  -- xkcd

--
Help protect Wikipedia. Donate now:
http://wikimediafoundation.org/wiki/Support_Wikipedia/en

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] some helpful tips on using RODBC

2010-08-13 Thread Mike Williamson
predict_value
 "float"

Another thing I find handy is to pre-generate the primary key in the data
table, since this is usually some unique numerical identifier, but otherwise
just gibberish.  Below I have written a wrapper function that works quite
nicely for me, and that I hope others out there might find handy.

A HUGE thanks to Dr. Ripley for making an excellent package!!

  Regards,
 Mike






db.populate <- function(dataSet=NULL, dbTable=NULL, primeKey=NULL,
db="blah", check.names=TRUE,
verbose=FALSE, safer=TRUE, fast=TRUE,
test=FALSE,
nastring=NULL) {
iAm <- "db.populate"
if (is.null(dataSet) | is.null(dbTable))
stop(paste(iAm,": Both \"dataSet\" and \"dbtable\" variables must
be",
   " provided.", sep=""))
### connect to the database & query tables.
dbCon <- odbcConnect(db)
tmp <- sqlColumns(dbCon, dbTable)
varTypes <- as.character(tmp$TYPE_NAME)
names(varTypes) <- as.character(tmp$COLUMN_NAME)

if (!is.null(primeKey)) {
myQuery <- paste("Select max(",primeKey,") from ",dbTable)
primeKeys <- seq(1,dim(dataSet)[1]) + sqlQuery(dbCon, myQuery)[1,1]
dataSet <- cbind(primeKeys,dataSet) ; names(dataSet)[1] <- primeKey
} ## end if clause to create prime Key
if (check.names) {
if (length(setdiff(names(dataSet),names(varTypes)))!=0) {
message(paste(iAm,": column names of \"dataSet\" do not match",
  " those of \"dbTable\", ",dbTable, sep=""))
message("\n\tNames of \"dataSet\":")
print(names(dataSet))
message(paste("\n\tNames in \"dbTable\",",dbTable,":"))
print(names(varTypes))
stop(paste(iAm,": stopped due to this mis-match.", sep=""))
} ## end if clause to see if "dataSet" & "dbTable" names match
dataSet <- dataSet[,names(varTypes)]

} ## end if clause to check names & re-arrange "dataSet" as needed

message(paste(iAm, ": populating table ",dbTable, " with \"dataSet\"",
  sep=""))
sqlSave(dbCon, dataSet, dbTable, append=TRUE, rownames=FALSE,
verbose=verbose, safer=safer, varTypes=varTypes, fast=fast,
test=test, nastring=nastring)
odbcClose(dbCon)
message(paste(iAm, ": added \"dataSet\" to table ",dbTable,
  sep=""))
if (verbose) {
message(paste(iAm, ": ",dim(dataSet)[1]," rows added.", sep=""))
if (!is.null(primeKey)) {
message(paste(iAm, ": primary key ",primeKey," updated."))
message(paste("\t",primeKey," values from ",primeKeys[1],
  " to ",primeKeys[length(primeKeys)],
  "are the newly updated data", sep=""))
}


}






Finally, a few words to act as good keys if someone out there does a search
for info:

R
R-help
RODBC
ODBC
database
SQL
table
sqlSave
obdcConnect

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] compare gam fits

2010-08-05 Thread Mike Lawrence
Hi folks,

I originally tried R-SIG-Mixed-Models for this one
(https://stat.ethz.ch/pipermail/r-sig-mixed-models/2010q3/004170.html),
but I think that the final steps to a solution aren't mixed-model
specific, so I thought I'd ask my final questions here.

I used gamm4 to fit a generalized additive mixed model to data from a
AxBxC design, where A is a random effect (human participants in an
experiment), B is a 2-level factor predictor variable, and C is a
continuous variable that is likely non-linear. I tell gamm4 to fit a
smooth across C to each level of B independently, and I can use
predict.gam(...,se.fit=T) to obtain predictions from the fitted model
as well as the standard error for the prediction. I'd like to
visualize the BxC interaction to see if smoothing C within each level
of B was really necessary, and if so, where it is (along the C
dimension) that B affects the smooth. It's easy enough to obtain the
predicted B1-B2 difference function, but I'm stuck on how to convey
the uncertainty of this function (e.g. computing the confidence
interval of the difference at each value of C).

One thought is that predict.gam(...,se.fit=T) returns SE values, so if
I could find out the N on which these SE values are computed, I could
compute the difference CI as
sqrt( ( (SD_B1)^2 + (SD_B2)^2 ) / N ) * qt( .975, df=N-1 )

However, I can't seem to figure out what value of N was used to
compute the SEs that predict.gam(...,se.fit=T) produces. Can anyone
point me to where I might find N?

Further, is N-1 the proper df for the call to qt()?

Finally, with a smooth function and 95% confidence intervals computed
at each of a large number of points, don't I run into a problem of an
inflated Type I error rate? Or does the fact that each point is not
independent from those next to it make this an inappropriate concern?

Cheers,

Mike

-- 
Mike Lawrence
Graduate Student
Department of Psychology
Dalhousie University

Looking to arrange a meeting? Check my public calendar:
http://tr.im/mikes_public_calendar

~ Certainty is folly... I think. ~

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] ggplot2 histograms... a subtle error found

2010-07-28 Thread Mike Williamson
Hello all,

I have a peculiar and particular bug that I stumbled across with
ggplot2.  I cannot seem to replicate it with anything other than my specific
data set.

Here is the problem:

   - when I try to plot a histogram, allowing for ggplot2 to decide the
   binwidths itself, I get the following error:
  - stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to
  adjust this.
  - Error: position_stack requires constant width

My code is simply:

ggplot(data=myDataSet, aes(x=myVarOI)) + geom_histogram()

or

qplot(myDataSet$myVarOI)

If I go ahead and set the binwidth to some value, then the plot can be
made without problems.

The problem is with the specific data that it is trying to plot.  I
suspect it is trying to create bins of different sizes, from the error
code.  Here are the basics of my data set:

   - length:  1936 entries
   - 1906 unique entries
   - stats:
   -  Min.   1st Qu.Median  Mean   3rd Qu.  Max.
   3.200e+09 6.312e+09 6.591e+09 6.874e+09 7.551e+09 1.083e+10



I cannot imagine this can be solved without my specifically uploading
the actual data.  If I simply attach it, will it be received by r-help?
Hadley, if you're interested, would you like me to send you the data
directly to you?

  Regards,
     Mike





"Telescopes and bathyscaphes and sonar probes of Scottish lakes,
Tacoma Narrows bridge collapse explained with abstract phase-space maps,
Some x-ray slides, a music score, Minard's Napoleanic war:
The most exciting frontier is charting what's already here."
  -- xkcd

--
Help protect Wikipedia. Donate now:
http://wikimediafoundation.org/wiki/Support_Wikipedia/en

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] question regarding "varImpPlot" results vs. model$importance data on package "RandomForest"

2010-07-13 Thread Mike Williamson
Hi everyone,

I have another "Random Forest" package question:

   - my (presumably incorrect) understanding of the varImpPlot is that it
   should plot the "% increase in MSE" and "IncNodePurity" exactly as can be
   found from the "importance" section of the model results.
  - However, the plot does not, in fact, match the "importance" section
  of the random forest model.

E.g., if you use the example given in the ?randomForest, you will see
the plot showing the highest few "%IncMSE" values around 17 or 18%.  But if
you look at the $importance, it is 9.7, 9.4, 7.7, and 7.3.  Perhaps more
importantly, for the plot, it will show "wt" is highest %MSE, then "disp",
then "cyl", then "hp"; whereas the $importance will show "wt", then "disp",
then "hp", then "cyl".  And the ratios look somewhat different, too.
Here is the code for that example:

set.seed(4543)
data(mtcars)
mtcars.rf <- randomForest(mpg ~ ., data=mtcars, ntree=1000,
keep.forest=FALSE,
importance=TRUE)
varImpPlot(mtcars.rf)

I am using version 2.11.1 of 'R' and version 4.5-35 of Random Forest.

I don't really care or need for the varImpPlot to work just right.  But
I am not sure which is accurate:  the varImpPlot or the $importance
section.  Which should I trust more, especially when they disagree
appreciably?

 Thanks!
 Mike



"Telescopes and bathyscaphes and sonar probes of Scottish lakes,
Tacoma Narrows bridge collapse explained with abstract phase-space maps,
Some x-ray slides, a music score, Minard's Napoleanic war:
The most exciting frontier is charting what's already here."
  -- xkcd

--
Help protect Wikipedia. Donate now:
http://wikimediafoundation.org/wiki/Support_Wikipedia/en

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] anyone know why package "RandomForest" na.roughfix is so slow??

2010-07-01 Thread Mike Williamson
Hadley,

Thanks!  Yes... as.data.frame() is quite slow.  (And it forces the
column names to become "acceptable" names, which is a hassle to fix all the
time.)  I just hadn't thought of something as clever as what you wrote
below.

I'll try out this suggestion.  :)

  Mike

"Telescopes and bathyscaphes and sonar probes of Scottish lakes,
Tacoma Narrows bridge collapse explained with abstract phase-space maps,
Some x-ray slides, a music score, Minard's Napoleanic war:
The most exciting frontier is charting what's already here."
 -- xkcd

--
Help protect Wikipedia. Donate now:
http://wikimediafoundation.org/wiki/Support_Wikipedia/en


On Thu, Jul 1, 2010 at 5:07 PM, Hadley Wickham  wrote:

> Here's another version that's a bit easier to read:
>
> na.roughfix2 <- function (object, ...) {
>  res <- lapply(object, roughfix)
>  structure(res, class = "data.frame", row.names = seq_len(nrow(object)))
> }
>
> roughfix <- function(x) {
>  missing <- is.na(x)
>  if (!any(missing)) return(x)
>
>  if (is.numeric(x)) {
>x[missing] <- median.default(x[!missing])
>  } else if (is.factor(x)) {
>freq <- table(x)
>x[missing] <- names(freq)[which.max(freq)]
>  } else {
>stop("na.roughfix only works for numeric or factor")
>  }
>  x
> }
>
> I'm cheating a bit because as.data.frame is so slow.
>
> Hadley
>
> On Thu, Jul 1, 2010 at 6:44 PM, Mike Williamson 
> wrote:
> > Jim, Andy,
> >
> >Thanks for your suggestions!
> >
> >I found some time today to futz around with it, and I found a "home
> > made" script to fill in NA values to be much quicker.  For those who are
> > interested, instead of using:
> >
> >  dataSet <- na.roughfix(dataSet)
> >
> >
> >
> >I used:
> >
> >origCols <- names(dataSet)
> >## Fix numeric values...
> >dataSet <- as.data.frame(lapply(dataSet,
> FUN=function(x)
> > {
> >if(!is.numeric(x)) { x } else {
> >ifelse(is.na(x), median(x, na.rm=TRUE), x) }
> }
> > ),
> > row.names=row.names(dataSet)
> )
> >## Fix factors...
> >dataSet <- as.data.frame(lapply(dataSet,
> FUN=function(x)
> > {
> >if(!is.factor(x)) { x } else {
> >levels(x)[ifelse(!is.na
> > (x),x,table(max(table(x)))
> >  ) ] } } ),
> > row.names=row.names(dataSet)
> )
> >names(dataSet) <- origCols
> >
> >
> >
> >In one case study that I ran, the na.roughfix() algo took 296 seconds
> > whereas the homemade one above took 16 seconds.
> >
> >  Regards,
> >Mike
> >
> >
> >
> > "Telescopes and bathyscaphes and sonar probes of Scottish lakes,
> > Tacoma Narrows bridge collapse explained with abstract phase-space maps,
> > Some x-ray slides, a music score, Minard's Napoleanic war:
> > The most exciting frontier is charting what's already here."
> >  -- xkcd
> >
> > --
> > Help protect Wikipedia. Donate now:
> > http://wikimediafoundation.org/wiki/Support_Wikipedia/en
> >
> >
> > On Thu, Jul 1, 2010 at 10:05 AM, Liaw, Andy  wrote:
> >
> >>  You need to isolate the problem further, or give more detail about your
> >> data.  This is what I get:
> >>
> >> R> nr <- 2134
> >> R> nc <- 14037
> >> R> x <- matrix(runif(nr*nc), nr, nc)
> >> R> n.na <- round(nr*nc/10)
> >> R> x[sample(nr*nc, n.na)] <- NA
> >> R> system.time(x.fixed <- na.roughfix(x))
> >>user  system elapsed
> >>8.440.398.85
> >> R 2.11.1, randomForest 4.5-35, Windows XP (32-bit), Thinkpad T61 with
> 2GB
> >> ram.
> >>
> >> Andy
> >>
> >>  --
> >> *From:* Mike Williamson [mailto:this.is@gmail.com]
> >> *Sent:* Thursday, July 01, 2010 12:48 PM
> >> *To:* Liaw, Andy
> >> *Cc:* r-help
> >> *Subject:* Re: [R] anyone know why package "RandomForest" na.roughfix is
> >> so slow??
> >>
> >> Andy,
> >>
&g

Re: [R] anyone know why package "RandomForest" na.roughfix is so slow??

2010-07-01 Thread Mike Williamson
Jim, Andy,

Thanks for your suggestions!

I found some time today to futz around with it, and I found a "home
made" script to fill in NA values to be much quicker.  For those who are
interested, instead of using:

  dataSet <- na.roughfix(dataSet)



I used:

origCols <- names(dataSet)
## Fix numeric values...
dataSet <- as.data.frame(lapply(dataSet, FUN=function(x)
{
if(!is.numeric(x)) { x } else {
ifelse(is.na(x), median(x, na.rm=TRUE), x) } }
),
 row.names=row.names(dataSet) )
## Fix factors...
dataSet <- as.data.frame(lapply(dataSet, FUN=function(x)
{
if(!is.factor(x)) { x } else {
levels(x)[ifelse(!is.na
(x),x,table(max(table(x)))
  ) ] } } ),
 row.names=row.names(dataSet) )
names(dataSet) <- origCols



In one case study that I ran, the na.roughfix() algo took 296 seconds
whereas the homemade one above took 16 seconds.

  Regards,
    Mike



"Telescopes and bathyscaphes and sonar probes of Scottish lakes,
Tacoma Narrows bridge collapse explained with abstract phase-space maps,
Some x-ray slides, a music score, Minard's Napoleanic war:
The most exciting frontier is charting what's already here."
 -- xkcd

--
Help protect Wikipedia. Donate now:
http://wikimediafoundation.org/wiki/Support_Wikipedia/en


On Thu, Jul 1, 2010 at 10:05 AM, Liaw, Andy  wrote:

>  You need to isolate the problem further, or give more detail about your
> data.  This is what I get:
>
> R> nr <- 2134
> R> nc <- 14037
> R> x <- matrix(runif(nr*nc), nr, nc)
> R> n.na <- round(nr*nc/10)
> R> x[sample(nr*nc, n.na)] <- NA
> R> system.time(x.fixed <- na.roughfix(x))
>user  system elapsed
>8.440.398.85
> R 2.11.1, randomForest 4.5-35, Windows XP (32-bit), Thinkpad T61 with 2GB
> ram.
>
> Andy
>
>  --
> *From:* Mike Williamson [mailto:this.is@gmail.com]
> *Sent:* Thursday, July 01, 2010 12:48 PM
> *To:* Liaw, Andy
> *Cc:* r-help
> *Subject:* Re: [R] anyone know why package "RandomForest" na.roughfix is
> so slow??
>
> Andy,
>
> You're right, I didn't supply any code, because my call was very simple
> and it was the call itself at question.  However, here is the associated
> code I am using:
>
>
> naFixTime <- system.time( {
> if (fltrResponse) {  ## TRUE: there are no NA's in the
> response... cleared via earlier steps
> message(paste(iAm,": Missing values will now be
> imputed...\n", sep=""))
> try( dataSet <- rfImpute(dataSet[,!is.element(names(dataSet),
> response)],
>  dataSet[,response]) )
> } else {  ## In this case, there is no "response" column in the
> data set
> message(paste(iAm,": Missing values will now be filled in
> with median",
>   " values or most frequent levels", sep=""))
> try( dataSet <- na.roughfix(dataSet) )
> }
> } )
>
>
>
> As you can see, the "na.roughfix" call is made as simply as possible:
> I supply the entire dataSet (only parameters, no responses).  I am not doing
> the prediction here (that is done later, and the prediction itself is not
> taking very long).
> Here are some calculation times that I experienced:
>
> # rows   # cols   time to run na.roughfix
> === === 
>   2046  2833 ~ 2 minutes
>   2066  5626 ~ 6 minutes
>   2134 14037 ~ 30 minutes
>
> These numbers are on a Windows server using the 64-bit version of 'R'.
>
>   Regards,
>Mike
>
>
> "Telescopes and bathyscaphes and sonar probes of Scottish lakes,
> Tacoma Narrows bridge collapse explained with abstract phase-space maps,
> Some x-ray slides, a music score, Minard's Napoleanic war:
> The most exciting frontier is charting what's already here."
>  -- xkcd
>
> --
> Help protect Wikipedia. Donate now:
> http://wikimediafoundation.org/wiki/Support_Wikipedia/en
>
>
> On Thu, Jul 1, 2010 at 8:58 AM, Liaw, Andy

Re: [R] anyone know why package "RandomForest" na.roughfix is so slow??

2010-07-01 Thread Mike Williamson
Andy,

You're right, I didn't supply any code, because my call was very simple
and it was the call itself at question.  However, here is the associated
code I am using:


naFixTime <- system.time( {
if (fltrResponse) {  ## TRUE: there are no NA's in the
response... cleared via earlier steps
message(paste(iAm,": Missing values will now be
imputed...\n", sep=""))
try( dataSet <- rfImpute(dataSet[,!is.element(names(dataSet),
response)],
 dataSet[,response]) )
} else {  ## In this case, there is no "response" column in the
data set
message(paste(iAm,": Missing values will now be filled in
with median",
  " values or most frequent levels", sep=""))
try( dataSet <- na.roughfix(dataSet) )
}
} )



As you can see, the "na.roughfix" call is made as simply as possible:  I
supply the entire dataSet (only parameters, no responses).  I am not doing
the prediction here (that is done later, and the prediction itself is not
taking very long).
Here are some calculation times that I experienced:

# rows   # cols   time to run na.roughfix
=== === 
  2046  2833 ~ 2 minutes
  2066  5626 ~ 6 minutes
  2134 14037 ~ 30 minutes

These numbers are on a Windows server using the 64-bit version of 'R'.

  Regards,
   Mike


"Telescopes and bathyscaphes and sonar probes of Scottish lakes,
Tacoma Narrows bridge collapse explained with abstract phase-space maps,
Some x-ray slides, a music score, Minard's Napoleanic war:
The most exciting frontier is charting what's already here."
 -- xkcd

--
Help protect Wikipedia. Donate now:
http://wikimediafoundation.org/wiki/Support_Wikipedia/en


On Thu, Jul 1, 2010 at 8:58 AM, Liaw, Andy  wrote:

> You have not shown any code on exactly how you use na.roughfix(), so I
> can only guess.
>
> If you are doing something like:
>
>  randomForest(y ~ ., mybigdata, na.action=na.roughfix, ...)
>
> I would not be surprised that it's taking very long on large datasets.
> Most likely it's caused by the formula interface, not na.roughfix()
> itself.
>
> If that is your case, try doing the imputation beforehand and run
> randomForest() afterward; e.g.,
>
> myroughfixed <- na.roughfix(mybigdata)
> randomForest(myroughfixed[list.of.predictor.columns],
> myroughfixed[[myresponse]],...)
>
> HTH,
> Andy
>
> -Original Message-
> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
> On Behalf Of Mike Williamson
> Sent: Wednesday, June 30, 2010 7:53 PM
> To: r-help
> Subject: [R] anyone know why package "RandomForest" na.roughfix is so
> slow??
>
> Hi all,
>
>I am using the package "random forest" for random forest
> predictions.  I
> like the package.  However, I have fairly large data sets, and it can
> often
> take *hours* just to go through the "na.roughfix" call, which simply
> goes
> through and cleans up any NA values to either the median (numerical
> data) or
> the most frequent occurrence (factors).
>I am going to start doing some comparisons between na.roughfix() and
> some apply() functions which, it seems, are able to do the same job more
> quickly.  But I hesitate to duplicate a function that is already in the
> package, since I presume the na.roughfix should be as quick as possible
> and
> it should also be well "tailored" to the requirements of random forest.
>
>Has anyone else seen that this is really slow?  (I haven't noticed
> rfImpute to be nearly as slow, but I cannot say for sure:  my "predict"
> data
> sets are MUCH larger than my model data sets, so cleaning the prediction
> data set simply takes much longer.)
>If so, any ideas how to speed this up?
>
>  Thanks!
>   Mike
>
>
>
> "Telescopes and bathyscaphes and sonar probes of Scottish lakes,
> Tacoma Narrows bridge collapse explained with abstract phase-space maps,
> Some x-ray slides, a music score, Minard's Napoleanic war:
> The most exciting frontier is charting what's already here."
>  -- xkcd
>
> --
> Help protect Wikipedia. Donate now:
> http://wikimediafoundation.org/wiki/Support_Wikipedia/en
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
>

[R] anyone know why package "RandomForest" na.roughfix is so slow??

2010-06-30 Thread Mike Williamson
Hi all,

I am using the package "random forest" for random forest predictions.  I
like the package.  However, I have fairly large data sets, and it can often
take *hours* just to go through the "na.roughfix" call, which simply goes
through and cleans up any NA values to either the median (numerical data) or
the most frequent occurrence (factors).
I am going to start doing some comparisons between na.roughfix() and
some apply() functions which, it seems, are able to do the same job more
quickly.  But I hesitate to duplicate a function that is already in the
package, since I presume the na.roughfix should be as quick as possible and
it should also be well "tailored" to the requirements of random forest.

Has anyone else seen that this is really slow?  (I haven't noticed
rfImpute to be nearly as slow, but I cannot say for sure:  my "predict" data
sets are MUCH larger than my model data sets, so cleaning the prediction
data set simply takes much longer.)
If so, any ideas how to speed this up?

  Thanks!
   Mike



"Telescopes and bathyscaphes and sonar probes of Scottish lakes,
Tacoma Narrows bridge collapse explained with abstract phase-space maps,
Some x-ray slides, a music score, Minard's Napoleanic war:
The most exciting frontier is charting what's already here."
 -- xkcd

--
Help protect Wikipedia. Donate now:
http://wikimediafoundation.org/wiki/Support_Wikipedia/en

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how can I evaluate a formula passed as a string?

2010-06-24 Thread Mike Williamson
Thank you, Peter!

I sure love this help group!!  :)



"Telescopes and bathyscaphes and sonar probes of Scottish lakes,
Tacoma Narrows bridge collapse explained with abstract phase-space maps,
Some x-ray slides, a music score, Minard's Napoleanic war:
The most exciting frontier is charting what's already here."
 -- xkcd

--
Help protect Wikipedia. Donate now:
http://wikimediafoundation.org/wiki/Support_Wikipedia/en


On Thu, Jun 24, 2010 at 10:29 AM, Peter Langfelder <
peter.langfel...@gmail.com> wrote:

> On Thu, Jun 24, 2010 at 10:16 AM, Mike Williamson 
> wrote:
> > Hey everyone,
> >
> >I've been using 'R' long enough that I should have some idea of what
> the
> > heck either   expression()  or eval()  are really ever useful for.  I
> come
> > across another instance where I WISH they would be useful, but I cannot
> get
> > them to work.
>
> Example: eval(parse(text = "3*3"))
>
> Peter
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] how can I evaluate a formula passed as a string?

2010-06-24 Thread Mike Williamson
Hey everyone,

I've been using 'R' long enough that I should have some idea of what the
heck either   expression()  or eval()  are really ever useful for.  I come
across another instance where I WISH they would be useful, but I cannot get
them to work.

Here is the crux of what I would like to do:

presume df looks like this
 A  B  C
===  ===  ===
M  45   0
M  46   1
F   42   0
F   42   1
M  46   1

and I want to make it look like this, using the function below (allowing for
other options besides "count" or "median"):
  A BCount of CMedian of C
===  ===  ==  
M 45 1  0
M 46 2  1
F  42 2  0.5

blah <- function(df=df, dup="C", aligns=c("A","B"),
dupFuns=c("length(unique(x)","median(x)", dupNames) {
### A function where it will "widen" the df based upon the "aligns" columns,
taking care to summarize the "dup" column using the "dupFuns"
tmp <- aggregate(df[,dup], by=as.list(df[,aligns]), FUN=function(x) { *
eval(dupFuns[1])* } )
names(tmp)[length(names(tmp)] <- dupNames[1]
for (i in c(2:length(dupFuns))) {
tmp <- merge(tmp, aggregate(df[,dup], by=as.list(df[,aligns]),
FUN=function(x) { *eval(dupFuns[i])* } ), all = TRUE)
names(tmp)[length(names(tmp)] <- dupNames[i]
}
}

Everything in it works OK, except that the "eval" function doesn't work
as it should.  How can I get, for instance, "length(unique(x))" (as a
character string) to become length(unique(x))  (a function to be evaluated
as shown above).

  Thanks!
   Mike



"Telescopes and bathyscaphes and sonar probes of Scottish lakes,
Tacoma Narrows bridge collapse explained with abstract phase-space maps,
Some x-ray slides, a music score, Minard's Napoleanic war:
The most exciting frontier is charting what's already here."
 -- xkcd

--
Help protect Wikipedia. Donate now:
http://wikimediafoundation.org/wiki/Support_Wikipedia/en

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] meta analysis with repeated measure-designs?

2010-06-18 Thread Mike Cheung
Dear Gerrit,

Sorry. There was an error in my previous code. As a record, the
followings are the revised code.


 Robust SE based on Hedges et al., (2010) Eq. 6 on Research
Synthesis Methods
 rma.obj: object fitted by metafor()
 cluster: indicator for clusters of studies
robustSE <- function(rma.obj, cluster=NULL, CI=.95) {
  # m: no. of clusters; assumed independent if not specified
  # rma.obj$not.na: complete cases
  if (is.null(cluster)) {
  m=length(rma.obj$X[rma.obj$not.na,1])
  } else {
  m=nlevels(unique(as.factor(cluster[rma.obj$not.na])))
  }
  res2 <- diag(residuals(rma.obj)^2)
  X <- rma.obj$X
  b <- rma.obj$b
  W <- diag(1/(rma.obj$vi+rma.obj$tau2)) # Use vi+tau2
  meat <- t(X) %*% W %*% res2 %*% W %*% X# W is symmetric
  bread <- solve( t(X) %*% W %*% X)
  V.R <- bread %*% meat %*% bread# Robust sampling covariance matrix
  p <- length(b) # no. of predictors
including intercept
  se <- sqrt( diag(V.R)*m/(m-p) )# small sample adjustment (Eq.7)
  tval <- b/se
  pval <- 2*(1-pt(abs(tval),df=(m-p)))
  crit <- qt( (1-CI)/2, df=(m-p), lower.tail=FALSE )
  ci.lb <- b-crit*se
  ci.ub <- b+crit*se
  data.frame(estimate=b, se=se, tval=tval, pval=pval, ci.lb=ci.lb, ci.ub=ci.ub)
}

library(metafor)
data(dat.bcg)

### calculate log relative risks and corresponding sampling variances
dat <- escalc(measure="RR", ai=tpos, bi=tneg, ci=cpos, di=cneg, data=dat.bcg)
dat <- cbind(dat.bcg, dat)

### random-effects model
fit1 <- rma(yi, vi, data=dat, method="DL")
summary(fit1)

estimate   se zval pvalci.lbci.ub
 -0.7141   0.1787  -3.9952   <.0001  -1.0644  -0.3638  ***

robustSE(fit1)
  estimatese  tvalpval ci.lb ci.ub
intrcpt -0.7141172 0.1791445 -3.986265 0.001805797 -1.104439 -0.323795

### mixed-effects model with two moderators (absolute latitude and
publication year)
fit2 <- rma(yi, vi, mods=cbind(ablat, year), data=dat, method="DL")
summary(fit2)

 estimate   se zvalpval ci.lbci.ub
intrcpt   -1.2798  25.7550  -0.0497  0.9604  -51.7586  49.1990
ablat -0.0288   0.0090  -3.2035  0.0014   -0.0464  -0.0112  **
year   0.0008   0.0130   0.0594  0.9526   -0.0247   0.0262

robustSE(fit2)
 estimate   setvalpval
ci.lb   ci.ub
intrcpt -1.2797914381 22.860353022 -0.05598301 0.956458098
-52.21583218 49.65624930
ablat   -0.0287644840  0.007212163 -3.98832970 0.002566210
-0.04483418 -0.01269478
year 0.0007720798  0.011550188  0.06684565 0.948022174
-0.02496334  0.02650750

-- 
-
 Mike W.L. Cheung   Phone: (65) 6516-3702
 Department of Psychology   Fax:   (65) 6773-1843
 National University of Singapore
 http://courses.nus.edu.sg/course/psycwlm/internet/
-----

On Wed, Jun 16, 2010 at 4:47 PM, Mike Cheung  wrote:
> Dear Gerrit,
>
> If the correlations of the dependent effect sizes are unknown, one
> approach is to conduct the meta-analysis by assuming that the effect
> sizes are independent. A robust standard error is then calculated to
> adjust for the dependence. You may refer to Hedges et. al., (2010) for
> more information. I have coded it here for reference.
>
> Hedges, L. V., Tipton, E., & Johnson, M. C. (2010). Robust variance
> estimation in meta-regression with dependent effect size estimates.
> Research Synthesis Methods, 1(1), 39-65. doi:10.1002/jrsm.5
>
> Regards,
> Mike
> --
> -
>  Mike W.L. Cheung               Phone: (65) 6516-3702
>  Department of Psychology       Fax:   (65) 6773-1843
>  National University of Singapore
>  http://courses.nus.edu.sg/course/psycwlm/internet/
> -
>
> library(metafor)
>
>  Robust SE based on Hedges et al., (2010) Eq. 6
>  rma.obj: object fitted by metafor()
>  cluster: indicator for clusters of studies
> robustSE <- function(rma.obj, cluster=NULL, CI=.95) {
>  # m: no. of clusters; assumed independent if not specified
>  if (is.null(cluster)) {
>      m=nrow(rma.obj$X)
>  } else {
>      m=nlevels(unique(as.factor(cluster)))
>  }
>  res2 <- diag(residuals(rma.obj)^2)
>  X <- rma.obj$X
>  b <- rma.obj$b
>  W <- diag(1/(rma.obj$vi+rma.obj$tau2))     # Use vi+tau2
>  meat <- t(X) %*% W %*% res2 %*% W %*% X    # W is symmetric
>  bread <- solve( t(X) %*% W %*% X)
>  V.R <- bread %*% meat %*% bread            # Robust sampling covariance 
> matrix
>  p <- length(b)                             # no. of predictors
> including int

Re: [R] meta analysis with repeated measure-designs?

2010-06-16 Thread Mike Cheung
Dear Gerrit,

If the correlations of the dependent effect sizes are unknown, one
approach is to conduct the meta-analysis by assuming that the effect
sizes are independent. A robust standard error is then calculated to
adjust for the dependence. You may refer to Hedges et. al., (2010) for
more information. I have coded it here for reference.

Hedges, L. V., Tipton, E., & Johnson, M. C. (2010). Robust variance
estimation in meta-regression with dependent effect size estimates.
Research Synthesis Methods, 1(1), 39-65. doi:10.1002/jrsm.5

Regards,
Mike
-- 
-----
 Mike W.L. Cheung   Phone: (65) 6516-3702
 Department of Psychology   Fax:   (65) 6773-1843
 National University of Singapore
 http://courses.nus.edu.sg/course/psycwlm/internet/
-

library(metafor)

 Robust SE based on Hedges et al., (2010) Eq. 6
 rma.obj: object fitted by metafor()
 cluster: indicator for clusters of studies
robustSE <- function(rma.obj, cluster=NULL, CI=.95) {
  # m: no. of clusters; assumed independent if not specified
  if (is.null(cluster)) {
  m=nrow(rma.obj$X)
  } else {
  m=nlevels(unique(as.factor(cluster)))
  }
  res2 <- diag(residuals(rma.obj)^2)
  X <- rma.obj$X
  b <- rma.obj$b
  W <- diag(1/(rma.obj$vi+rma.obj$tau2)) # Use vi+tau2
  meat <- t(X) %*% W %*% res2 %*% W %*% X# W is symmetric
  bread <- solve( t(X) %*% W %*% X)
  V.R <- bread %*% meat %*% bread# Robust sampling covariance matrix
  p <- length(b) # no. of predictors
including intercept
  se.R <- sqrt( diag(V.R)*m/(m-p) )  # small sample adjustment (Eq.7)
  t.R <- b/se.R
  p.R <- 2*(1-pt(abs(t.R),df=(m-p)))
  crit <- qt( 1-CI/2, df=(m-p) )
  ci.lb <- b-crit*se.R
  ci.ub <- b+crit*se.R
  data.frame(estimate=b, se.R=se.R, t.R=t.R, p.R=p.R, ci.lb=ci.lb, ci.ub=ci.ub)
}


## Example: calculate log relative risks and corresponding sampling variances
data(dat.bcg)
dat <- escalc(measure="RR", ai=tpos, bi=tneg, ci=cpos, di=cneg, data=dat.bcg)
dat <- cbind(dat.bcg, dat)

## random-effects model
fit1 <- rma(yi, vi, data=dat, method="DL")
summary(fit1)

estimate   se zval pvalci.lbci.ub
 -0.7141   0.1787  -3.9952   <.0001  -1.0644  -0.3638  ***

## Robust SE
robustSE(fit1)
  estimate  se.R   t.R p.R ci.lb  ci.ub
intrcpt -0.7141172 0.1791445 -3.986265 0.001805797 -0.725588 -0.7026465

## mixed-effects model with two moderators (absolute latitude and
publication year)
fit2 <- rma(yi, vi, mods=cbind(ablat, year), data=dat, method="DL")
summary(fit2)

 estimate   se zvalpval ci.lbci.ub
intrcpt   -1.2798  25.7550  -0.0497  0.9604  -51.7586  49.1990
ablat -0.0288   0.0090  -3.2035  0.0014   -0.0464  -0.0112  **
year   0.0008   0.0130   0.0594  0.9526   -0.0247   0.0262

### Robust SE
robustSE(fit2)
 estimate se.R t.R p.R ci.lb
intrcpt -1.2797914381 22.860353022 -0.05598301 0.956458098 -2.749670e+00
ablat   -0.0287644840  0.007212163 -3.98832970 0.002566210 -2.922821e-02
year 0.0007720798  0.011550188  0.06684565 0.948022174  2.942412e-05
   ci.ub
intrcpt  0.190086881
ablat   -0.028300755
year 0.001514735




On Mon, Jun 14, 2010 at 4:58 PM, Gerrit Hirschfeld
 wrote:
> Hi,
>
> thanks for the references I will try the sensitivitiy-analysis in R and try 
> out winbugs if that does not work (little afraid of switching programmes).
>
> I also had an idea for a reasonable estimate of the correlations. Some 
> studies report both results from paired t-tests and means and SDs, and thus 
> allow to calculate two estimates for d one based on M and SD alone the other 
> on t. The difference between the two estimates should be systematically 
> related to the correlations of measures.
>
> I will keep you posted, if I have a solution or hit a wall.
>
> efachristo and dank je wel!
>
> Gerrit
>
>
> On 12.06.2010, at 15:59, Viechtbauer Wolfgang (STAT) wrote:
>
>> Dear Gerrit,
>>
>> the most appropriate approach for data of this type would be a proper 
>> multivariate meta-analytic model (along the lines of Kalaian & Raudenbush, 
>> 1996). Since you do not know the correlations of the reaction time 
>> measurements across conditions for the within-subject designs, a simple 
>> solution is to "guestimate" those correlations and then conduct sensitivity 
>> analyses to make sure your conclusions do not depend on those guestimates.
>>
>> Best,
>>
>> --
>> Wolfgang Viechtbauer                        http://www.wvbauer.com/
>> Department of Methodology and Statistics    Tel: +31 

Re: [R] equivalent of stata command in R‏‏

2010-06-09 Thread mike mick






Thanx for your response,
yeah, i know i didnst specified the indexes
when i wrote the 2nd mail, in fact in the 1st mail i wrote already that
i dont have problem with the estimation of the model... thats the
reason why i didnt write in fact since the issue is not to estimate the
model but to get the marginal effect,
anyway, i figured out that predict(), doesnt work for panel data...
and
well, my problem is that contrary to your guess, i couldnt figure out
the rest of the calculations... since i am not that experienced in R.
one last help of yours would be quite helpful to get rid of this silly problem!
Thanx again...


> Date: Wed, 9 Jun 2010 12:40:42 +0200
> Subject: Re: [R] equivalent of stata command in R‏
> From: jorism...@gmail.com
> To: saint-fi...@hotmail.com
> CC: r-help@r-project.org
> 
> plm does not have a predict function, so forget my former mail. To get
> to the coefficients, you just  :
> coef(mdl)
> 
> The rest of the calculations you can figure out I guess.
> 
> I'm also not sure if you're doing what you think you're doing. You
> never specified the index stno in your pml call. Read the help files
> again. And while you're at it, read the posting guide for the list as
> well:
> http://www.R-project.org/posting-guide.html
> 
> Cheers
> Joris
> 
> 
> On Wed, Jun 9, 2010 at 11:54 AM, mike mick  wrote:
> >
> >
> >
> >
> >
> >
> >
> >
> > From: saint-fi...@hotmail.com
> > To: saint-fi...@hotmail.com
> > Subject: RE:
> > Date: Wed, 9 Jun 2010 09:53:20 +
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > OK! sorry thats my fault,
> >
> > here the translations of the stata commands
> > 1st step is to get the mean values of the variables, well that doesnt need 
> > explanation i guess,
> >
> > 2nd step is to estimate the model on panel data estimation method
> > which is:
> > mdl<-plm(lnLP~lnC+lnL+lnM+lnE+Eco+Inno+Eco*Inno+Eco*lnM+Eco*lnE+year,data=newdata,model="within")
>
> and basically i need to get the marginal effect of variable "Eco"
at the sample mean (step 3) but i am not that good in R so any
additional help is wlcome!
> > Thanks
> > From: saint-fi...@hotmail.com
> > To: r-help@r-project.org
> > Subject:
> > Date: Wed, 9 Jun 2010 09:45:16 +
> >
> >
> >
> >
> >
> >
> >
> > It helps if you translate the Stata commands. Not everybody is fluent
> > in those. It would even help more if you would enlight us about the
> > function you used to fit the model. Getting the marginal effects is
> > not that hard at all, but how depends a bit on the function you used
> > to estimate the model.
> >
> > You can try
> > predict(your_model,type="terms",terms="the_term_you're_interested_in")
> >
> > For exact information, look at the respective predict function, eg if
> > you use lme, do ?predict.lme
> > Be aware of the fact that R normally choses the correct predict
> > function without you having to specify it. predict() works for most
> > model objects. Yet, depending on the model eacht predict function can
> > have different options or different functionality. That information is
> > in the help files of the specific function.
> >
> > Cheers
> > Joris
> >
> > Dear all,
> >
>
> I need to use R for one estimation, and i have readily available
 stata command, but i need also the R version of the same command.
> > the estimation in stata is as following:
> >   1. Compute mean values of relevant variables
> >
> >
> >
> > . sum inno lnE lnM
> >
> >
> >
> >Variable |   ObsMeanStd. Dev.   MinMax
> >
> > -+
> >
> >inno |146574.0880374.2833503  0  1
> >
> > lnE |146353.92562391.732912  -4.473922   10.51298
> >
> > lnM |1462094.2819031.862192  -4.847253   13.71969
> >
> >
> >
> >2. Estimate model
> >
> >
> >
> > . xi: xtreg lnLP lnC lnL lnE lnM eco inno eco_inno eco_lnE eco_lnM i.year, 
> > fe i(stno)
> >
> > i.year_Iyear_1997-1999(naturally coded; _Iyear_1997 omitted)
> >
> >
> >
> > Fixed-effects (within) regression   Number of obs  =
> > 146167
> >
> > Group variable (i): stnoNumber of groups   = 
> > 48855
> >

Re: [R] equivalent of stata command in R‏

2010-06-09 Thread mike mick








From: saint-fi...@hotmail.com
To: saint-fi...@hotmail.com
Subject: RE:
Date: Wed, 9 Jun 2010 09:53:20 +









OK! sorry thats my fault,

here the translations of the stata commands
1st step is to get the mean values of the variables, well that doesnt need 
explanation i guess,

2nd step is to estimate the model on panel data estimation method
which is:
mdl<-plm(lnLP~lnC+lnL+lnM+lnE+Eco+Inno+Eco*Inno+Eco*lnM+Eco*lnE+year,data=newdata,model="within")
and basically i need to get the marginal effect of variable "Eco" at the sample 
mean (step 3) but i am not that good in R so any additional help is wlcome!
Thanks
From: saint-fi...@hotmail.com
To: r-help@r-project.org
Subject: 
Date: Wed, 9 Jun 2010 09:45:16 +







It helps if you translate the Stata commands. Not everybody is fluent
in those. It would even help more if you would enlight us about the
function you used to fit the model. Getting the marginal effects is
not that hard at all, but how depends a bit on the function you used
to estimate the model.
 
You can try
predict(your_model,type="terms",terms="the_term_you're_interested_in")
 
For exact information, look at the respective predict function, eg if
you use lme, do ?predict.lme
Be aware of the fact that R normally choses the correct predict
function without you having to specify it. predict() works for most
model objects. Yet, depending on the model eacht predict function can
have different options or different functionality. That information is
in the help files of the specific function.
 
Cheers
Joris
 
Dear all,
 
I need to use R for one estimation, and i have readily available  stata 
command, but i need also the R version of the same command.
the estimation in stata is as following:
   1. Compute mean values of relevant variables
 
 
 
. sum inno lnE lnM
 
 
 
Variable |   ObsMeanStd. Dev.   MinMax
 
-+
 
inno |146574.0880374.2833503  0  1
 
 lnE |146353.92562391.732912  -4.473922   10.51298
 
 lnM |1462094.2819031.862192  -4.847253   13.71969
 
 
 
2. Estimate model
 
 
 
. xi: xtreg lnLP lnC lnL lnE lnM eco inno eco_inno eco_lnE eco_lnM i.year, fe 
i(stno)
 
i.year_Iyear_1997-1999(naturally coded; _Iyear_1997 omitted)
 
 
 
Fixed-effects (within) regression   Number of obs  =146167
 
Group variable (i): stnoNumber of groups   = 48855
 
 
 
R-sq:  within  = 0.9908 Obs per group: min = 1
 
   between = 0.9122avg =   3.0
 
   overall = 0.9635max = 3
 
 
 
F(11,97301)= 949024.29
 
corr(u_i, Xb)  = 0.2166 Prob > F   =0.
 
 
 
--
 
lnLP |  Coef.   Std. Err.  tP>|t| [95% Conf. Interval]
 
-+
 
 lnC |   .0304896   .000950932.06   0.000 .0286258.0323533
 
 lnL |  -.9835998   .0006899 -1425.74   0.000 -.984952   -.9822476
 
 lnE |   .0652658   .000943969.14   0.000 .0634158.0671159
 
 lnM |   .6729931   .0012158   553.53   0.000   .67061.6753761
 
 eco |   .0610348   .0177048 3.45   0.001 .0263336 .095736
 
inno |   .0173824   .0058224 2.99   0.003 .0059706.0287943
 
eco_inno |   .0080325   .0110815 0.72   0.469-.0136872.0297522
 
 eco_lnE |   .0276226.004059 6.81   0.000  .019667.0355781
 
 eco_lnM |  -.0214237   .0039927-5.37   0.000-.0292494   -.0135981
 
 _Iyear_1998 |  -.0317684   .0013978   -22.73   0.000 -.034508   -.0290287
 
 _Iyear_1999 |  -.0647261   .0027674   -23.39   0.000-.0701501   -.0593021
 
   _cons |   1.802112.009304   193.69   0.000 1.7838761.820348
 
-+
 
 sigma_u |  .38142386
 
 sigma_e |   .2173114
 
 rho |  .75494455   (fraction of variance due to u_i)
 
--
 
F test that all u_i=0: F(48854, 97301) = 3.30Prob > F = 0.
 
 
 
3. Compute marginal effect of eco at sample mean
 
 
 
. nlcom (_b[eco]+_b[inno]*.0880374+_b[eco_lnE]*.9256239+_b[eco_lnM]*4.281903)
 
 
 
   _nl_1:  
_b[eco]+_b[inno]*.0880374+_b[eco_lnE]*.9256239+_b[eco_lnM]*4.281903
 
 
 
--
 
lnLP |  Coef.   Std. Err.  tP>|t| [95% Conf. Interval]
 
-+

[R] equivalent of stata command in R

2010-06-09 Thread mike mick

Dear all,

I need to use R for one estimation, and i have readily available  stata 
command, but i need also the R version of the same command.
the estimation in stata is as following:
   1. Compute mean values of relevant variables



. sum inno lnE lnM



Variable |   ObsMeanStd. Dev.   MinMax

-+

inno |146574.0880374.2833503  0  1

 lnE |146353.92562391.732912  -4.473922   10.51298

 lnM |1462094.2819031.862192  -4.847253   13.71969



2. Estimate model



. xi: xtreg lnLP lnC lnL lnE lnM eco inno eco_inno eco_lnE eco_lnM i.year, fe 
i(stno)

i.year_Iyear_1997-1999(naturally coded; _Iyear_1997 omitted)



Fixed-effects (within) regression   Number of obs  =146167

Group variable (i): stnoNumber of groups   = 48855



R-sq:  within  = 0.9908 Obs per group: min = 1

   between = 0.9122avg =   3.0

   overall = 0.9635max = 3



F(11,97301)= 949024.29

corr(u_i, Xb)  = 0.2166 Prob > F   =0.



--

lnLP |  Coef.   Std. Err.  tP>|t| [95% Conf. Interval]

-+

 lnC |   .0304896   .000950932.06   0.000 .0286258.0323533

 lnL |  -.9835998   .0006899 -1425.74   0.000 -.984952   -.9822476

 lnE |   .0652658   .000943969.14   0.000 .0634158.0671159

 lnM |   .6729931   .0012158   553.53   0.000   .67061.6753761

 eco |   .0610348   .0177048 3.45   0.001 .0263336 .095736

inno |   .0173824   .0058224 2.99   0.003 .0059706.0287943

eco_inno |   .0080325   .0110815 0.72   0.469-.0136872.0297522

 eco_lnE |   .0276226.004059 6.81   0.000  .019667.0355781

 eco_lnM |  -.0214237   .0039927-5.37   0.000-.0292494   -.0135981

 _Iyear_1998 |  -.0317684   .0013978   -22.73   0.000 -.034508   -.0290287

 _Iyear_1999 |  -.0647261   .0027674   -23.39   0.000-.0701501   -.0593021

   _cons |   1.802112.009304   193.69   0.000 1.7838761.820348

-+

 sigma_u |  .38142386

 sigma_e |   .2173114

 rho |  .75494455   (fraction of variance due to u_i)

--

F test that all u_i=0: F(48854, 97301) = 3.30Prob > F = 0.



3. Compute marginal effect of eco at sample mean



. nlcom (_b[eco]+_b[inno]*.0880374+_b[eco_lnE]*.9256239+_b[eco_lnM]*4.281903)



   _nl_1:  
_b[eco]+_b[inno]*.0880374+_b[eco_lnE]*.9256239+_b[eco_lnM]*4.281903



--

lnLP |  Coef.   Std. Err.  tP>|t| [95% Conf. Interval]

-+

   _nl_1 |  -.0036011.008167-0.44   0.659-.0196084.0124061

--



in fact i can find the mean of the variables ( step 1) and extimate the model 
(step 2) but i couldnt find the equivalent of step 3 (compute marginal effect 
of eco at sample mean). Can someone help me for this issue?

Cheers!

  
_


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] 4D Plot

2010-05-27 Thread Mike Prager
On Thu, 27 May 2010 10:42:54 +0200, "Spitzner, Andrea"
 wrote:

>Hello,
>I need some help with a 4D-Plot.

I can't offer a lot of help, other than to note that There is code for
a 4D plot at the R graphics gallery (I am the author) and perhaps
looking at that might help.  

http://addictedtor.free.fr/graphiques/RGraphGallery.php?graph=90

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Problem with Sweave not recognising \Sexpr{}

2010-05-25 Thread Mike White
I think I have solved the problem. In the Sweave manual it mentions that 
problems may occur after loading the R2HTML package.  I have not 
recently loaded this package but the proposed solution to problems 
caused by R2HTML also solves my problem with the evaluation of R code in 
\Sexpr. It seems that it is necessary to set the syntax option in the 
Sweave function as follows

Sweave(..., syntax="SweaveSyntaxNoweb")
 although I am not sure why this is required.
Mike

On 19:59, Mike White wrote:
I am trying to run the Sweave example at 
http://www.stat.uni-muenchen.de/~leisch/Sweave/Sweave-Rnews-2002-3.pdf
However, the \Sexpr{} code is not being evaluated, although the actual 
R code within the {} runs ok in R.
Below is part of the resulting .tex file.  Can anyone help identify 
the cause? I am using R 2.10.1 on Windows XP.


Consider the \texttt{cats} regression example from Venables \& Ripley
(1997). The data frame contains measurements of heart and body weight
of \Sexpr{nrow(cats)} cats (\Sexpr{sum(cats$Sex=="F")} female,
\Sexpr{sum(cats$Sex=="M")} male).

Thanks
Mike White




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Problem with Sweave not recognising \Sexpr{}

2010-05-25 Thread Mike White
I am trying to run the Sweave example at 
http://www.stat.uni-muenchen.de/~leisch/Sweave/Sweave-Rnews-2002-3.pdf
However, the \Sexpr{} code is not being evaluated, although the actual R 
code within the {} runs ok in R.
Below is part of the resulting .tex file.  Can anyone help identify the 
cause? I am using R 2.10.1 on Windows XP.


Consider the \texttt{cats} regression example from Venables \& Ripley
(1997). The data frame contains measurements of heart and body weight
of \Sexpr{nrow(cats)} cats (\Sexpr{sum(cats$Sex=="F")} female,
\Sexpr{sum(cats$Sex=="M")} male).

Thanks
Mike White

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Fwd: R apply() help -urgent

2010-05-11 Thread Mike White
Set up a function for the fisher.test on a 2x2 table and then include 
this in the apply function for columns as in the example below. The 
result is a list with names A to Z


# set up a dummy data set with 100 rows
Cat<-LETTERS[sample(1:6,100, replace=T)]
GL<-sample(1:6, 100, replace=T)
dat<-matrix(sample(c(0,1),100*27, replace=T), nrow=100)
colnames(dat)<-c(LETTERS[1:26],"pLoss")
data1<-data.frame(Cat, GL, dat)

# define function fro fisher.test
ff<-function(x,y){
fisher.test(table(x,y))
}

# apply function to columns A to Z
results<-apply(data1[,LETTERS[1:26]],2, ff, y=data1[,"pLoss"])
# the results are in the form of a list with names A to Z
results$C


On 19:59, Venkatesh Patel wrote:

-- Forwarded message --
From: Dr. Venkatesh
Date: Sun, May 9, 2010 at 4:55 AM
Subject: R apply() help -urgent
To: r-help@r-project.org


I have a file with 4873 rows of 1s or 0s and has 26 alphabets (A-Z) as
columns. the 27th column also has 1s and 0s but stands for a different
variable (pLoss). columns 1 and 2 are not significant and hence lets ignore
them for now.

here is how the file looks

CatGL  A   B   C   D   E   F   G   H   I   J   K   L   M   N   O   P   Q
   R   S   T   U   V   W   X   Y   Z pLoss
H  5   0   0   0   0   0   0   0   1   0   0   0   0   0   0   0   0   0
   0   0   0   0   0   0   0   0   0 1
E  5   0   0   0   0   1   0   0   0   0   0   0   0   0   0   0   0   0
   0   0   0   0   0   0   0   0   0 1
P  6   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   1   0
   0   0   0   0   0   0   0   0   0 1
P  5   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   1   0
   0   0   0   0   0   0   0   0   0 1
F  6   0   0   0   0   0   1   0   0   0   0   0   0   0   0   0   0   0
   0   0   0   0   0   0   0   0   0 1
E  4   0   0   0   0   1   0   0   0   0   0   0   0   0   0   0   0   0
   0   0   0   0   0   0   0   0   0 1
H  5   0   0   0   0   0   0   0   1   0   0   0   0   0   0   0   0   0
   0   0   0   0   0   0   0   0   0 1
J  4   0   0   0   0   0   0   0   0   0   1   0   0   0   0   0   0   0
   0   0   0   0   0   0   0   0   0 1
J  4   0   0   0   0   0   0   0   0   0   1   0   0   0   0   0   0   0
   0   0   0   0   0   0   0   0   0 1
E  5   0   0   0   0   1   0   0   0   0   0   0   0   0   0   0   0   0
   0   0   0   0   0   0   0   0   0 1
S  6   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
   0   1   0   0   0   0   0   0   0 1
..
..
..
..
..
..

Alphabets A-Z stand for different categories of protein families and pLoss
stands for their presence or absence in an animal.

I intend to do Fisher's test for 26 individual 2X2 tables constructed from
each of these alphabets vs pLoss.

For example, here is what I did for alphabet A and then B and then C so
on. (I have attached R-input.csv for your perusal)

   

data1<- read.table("R_input.csv", header = T)
datatable<- table(data1$A, data1$pLoss) #create a new datatable2 or 3
 

with table(data1$B.. or  (data1$C.. and so on
   

datatable
 

01
   0   31 4821
   10   21

now run the Fisher's test for these datatables one by one for the 26
alphabets :(

fisher.test(datatable), ... fisher.test(datatable2)...

in this case, the task is just for 26 columns.. so I can do it manually.

But I would like to do an automated extraction and fisher's test for all the
columns.

I tried reading the tutorials and trying a few examples. Cant really come up
with anything sensible.

How can I use apply() in this regard? or is there any other way, a loop may
be? to solve this issue.

Please help.

Thanks a million in advance,

Dr Venkatesh Patel
School of Biological Sciences
University of Liverpool
United Kingdom





__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Tinn-R RGui Send problem

2010-05-07 Thread Mike White
A possible work around would be to append the selection.r file to the .Rhistory 
file and then reload the history, e.g.
file.append(".Rhistory", .trPaths[5])
loadhistory(file= ".Rhistory")
You can then access the code on the console, skipping the last 2 lines.
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Tinn-R RGui Send problem

2010-05-06 Thread Mike White
A possible work around would be to append the selection.r file to the .RHistory 
file and then reload the history, e.g.
file.append(".Rhistory", .trPaths[5])
loadhistory(file= ".Rhistory")
You can then access the code on the console, skipping the last 2 lines.
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Comparing the correlations coefficient of two (very) dependent samples

2010-05-03 Thread Mike Cheung
Dear Tal,

There are several approaches in doing it (see Steiger, 2003). It
should not be difficult to implement them in R.

Steiger, J.H. (2003). Comparing correlations. In A. Maydeu-Olivares
(Ed.) Psychometrics. A festschrift to Roderick P. McDonald. Mahwah,
NJ:  Lawrence Erlbaum Associates.
http://www.statpower.net/Steiger%20Biblio/Steiger03.PDF

Regards,
Mike
--
-
 Mike W.L. Cheung   Phone: (65) 6516-3702
 Department of Psychology   Fax:   (65) 6773-1843
 National University of Singapore
 http://courses.nus.edu.sg/course/psycwlm/internet/
-

On Tue, May 4, 2010 at 2:00 AM, Tal Galili  wrote:
> Hello all,
>
> I believe this can be done using bootstrap, but I am wondering if there is
> some other way that might be used to tackle this.
>
> #Let's say I have two pairs of samples:
> set.seed(100)
> s1 <- rnorm(100)
> s2 <- s1 + rnorm(100)
>
> x1 <- s1[1:99]
> y1 <- s2[1:99]
>
> x2 <- x1
> y2 <- s2[2:100]
>
> #And both yield the following two correlations:
> cor(x1,y1) # 0.7568969  (cor1)
> cor(x2,y2) # -0.2055501 (cor2)
>
> Now for my questions:
> 1) is cor1 larger then cor2?   (CI for the diff ?)
> 2) With what P value?
> 3) What if the values of s1 are not independent ?
>
> I found an older thread discussing such issues:
> http://tolstoy.newcastle.edu.au/R/e2/help/06/09/1035.html
> But wasn't sure how much this might be relevant to my case.
>
>
>
> Thanks for any help,
> Tal
>
> Contact
> Details:---
> Contact me: tal.gal...@gmail.com |  972-52-7275845
> Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) |
> www.r-statistics.com (English)
> --
>
>        [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] How to generate Mackey-Glass time series with "ddesolve" package?

2010-05-02 Thread Mike Beddo
I could use some help generating a time series for the Mackey-Glass equation: 
dx/dt = 0.2 x(t-tau)/(1 + x(t-tau)^10) - 0.1 x(t) for the case there tau = 17. 
I tried the "ddesolve" package but the dde(...) function seems to hang, not 
producing anything. Can someone show me the R script how to do this?


-  Mike Beddo

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Problem with Tinn-R communicating with REvolution R

2010-04-27 Thread Mike White
I have been using Tinn-R with R without any problems but when I try to use it 
with REvolution R I get the following error message when Tinn-R runs the 
configuration script and gets to the trDDEInstall() function:

## Start DDE
trDDEInstall()
> trDDEInstall()
Error in structure(.External("dotTcl", ..., PACKAGE = "tcltk"), class = 
"tclObj") :
  [tcl] invalid command name "dde".
In addition: Warning message:
In tclRequire("dde", warn = TRUE) : Tcl package 'dde' not found

I have not found anything about this on the Tinn-R forum and have had no 
repsonse from the REvolution R forum.
I have checked other R-Help queries but these relate to adding the code for 
.trPaths which I already have.
Does anyone know how to solve this problem?

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Upgrading R using the "global library folder" strategy - what do you think about it?

2010-04-26 Thread Mike Prager
I think it makes more sense for most users to have a global library
(as you call it), rather than put the library under the current
installation.  I have been doing that for years, and it saves a lot of
trouble.

When I have helped people learn R, the need to copy the library when
updating is a regular source of confusion and questions. Many users
are not particularly computer-savvy.

Given that, it would seem desirable for the R installation to default
to a global library location.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R and S-Plus: Two programs separated by a common language?

2010-04-26 Thread Mike Prager
On Thu, 22 Apr 2010 12:00:13 -0700 (PDT), Paul Miller
 wrote:

>I was just wondering if anyone could give me some advice about the wisdom or 
>folly of trying to use both [R and S-Plus].

I suspect that trying to use both will give you heartburn. When I
switched from S-Plus to R, the most significant differences (for my
purposes) were in graphics. The languages differ in the way they
approach graphics, and even where the languages appear the same,
interpretation of some parameters (such as cex) can be different. 

Also, the way data are stored differs considerably. Because S-Plus
stores data on disk, while R keeps data in memory, S-Plus can have an
edge when analyzing huge data sets. (That reflects my understanding of
the situation about 5 or 6 yr ago.) 

I have gone back to old projects and tried to execute my S-Plus code
in R. In general, the code needed minor to major massaging to make
that happen, especially when the output was carefully annotated
graphs.

I would recommend that you concentrate on R, where much active
development is taking place.  Support from this newsgroup is better
than support from most commercial vendors, though perhaps not always
as sweet-natured.  As noted by others, Revolution R is available with
commercial support, if you need it.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How do you change library location ? (in R under windows XP)

2010-04-26 Thread Mike Prager
On Fri, 23 Apr 2010 15:22:45 +0300, Tal Galili 
wrote:

>Due to the new R 2.11 release, I want to implement Dirk's suggestion
>here
>.
>
>So for that I am asking - How can I (permanently) change R's library path?
>(The best solution would be one that can be run from within R)

To me, it seemed more straightforward to do this outside R.  

Just set the environment variable R_LIBS in Windows to something like

R_LIBS=c:/R/Library

Then, delete your R installation. Install the new version and all
desired packages.  The add-on packages will be located according to
your environment setting, and future updates will not require add-on
packages to be copied or reloaded.

HTH

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Page width figures in Latex

2010-03-28 Thread Mike Prager
On Mon, 29 Mar 2010 12:21:40 +1100, Jim Lemon 
wrote:

>bRotheRs & sisteRs,
>I am once again attempting to learn enough Latex voodoo to get something 
>done, and failing comically. The document "RJAuthorguide.pdf"
>mentions that one can get page width figures through the use of the 
>"figure*" or "table* environments, but despite considerable searching 
>through the mail archives and reading Frank Harrell's discussion of 
>"Using Latex Figure Environments for Plots" until my eyes went on 
>strike, I am nowhere near a solution. Would anyone be kind enough to 
>point me to the Idiot's Guide to Latex Figure Environments?
>
>Jim

Jim,

You need a good book on Latex.  I like this one:

http://www.amazon.com/Guide-LaTeX-4th-Helmut-Kopka/dp/0321173856/ref=sr_1_4?ie=UTF8&s=books&qid=1269833347&sr=8-4

The width of the figure is controlled by the \includegraphics
statement, not any particular part of the environment specification.
That assumes you have loaded the graphicx package.  For example,

\begin{figure}[!th]
\begin{center}
\includegraphics[width=\textwidth]{myfig.eps}\\
\end{center}
\end{figure}%

HTH,
Mike

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Read SAS data

2010-03-25 Thread Mike Reese
Nicola,

I found its easier if you  convert your SAS data to the XPORT format (a SAS 
transport format).  Here's instructions for creating the transport file in SAS 
- use xport not cport.

http://support.sas.com/documentation/cdl/en/movefile/59598/HTML/default/a002575816.htm

I also found the import works better in R if you use the sasxport.get (instead 
of read.xport) function. I believe this function is part of the Hmisc library). 
 

BTW I'm looking for help with the write.dta command, specifically, exporting 
dataframes with variable labels that are readable in  STATA.  Do you know 
anything about that?

Mike

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Nordlund, Dan (DSHS/RDA)
Sent: Thursday, March 25, 2010 2:32 PM
To: Nicola Sturaro Sommacal; r-help@r-project.org
Subject: Re: [R] Read SAS data

> -Original Message-
> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On
> Behalf Of Nicola Sturaro Sommacal
> Sent: Thursday, March 25, 2010 9:16 AM
> To: r-help@r-project.org
> Subject: [R] Read SAS data
> 
> Hi!
> 
> I need to import in R some SAS dataset (sas7bdat). I found two functions to
> do it:
> "read.ssd" from the package "foreign" and "sas.get" from "Hmisc".
> 
> df = read.ssd(libname = path2data, sectionnames = "sasSmallDataset",
> tmpXport = path2data, tmpProgLoc = path2data, sascmd = path2sas)
> sas.get(libraryName = path2data, member = "sasSmallDataset", formats =
> FALSE, sasprog = path2sas, keep.log = TRUE)
> 
> where path2data is the directory on which is contained the file sas,
> sasSmallDataset.sas7bdat are the data and path2sas is the path to SAS
> (C:/Programmi/SAS/SAS System/9.0/sas.exe).
> 
> I obtain the following messages:
> 
> from read.ssd:
> SAS failed.  SAS program at Z:/projects/QUANTIDE/import2R/.sas
> The log file will be import2R.log in the current directory
> Warning messages:
> 1: In file.symlink(oldPath, linkPath) :
>   symlinks are not supported on this platform
> 2: In read.ssd(libname = path2data, sectionnames = "sasSmallDataset",  :
>   SAS return code was 2
> 
> from sas.get:
> Error in if (status != 0) { : argument is of length zero
> 
> I have SAS 9.0 and R 2.10.1 running on Windows XP Pro.
> 
> Can you help me to found a solution or can you provide an alternative way to
> import SAS data directly from R?
> 
> Thank you very much.
> 
> --
>  Nicola Sturaro Sommacal
> Quantide srl
> 
> http://www.quantide.com

Here is one work-around for sas.get on MS Windows platforms.

http://finzi.psych.upenn.edu/Rhelp10/2008-December/182573.html

Here is another thread that you might find useful.

http://tolstoy.newcastle.edu.au/R/e9/help/10/02/6248.html

Hope this is helpful,

Dan

Daniel J. Nordlund
Washington State Department of Social and Health Services
Planning, Performance, and Accountability
Research and Data Analysis Division
Olympia, WA  98504-5204

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R on Linux - a primer

2010-03-17 Thread Mike Miller
Thanks for sharing the interesting information about cran2deb.  I was 
unaware of that project (but I did know that Dirk E. had been doing Octave 
and R binaries for Debian for years).  Dirk Eddelbuettel (you spelled his 
name correctly) and Charles Blundell gave a talk at UseR! 2009...


http://dirk.eddelbuettel.com/papers/useR2009cran2deb.pdf

...where they seem to claim that they chose to develop for Debian partly 
because of its similarity to Ubuntu:


Common platform -- as Debian forms the base for Ubuntu and several other 
derivative or single-focus distributions


Audience -- given the reach of Debian and Ubuntu, large number of users 
can be reached with little effort


But that was in July.  This what he was saying in November:

http://www.mail-archive.com/r-sig-deb...@r-project.org/msg00892.html

And this is what he was saying a month ago:

https://stat.ethz.ch/pipermail/r-sig-debian/2010-February/001042.html

I guess we just don't know if cran2deb works on Unbuntu and we don't know 
if Charles and Dirk will be providing binaries for Ubuntu.  Apparently, 
they want to reach the Ubuntu user base, but it is a lot of work for them 
and they aren't getting the resources they need to pull it off.


Mike


On Tue, 16 Mar 2010, Emmanuel Charpentier wrote:


Le dimanche 14 mars 2010 à 18:04 -0400, Axel Urbiz a écrit :

Hi,

I'm looking to move from Windows into a 64-bit Linux environment. Which is
the best Linux Flavor to use within R? To install R on this environment, do
I need to do any compiling?


I'd like to add two cents of folly to the (wise) advice you've received
already.

Indeed , Ubuntu is one very good distribution whose management system
has made the care and feeding of a Linux system a *lot* easier for the
not-so-system-oriented people like yours truly. Whereas my first
contacts with a Unix-like system were about 30 years ago (Oh my, how
time flies, and how far away are Xenix and our 68000 systems ...), I'm
*still* not fond of system maintenance for it's own sake. Ubuntu added
an (almost) fool-proof maintenance system to an excellent distribution
called Debian, thus lowering the Linux entry bar to as low as it can be
humanely made. Some pretended that "Ubuntu" was a code word for "I'm too
stupid to configure Debian" ; quite untrue ! It only means "I'm too
busy|lazy to configure Debian", which is a Good Thing (TM).

But Debian has its strong points,  and one of them is *extremely* strong
for an R user : Dirk Eddelbuettel (whose name I'm almost surely
misspelling (sorry, Dirk !)) has created a marvelous system called
cran2deb which routinely creates binary Debian packages from (almost)
the 2000+ R packages available nowadays.

That might look small change : the basic tools used for
developing/compiling most R packages are small beer (at least by today's
standards).But some of them might depend on fiendishly
difficult-to-maintain foreign libraries. Dirk's cran2deb takes care of
that and creates any information that Debian's dpkg maintenance system
needs to automate *your* R maintenance chores by integrating them in
Debian's maintenance scheme, which is as automatic as you can get
without becoming an incomprehensible beast.

In fact, cran2deb is so good that Im seriously tempted to go back to
Debian (after almost 8 years of Debian use, Ubuntu's ease-of-use, easy
access to no-so-exotic hardware drivers (and the then-incessant
politically correct yack-yacking on some Debian mailing lists...) made
me switch to an early Ubuntu distribution). I did not yet switch back
(mostly for not-so-"superficial" hardware support reasons), but I
maintain a backup Debian installation "for the hell of it" and to test
waters. So far, they have been a lot less rough than they used to be,
but there are still occasional rows (e. g. a recent gotcha with
openoffice.org, which would have render myself unable to work with those
d*mn Word files for about a month, or forced me to do a maual repair
(which I hate...)).

So consider Debian as a (desirable) alternative to Ubuntu.

HTH,

Emmanuel Charpentier, DDS, MSc
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R on Linux - a primer

2010-03-15 Thread Mike Miller

On Sun, 14 Mar 2010, Jonathan Baron wrote:

Just to make this thoroughly confusing, I will say that I am very happy 
with Fedora



Just to make this less confusing: choose Ubuntu.  I say this because it is 
easy to use, has great repositories and it is the most popular Linux 
distro, so it should be easy to get help with it.  I have been running it 
on a number of machines doing a few different kinds of tasks and it has 
almost always been very easy to install.  I'm also happily running the 
Ubuntu Netbook Remix on a little Asus EeePC netbook.  To install R, just 
use the Synaptic program:


https://help.ubuntu.com/community/SynapticHowto

It couldn't be easier.

I don't work for Ubuntu and I don't have any friends or relatives working 
there either.


Mike

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] nano syntax highlighting for R

2010-03-13 Thread Mike Miller
Nano is Free Software and a nice intro text editor.  It also starts up 
very quickly and has good syntax highlighting functionality, so it makes a 
nice file viewer.  The syntax highlighting is configured in the ~/.nanorc 
file.  See the attached code and screenshot.


I was looking for good nano syntax highlighting code for R when I found by 
Stephen Haptonstahl's code here:


http://srh.ucdavis.edu/drupal/node/20

He is happy for others to use the code, modify it and distribute it.  My 
edited version of his code is attached.  To use it, just append it to your 
~/.nanorc file.  There's also a systemwide way to use it, but I don't know 
offhand.  To use nano in a view-only mode, just use the -v option to turn 
off editing functionality:


nano -v file.R

The .R file extension will cause Nano to use the R syntax highlighting, 
but it can also be triggered with the -Y option:


nano -v -Y R file.whatever

Of course, nano also is an editor, so just drop the -v option if you want 
to edit a file.


The screenshot looks pretty nice now (I'm using xterm with white on black 
and haven't tested with black on white), but I'm sure it can be improved 
further.  I don't have time to work on it much now, but maybe someone else 
will want to pick it up.


Best,
Mike

--
Michael B. Miller, Ph.D.
Bioinformatics Specialist
Minnesota Center for Twin and Family Research
Department of Psychology
University of Minnesota
#  Syntax highlighting for R   #
#  by Stephen Haptonstahl  #
#  March 15, 2009  #
#  http://srh.ucdavis.edu/drupal/node/20   #
#  edited by Mike Miller   #


syntax "R" "\.R$"

# reserved words
color brightyellow 
"\b(if|else|repeat|while|function|for|in|next|break|TRUE|T|FALSE|F|NULL|Inf|NaN|NA|NA_integer_|NA_real_|NA_complex_|NA_character_|\.\.\.)\b"
color brightyellow "\.\.[0-9]"
color brightred "\b(require|library)\b"

# logicals
color brightgreen "(==|<=|>=|!=|!|<|>|\||\|\||&|&&|%in%|%%|%\*%|%/%|%o%|%x%)"

# strings
color cyan "'[^']*'"
color cyan ""[^"]*""
# "

# variable definitions
color blue "^.*?<-"

color yellow start="[...@%]" end="[[:alnum:]]*"

# function definitions
color magenta "\<([A-Za-z0-9\.]+)\>\("

# parameters -- not working yet
# icolor brightblue "[^,\(=]*=(?:[^=])"

# danger!
color black,red "(stop|warning|return)"
color red " = "

color yellow "[(){}[;|<>]"
color yellow "\]"
color brightred "<-"

## Comment highlighting
color brightblack "#.*$"
<>__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] A slight trap in read.table/read.csv.

2010-03-08 Thread Mike Prager
Rolf Turner  wrote:
> 
> I solved the problem by putting in a colClasses argument in my
> call to read.csv().  But I really think that the read functions
> are being too clever by half here.  If field entries are surrounded
> by quotes, shouldn't they be left as character?  Even if they are
> all F's and T's?
> 
> Furthermore using F's and T's to represent TRUE's and FALSE's is
> bad practice anyway.  Since FALSE and TRUE are reserved words it
> would make sense for the read function to assume that a field is
> logical if it consists entirely of these words.  But T's and F's
>  I don't think so.
> 
> I would argue that this behaviour should be changed.  I can see no
> downside to such a change.
> 

I agree with you, Rolf, that this is horrid behavior. It is such
automatic devices that have made people hate (e.g.) Microsoft
Word with a passion. 

Yet, in R this is a designed-in bug (e.g., feature) that
probably can't be changed without making some legacy code not
work. But at least, T and F could be removed soon as synonms for
TRUE and FALSE. We have seen that "_" was removed as an
assignment operator, and the world did not crumble. The use of T
and F is no less error-prone, and possibly more.

The only immediate solution to this accretion of overly clever
behavior would be for someone to write new functions (say,
Read.csv) that didn't do all those conversions behind the
scenes. I'm not about to do that. Are you?

Best of luck!

-- 
Mike Prager, NOAA, Beaufort, NC
* Opinions expressed are personal and not represented otherwise.
* Any use of tradenames does not constitute a NOAA endorsement.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Assign Name of Data Frame

2010-02-12 Thread Mike Harwood
Marc,

Thank you for your help.  I had tried "assign", but I did not enclose the
table name in quotes in my function call.  I needed to see what someone else
had written before I ever would have noticed it!

On Fri, Feb 12, 2010 at 10:00 AM, Marc Schwartz wrote:

> On Feb 12, 2010, at 8:19 AM, mah wrote:
>
> > Hello R Experts,
> >
> > How can I assign the name of a data frame with the argument of a
> > function?  Specifically I am using RODBC to build local dataframes
> > from SAS datasets on a
> > remote server.  I would like the local dataframe have the same name as
> > the source SAS dataset, and the function below is what I am
> > developing.  However, the "substitute(table)" on the left side of the
> > assignment
> > generates the error "Error in substitute(table) <<- sqlQuery(sears,
> > sql.stmt) :
> > could not find function "substitute<-".
> >
> > Thanks in advance
> >
> > MakeDF <- function(table)
> > #
> > # Function makes dataframe from UNIX SAS datasets
> > #
> > {
> > st.time <- Sys.time()
> > print(substitute(table))
> > sql.stmt <- paste("select * from swprod.", substitute(table),
> > sep="")
> > print(sql.stmt)
> > substitute(table) <<- sqlQuery(sears, sql.stmt)
> > #  deparse(substitute(table)) <<- sqlQuery(sears, sql.stmt)
> > end.time
> > print(end.time - st.time)
> > }
> > MakeDF(sku_series)
>
>
>
> My recommendation would be something like this:
>
> MakeDF <- function(table)
> {
>  DF <- sqlQuery(channel, paste("select * from swprod.", table, sep = ""))
>  assign(table, DF, envir = parent.frame())
> }
>
> Then use would be:
>
>  MakeDF("sku_series")
>
>
> The result would be a data frame called 'sku_series' in the calling
> environment.  You could substitute globalenv() for parent.frame() if you
> wanted to create the data frame in the global environment.
>
> See ?assign
>
> HTH,
>
> Marc Schwartz
>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Help Please!

2010-02-10 Thread Mike Williams
On Wed, Feb 10, 2010 at 2:47 AM, Nick Manginelli  wrote:
> So I have to use this table of min, max, and mean temps for certain 
> years http://www.stat.berkeley.edu/classes/s133/data/january.tab. I am 
> supposed to figure out which year had the hottest January and which had the 
> coldest. But I dont know how to!
>
> Nick Manginelli

For starters I'd suggest pruning the data with grep, then you can
pretty much eyeball the result.

[m...@localhost lab1]$ grep " 1$" january.txt
45.550.67   62.120051
50.755.02   59.520061
43.953.23   65.720071
42.252.16   64.720081
46.651.93   59.920091
53  57.75   63.420101

Although you have to decide if you want to use the min, max, or mean
temp to rank the years.  If you use min for coldest its 2008, using
mean it would be 2009.

Also, if you are going to play with this data with R you probably want
to change the headings because it will be confusing, (to you if not to
R) to have column names that match R builtin commands. Maybe use minT
meanTmaxT yearday
I'm sure someone else here can help you with using R.  I'm just
learning R myself and also just about to go to sleep.

Mike

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] metafor package: effect sizes are not fully independent

2010-02-07 Thread Mike Cheung
Dear Gang,

It seems that it is possible to use a univariate meta-analysis to
handle your multivariate effect sizes. If you want to calculate a
weighted average first, Hedges and Olkin (1985) has discussed this
approach.

Hedges, L. V., & Olkin, I. (1985). Statistical methods for
meta-analysis. Orlando, FL: Academic Press.

Regards,
Mike
-- 
-----
 Mike W.L. Cheung   Phone: (65) 6516-3702
 Department of Psychology   Fax:   (65) 6773-1843
 National University of Singapore
 http://courses.nus.edu.sg/course/psycwlm/internet/
-

On Mon, Feb 8, 2010 at 6:48 AM, Gang Chen  wrote:
> Dear Mike,
>
> Thanks a lot for the kind help!
>
> Actually a few months ago I happened to read a couple of your posts on
> the R-help archive when I was exploring the possibility of using lme()
> in R for meta analysis.
>
> First of all, I didn't specify the meta analysis model for my cases
> correctly in my previous message. Currently I'm only interested in
> random- or mixed-effects meta analysis. So what you've suggested is
> directly relevant to what I've been looking for, especially for case
> (2). I'll try to gather those references you listed, and figure out
> the details.
>
> Also I think I didn't state my case (1) clearly in my previous post.
> In that case, all the effect sizes are the same and in the same
> condition too (e.g., happy), but each source has multiple samples of
> the measurement (and also measurement error, or standard error). Could
> this still be handled as a multivariate meta analysis since the
> samples for the the same source are correlated? Or somehow the
> multiple measures from the same source can be somehow summarized
> (weighted average?) before the meta analysis?
>
> Your suggestions are highly appreciated.
>
> Best wishes,
> Gang
>
>
> On Sun, Feb 7, 2010 at 10:39 AM, Mike Cheung  wrote:
>> Dear Gang,
>>
>> Here are just some general thoughts. Wolfgang Viechtbauer will be a
>> better position to answer questions related to metafor.
>>
>> For multivariate effect sizes, we first have to estimate the
>> asymptotic sampling covariance matrix among the effect sizes. Formulas
>> for some common effect sizes are provided by Gleser and Olkin (2009).
>>
>> If a fixed-effects model is required, it is quite easy to write your
>> own GLS function to conduct the multivariate meta-analysis (see e.g.,
>> Becker, 1992). If a random-effects model is required, it is more
>> challenging in R. SAS Proc MIXED can do the work (e.g., van
>> Houwelingen, Arends, & Stijnen, 2002).
>>
>> Sometimes, it is possible to transform the multivariate effect sizes
>> into independent effect sizes (Kalaian & Raudenbush, 1996; Raudenbush,
>> Becker, & Kalaian, 1988). Then univariate meta-analysis, e.g.,
>> metafor(), can be performed on the transformed effect sizes. This
>> approach works if it makes sense to pool the multivariate effect sizes
>> as in your case (2)- the effect sizes are the same but in different
>> conditions (happy, sad, and neutral). However, this approach does not
>> work if the multivariate effect sizes are measuring different
>> concepts, e.g., verbal achievement and mathematical achievement.
>>
>> Hope this helps.
>>
>> Becker, B. J. (1992). Using results from replicated studies to
>> estimate linear models. Journal of Educational Statistics, 17,
>> 341-362.
>> Gleser, L. J., & Olkin, I. (2009). Stochastically dependent effect
>> sizes. In H. Cooper, L. V. Hedges, and J. C. Valentine (Eds.), The
>> handbook of research synthesis and meta-analysis, 2nd edition (pp.
>> 357-376). New York: Russell Sage Foundation.
>> Kalaian, H. A., & Raudenbush, S. W. (1996). A multivariate mixed
>> linear model for meta-analysis. Psychological Methods, 1, 227-235.
>> Raudenbush, S. W., Becker, B. J., & Kalaian, H. (1988). Modeling
>> multivariate effect sizes. Psychological Bulletin, 103, 111-120.
>> van Houwelingen, H.C., Arends, L.R., & Stijnen, T. (2002). Advanced
>> methods in meta-analysis: multivariate approach and meta-regression.
>> Statistics in Medicine, 21, 589-624.
>>
>> Regards,
>> Mike
>> --
>> -
>>  Mike W.L. Cheung               Phone: (65) 6516-3702
>>  Department of Psychology       Fax:   (65) 6773-1843
>>  National University of Singapore
>>  http://courses.nus.edu.sg/course/psycwlm/internet/
>> --

Re: [R] metafor package: effect sizes are not fully independent

2010-02-07 Thread Mike Cheung
Dear Gang,

Here are just some general thoughts. Wolfgang Viechtbauer will be a
better position to answer questions related to metafor.

For multivariate effect sizes, we first have to estimate the
asymptotic sampling covariance matrix among the effect sizes. Formulas
for some common effect sizes are provided by Gleser and Olkin (2009).

If a fixed-effects model is required, it is quite easy to write your
own GLS function to conduct the multivariate meta-analysis (see e.g.,
Becker, 1992). If a random-effects model is required, it is more
challenging in R. SAS Proc MIXED can do the work (e.g., van
Houwelingen, Arends, & Stijnen, 2002).

Sometimes, it is possible to transform the multivariate effect sizes
into independent effect sizes (Kalaian & Raudenbush, 1996; Raudenbush,
Becker, & Kalaian, 1988). Then univariate meta-analysis, e.g.,
metafor(), can be performed on the transformed effect sizes. This
approach works if it makes sense to pool the multivariate effect sizes
as in your case (2)- the effect sizes are the same but in different
conditions (happy, sad, and neutral). However, this approach does not
work if the multivariate effect sizes are measuring different
concepts, e.g., verbal achievement and mathematical achievement.

Hope this helps.

Becker, B. J. (1992). Using results from replicated studies to
estimate linear models. Journal of Educational Statistics, 17,
341-362.
Gleser, L. J., & Olkin, I. (2009). Stochastically dependent effect
sizes. In H. Cooper, L. V. Hedges, and J. C. Valentine (Eds.), The
handbook of research synthesis and meta-analysis, 2nd edition (pp.
357-376). New York: Russell Sage Foundation.
Kalaian, H. A., & Raudenbush, S. W. (1996). A multivariate mixed
linear model for meta-analysis. Psychological Methods, 1, 227-235.
Raudenbush, S. W., Becker, B. J., & Kalaian, H. (1988). Modeling
multivariate effect sizes. Psychological Bulletin, 103, 111-120.
van Houwelingen, H.C., Arends, L.R., & Stijnen, T. (2002). Advanced
methods in meta-analysis: multivariate approach and meta-regression.
Statistics in Medicine, 21, 589-624.

Regards,
Mike
--
-----
 Mike W.L. Cheung   Phone: (65) 6516-3702
 Department of Psychology   Fax:   (65) 6773-1843
 National University of Singapore
 http://courses.nus.edu.sg/course/psycwlm/internet/
-

On Sat, Feb 6, 2010 at 6:07 AM, Gang Chen  wrote:
> In a classical meta analysis model y_i = X_i * beta_i + e_i, data
> {y_i} are assumed to be independent effect sizes. However, I'm
> encountering the following two scenarios:
>
> (1) Each source has multiple effect sizes, thus {y_i} are not fully
> independent with each other.
> (2) Each source has multiple effect sizes, each of the effect size
> from a source can be categorized as one of a factor levels (e.g.,
> happy, sad, and neutral). Maybe better denote the data as y_ij, effect
> size at the j-th level from the i-th source. I can code the levels
> with dummy variables into the X_i matrix, but apparently the data from
> the same source are correlated with each other. In this case, I would
> like to run a few tests one of which is, for example, whether there is
> any difference across all the levels of the factor.
>
> Can metafor handle these two cases?
>
> Thanks,
> Gang
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
-
 Mike W.L. Cheung   Phone: (65) 6516-3702
 Department of Psychology   Fax:   (65) 6773-1843
 National University of Singapore
 http://courses.nus.edu.sg/course/psycwlm/internet/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] control of scat1d tick color in plot.Predict?

2010-01-27 Thread Mike Babyak
Hi All,

I have a quick question about using plot.Predict now that the rms package
uses lattice.  I'd like to add tick marks along the regression line, which
is given by data=llist(variablename) in the plot call.  The ticks show up
fine, but I'd like to alter the color.  I know the ticks are produced by
scat1d, but after spending a fair bit of time going through documentation,
it still isn't clear to me how to do this in the context of lattice.
Guidance would be greatly appreciated.

Thanks,

Mike Babyak
Duke University Medical Center

My code using R 2.10.1/windows XP

myx<-c(1,2,3,4)
myy<-c(1,2,3,5)

library(rms)

d<-datadist(myx)
options(datadist="d")

myfit<-ols(myy~myx,x=T,y=T)

p1<-Predict(myfit,myx =.)

library(lattice)

#change line to black
line <- trellis.par.get("plot.line")
line$col <- 1
trellis.par.set("plot.line", line)

plot(p1, data=llist(myx),col.fill="lightgray", lwd=1.5)

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Error compiling R 2.10.1 on AIX

2010-01-19 Thread Mike Waldron
I'm trying to compile R 2.10.1 on AIX 5.3, and am getting the following
error:

Error in read.dcf(file = descfile) : 
  Line starting 'Package: tools ...' is malformed!
Calls: makeLazyLoading ... code2LazyLoadDB -> loadNamespace ->
parseNamespaceFile -> read.dcf
Execution halted
make[3]: *** [all] Error 1
make[3]: Leaving directory
`/afs/.isis.unc.edu/pkg/r-2.10.1/.build/rs_aix53/R-patched/src/library/tools
'

My environment and configure settings are as follows:
   export PATH=/usr/local/bin:/opt/freeware/bin:$PATH
   export OBJECT_MODE=64
   export LIBICONV=/opt/freeware
   export CC="xlc_r -q64"
   export CFLAGS="-O -qstrict"
   export CXX="xlC_r -q64"
   export CXXFLAGS="-O -qstrict"
   export AR="ar -X64"
   export F77="xlf_r -q64"
   export CPPFLAGS="-I/afs/isis/pkg/libpng/include -I/usr/local/include
-I$LIBICONV/include -I/usr/lpp/X11/include/X11"
   export LDFLAGS="-L/usr/local/lib -L$LIBICONV/lib -L/usr/lib
-L/usr/X11R6/lib"
   export CAIRO_CFLAGS="-I/opt/freeware/include/cairo
-I/opt/freeware/include/freetype2"
   export CAIRO_LIBS="-L/opt/freeware/lib -lcairo"
   export JAVA_HOME=/usr/java14_64
   export JAVA_CPPFLAGS="-I/usr/java14_64/include"
   export LDR_CNTRL=USERREGS

./configure --prefix=/afs/.isis/pkg/r-2.10.1 --with-tcltk=/usr/local/lib
--with-tcl-config=/usr/local/lib/tclConfig.sh
--with-tk-config=/usr/local/lib/tkConfig.sh

Mike Waldron

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how can I use R functions in Fortran 90

2009-12-29 Thread Mike Prager
Anny Huang  wrote:

> Is there a way that I can import R functions into Fortran? Especially, I
> want to generate random numbers from some not-so-common distributions (e.g.
> inverted chi square) but did not find any routines written in Fortran that
> deal with distributions other than uniform and normal.

If you are interested in pure Fortran code for the problem, you
could look at routine G01FCF here:

http://gams.nist.gov/search.cgi?Pattern=chi+square&Boolean=AND&Match=Full&Limit=100&Show=Yes

GAMS (the Guide to Available Mathematical Software) has a lot of
good code.



-- 
Mike Prager, NOAA, Beaufort, NC
* Opinions expressed are personal and not represented otherwise.
* Any use of tradenames does not constitute a NOAA endorsement.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] multiple plots using summary in rms package

2009-12-07 Thread Mike Babyak
Dear All,

I wonder if someone can point me in the right direction here.  I'm working
with the rms library, R 2.9.2 under Windows XP.

I'm trying to arrange two plots side by side for a colleague.  mfrow or
mfcol do not seem to work, however, so I am obviously missing something
important.  I know that there have been changes in the graphics from Design
to rms, but am just not sure where to find specific documentation about this
particular issue.

Below is the code I'm using.  If I run the code except for the last plot
call, the first plot is correctly produces in the first column with empty
space for the second plot in col 2.  But when I add the second plot call, it
just overwrites the first plot in col1 on the left.

If I run the code below using something like plot(x~y) and plot(a~b) instead
of the two rms summary objects, it works fine, so I am assuming there is
something about rms I am missing.

I'd be happy to provide some data if that would help.

Any guidance would be greatly appreciated.

Thanks,

Mike Babyak
Duke University Medical Center



#summary of dietary variables broken out by Groups A, B, and C
sumfc<-summary(Group~fcgrain+fcveg+fcfruit+fcmeat+fcdairy+fcnsl+
fcfat+fcsatfat+fcsweet+fcsod,
method='reverse', overall=F,  test=F)

sumpc<-summary(Group~pcgrain+pcveg+pcfruit+pcmeat+pcdairy+
pcnsl+pcfat+pcsatfat+pcsweet+pcsod,
method='reverse', overall=F,  test=F)

par(mfcol=c(1,2),oma=c(1,0,4,0))
plot(sumfc, which='categorical',
main='Full Compliance',pch=c('A','B','C'))
 Key(0,-.1)

plot(sumpc, which='categorical',
main='At Least Partial Compliance',pch=c('A','B','C'))
mtitle("Dietary Adherence")

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Getting Sphericity Tests for Within Subject Repeated Measure Anova (using "car" package)

2009-11-10 Thread Mike Lawrence
Oops, I see now that despite repeated subject names, treatment is a
between-Ss variable, so you need to use this to get the equivalent of
Anova:

library(ez)
a = read.table( 'Sergios_wide_data.txt' , header=T )
b = melt.data.frame( data=a , id.vars=c('subject','treatment') ,
variable_name='day' )
b$subject=paste(b$subject,b$treatment) #create unique sids
ezANOVA( data=b , dv=.(value) , sid=.(subject) , within=.(day) ,
between=.(treatment) )




On Tue, Nov 10, 2009 at 6:47 AM, Mike Lawrence  wrote:
> Check out the reshape package for transforming data from long to wide
> and vice versa.
>
> Yet I still don't know what problem you've encountered with ezANOVA.
> Using the data you just sent, where Day now has 3 levels, I reformat
> back to the presumably original long format and find that ezANOVA
> returns the same sphericity tests as John's solution (which is
> expected because ezANOVA is a wrapper to Anova):
>
> library(ez)
> a = read.table( 'Sergios_wide_data.txt' , header=T )
> b = melt.data.frame( data=a , id.vars=c('subject','treatment') ,
> variable_name='day' )
> ezANOVA( data=b , dv=.(value) , sid=.(subject) , within=.(treatment,day) )
>
>
>
> On Mon, Nov 9, 2009 at 10:20 PM, Sergios (Sergey) Charntikov
>  wrote:
>> Thank you very much.  Finally got it to work.  However, I had to recode it 
>> from:
>> columns: subject/treatment/DV (where all my response data was in one
>> DV column) to columns: subject/treatment/day1/day2/day3/ (where my
>> response data is now in three different columns).
>>
>> Is there a way to do that without hand recoding (cutting and pasting
>> in spreadsheet) by hand? Thank you for your help.  Glad it works as
>> is.
>>
>>
>> Sincerely,
>>
>> Sergios Charntikov (Sergey), MA
>>
>> Behavioral Neuropharmacology Lab
>> Department of Psychology
>> University of Nebraska-Lincoln
>> Lincoln, NE 68588-0308  USA
>>
>>
>>
>>
>>
>> On Mon, Nov 9, 2009 at 7:12 PM, John Fox  wrote:
>>> Dear Sergios,
>>>
>>> Why don't you try what I suggested originally? Adapted to this data set,
>>>
>>> mod <- lm(cbind(day1, day2, day3) ~ Treatment, data=Dataset)
>>> idata <- data.frame(Day=factor(1:3))
>>> summary(Anova(mod, idata=idata, idesign=~Day))
>>>
>>> Peter Dalgaard also pointed toward an article that describes how to do the
>>> same thing with anova().
>>>
>>> Regards,
>>>  John
>>>
>>>> -Original Message-
>>>> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
>>> On
>>>> Behalf Of Sergios (Sergey) Charntikov
>>>> Sent: November-09-09 7:13 PM
>>>> To: Mike Lawrence
>>>> Cc: r-help@r-project.org
>>>> Subject: Re: [R] Getting Sphericity Tests for Within Subject Repeated
>>> Measure
>>>> Anova (using "car" package)
>>>>
>>>> Hi Mike,
>>>>
>>>> I tried to run my data in SPSS and it works fine without any problems,
>>> plug
>>>> in my levels, plug in my covariate (since it is all within) and get my
>>>> Mauchly Tests.
>>>>
>>>> I tried to rearrange the data so it looks like this
>>>>
>>>> subj/treatment/day1/day2/day3
>>>>
>>>> subject    treatment    day1    day2    day3
>>>> 1    1    8    8    8
>>>> 1    2    5    7    5
>>>> 2    1    7    4    4
>>>> 2    2    4    5    7
>>>> 3    1    8    6    4
>>>> 3    2    5    2    4
>>>> 4    1    2    9    4
>>>> 4    2    1    9    1
>>>> 5    1    4    8    1
>>>> 5    2    7    8    2
>>>> 6    1    4    7    2
>>>> 6    2    4    5    2
>>>>
>>>>
>>>> When I try mlmfit <- lm(Dataset~1), I get "invalid type (list) for
>>> variable
>>>> 'Dataset"
>>>>
>>>> When I try
>>>>
>>>> mod <- lm(cbind(day1, day2, day3) ~ Treatment, data=Dataset)
>>>>
>>>> idata<- data.frame(factor(rep(c(Dataset$day1, Dataset$day2,
>>> Dataset$day3))),
>>>> ordered(Dataset$Treatment))
>>>>
>>>> Anova(mod, idata=idata, idesign=~Dataset$Treatment)
>>>>
>>>> I get: Terms in the intra-subject model matrix are not orthogonal.
>>>>
>>>> When I try is.matri

Re: [R] Getting Sphericity Tests for Within Subject Repeated Measure Anova (using "car" package)

2009-11-10 Thread Mike Lawrence
Check out the reshape package for transforming data from long to wide
and vice versa.

Yet I still don't know what problem you've encountered with ezANOVA.
Using the data you just sent, where Day now has 3 levels, I reformat
back to the presumably original long format and find that ezANOVA
returns the same sphericity tests as John's solution (which is
expected because ezANOVA is a wrapper to Anova):

library(ez)
a = read.table( 'Sergios_wide_data.txt' , header=T )
b = melt.data.frame( data=a , id.vars=c('subject','treatment') ,
variable_name='day' )
ezANOVA( data=b , dv=.(value) , sid=.(subject) , within=.(treatment,day) )



On Mon, Nov 9, 2009 at 10:20 PM, Sergios (Sergey) Charntikov
 wrote:
> Thank you very much.  Finally got it to work.  However, I had to recode it 
> from:
> columns: subject/treatment/DV (where all my response data was in one
> DV column) to columns: subject/treatment/day1/day2/day3/ (where my
> response data is now in three different columns).
>
> Is there a way to do that without hand recoding (cutting and pasting
> in spreadsheet) by hand? Thank you for your help.  Glad it works as
> is.
>
>
> Sincerely,
>
> Sergios Charntikov (Sergey), MA
>
> Behavioral Neuropharmacology Lab
> Department of Psychology
> University of Nebraska-Lincoln
> Lincoln, NE 68588-0308  USA
>
>
>
>
>
> On Mon, Nov 9, 2009 at 7:12 PM, John Fox  wrote:
>> Dear Sergios,
>>
>> Why don't you try what I suggested originally? Adapted to this data set,
>>
>> mod <- lm(cbind(day1, day2, day3) ~ Treatment, data=Dataset)
>> idata <- data.frame(Day=factor(1:3))
>> summary(Anova(mod, idata=idata, idesign=~Day))
>>
>> Peter Dalgaard also pointed toward an article that describes how to do the
>> same thing with anova().
>>
>> Regards,
>>  John
>>
>>> -Original Message-
>>> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
>> On
>>> Behalf Of Sergios (Sergey) Charntikov
>>> Sent: November-09-09 7:13 PM
>>> To: Mike Lawrence
>>> Cc: r-help@r-project.org
>>> Subject: Re: [R] Getting Sphericity Tests for Within Subject Repeated
>> Measure
>>> Anova (using "car" package)
>>>
>>> Hi Mike,
>>>
>>> I tried to run my data in SPSS and it works fine without any problems,
>> plug
>>> in my levels, plug in my covariate (since it is all within) and get my
>>> Mauchly Tests.
>>>
>>> I tried to rearrange the data so it looks like this
>>>
>>> subj/treatment/day1/day2/day3
>>>
>>> subject    treatment    day1    day2    day3
>>> 1    1    8    8    8
>>> 1    2    5    7    5
>>> 2    1    7    4    4
>>> 2    2    4    5    7
>>> 3    1    8    6    4
>>> 3    2    5    2    4
>>> 4    1    2    9    4
>>> 4    2    1    9    1
>>> 5    1    4    8    1
>>> 5    2    7    8    2
>>> 6    1    4    7    2
>>> 6    2    4    5    2
>>>
>>>
>>> When I try mlmfit <- lm(Dataset~1), I get "invalid type (list) for
>> variable
>>> 'Dataset"
>>>
>>> When I try
>>>
>>> mod <- lm(cbind(day1, day2, day3) ~ Treatment, data=Dataset)
>>>
>>> idata<- data.frame(factor(rep(c(Dataset$day1, Dataset$day2,
>> Dataset$day3))),
>>> ordered(Dataset$Treatment))
>>>
>>> Anova(mod, idata=idata, idesign=~Dataset$Treatment)
>>>
>>> I get: Terms in the intra-subject model matrix are not orthogonal.
>>>
>>> When I try is.matrix(Dataset) - I get no.
>>>
>>> My original mock Dataset (attached in txt) is below.  Maybe I am not
>> coding
>>> it right? I would hate to recode all my data for SPSS, since at the end I
>>> would need to show that Sphericity was not violated.
>>>
>>> Subj  Trtmt   Sessn   Response
>>>
>>> 1     N       1       5
>>>
>>> 1     D       1       6
>>>
>>> 1     N       2       4
>>>
>>> 1     D       2       7
>>>
>>> 2     N       1       8
>>>
>>> 2     D       1       9
>>>
>>> 2     N       2       2
>>>
>>> 2     D       2       1
>>>
>>> 3     N       1       4
>>>
>>> 3     D       1       5
>>>
>>> 3     N       2       6
>>>
>>> 3     D       2       2
>>>
>>> 4     N       1       5
>>>
>&g

<    1   2   3   4   5   6   7   8   9   >