Re: [R] Very slow optim(): solved

2011-07-14 Thread Mike Marchywka





> Date: Thu, 14 Jul 2011 12:44:18 -0800
> From: toshihide.hamaz...@alaska.gov
> To: r-h...@stat.math.ethz.ch
> Subject: Re: [R] Very slow optim(): solved
>
> After Googling and trial and errors, the major cause of optimization was not 
> functions, but data setting.
> Originally, I was using data.frame for likelihood calculation. Then, I 
> changed data.frame to vector and matrix for the same likelihood calculation. 
> Now convergence takes ~ 14 sec instead of 25 min. Certainly, I didn't know 
> this simple change makes huge computational difference.

Thanks, can you pass along any additional details like google links you found 
or comment on
the resulting limitation( were you CPU limited converting data formats or did 
this cause memory
problems leading to VM thrashing?)? I've often had c++ code that turns out to 
be IO limited
when I expected I was doing real complicated computations, it never hurts to go 
beyond
the usual suspects LOL. 




>
>
>
> Toshihide "Hamachan" Hamazaki, 濱崎俊秀PhD
> Alaska Department of Fish and Game: アラスカ州漁業野生動物課
> Diivision of Commercial Fisheries: 商業漁業部
> 333 Raspberry Rd. Anchorage, AK 99518
> Phone: (907)267-2158
> Cell: (907)440-9934
>
> -Original Message-
> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
> Behalf Of Ben Bolker
> Sent: Wednesday, July 13, 2011 12:21 PM
> To: r-h...@stat.math.ethz.ch
> Subject: Re: [R] Very slow optim()
>
> Hamazaki, Hamachan (DFG  alaska.gov> writes:
>
> >
> > Dear list,
> >
> > I am using optim() function to MLE ~55 parameters, but it is very slow to
> converge (~ 25 min), whereas I can do
> > the same in ~1 sec. using ADMB, and ~10 sec using MS EXCEL Solver.
> >
> > Are there any tricks to speed up?
> >
> > Are there better optimization functions?
> >
>
> There's absolutely no way to tell without knowing more about your code. You
> might try method="CG":
>
> Method ‘"CG"’ is a conjugate gradients method based on that by
> Fletcher and Reeves (1964) (but with the option of Polak-Ribiere
> or Beale-Sorenson updates). Conjugate gradient methods will
> generally be more fragile than the BFGS method, but as they do not
> store a matrix they may be successful in much larger optimization
> problems.
>
> If ADMB works better, why not use it? You can use the R2admb
> package (on R forge) to wrap your ADMB calls in R code, if you
> prefer that workflow.
>
> Ben
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Very slow optim()

2011-07-13 Thread Mike Marchywka




>
> Hamazaki, Hamachan (DFG  alaska.gov> writes:
>
> >
> > Dear list,
> >
> > I am using optim() function to MLE ~55 parameters, but it is very slow to
> converge (~ 25 min), whereas I can do
> > the same in ~1 sec. using ADMB, and ~10 sec using MS EXCEL Solver.
> >
> > Are there any tricks to speed up?
> >
> > Are there better optimization functions?
> >
>
> There's absolutely no way to tell without knowing more about your code. You
> might try method="CG":

I guess the first thing to do is look at  task manager and see if it is memory
or CPU limited. In the absence of gross algorithmic differences, the "other 
stuff"
that you may be doing could be a big deal. I saw similar performance issues
in my own c++ code where the fast result was obtained just by sorting the input
data to stop VM thrashing. A sort on large data is noramlly considered slow
and wastful if later algorithm doesn't require that but memory coherence can be
a big deal.





>
> Method ‘"CG"’ is a conjugate gradients method based on that by
> Fletcher and Reeves (1964) (but with the option of Polak-Ribiere
> or Beale-Sorenson updates). Conjugate gradient methods will
> generally be more fragile than the BFGS method, but as they do not
> store a matrix they may be successful in much larger optimization
> problems.
>
> If ADMB works better, why not use it? You can use the R2admb
> package (on R forge) to wrap your ADMB calls in R code, if you
> prefer that workflow.
>
> Ben
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
  
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Using t tests

2011-07-10 Thread Mike Marchywka


( after getting confirmation of lack of posts try again, LOL )


> From: marchy...@hotmail.com
> To: r-help@r-project.org
> Subject: RE: [R] Using t tests
> Date: Sun, 10 Jul 2011 10:13:51 -0400
>
>
> ( sorry if this is a repost but I meant to post to list and never received
> any indication it was sent to the list, thanks asking for comments about 
> approach
> to data analysis).
>
>
> > From: marchy...@hotmail.com
> > To: j...@bitwrit.com.au; gwanme...@aol.com
> > CC: r-help@r-project.org
> > Subject: RE: [R] Using t tests
> > Date: Sun, 10 Jul 2011 07:35:32 -0400
> >
> >
> >
> >
> >
> > 
> > > Date: Sat, 9 Jul 2011 18:40:43 +1000
> > > From: j...@bitwrit.com.au
> > > To: gwanme...@aol.com
> > > CC: r-help@r-project.org
> > > Subject: Re: [R] Using t tests
> > >
> > > On 07/08/2011 07:22 PM, gwanme...@aol.com wrote:
> > > > Dear Sir,
> > > >
> > > > I am doing some work on a population of patients. About half of them are
> > > > admitted into hospital with albumin levels less than 33. The other half 
> > > > have
> > > > albumin levels greater than 33, so I stratify them into 2 groups, x and 
> > > > y
> > > > respectively.
> > > >
> > > > I suspect that the average length of stay in hospital for the group of
> > > > patients (x) with albumin levels less than 33 is greater than those with
> > > > albumin levels greater than 33 (y).
> > > >
> > > > What command function do I use (assuming that I will be using the chi
> > > > square test) to show that the length of stay in hospital of those in 
> > > > group x is
> > > > statistically significantly different from those in group y?
> > > >
> > > Hi Ivo,
> > > Just to make things even more complicated for you, Mark's suggestion
> > > that the length_of_stay measure is unlikely to be normally distributed
> > > might lead you to look into a non-parametric test like the Wilcoxon (aka
> >
> > ( please correct any of the following which is wrong, but note that
> > the discusion is more interesting and useful with details of your goals )
> > I'm curious why people still jump to setting arbitrary cutoff points,
> > in this case based on what you happen to have sampled, rather than
> > try to find a functional relationship between the two parametric
> > variables? Generally the thing that separates likely cause from
> > noise is smotthness or something you can at least rationalize
> > in terms of physical mechanisms. If your question relates
> > to the reprodiciblity of a given result (" well this experiment showed
> > hi and low were significantly different on hospital stays, maybe the next
> > experiement will show the same ") you'd probably like to consider
> > the data in relation to possible causes. I'd not sure your disease process
> > would know about your median test results when patients walk in. BTW,
> > what is terminating the hospital stay, cure death or insurance exhaustion?
> > This sounds like you are just trying to reproduce something that is already
> > in the literature:cutoff is on the low side of normal and often hypoprotein
> > is suspected of being bad, that the higher group would be usually expected 
> > to do better no? Although
> > I suppose this could have something to do with dehydration etc but the point
> > of course is that data interpretation is difficult to do in a vacuum.
> >
> >
> >
> >
> >
> >
> >
> >
> > > Mann-Whitney in your case) test. You will have to split your
> > > length_of_stay measure into two like this (assume your data frame is
> > > named "losdf"):
> > >
> > > albumin_hilo <- albumin > 33
> > > wilcox.test(losdf$length-of-stay[albumin_hilo],
> > > losdf$length_of_stay[!albumin_hilo])
> > >
> > > or if you use wilcox_test in the "coin package:
> > >
> > > albumin_hilo <- albumin > 33
> > > wilcox_test(length_of_stay~albumin_hilo,losdf)
> > >
> > > Do remember that the chi-square test is used for categorical variables,
> > > for instance if you dichotomized your length_of_stay into less than 10
> > > days or 10 days and over.
> > >
> > > Jim
> > >
> > > __
> > > R-help@r-project.org mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide 
> > > http://www.R-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible code.
> >
>
  
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] problem loading rgdal with Rapache, problem solved due to libexpat with apache.

2011-07-08 Thread Mike Marchywka

Well, I'm not sure it is really an rgdal "problem" as much as 
having a stale so and I didn;t want to go wreck the thing to capture
the exact error message, the point was that apache build seems to have
come with incompatible expat so.  In the apache/lib dir, I apparently had a much
different libs, now named xxx, than that which was in /lib64. If I dump
the symbols with objdump the one rgdal complained about, IIRC XML_StopParser
does not appear in either output but they do grep differently
as shown below ( 

-rw-r--r-- 1 root root 605314 May  1 07:09 libexpat.a
-rwxr-xr-x 1 root root    805 May  1 07:09 libexpat.la
lrwxrwxrwx 1 root root 17 May  1 07:09 libexpat.so -> libexpat.so.0.5.0
lrwxrwxrwx 1 root root 17 May  1 07:09 libexpat.so.0 -> libexpat.so.0.5.0
-rwxr-xr-x 1 root root 143144 Jul  6 16:29 libexpat.so.0.5.0
-rwxr-xr-x 1 root root 398422 Jul  6 16:27 libexpat.soxxx.0.5.0

[marchywka@351915-www1 lib]$ grep XML_StopParser libexpat*
Binary file libexpat.so matches
Binary file libexpat.so.0 matches
Binary file libexpat.so.0.5.0 matches
[marchywka@351915-www1 lib]$ 



libexpat.so.0.5.0:
    linux-vdso.so.1 =>  (0x2b4a1fdd2000)
    libc.so.6 => /lib64/libc.so.6 (0x0035aca0)
    /lib64/ld-linux-x86-64.so.2 (0x0035ac60)
libexpat.soxxx.0.5.0:
    linux-vdso.so.1 =>  (0x7fff8bdfc000)
    libc.so.6 => /lib64/libc.so.6 (0x2b143b19e000)
    /lib64/ld-linux-x86-64.so.2 (0x0035ac60)







> To: r-help@r-project.org
> Subject: Re: [R] problem loading rgdal with Rapache, problem solved due to 
> libexpat with apache.
>
> You should provide at least the output of sessionInfo() and the messages
> issued by rgdal on load. rgdal has recently been updated to try to address
> issues of interference between applications, and it would help very much to
> know your intentions, platform, and the versions of the software you are
> using.
>
> Roger
>
>
> Mike Marchywka wrote:
> >
> > Has anyone had problems with Rapache that don't show up on command line
> > execution of R? I just ran into this loading rgdal in Rapache page and
> > having
> > a problem with loading shared object. The final complaint was that
> > Stop_XMLParser
> > was undefined- this was surprising since grep -l showed it in expat lib
> > and it worked
> > fine from command line. Finally, I noted the search path had apache/lib
> > ahead
> > of the other places where expat exists.  The libexpat there although
> > apparently having
> > same versio, so.0.5.0 did not grep for this symbol. Anyway, copying one
> > into
> > the other fixed the problem and the page works fine but curious is anyone
> > has thoughts on what could have caused this. Sorry I don't have
> > specific output but i thought you may remember if you ran into
> > it and it is not worth trying to replicate. Thanks.
> >
> >
> >
> >
> > __
> > R-help@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
>
> -
> Roger Bivand
> Department of Economics
> NHH Norwegian School of Economics
> Helleveien 30
> N-5045 Bergen, Norway
>
> --
> View this message in context: 
> http://r.789695.n4.nabble.com/problem-loading-rgdal-with-Rapache-problem-solved-due-to-libexpat-with-apache-tp3650119p3652639.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
  
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] problem loading rgdal with Rapache, problem solved due to libexpat with apache.

2011-07-06 Thread Mike Marchywka


Has anyone had problems with Rapache that don't show up on command line
execution of R? I just ran into this loading rgdal in Rapache page and having
a problem with loading shared object. The final complaint was that 
Stop_XMLParser
was undefined- this was surprising since grep -l showed it in expat lib and it 
worked
fine from command line. Finally, I noted the search path had apache/lib ahead
of the other places where expat exists.  The libexpat there although apparently 
having
same versio, so.0.5.0 did not grep for this symbol. Anyway, copying one into
the other fixed the problem and the page works fine but curious is anyone
has thoughts on what could have caused this. Sorry I don't have
specific output but i thought you may remember if you ran into
it and it is not worth trying to replicate. Thanks.



  
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] superimposing network graphs

2011-07-06 Thread Mike Marchywka




> Date: Wed, 6 Jul 2011 13:10:14 +0200
> To: r-help@r-project.org
> CC: but...@uci.edu
> Subject: [R] superimposing network graphs
>
> Dear all,
>
> I have a undirected network (g), representing all the sexual relationships 
> that ever existed in a model community.
> I also have a directed edgelist (e) which is a subset of the edgelist of g. e 
> represents the transmission pathway of HIV.
> Now I would like to superimpose the picture of the sexual relationships with 
> arrows in a different colour, to indicate where in the network HIV was 
> transmitted.


If you can't find an R answer, I've found that "dot" from graphviz is really 
helpful. It should not be too hard to generate
the source code from your raw data file or R but others may have a more R 
answer. Package sna may be helpful for
an R-only approach. 




>
> Any ideas on how to do this?
> Many thanks,
>
> Wim
>
>
>
> Wim Delva MD, PhD
> International Centre for Reproductive Health
> Ghent University, Belgium
> www.icrh.org
>
> South African Centre for Epidemiological Modelling and Analysis
> Stellenbosch University, South Africa
> www.sacema.com
> epi update: www. sacemaquarterly.com
>
> Tel: +27 21 808 27 79 (work)
> Cell: +27 72 842 82 33
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
  
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Protecting R code

2011-07-04 Thread Mike Marchywka
Put it on rapache or otherwise server but this seems like a waste depending on 
what you are doing 

Server side is only good way but making c++ may be interesting test
Sent from my Verizon Wireless BlackBerry

-Original Message-
From: Vaishali Sadaphal 
Date: Mon, 4 Jul 2011 16:48:13 
To: 
Cc: ; 
Subject: Re: [R] Protecting R code

Hey All,

Thank you so much for quick replies.
Looks like translation to C/C++ is the only robust option. Do you think 
there exists any ready-made R to C translator?

Thanks
--
Vaishali

Vaishali Paithankar Sadaphal
Tata Consultancy Services
Mailto: vaishali.sadap...@tcs.com
Website: http://www.tcs.com

Experience certainty.   IT Services
    Business Solutions
    Outsourcing




From:
Spencer Graves 
To:
Barry Rowlingson 
Cc:
Vaishali Sadaphal , r-help@r-project.org
Date:
07/04/2011 08:42 PM
Subject:
Re: [R] Protecting R code



Hello:


On 7/4/2011 7:41 AM, Barry Rowlingson wrote:
> On Mon, Jul 4, 2011 at 8:47 AM, Vaishali Sadaphal
>   wrote:
>> Hi All,
>>
>> I need to give my R code to my client to use. I would like to protect 
the
>> logic/algorithms that have been coded in R. This means that I would not
>> like anyone to be able to read the code.
>    At some point the R code has to be run. Which means it has to be
> read by an interpreter that can handle R code. Which means, unless you
> rewrite the interpreter, the R code must exist as such.
>
>   Even if you could compile R into C code into machine code and
> distribute a .exe file, its still possible in theory to
> reverse-engineer it and get something like the original back - the
> original logic if not the original names of the variables and
> functions.
>
>   You could rewrite the interpreter to only run encrypted, signed code
> that requires a decryption key, but you still have to give the user
> the decryption key at some point in order to get the plaintext code.
> Again, its an obfuscation problem of hiding the key somewhere, and
> hence is going to fail.
>
>   It all depends on how much expense you want to go to in order to make
> the expense of circumventing your solution more than its worth. Tell
> me how much that is, and I will tell you the solution.
>
>   For total security[1], you need to run the code on servers YOU
> control, and only give access via a network API. You can do this with
> RServe or any of the HTTP-based systems like Rapache.

   An organization I know that encrypted R code started with making 
it available only on their servers.  This was maybe four years ago.  I'm 
not sure what they do now, but I think they have since lost their major 
proponents of R internally and have probably translated all the code 
they wanted to sell into a compiled language in a way that didn't 
require R at all.


   Spencer

>
> Barry
>
> [1] Except of course servers can be hacked or socially-engineered
> into. For total security, disconnect your machine from the network and
> from any power supply.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



=-=-=
Notice: The information contained in this e-mail
message and/or attachments to it may contain 
confidential or privileged information. If you are 
not the intended recipient, any dissemination, use, 
review, distribution, printing or copying of the 
information contained in this e-mail message 
and/or attachments to it are strictly prohibited. If 
you have received this communication in error, 
please notify us by reply e-mail or telephone and 
immediately and permanently delete the message 
and any attachments. Thank you



    [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] wavelets

2011-07-04 Thread Mike Marchywka

> From: jdnew...@dcn.davis.ca.us
> Date: Mon, 4 Jul 2011 00:45:41 -0700
> To: tyagi...@gmail.com; r-help@r-project.org
> Subject: Re: [R] wavelets
>
> Study the topic more carefully, I suppose. My understanding is that wavelets 
> do not in themselves compress anything, but because they sort out the 
> interesting data from the uninteresting data, it can be easy to toss the 
> uninteresting data (lossy data compression). Perhaps you should understand 
> better what your Matlab library is doing.
> ---
> Jeff Newmiller The . . Go Live...
> DCN: Basics: ##.#. ##.#. Live Go...
> Live: OO#.. Dead: OO#.. Playing
> Research Engineer (Solar/Batteries O.O#. #.O#. with
> /Software/Embedded Controllers) .OO#. .OO#. rocks...1k
> ---
> Sent from my phone. Please excuse my brevity.
>
> user123  wrote:
>
> I'm new to the topic of wavelets. When I tried to use the mra function in the
> wavelets package, the data is not getting compressed. eg. if the original
> data has 500 values , the output data also has the same.
> However in MATLAB, depending on the level of decompositon, the data gets
> compressed.
> How do I implement this in R?



can you post some code? You can always compress into one value of course by 
turning
bytes into a single char string, what you want is entropy. I posted some
example code before and I remember it took effort to not get the subsampling.
mra is probably multi-resolution analysis and I'd suppose you want all the 
samples.
You probably need paper and pencil however at this point. 
 




>
> --
> View this message in context: 
> http://r.789695.n4.nabble.com/wavelets-tp3642973p3642973.html
> Sent from the R help mailing list archive at Nabble.com.
>
> _
>
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
  
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to fit ARMA model

2011-07-01 Thread Mike Marchywka




> From: bbol...@gmail.com
> Date: Fri, 1 Jul 2011 13:16:45 +
> Subject: Re: [R] How to fit ARMA model
> 
> UnitRoot  gmail.com> writes:
> 
> > 
> > Hello,
> > I am having some problems with fitting an ARMA model to my time series data
> > (randomly generated numbers). The thing is I have tried many packages
> > [tseries, fseries, FitARMA etc.] and all of them giving very different
> > results. I would appreciate if someone could post here what the best package
> > is for my purpose. 
> > Also, after having done the fitting, I would like to check for the model's
> > adequacy. How can I do this?
> > Thanks.
> 
>   It's hard to say without more detail -- we don't know what your
> purpose is (beyond the general one of fitting an ARMA model to data).
> I would say that when in doubt, if there is functionality in 'core' R --
> the base and recommended packages -- that it is likely to be the most
> stable and well tested.  (Not always true -- there are some very good
> contributed packages, and sometimes the functions in R are missing
> some advanced features -- but a good rule of thumb.)  So try ?arima.
> 
>  Another rule of thumb is that Venables and Ripley 2002 is a good
> starting point (although again not necessarily as extensive as specialized
> topics) for "how do I do xxx in R"?  See Chapter 14.
> 

I had to check my memory but IIRC wiki or other non-authoritative source
suggested that this is not much different than trying to design or analyze
an IIR (infinte impulse response) filter that is presumed excited with white 
noise. In such a case, the
power spectrum could be interpretted in terms of zero and pole ( zeroes
of denomintator) of the transfer function giving you some graphical info
to help develop intuition. 


It is not hard to make your own tests and sanity checks if you are just getting
started. 
( hand typed as my 'dohs machine is busy running malware  scans and this one R 
install
has a few issues ) 
x<-runif(8192)
x1<-c(x[-1],x[1])
x2<-c(x1[-1],x1[1])

y=x+x1+x2
plot(log(abs(fft(y

makes bode plot ( if I have the name right) with zeroes at the root. Extension 
to
IIR case should be obvious. 







>   As suggested in the help for ?arima, ?tsdiag is "a generic function
> to plot time series diagnostics"
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Running R from windows command prompt

2011-06-28 Thread Mike Marchywka




> To: lig...@statistik.tu-dortmund.de
> CC: r-help@r-project.org
> Subject: Re: [R] Running R from windows command prompt
> 
> Thanks for your help.
> 
> I tried the way you mentioned for my first question. But I am not getting
> any results.
> Can you please explain in detail the process through which I can run a R
> code from windows command prompt.

While your problem likely has nothing to do with windoh's, I would suggest
you go get cygwin ( see google) and use that. I and probably others
have lots of scripts that work there and on linux. Bash scripts are
a better way to proceed if you want to do more than a test case and
they integrate with lots of other existing things. 


> 
> 2011/6/28 Uwe Ligges 
> 
> >
> >
> > On 28.06.2011 11:54, siddharth arun wrote:
> >
> >> 1. I have a R program in a file say "functions.R".
> >> I load the "functions.R" file the R using source("function.R") and then
> >> call
> >> functionsf1(), f2() etc. which are declared and defined within
> >> "function.R"
> >>  file.
> >> I also need to load a couple of R libraries using library() before I can
> >> use f1(), f2() etc.
> >>
> >> My question is can I acheive all this (i.e. calling function f1() and
> >> f2())
> >> from the windows prompt without opening the R environment ? If yes then
> >> how?
> >>
> >
> >
> > Put all the code into a file, e.g. "foo.R", and run
> >
> > R CMD BATCH foo.R from the windows command shell.
> >
> >
> >
> >  2. Also, Is there any way to scan strings directly. Like scan() function
> >> only scans numerical values. Is there any way to scan strings?
> >>
> >
> > Yes, it is called scan() which is for arbitrary data, including
> > character(). See ?scan. Otherwise, you may want to look into ?readLines
> >
> > Uwe Ligges
> >
> >
> 
> 
> -- 
> Siddharth Arun,
> 4th Year Undergraduate student
> Industrial Engineering and Management,
> IIT Kharagpur
> 
>   [[alternative HTML version deleted]]
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Need help on a R script part

2011-06-25 Thread Mike Miller
I'm curious about what would cause this (see below), if it isn't a joke. 
Is it possible that it didn't look ridiculous in the deleted HTML but the 
text looked bad?  It's almost unreadable.  I guess the HTML gets deleted 
because it is a waste of space, but I received a 14 MB message from this 
list the other day.


Mike


On Fri, 24 Jun 2011, Lusk Aris wrote:


Hi all,
I need all your help on this. I have the next part of 
code:e1=x1-mean(x1)e2=x2-mean(x2)n1=length(x1)n2=length(x2N=(n1 + 
n2)nu2=sum( c(  ( x1 -mean(x1) )^2 , ( x2-mean(x2) )^2  ) )/Nss=c(e1,e2) b3= 
N*sum(ss^4)/ (sum( ss^2)^2)what do lines 6-8 (mathematical 
notation)?Also the, what means part?for(j in 1:B){ ss11=sample(x1, 
n1, replace=TRUE) ss12=sample(x2, n2, replace=TRUE) 
}I would appreciate you could help on my inquiry, 
and I am awaiting for your soon answer Thx in advance, Lusk
[[alternative HTML version deleted]]


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Access R functions from web

2011-06-25 Thread Mike Marchywka




> From: orvaq...@gmail.com
> To: r-help@r-project.org
> Subject: [R] Access R functions from web
> 
> I need a way to send R objects and call R functions from web. Is there any
> project close or similar to that?
> 
> I want to be able to send an HTTP rquest from an existing application with
> some data. And obtain a plot from R.

This should be a faq but it can take a while to find, see Rserve and Rapache.
I have been using Rapache now on red hat and debian  and it works nicely.
Also, the goog visualization API works in some limited testing but apparently
a lot of that requires flash ( which I either did not have or had turned off 
LOL).




> 
> Thanks in advance
> Caveman
> 
>   [[alternative HTML version deleted]]
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Time-series analysis with treatment effects - statistical approach

2011-06-25 Thread Mike Marchywka




> Date: Thu, 23 Jun 2011 15:41:25 -0700
> From: jmo...@student.canterbury.ac.nz
> To: r-help@r-project.org
> Subject: Re: [R] Time-series analysis with treatment effects - statistical 
> approach
> 
> 
> Mike Marchywka wrote:
> > 
> >> I discovered a way to do repetitive tasks that can be concisely specified
> >> using
> >> something called a computer.  
> 
> Now that's funny :)

well, there is a point to that and that is that with cheap computations you can
do different analyses than you did in the past. 


> 
> There were not controlled tests. It was a field experiment testing the
> effects that various pavement designs have on underlying soil moisture. Two
> designs incorporated a porous pavement surface course, while two others were
> based on standard impervious concrete pavement...the control was just bare,
> exposed soil. 
> 
> As you can see from the graph, the control responds quickly to rainfall
> events, but dries out quickly as well due to evaporation. The porous
> pavement allows for quick infiltration of precipitation, while the
> impervious pavement eventually allows infiltration of rainfall, but it's
> delayed. 
> 
> My objective is to be able to differentiate between the pavement treatments,
> such that I can state with statistical confidence that porous pavements
> affects underlying soil moisture differently than impervious pavements. 
> 
> I think this is obvious just looking at it, but I wanted to be able to back
> it up with stats. What I'd done previously is to average by week. But as  I
> mentioned, I thought that an anova table with 104 rows relating to each week
> was a poor way of analyzing the data. But that being said, it effectively
> allows me to check for treatment-related differences. 

I don't think we've mentioned R in the past few posts but I guess pointing
people to useful things that R can do is not too big a problem and
if you have ever dealt with "analysis for the sake of rationalization"
you can appreciation that is a huge problem :)  Generally you'd
like to have reproducible results and if you don't have IID ( stationary
parameters of the population you wish to characterize) you
are not even asking a good question about the system.  It may
be helpful as quick check of something but otherwise difficult to
interpret- do your results mean these things are "different" in the
desert? You appear to have data points from a bunch of different situations.
After the fact selection is often helpful but generally stats people frown
on that as "backing up" anything ( unless it supports sponsor's opinion LOL). 

http://www.itl.nist.gov/div898/handbook/prc/section4/prc432.htm


I guess I'd either go with dynamic model or convert into dollars and then
see if you have clinically and statistically significant differences in things 
of relevance.



> 
> Thanks for the suggestions to date. Maybe the more I explain what I'm trying
> to achieve, the more focussed the suggestions will be. The vaguer the
> question, the broader the response, right?
> 
> Thanks again,
> Justin
> 
> --
> View this message in context: 
> http://r.789695.n4.nabble.com/Time-series-analysis-with-treatment-effects-statistical-approach-tp3615856p3621179.html
> Sent from the R help mailing list archive at Nabble.com.
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] gaoshan

2011-06-24 Thread Mike Miller

On Fri, 24 Jun 2011, wang peter wrote:


aa file is:
x  1 NA  2
y  1 NA  3

and r program is

aa<-read.table("aa",row.names=1)
bb<-cor(t(aa),method = "pearson",use="pairwise.complete.obs")
bb

 x y
x 1 1
y 1 1

i am confused why the pearson correlation coefficients between x and y is 1



You have two paired observations.  Plot the points and draw the regression 
line.  Every point is on the line (there are only two points).  The 
correlation has to be 1.


Mike__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R-help

2011-06-24 Thread Mike Marchywka


I'm surprised the pdf made it through mail list but many will not open
these and I have just cleaned malware off my 'dohs system. Can you explain
what you want to do in simple text?



Subject: [R] R-help

Hi
 
Please assist me to code the attached pdf in R.
 
Your help will be greatly appreciated.
 
Edward
Actuarial Science Student

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.  
  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Is R a solution to my problem?

2011-06-24 Thread Mike Marchywka




> From: jdnew...@dcn.davis.ca.us
> Date: Fri, 24 Jun 2011 02:13:29 -0700
> To: ikoro...@gmail.com; R-help@r-project.org
> Subject: Re: [R] Is R a solution to my problem?
> 
> Almost any tool can be bent to perform almost any task, but that combination 
> sounds like a particularly difficult way to get started. There are a variety 
> of purpose-built software development environments for serving 
> dynamically-constructed web pages, including C++ libraries.


I guess Rapache and Rserve would be worth looking at if you think it is a match 
with R.



> 
> However, discussions of such topics do not belong on this mailing list. Give 
> Google a chance.
> ---
> Jeff Newmiller The . . Go Live...
> DCN: Basics: ##.#. ##.#. Live Go...
> Live: OO#.. Dead: OO#.. Playing
> Research Engineer (Solar/Batteries O.O#. #.O#. with
> /Software/Embedded Controllers) .OO#. .OO#. rocks...1k
> --- 
> Sent from my phone. Please excuse my brevity.
> 
> Igor Korot  wrote:
> 
> Hi, ALL,
> My name is Igor and I am a Software Developer.
> Recently I wrote a program for a company using C++.
> 
> The problem I am facing right now is that this program (demo) has to
> run using web interface. I.e. the potential user goes to the website,
> click on the link and the program executes somewhere on the server.
> 
> Would that be possible with the R extension / add-on?
> 
> Thank you.
> 
> _
> 
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
> 
>   [[alternative HTML version deleted]]
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Time-series analysis with treatment effects - statistical approach

2011-06-23 Thread Mike Marchywka









> From: rvarad...@jhmi.edu
> To: marchy...@hotmail.com; jmo...@student.canterbury.ac.nz; 
> r-help@r-project.org
> Subject: RE: [R] Time-series analysis with treatment effects - statistical 
> approach
> Date: Thu, 23 Jun 2011 02:59:19 +
> 
> If you have any specific features of the time series of soil moisture, you 
> could either model that or directly estimate it and test for differences in 
> the 4 treatments.  If you do not have any such specific considerations, you 
> might want to consider some nonparametric approaches such as functional data 
> analysis, in particular  functional principal components analysis (fPCA) 
> might be relevant.  You could also consider semiparametric methods. For 
> example, take a look at the "SemiPar" package.  
> 
> Ravi.

I guess just playing with it while waiting for other code to finish, I'd be 
curious if you had
any controlled tests such as impulse response- what did treatment do when you 
held
at constant temp and humidity and illumination in stll air after single burst 
of rain? 
If you were pursing the model approach, quick look suggests qualitatitve rather 
than
just quantitative effects - in one case looks like linear or biphasic dry out 
dynamics, others
seem to just fall off of cliff. 

Objective of course matters too, if you are trying to sell this to farmers, 
maybe a plot of
moisture for each treatment against control would help. I just did that after 
averaging over sensors
and it may be a reasonable analysis for cost effectiveness if you can translate 
moisture into
dollars. Now you would still need to put error bars on comparisons and use 
words carefully etc
but that approach may be more important than getting at dynamics. I dunno.
Consider that in fact maybe all you care about is peaks, if too dry for one day
kills the crop then that is what you want to focus the analysis on etc etc etc.






> 
> From: r-help-boun...@r-project.org [r-help-boun...@r-project.org] on behalf 
> of Mike Marchywka [marchy...@hotmail.com]
> Sent: Wednesday, June 22, 2011 9:31 PM
> To: jmo...@student.canterbury.ac.nz; r-help@r-project.org
> Subject: Re: [R] Time-series analysis with treatment effects - statistical 
> approach
> 
> > Date: Wed, 22 Jun 2011 17:21:52 -0700
> > From: jmo...@student.canterbury.ac.nz
> > To: r-help@r-project.org
> > Subject: Re: [R] Time-series analysis with treatment effects - statistical 
> > approach
> >
> > Hi Mike, here's a sample of my data so that you get an idea what I'm working
> > with.
> 
> Thanks, data helps make statements easier to test :)  I'm quite
> busy at moment but I will try to look during dead time.
> 
> >
> > http://r.789695.n4.nabble.com/file/n3618615/SampleDataSet.txt
> > SampleDataSet.txt
> >
> > Also, I've uploaded an image showing a sample graph of daily soil moisture
> > by treatment. The legend shows IP, IP+, PP, PP+ which are the 4 treatments.
> > Also, I've included precipitation to show the soil moisture response to
> > precip.
> 
> Personally I'd try to write a simple physical model or two and see which 
> one(s)
> fit best. It shouldn't be too hard to find sources and sinks of water and 
> write
> a differential equation with a few parameters.  There are probably online
> lecture notes that cover this or related examples. You probably suspect a
> mode of action for the treatments, see if that is consistent with observed 
> dyanmics.
> You may need to go get temperature and cloud data but it may or may not
> be worth it.
> 
> >
> > http://r.789695.n4.nabble.com/file/n3618615/MeanWaterPrecipColour2ndSeasonOnly.jpeg
> >
> > I have used ANOVA previously, but I don't like it for 2 reasons. The first
> > is that I have to average away all of the interesting variation. But mainly,
> 
> There are  a number of assumptions that go into that to make it useful. If
> you are just drawing samples from populations of identical independent things
> great but here I would look at things related to non-stationary statistics of
> time series.
> 
> > it becomes quite cumbersome to do a separate ANOVA for each day (700+ days)
> > or even each week (104 weeks).
> 
> I discovered a way to do repetitive tasks that can be concisely specified 
> using
> something called a computer.  Writing loops is pretty easy, don't give up
> due to cumbersomeness. Also, you could try a few simple things like plotting
> difference charts ( plot treatment minus control for example).
> 
> If you approach this purely empirically, there are time series packages
> and maybe the econ/quant financial analysts would h

Re: [R] Hardy Weinberg

2011-06-22 Thread Mike Miller

On Wed, 22 Jun 2011, Jim Silverton wrote:

I am generating 1,000 replicates of 10,000 of these 2 x 3 tables but R 
cannot seem to save it. Its over 1 Gig. Any ideas on how I can store 
this large amount of data? Should I use a list or a matrix?


Is English your first language?  If so, you can probably understand our 
questions and answer them.


You might consider storing a 2x3 table as one row of length 6 in a matrix 
with 10,000 rows.  Store that as one file -- it's only 60,000 elements, 
all of them integers.  Store 1,000 files.


You aren't being clear about what you are doing.

Mike

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Hardy Weinberg

2011-06-22 Thread Mike Miller

On Thu, 23 Jun 2011, David Duffy wrote:

I am interested in simulating 10,000 2 x 3 tables for SNPs data with 
the Hardy Weinberg formulation. Is there a quick way to do this? I am 
assuming that the minor allelle frequency is uniform in (0.05, 0.25).


rmultinom() with HWE expectations



I'm also unsure of what the 2x3 table is (what are the rows?), and David 
gave you a good answer for how to make tabular data, but I just thought 
I'd add some info for those who may be interested...


If you want to make genotype data in the form of minor allele counts under 
HW, you can use rbinom() with size=2 and prob=MAF.


Suppose variable "n" is the number of subjects for whom data is to be 
generated, "maf" is a vector of minor allele frequencies, and "m" is the 
length of maf (i.e., the number of markers to be generated.  Then this 
will produce a data matrix (subjects by markers):



maf <- c( .1, .2, .5 )
n <- 7
m <- length( maf )
t( matrix( rbinom( n*m, 2, maf ), m, n ) )

 [,1] [,2] [,3]
[1,]002
[2,]011
[3,]102
[4,]021
[5,]111
[6,]012
[7,]001

Of course, when data are produced that way, the markers are in linkage 
equilibrium (an unfortunate term meaning that the genotypes for pairs of 
markers are not associated).


Mike

--
Michael B. Miller, Ph.D.
Minnesota Center for Twin and Family Research
Department of Psychology
University of Minnesota

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Time-series analysis with treatment effects - statistical approach

2011-06-22 Thread Mike Marchywka




> Date: Wed, 22 Jun 2011 17:21:52 -0700
> From: jmo...@student.canterbury.ac.nz
> To: r-help@r-project.org
> Subject: Re: [R] Time-series analysis with treatment effects - statistical 
> approach
> 
> Hi Mike, here's a sample of my data so that you get an idea what I'm working
> with. 

Thanks, data helps make statements easier to test :)  I'm quite
busy at moment but I will try to look during dead time.

> 
> http://r.789695.n4.nabble.com/file/n3618615/SampleDataSet.txt
> SampleDataSet.txt 
> 
> Also, I've uploaded an image showing a sample graph of daily soil moisture
> by treatment. The legend shows IP, IP+, PP, PP+ which are the 4 treatments.
> Also, I've included precipitation to show the soil moisture response to
> precip.

Personally I'd try to write a simple physical model or two and see which one(s)
fit best. It shouldn't be too hard to find sources and sinks of water and write
a differential equation with a few parameters.  There are probably online
lecture notes that cover this or related examples. You probably suspect a
mode of action for the treatments, see if that is consistent with observed 
dyanmics.
You may need to go get temperature and cloud data but it may or may not
be worth it. 

> 
> http://r.789695.n4.nabble.com/file/n3618615/MeanWaterPrecipColour2ndSeasonOnly.jpeg
>  
> 
> I have used ANOVA previously, but I don't like it for 2 reasons. The first
> is that I have to average away all of the interesting variation. But mainly,

There are  a number of assumptions that go into that to make it useful. If
you are just drawing samples from populations of identical independent things
great but here I would look at things related to non-stationary statistics of
time series. 

> it becomes quite cumbersome to do a separate ANOVA for each day (700+ days)
> or even each week (104 weeks). 

I discovered a way to do repetitive tasks that can be concisely specified using
something called a computer.  Writing loops is pretty easy, don't give up
due to cumbersomeness. Also, you could try a few simple things like plotting
difference charts ( plot treatment minus control for example). 

If you approach this purely empirically, there are time series packages 
and maybe the econ/quant financial analysts would have some thoughts
that wouldn't be well known in your field. 


> 
> Thanks for your help,
> -Justin
> 
> --
> View this message in context: 
> http://r.789695.n4.nabble.com/Time-series-analysis-with-treatment-effects-statistical-approach-tp3615856p3618615.html
> Sent from the R help mailing list archive at Nabble.com.
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Re; Getting SNPS from PLINK to R

2011-06-22 Thread Mike Miller

Resending to correct bad subject line...


On Mon, 20 Jun 2011, Jim Silverton wrote:

I a using plink on a large SNP dataset with a .map and .ped file. I want 
to get some sort of file say a list of all the SNPs that plink is saying 
that I have. ANyideas on how to do this?



All the SNPs you have are listed in the .map file.  An easy way to put the 
data in to R, if there isn't too much, is to do this:


plink --file whatever --out whatever --recodeA

That will make a file called whatever.raw, single space delimited, 
consisting of minor allele counts (0, 1, 2, NA) that you can bring into R 
like this:


data <- read.table("whatever.raw", delim=" ", header=T)

If you have tons of data, you'll want to work with the compact binary 
format (four genotypes per byte):


plink --file whatever --out whatever --make-bed

Then see David Duffy's reply.  However, I'm not sure if R can work with 
the compact format in memory.  It might expand those genotypes (minor 
allele counts) from two-bit integers to double-precision floats.  What 
does read.plink() create in memory?


There is another package I've been meaning to look at that is supposed to 
help with the memory management problem for large genotype files:


http://cran.r-project.org/web/packages/ff/

I haven't used it yet, but I am hopeful.  Maybe David Duffy or someone 
else here will know more about it.


If you have a lot of data, also consider chopping the data into pieces 
before loading it into R.  That's what we do.  With a 100 core system, I 
break the data into 100 files (I use the GNU/Linux "split" command and a 
few other tricks) and have all 100 cores run at once to analyze the data.


When I work with genotype data as allele counts using Octave, I store the 
data, both in files and in memory, as unsigned 8-bit integers, using 3 as 
the missing value.  That's still inefficient compared to the PLINK system, 
but it is way better than using doubles.


Best,
Mike

--
Michael B. Miller, Ph.D.
Minnesota Center for Twin and Family Research
Department of Psychology
University of Minnesota

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] several messages

2011-06-22 Thread Mike Miller

On Mon, 20 Jun 2011, Jim Silverton wrote:

I a using plink on a large SNP dataset with a .map and .ped file. I want 
to get some sort of file say a list of all the SNPs that plink is saying 
that I have. ANyideas on how to do this?



All the SNPs you have are listed in the .map file.  An easy way to put the 
data in to R, if there isn't too much, is to do this:


plink --file whatever --out whatever --recodeA

That will make a file called whatever.raw, single space delimited, 
consisting of minor allele counts (0, 1, 2, NA) that you can bring into R 
like this:


data <- read.table("whatever.raw", delim=" ", header=T)

If you have tons of data, you'll want to work with the compact binary 
format (four genotypes per byte):


plink --file whatever --out whatever --make-bed

Then see David Duffy's reply.  However, I'm not sure if R can work with 
the compact format in memory.  It might expand those genotypes (minor 
allele counts) from two-bit integers to double-precision floats.  What 
does read.plink() create in memory?


There is another package I've been meaning to look at that is supposed to 
help with the memory management problem for large genotype files:


http://cran.r-project.org/web/packages/ff/

I haven't used it yet, but I am hopeful.  Maybe David Duffy or someone 
else here will know more about it.


If you have a lot of data, also consider chopping the data into pieces 
before loading it into R.  That's what we do.  With a 100 core system, I 
break the data into 100 files (I use the GNU/Linux "split" command and a 
few other tricks) and have all 100 cores run at once to analyze the data.


When I work with genotype data as allele counts using Octave, I store the 
data, both in files and in memory, as unsigned 8-bit integers, using 3 as 
the missing value.  That's still inefficient compared to the PLINK system, 
but it is way better than using doubles.


Best,
Mike

--
Michael B. Miller, Ph.D.
Minnesota Center for Twin and Family Research
Department of Psychology
University of Minnesota

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Time-series analysis with treatment effects - statistical approach

2011-06-22 Thread Mike Marchywka


> Subject: [R] Time-series analysis with treatment effects - statistical
> approach
> 
> Hello all R listers,
> I'm struggling to select an appropriate statistical method for my data set.
> I have collected soil moisture measurements every hour for 2 years. There
> are 75 sensors taking these automated measurements, spread evenly across 4
> treatments and a control. I'm not interested in being able to predict soil
> future soil moisture trends, but rather in knowing whether the treatment
> affected soil moisture response overall.  In particular, it would be
> interesting to inspect treatment related response within defined periods.
> For example, a visual inspection of my data suggests that soil moisture is
> equivalent across treatments during wet winter months, but during the dry
> summer months, a treatment effect appears. 
> 
> Any help on this topic would be very appreciated. I've looked far and wide
> through academic literature for similar experimental designs, but have not
> had any success as yet.


Can you post your data? This makes it more interesting for potential responders
and makes it more likely you get a relevant answer or observation. 

I guess the stats people would just suggest something like anova and
you could just compare moisture means among the treatments and controls
but presumably things are not stationary and it may be easy to get confusing
results. 

http://en.wikipedia.org/wiki/Analysis_of_variance



Usually there is some analysis plan made up a priori I would imagine or at least
there is prior literature in your field. I guess citeseer or google 
scholar/books may
be useful if you have some key words or author names. The data you have
sounds to an engineer just like any other discrete time signal and maybe digital
signal processing literature would be of interest. Presumably you have some 
models to test, and more details depend on the specifics of your competing
hypotheses. Kalman filtering maybe? 



> 
> Cheers,
> Justin
> 
> Dr. Justin Morgenroth
> New Zealand School of Forestry
> Christchurch, New Zealand
> 
> 
> 
> 
> --
> View this message in context: 
> http://r.789695.n4.nabble.com/Time-series-analysis-with-treatment-effects-statistical-approach-tp3615856p3615856.html
> Sent from the R help mailing list archive at Nabble.com.
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] different results from nls in 2.10.1 and 2.11.1

2011-06-20 Thread Mike Marchywka

> Date: Mon, 20 Jun 2011 11:25:54 +0200
> From: lig...@statistik.tu-dortmund.de
> To: p.e.b...@dunelm.org.uk
> CC: r-help@r-project.org; pb...@astro.uni-bonn.de
> Subject: Re: [R] different results from nls in 2.10.1 and 2.11.1
> 
> Since one is a 32-bit and the other one a 64-bit, and therefore the 
> compiler is also different, you can get different numerical results 
> easily, even with identical versions of R (and most of us do not have 
> outdated R installations around).
> 
> I just tried your example on 3 different systems with R-2.13.0 and all 
> told me "singular convergence"...
> 
> Uwe Ligges
> 
> 
> 
> 
> 
> On 18.06.2011 15:44, Philip Bett wrote:
> > Hi,
> > I've noticed I get different results fitting a function to some data on
> > my laptop to when I do it on my computer at work.

I guess the best all around approach is to dump the results from your
FisherAvgPdf function and get some idea what trajectory it takes
in the different cases. This is presumably not just an issue with R
and could have something to tell you about your data vs the model.
Even if it converged without incident, you'd probably want to 
look at more details.  You apparently are trying to fit a histogram to
a pdf and it is not too hard to just plot fit over histo and
spot check parameter space. 
 
Consider for example, 
 
plot(h$mids,h$density,type="b")
points(h$mids,FisherAvgdPdf(acos(h$mids),3.527e-8,3.198),pch=20)
points(h$mids,FisherAvgdPdf(acos(h$mids),3.527e-8,3.298),pch=13)
points(h$mids,FisherAvgdPdf(acos(h$mids),4.527e-8,3.198),pch=14)

and the find various measures for how good/bad these are etc. 
 
e<-h$density-FisherAvgdPdf(acos(h$mids),3.527e-8,3.198)
sum(e*e)
e<-h$density-FisherAvgdPdf(acos(h$mids),3.527e-8,3.298)
sum(e*e)

the error surface as function of parameters seems smooth etc,
 
foo <- function (x,y) { e<-h$density-FisherAvgdPdf(acos(h$mids),x,y) ; sum(e*e) 
}

x=(2+(1:100)*.02)/1e8;
y=3+((1:100)/100);


df<-data.frame(x=0,y=0,z=0);
for ( i in x ) { for ( j in y ) { gg<-data.frame(i,j,foo(i,j));str(gg); 
colnames(gg)<-colnames(df); df<-rbind(df,gg); }}
sel=which(df$x!=0)
scatterplot3d(df$x[sel],df$y[sel],df$z[sel])

 
 
 
> >
> > Here's a code snippet of what I do:
> >
> > ##--
> > require(circular) ## for Bessel function I.0
> >
> > ## Data:
> > dd <- c(0.9975948929787, 0.9093316197395, 0.7838819026947,
> > 0.9096108675003, 0.8901804089546, 0.2995955049992, 0.9461286067963,
> > 0.8248071670532, 0.2442084848881, 0.2836948633194, 0.7353935241699,
> > 0.5812761187553, 0.8705610632896, 0.8744471669197, 0.7490273118019,
> > 0.9947383403778, 0.9154829382896, 0.8659985661507, 0.6448246836662,
> > 0.8588128685951, 0.7347437739372, -0.1645197421312, 0.970999121666,
> > 0.8038327097893, 0.9558997154236, 0.6846113204956, 0.6286814808846,
> > 0.9201356172562, 0.9422197341919, 0.3470877110958, 0.4154576957226,
> > 0.0721184238791, 0.14151956141, -0.6142936348915, -0.4688512086868,
> > 0.6805665493011, 0.3594025671482, 0.8991097211838, 0.7656877636909,
> > 0.9282909035683, 0.9454715847969, 0.9766132831573, 0.4316343963146,
> > 0.62679708004, 0.2093886137009, 0.3937581181526, 0.4254160523415,
> > 0.8684504628181, 0.3844584524632, 0.9578431844711, 0.956972181797,
> > 0.4456568360329, 0.9793710708618, 0.5825698971748, 0.929228246212,
> > 0.9211971759796, 0.9407976865768, 0.821156680584, 0.2048042863607,
> > 0.6473184227943, 0.9456319212914, 0.7021154165268, 0.9761978387833,
> > 0.1485801786184, 0.2195029109716, 0.5378785729408, 0.8304615020752,
> > 0.8596342802048, 0.950027525425, 0.9102076888084, 0.5108731985092,
> > 0.7200184464455, 0.3571084141731, 0.9765330553055, -0.143017962575,
> > 0.8576183915138, 0.1283493340015, -0.3226098418236, 0.7031792402267,
> > 0.8708582520485, 0.56754809618, 0.060470353812, 0.8015220761299,
> > 0.7363410592079, 0.671902179718, 0.8082517385483, 0.9468197822571,
> > 0.9729647636414, 0.7919752597809, 0.9539568424225, 0.4840737581253,
> > 0.850653231144, 0.5909016132355, 0.8414449691772, 0.9699150323868)
> >
> > xlims <- c(-1,1)
> > bw <- 0.05
> > b <- seq(xlims[1],xlims[2],by=bw) ; nb <- length(b)
> > h <- hist( dd, breaks=b, plot=FALSE)
> >
> > FisherAvgdPdf <- function(theta,theta0,kappa){
> > A <- kappa/(2*sinh(kappa))
> > A * I.0( kappa*sin(theta)*sin(theta0) ) * exp(
> > kappa*cos(theta)*cos(theta0) )
> > }
> >
> >
> > nls(dens ~ FisherAvgdPdf(theta,theta0,kappa),
> > data = data.frame( theta=acos(h$mids), dens=h$density ),
> > start=c( theta0=0.5, kappa=4.0 ),
> > algorithm="port", lower=c(0,0), upper=c(acos(xlims[1]),500),
> > control=list(warnOnly=TRUE) )
> >
> > ##--
> >
> > On one machine, nls converges, and on the other it doesn't. Any ideas
> > why, and which is "right"? I can't see anything in R News that could be
> > relevant.
> >
> >
> > The different R versions (and computers) are:
> > > R.version
> > _
> > platform i686-pc-linux-gnu
> > arch 

Re: [R] Multivariate HPD credible volume -- is it computable in R?

2011-06-19 Thread Mike Marchywka






> Date: Sun, 19 Jun 2011 19:06:20 +1200
> From: agw1...@gmail.com
> To: r-help@r-project.org
> Subject: [R] Multivariate HPD credible volume -- is it computable in R?
> 
> Hi all,
> 
> I'm new to the list and am hoping to get some advice. I have a set of
> multivariate data and would like to find the densest part of the data cloud
> containing 95% of the data, like a 95% HPD credible volume. Is there any R
> code available to compute that?

It looks like the LaplacesDemon pkg was just updated FWIW,

http://www.google.com/search?hl=en&q=hpd+credible+volume+cran

If youjust  want to find the density of your data in some n-dim space, that 
sounds like
multi dim binning or histogram would work (you can check google but IIRC there 
is
no general n-dim binning pacakge.  I also mentioned a version based
on a density field being associated with each point which could be summed over
all points to get density at arbirtary point but you need to implrement that in 
some
intelligent way for it to be fast ). I guess you could bin using aggregate see 
if this
example would work,

df<-data.frame(z=runif(100), x<-runif(100)*10,y<-runif(100)*5,c<-rep(1,100))
d<-aggregate(c~floor(x)+floor(y),df,FUN="sum")
d
which (d$c==max(d$c))

now at this point you can imagine you may be able to find a surface that 
encloses 
a certain amount of your data



> 
> Thank you very much! Your help and patience are much appreciated.
> 
> G.S.
> 
>   [[alternative HTML version deleted]]
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Server question

2011-06-18 Thread Mike Marchywka






 

> From: oliver.jo...@digred.com
> To: r-help@r-project.org
> Date: Fri, 17 Jun 2011 15:35:38 +0100
> CC: rebeccahard...@deltaeconomics.com
> Subject: [R] Server question
> 
> Hi
> 
> 
> 
> A client of mine has asked me to investigate the installation of R-software.
> 
> 
> 
> Could anyone tell me whether the software works only on a client machine or
> whether it sits on a server with clients attaching to it?

 
Did anyone answer this yet? See Rserve and Rapache on google. Both could
be useful to you or get you started.
 
 
 
> 
> 
> 
> Not immediately clear from the docs.
> 
> 
> 
> Best
> 
> 
> 
> Oliver
> 
> --
> 
> Oliver Jones
> 
> T: 01845 595911
> M: 07977 122089
> 
> DIG*RED Web Production |  www.digred.com
> 2 Vyner's Yard
> Rainton
> North Yorkshire
> 
> YO7 3PH
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> [[alternative HTML version deleted]]
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] About 'hazard ratio', urgent~~~

2011-06-14 Thread Mike Marchywka











> Date: Tue, 14 Jun 2011 09:11:25 -0700
> From: dr.jz...@gmail.com
> To: r-help@r-project.org
> Subject: Re: [R] About 'hazard ratio', urgent~~~
>
> Thanks a lot for the great help~


well, if you are referring to my post, as I indicated I made a lot of blunders 
to get that out the
door and posted the first thing that made sense. Note also that by
making the one hazard rate "1" it wasn't clear that the column
I cited was divided by "1" etc. Caveat emptor. I usually
just answer questions like this when I "meant to look at that
anyway" and I have a chance of putting forth an answer which
I and others can verify. In any case, if you are using
some new code it always helps to run known test cases through
it, even if it is just new to you LOL.





>
> --
> View this message in context: 
> http://r.789695.n4.nabble.com/About-hazard-ratio-urgent-tp3595527p3597025.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>   
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Need script to create new waypoint

2011-06-14 Thread Mike Marchywka

 
 
> Now I need to change this data set into one with waypoints at regular 
> intervals: for example 2 
 
I guess your question is about algorithm ideas for making up data due to
unspecified percrived need. Anything you can specify completely should be easy
to code in R, maybe a bash script, or c++ or java, not sure the Excel was 
intended
for arbitrary code. Generally the choice of how you manufacture data 
would be dictated by the percieved need. If you can elaborate on the "need"
maybe someone can help you pick an approach to making things up.
 
There is a sig list for geo problems that may know your requirement better,
 
https://stat.ethz.ch/mailman/listinfo/r-sig-geo
 
and of course if your need is to replicate prior work in the field, you
can contact authors directly for questions about what they did.
 
If your question is about how R can handle time series with pre-packaged
things, there is quite a bit notably "zoo" and "ts" which
I think should come up with "?zoo" and "?ts"
 
http://www.google.com/search?hl=en&safe=off&q=R+ts+regular+time+intervals
 
also try "interpolation" search. 
 
> library(zoo)
> ?zoo
> ?ts
>
 
 


> From: anti...@hotmail.com
> To: r-help@r-project.org
> Date: Tue, 14 Jun 2011 11:55:57 +
> Subject: [R] Need script to create new waypoint
>
>
> Dear help-list members,
>
> I am a student at Durham University (UK) conducting a PhD on spatial 
> representation in baboons. Currently, I'm analysing the effect of sampling 
> interval on home range calculations.
>
> I have followed the baboons for 234 days in the field, each day is 
> represented by about 1000 waypoints (x,y coordinates) recorded at irregular 
> time intervals. Consecutive waypoints in this raw data set have an average 
> distance interval of 5 meters an average time interval of 23 seconds (but 
> when baboons were stationary the time interval could be much larger - e.g. 
> waypoint 7 below).This raw data set needs to become a data set with waypoints 
> at regular intervals and thus, 'new' waypoints have to be 'created'. 
> Eventually, I want to use seven different time intervals: 2, 5, 10, 15, 30, 
> 45 and 60 minute intervals. I have tried in Excel, but I am not managing it. 
> I have some experience with R, and although I can 'read' quite complicated 
> scripts, I am unable to write them, so I would very much appreciate any help 
> anybody would be willing to offer me.
>
>
>
> My current data set has 9 columns (in csv / excel file):
> x coordinate, y coordinate, year, month, day, record, time interval (duration 
> between this waypoint and the previous) (hh:dd:ss), summed time intervals, 
> distance interval (m)
>
>
> EXAMPLE (24th of april 2007)
>
> (wp1) x1, y1, 2007, 7, 24, 1, 00:00:00, 00:00:00, 0
> (wp2) x2, y2, 2007, 7, 24, 2, 00:00:23, 00:00:23, 2
> (wp3) x3, y3, 2007, 7, 24, 3, 00:00:50, 00:00:73, 3
> (wp4) x4, y4, 2007, 7, 24, 4, 00:01:20, 00:02:33, 5
> (wp5) x5, y5, 2007, 7, 24, 5, 00:00:03, 00:02:36, 1
> (wp6) x6, y6, 2007, 7, 24, 6, 00:00:12, 00:02:48, 2
> (wp7) x7, y7, 2007, 7, 24, 7, 00:05:45, 00:08:33, 2
>
> Now I need to change this data set into one with waypoints at regular 
> intervals: for example 2 minutes = 120 seconds >> 2 minutes after the first 
> waypoint (x1,y1) the baboons would be somewhere between WP3 and WP4 (at WP3 
> sum duration is 73 seconds and after WP4 sum duration is 153 seconds), and so 
> this is where I would like a new waypoint created. Note that there are time 
> intervals which will be so large that multiple 'new' waypoints have to be 
> made / copied (e.g. WP 7 for a 2 minute interval).
>
> Three ways of calculating the new coordinates for this new waypoint from very 
> precise to not so precise (in order of preference) are:
>
>
>
> 1) Basing the new waypoint coordinates on the relative 'time distance' to 
> each waypoint of the two surrounding waypoints (therewith assuming a constant 
> movement during the time interval). The 'time distance' bewteen WP3 and WPnew 
> is (120-73) 47 seconds and the 'time distance' between WPnew and WP4 is 
> (153-120) 33 seconds (whereas total time interval between WP3 and WP4 is 80 
> seconds). WPnew (with coordinates Xnew,Ynew) should then be located at 
> 80/33*100=41.25% from WP3: Xnew = X3 + (X4-X3)*41.25% and Ynew= Y3 + 
> (Y4-Y3)*41.25%
>
>
> 2) Calculate the average location (average of x3,y3 and x4,y4), at which to 
> create a new waypoint at 2 minutes.
>
>
> 3) A simpler alternative is that the location of the 'closer waypoint in 
> time', in this example WP 4, (WP4: 153-120=33 versus WP3: 120-73=47) could be 
> copied as being the new location.
>
>
>
> I hope I explained my query so that it makes sense to you. I realize I am 
> asking a 'big' question and apologize for this - the software programs I have 
> thorough knowledge of (ArcGIS, excel, Mapsource), are all unable to solve 
> this problem. I would be very grateful for any advice or suggestions.
>
> Best wishes,
> Louise
>
> 

Re: [R] About 'hazard ratio', urgent~~~

2011-06-14 Thread Mike Marchywka













> Date: Mon, 13 Jun 2011 19:44:15 -0700
> From: dr.jz...@gmail.com
> To: r-help@r-project.org
> Subject: [R] About 'hazard ratio', urgent~~~
>
> Hi,
>
> I am new to R.
>
> My question is: how to get the 'hazard ratio' using the 'coxph' function in
> 'survival' package?

You can probably search the docs for hazard terms, for example, 

http://cran.r-project.org/doc/contrib/Fox-Companion/appendix-cox-regression.pdf


and try running known test data through to verify. For example,  
it does seem that the "exp" column contains a decent estimate of hazard
ratio in simple cases( not quite right 3 sig figs but haven't given this
a lot of thought and it is still early here, hoping from input from someone
who can explain better) ,

> ?rexp

> ns=10
> df<-data.frame(gr<-c(rep(0,ns),rep(1,ns)),t=c(rexp(ns,1),rexp(ns,3)))
> coxph(Surv(t)~gr,df)
Call:
coxph(formula = Surv(t) ~ gr, data = df)

   coef exp(coef) se(coef)   z p
gr 1.09  2.98  0.00503 217 0
Likelihood ratio test=47382  on 1 df, p=0  n= 20, number of events= 2e+05


> df<-data.frame(gr<-c(rep(0,100),rep(1,100)),t=c(rexp(100,1),rexp(100,2)))
> coxph(Surv(t)~gr,df)
Call:
coxph(formula = Surv(t) ~ gr, data = df)

coef exp(coef) se(coef)z   p
gr 0.658  1.930.148 4.44 8.8e-06
Likelihood ratio test=19.6  on 1 df, p=9.5e-06  n= 200, number of events= 200
> df<-data.frame(gr<-c(rep(0,100),rep(1,100)),t=c(rexp(100,1),rexp(100,1)))
> coxph(Surv(t)~gr,df)
Call:
coxph(formula = Surv(t) ~ gr, data = df)

  coef exp(coef) se(coef)  zp
gr -0.0266 0.9740.142 -0.187 0.85
Likelihood ratio test=0.03  on 1 df, p=0.852  n= 200, number of events= 200
>

>
> thanks,
>
> karena
>
> --
> View this message in context: 
> http://r.789695.n4.nabble.com/About-hazard-ratio-urgent-tp3595527p3595527.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>   
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] smoothScatter function (color density question) and adding a legend

2011-06-13 Thread Mike Marchywka










> Date: Sat, 11 Jun 2011 21:52:21 -0400
> From: ccoll...@purdue.edu
> To: r-help@r-project.org
> Subject: [R] smoothScatter function (color density question) and adding a 
> legend
>
> Dear R experts,
>
> I am resending my questions below one more time
> just in case someone out there could help
> but missed my email.

Thanks, I was curious about this and so I installed the geneplot package
and that went fine but I had another issue and need to take care of that.
First, see if "?legend" helps at all ( no idea, I have not used this stuff 
much).
Also source code should be available for the pacakge you care about.


On quick read, it sounds like this uses a binning system for colors. If you
want something slightly different and much slower, I have some examples
( IIRC) that calculate densities in R using something similar to electric
field calculation around points, ( again I'm in a hurry and pulling this
from archive so caveat emptor this plot may not correspond exactly to
example script below etc), 


http://98.129.232.232/coloumb1.pdf


mydensity<-function(x1,x2)
{

len=length(x1);
z<-1:len;
for ( y in 1:len )
{

nx=c(1:(y-1),(y+1):len); 
if ( y==1) {nx=2:len;} 
if (y==len) {nx=1:(len-1); } 
# coloumb is a bit much with overlapping data points, so limite it a bit
z[y]=sum(1.0/(.001*1+(x1[nx]-x1[y])*(x1[nx]-x1[y])+(x2[nx]-x2[y])*(x2[nx]-x2[y])));
 
#for
}
print(max(z));
print(min(z));
z=z-min(z);
z=100*z/max(z);
hist(z);
color=rainbow(100);
#color=heat.colors(10);
tmap=color[floor(z)+1];
scatterplot3d(x1,x2,z,color=tmap);
plot(x2,x1,col=tmap,cex=.5)
#library("VecStatGraphs2D")
#DrawDensityMap(x1,x2,PaintPoint=TRUE)
#color=terrain.colors(26)
#color=heat.colors(26)

}





>
> I don't think my questions are too hard.
>
> I am most concerned about the transformation function.
> See below.
>
> Thanks,
> Clayton
>
> Hello,
>
> I have a few questions, regarding the smoothScatter function.
>
>
> I have a scatter plot with more than 500,000 data points for
> two samples. So, I am wanting to display the density in colors
> to convince people that my good correlation coefficient is not
> due to an "influential point effect" and plus, I also want to make
> my scatter plot look pretty. Anyway ...
>
> I have been able to make the plot, but I have a couple of questions about it.
> I also want to make a legend for it, which I don't know how to do.
> I have only been playing around with R for a few weeks now off and on.
> So, I am definitely a beginner.
>
>
> 1.
> I have 10 colors for my plot: white representing zero density and dark red 
> representing the maximum density (I presume).
> According to the R documentation, the transformation argument represents the 
> "function mapping the density scale to the color scale."
> Note that in my R code below, transformation = function(x) x^0.14.
> I was wondering how exactly or mathematically that this function relates the 
> density to color.
> I believe the colorRampPalette ramps colors between 0 and 1. I am not sure if 
> x represents the color or the
> density. Since I have 10 colors, I suppose the colorRampPalette would assign 
> values of
> 0, 0.11, 0.22, 0.33, 0.44, 0.55, 0.66, 0.77, 0.88, and 1 for white to dark 
> red. I am not sure though.
> Does anyone know how this works? I am sure its not too too complicated.
>
>
> 2.
> In a related issue, I also would like to make a legend for this plot. Then, I 
> would be able to see the transformation
> function's effects on the color-density relationship. Could someone help me 
> in making a legend for my smoothScatter plot? I would like to place it 
> immediately to the right of my plot as a vertical bar, matching the vertical 
> length of the plot as is often convention.
>
> I really like the smoothScatter function. It is easy to use, and I believe 
> it's relatively new.
>
> Thanks in advance.
>
> -Clayton
>
> > clayton <- c("white", "lightblue", "blue", "darkgreen", "green", "yellow", 
> > "orange", "darkorange", "red", "darkred")
> > x <- read.table("c:/users/ccolling/log_u1m1.txt", sep ="\t")
> > smoothScatter(x$V8~x$V7,
> nbin=1000,
> colramp = colorRampPalette(c(clayton)),
> nrpoints=Inf,
> pch="",
> cex=.7,
> transformation = function(x) x^.14,
> col="black",
> main="M1 vs. M3 (r = 0.92)",
> xlab="Normalized M1",
> ylab="Normalized M2",
> cex.axis = 1.75,
> cex.lab = 1.25
> cex.main = 2)
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>   
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Amazon AWS, RGenoud, Parallel Computing

2011-06-11 Thread Mike Marchywka












> Date: Sat, 11 Jun 2011 19:57:47 +0200
> Subject: Re: [R] Amazon AWS, RGenoud, Parallel Computing
> From: lui.r.proj...@googlemail.com
> To: marchy...@hotmail.com
> CC: r-help@r-project.org
>
> Hello Mike,
>
[[elided Hotmail spam]]
> Best to my knowledge the sort algorithm implemented in R is already
> "backed by C++" code and not natively written in R. Writing the code
> in C++ is not really an option either (i think rGenoud is also written
> in C++). I am not sure whether there really is a "bottleneck" with
> respect to the computer - I/O is pretty low, plenty of RAM left etc.
> It really seems to me as if parallelizing is not easily possible or
> only at high costs so that the benefits diminish through all the
> coordination and handling needed...
> Did anybody use rGenoud in "cluster mode" an experience sth similar?
> Are quicksort packages available using multiple processors efficiently
> (I didnt find any... :-( ).

I'm no expert but these don't seem to be terribly subtle problems
in most cases. Sure, if the task is not suited to parallelism and
you force it to be parallel and it spends all its time syncing
up, that can be a problem. Just making more tasks to fight over
the bottle neck- memory, CPU, locks- can easily make things worse.
I think I posted my link earlier on IEEE blurb showing 
how easy it is for many cores to make things worse on non-contrived
benchmarks.




>
> I am by no means an expert on parallel processing, but is it possible,
> that benefits from parallelizing a process greatly diminish if a large
> set of variables/functions need to be made available and the actual
> function (in this case sorting a few hundred entries) is quite short
> whereas the number of times the function is called is very high!? It
> was quite striking that the "first run" usually took several hours
> (instead of half an hour) and the subsequent runs were much much
> faster..
>
> There is so much happening "behind the scenes" that it is a little
> hard for me to tell what might help - and what will not...
>
> Help appreciated :-)
> Thank you
>
> Lui
>
> On Sat, Jun 11, 2011 at 4:42 PM, Mike Marchywka  wrote:
> >
> >
> >
> >
> > 
> >> Date: Sat, 11 Jun 2011 13:03:10 +0200
> >> From: lui.r.proj...@googlemail.com
> >> To: r-help@r-project.org
> >> Subject: [R] Amazon AWS, RGenoud, Parallel Computing
> >>
> >> Dear R group,
> >>
> >>
> > [...]
> >
> >> I am a little bit puzzled now about what I could do... It seems like
> >> there are only very limited options for me to increase the
> >> performance. Does anybody have experience with parallel computations
> >> with rGenoud or parallelized sorting algorithms? I think one major
> >> problem is that the sorting happens rather quick (only a few hundred
> >> entries to sort), but needs to be done very frequently (population
> >> size >2000, iterations >500), so I guess the problem with the
> >> "housekeeping" of the parallel computation deminishes all benefits.
> >>
> > Your sort is part of algorithm or you have to sort results after
> > getting then back out of order from async processes? One of
> > my favorite anecdotes is how I used a bash sort on huge data
> > file to make program run faster ( from impractical zero percent CPU
> > to very fast with full CPU usage and you complain about exactly
> > a lack of CPU saturation). I guess a couple of comments. First,
> > if you have specialized apps you need optimized, you may want
> > to write dedicated c++ code. However, this won't help if
> > you don't find the bottleneck. Lack of CPU saturation could
> > easily be due to "waiting for stuff" like disk IO or VM
> > swap. You really ought to find the bottle neck first, it
> > could be anything ( except the CPU maybe LOL). The sort
> > that I used prevented VM thrashing with no change to the app
> > code- the app got sorted data and so VM paging became infrequent.
> > If you can specify the problem precisely you may be able to find
> > a simple solution.
> >
> >
  
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Amazon AWS, RGenoud, Parallel Computing

2011-06-11 Thread Mike Marchywka





> Date: Sat, 11 Jun 2011 13:03:10 +0200
> From: lui.r.proj...@googlemail.com
> To: r-help@r-project.org
> Subject: [R] Amazon AWS, RGenoud, Parallel Computing
>
> Dear R group,
>
>
[...]

> I am a little bit puzzled now about what I could do... It seems like
> there are only very limited options for me to increase the
> performance. Does anybody have experience with parallel computations
> with rGenoud or parallelized sorting algorithms? I think one major
> problem is that the sorting happens rather quick (only a few hundred
> entries to sort), but needs to be done very frequently (population
> size >2000, iterations >500), so I guess the problem with the
> "housekeeping" of the parallel computation deminishes all benefits.
>
Your sort is part of algorithm or you have to sort results after 
getting then back out of order from async processes? One of 
my favorite anecdotes is how I used a bash sort on huge data
file to make program run faster ( from impractical zero percent CPU
to very fast with full CPU usage and you complain about exactly
a lack of CPU saturation). I guess a couple of comments. First, 
if you have specialized apps you need optimized, you may want
to write dedicated c++ code. However, this won't help if
you don't find the bottleneck. Lack of CPU saturation could
easily be due to "waiting for stuff" like disk IO or VM
swap. You really ought to find the bottle neck first, it
could be anything ( except the CPU maybe LOL). The sort
that I used prevented VM thrashing with no change to the app
code- the app got sorted data and so VM paging became infrequent.
If you can specify the problem precisely you may be able to find
a simple solution. 

  
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] RES: Linear multivariate regression with Robust error

2011-06-11 Thread Mike Marchywka















> Date: Fri, 10 Jun 2011 16:50:24 -0300
> From: filipe.bote...@vpar.com.br
> To: frien...@yorku.ca; bkkoc...@gmail.com
> CC: r-help@r-project.org
> Subject: [R] RES: Linear multivariate regression with Robust error
>
>
>
> --Forwarded Message Attachment--
> Subject: RES: [R] Linear multivariate regression with Robust error
> Date: Fri, 10 Jun 2011 16:50:24 -0300
> From: filipe.bote...@vpar.com.br
> To: frien...@yorku.ca; bkkoc...@gmail.com
> CC: r-help@r-project.org
>
>
> Hi Barjesh,
>
> I am not sure which data you analyze, but once I had a similar situation
> and it was a multicolinearity issue. I realized this after finding a
> huge correlation between the independent variables, then I dropped one
> of them and signs of slopes made sense.
>
> Beforehand, have a glance at your correlation matrix of independent
> variables to see the relationship between them.

I guess "look at the data" ( LATFD ) seems to be the recurring solution.
Certainly with polynomial regression, there's a tendency to think that
the fit parameter is a pure property of the system being examined. With
something like a Taylor series, and you have a slope at a specific point,
maybe you could think about it that way but here the coefficient
is just "whatever optimizes your (arbitrary ) error function for the data
you have." A linear approaximation to a non-line could be made at
one point and maybe that property should remain constant but 
you have people claiming " past authors samples some curve near x=a
and got a different slope than my work largely smapling f(x) around
x=b what is wrong with my result?" It may be interesting to try to
write a taylor series around some point and see how those coefficients
vary with data sets for example( you still need arbitrary way to
estimate slopes and simply differencing two points may be a bit noisy LOL 
but you could play with some wavelet families maybe ). 

If you try to describe a fruit as a linear combination of vegetables,
you may get confusing but possibly useful results even if they don't correspond 
to
properties of the fruits so much as a specific need you have. For example, if 
you
are compressing images of fruits and your decoder already has a dictionary
of vegetables, it may make sense to do this.  This is not much different from 
trying to compress non-vocal music with an ACELP codec that attempts to fit the 
sounds
to models of human vocal tract. Sometimes this may be even be informative
about how a given sound was produced even if it sounds silly. 












>
> Not sure how helpful my advice will be, but it did the trick for me back
> then ;)
>
> Cheers,
> Filipe Botelho
>
> -Mensagem original-
> De: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
> Em nome de Michael Friendly
> Enviada em: sexta-feira, 10 de junho de 2011 12:51
> Para: Barjesh Kochar
> Cc: r-help@r-project.org
> Assunto: Re: [R] Linear multivariate regression with Robust error
>
> On 6/10/2011 12:23 AM, Barjesh Kochar wrote:
> > Dear all,
> >
> > i am doing linear regression with robust error to know the effect of
> > a (x) variable on (y)other if i execute the command i found positive
> > trend.
> > But if i check the effect of number of (x.x1,x2,x3)variables
> > on same (y)variable then the positive effect shwon by x variable turns
> > to negative. so plz help me in this situation.
> >
> > Barjesh Kochar
> > Research scholar
> >
> You don't give any data or provide any code (as the posting guide
> requests) , so I have to guess that you
> have just rediscovered Simpson's paradox -- that the coefficient of a
> variable in a marginal regression can have an opposite sign to that in
> a joint model with other predictors. I have no idea what you mean
> by 'robust error'.
>
> One remedy is an added-variable plot which will show you the partial
> contributions of each predictor in the joint model, as well as whether
> there are any influential observations that are driving the estimated
> coefficients.
>
>
> --
> Michael Friendly Email: friendly AT yorku DOT ca
> Professor, Psychology Dept.
> York University Voice: 416 736-5115 x66249 Fax: 416 736-5814
> 4700 Keele Street Web: http://www.datavis.ca
> Toronto, ONT M3J 1P3 CANADA
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
> "This message and its attachments may contain confidential and/or
> privileged information. If you are not the addressee, please, advise
> the sender immediately by replying to the e-mail and delete this
> message." "Este mensaje y sus anexos pueden contener información
> confidencial o privilegiada. Si ha recibido este e-mail por error por
> favor bórrelo y envíe un mensaje al remitente." "Esta mensagem e seus
> anexos podem conter inf

Re: [R] Linear multivariate regression with Robust error

2011-06-10 Thread Mike Marchywka













> Date: Fri, 10 Jun 2011 09:53:20 +0530
> From: bkkoc...@gmail.com
> To: r-help@r-project.org
> Subject: [R] Linear multivariate regression with Robust error
>
> Dear all,
>
> i am doing linear regression with robust error to know the effect of
> a (x) variable on (y)other if i execute the command i found positive
> trend.
> But if i check the effect of number of (x.x1,x2,x3)variables
> on same (y)variable then the positive effect shwon by x variable turns
> to negative. so plz help me in this situation.

take y as goodness and x and x1 have something to do with
a product. The first analysis is from company A, second is from company
B and the underlying relationship is given with some noise LOL, 
( I'm still on first cup of cofee, this was fist example to
come to mind as these question keep coming up here everyday )

> x=1:100
> x1=x*x
> y<-x-x1+runif(100)
> lm(y~x)

Call:
lm(formula = y ~ x)

Coefficients:
(Intercept)    x
   1718 -100

> lm(y~x+x1)

Call:
lm(formula = y ~ x + x1)

Coefficients:
(Intercept)    x   x1
 0.5253   1.0024  -1.

>











>
> Barjesh Kochar
> Research scholar
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
  
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] error with geomap in googleVis

2011-06-10 Thread Mike Marchywka




> Subject: RE: [R] error with geomap in googleVis
> Date: Fri, 10 Jun 2011 11:06:47 +0100
> From: markus.gesm...@lloyds.com
> To: marchy...@hotmail.com; r-h...@stat.math.ethz.ch
>
> Hi Mike,
>
> I believe you are trying to put two charts on the same page.
> The demo(googleVis) provides you with an example for this.
> So applying the same approach to your data sets would require:


Thanks, I thought however I tried to different examples the first
being my own which failed and  then the "Fruits" example as a second
test. I was lazy and used install package from probably wrong mirror
as in the past on 'dohs I had problems with CMD INSTALL. I reloaded
as per your suggestion and tried first example. The output did
change but it is still lacking a chart. Do I need API key or something?
Ive never used google visualization before and was just reacting to OP
as I had wanted to try it. Process is below, final result may
be here, http://98.129.232.232/xxx.html , but I will be varying as
I get time. Thanks again.

--2011-06-10 05:17:47--  http://cran.r-project.org/src/contrib/googleVis_0.2.5.t
ar.gz
Resolving cran.r-project.org... 137.208.57.37
Connecting to cran.r-project.org|137.208.57.37|:80... connected.
HTTP request sent, awaiting response...
  HTTP/1.1 200 OK
  Date: Fri, 10 Jun 2011 10:17:47 GMT
  Server: Apache/2.2.9 (Debian)
  Last-Modified: Tue, 07 Jun 2011 06:35:56 GMT
  ETag: "a83451-b36e9-4a5196ea64b00"
  Accept-Ranges: bytes
  Content-Length: 734953
  Keep-Alive: timeout=15, max=100
  Connection: Keep-Alive
  Content-Type: application/x-gzip
Length: 734953 (718K) [application/x-gzip]
Saving to: `googleVis_0.2.5.tar.gz'

100%[==>] 734,953  527K/s   in 1.4s

2011-06-10 05:17:49 (527 KB/s) - `googleVis_0.2.5.tar.gz' saved [734953/734953]

[marchywka@351915-www1 downloads]$ R CMD INSTALL googleVis_0.2.5.tar.gz
* installing to library `/usr/local/lib64/R/library'
* installing *source* package `googleVis' ...
** R
** data
**  moving datasets to lazyload DB
** demo
** inst
** preparing package for lazy loading
** help
*** installing help indices
** building package indices ...
** testing if installed package can be loaded

* DONE (googleVis)
[marchywka@351915-www1 downloads]$ R

R version 2.13.0 (2011-04-13)
Copyright (C) 2011 The R Foundation for Statistical Computing
ISBN 3-900051-07-0
Platform: x86_64-unknown-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

  Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(googleVis)
df<-data.frLoading required package: RJSONIO
ame(foo=c("BrazilTo suppress the following message use the statement:
suppressPackageStartupMessages(library(googleVis))

Welcome to googleVis version 0.2.5

Type ?googleVis to access the overall documentation and
vignette('googleVis') for the package vignette.
You can execute the demo of the package via: demo(googleVis)

More information is available on the googleVis project web-site:
http://code.google.com/p/google-motion-charts-with-r/

Please read also the Google Visualisation API Terms of Use:
http://code.google.com/apis/visualization/terms.html

Feel free to send us an email 
if you would like to be keept informed of new versions,
or if you have any feedback, ideas, suggestions or would
like to collaborate.

","C> df<-data.frame(foo=c("Brazil","Canada"), bar=c(123,456))
>
> map1<-gvisGeoMap(df,locationvar='foo',numvar='bar')
> cat(map1$html)
Error in cat(list(...), file, sep, fill, labels, append) :
  argument 1 (type 'list') cannot be handled by 'cat'
> cat(map1$html$header)
http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd";>
http://www.w3.org/1999/xhtml";>

  GeoMapID23b504e5
  
  
    body {
  color: #44;
  font-family: Arial,Helvetica,sans-serif;
  font-size: 75%;
    }
    a {
  color: #4D87C7;
  text-decoration: none;
    }
  


> cat(map1$html$header,file="xxx.html",appeand=F)
> cat(map1$html$chart,file="xxx.html",appeand=t)
Error in cat(list(...), file, sep, fill, labels, append) :
  argument 2 (type 'closure') cannot be handled by 'cat'
> str(map1$html)
List of 4
 $ header : chr "http://www.w3.org/TR/xhtml1/DT

Re: [R] error with geomap in googleVis

2011-06-09 Thread Mike Marchywka


I still got blanks with Firefox with the two examples below, I put html
up here if you want to look at it,

http://98.129.232.232/xxx.html

I just downloaded googlevis from mirror 68 and it claimed it
was 0.2.5 ( I thought, but maybe I should check again). 


install.packages("googleVis",dep=T)
library(googleVis)
df<-data.frame(foo=c("Brazil","Canada"), bar=c(123,456))
map1<-gvisGeoMap(df,locationvar='foo',numvar='bar')

cat(map1$html$header,filename="xxx.html",append=F)
cat(map1$html$header,file="xxx.html",append=F)
cat(map1$html$chart,file="xxx.html",append=T)
cat(map1$html$caption,file="xxx.html",append=T)
cat(map1$html$footer,file="xxx.html",append=T)
m<-gvisMotionChart(Fruits,idvar="Fruit",timevar="Year")
str(m)
cat(m$html$header,file="xxx.html",append=F)
cat(m$html$chart,file="xxx.html",append=T)
cat(m$html$caption,file="xxx.html",append=T)
cat(m$html$footer,file="xxx.html",append=T)




> Subject: RE: [R] error with geomap in googleVis
> Date: Thu, 9 Jun 2011 14:06:22 +0100
> From: markus.gesm...@lloyds.com
> To: marchy...@hotmail.com; mjphi...@tpg.com.au; r-h...@stat.math.ethz.ch
>
> Hi all,
>
> This issue occurs with googleVis 0.2.4 and RJSONIO > 0.7.1.
> Version 0.2.5 of the googleVis package has been uploaded to CRAN two
> days ago and should have fixed this issue.
> Can you please try to update to that version, e.g. from
> http://cran.r-project.org/web/packages/googleVis/
>
> Further version 0.2.5 provides new interfaces to more interactive Google
> charts:
> - gvisLineChart
> - gvisBarChart
> - gvisColumnChart
> - gvisAreaChart
> - gvisScatterChart
> - gvisPieChart
> - gvisGauge
> - gvisOrgChart
> - gvisIntensityMap
>
> Additionally a new demo 'AnimatedGeoMap' has been added which shows how
> a Geo Map can be animated with additional JavaScript. Thanks to Manoj
> Ananthapadmanabhan and Anand Ramalingam, who provided the idea and
> initial code.
>
> For more information and examples see:
> http://code.google.com/p/google-motion-charts-with-r/
>
> I hope this helps
>
> Markus
>
> -Original Message-
> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
> On Behalf Of Mike Marchywka
> Sent: 09 June 2011 11:19
> To: mjphi...@tpg.com.au; r-h...@stat.math.ethz.ch
> Subject: Re: [R] error with geomap in googleVis
>
>
>
> 
> > To: r-h...@stat.math.ethz.ch
> > From: mjphi...@tpg.com.au
> > Date: Wed, 8 Jun 2011 10:14:01 +
> > Subject: Re: [R] error with geomap in googleVis
> >
> > SNV Krishna primps.com.sg> writes:
> >
> > >
> > > Hi All,
> > >
> > > I am unable to get the plot geomap in googleVis package. data is as
> > > follows
> > >
> > > > head(index.ret)
> > > country ytd
> > > 1 Argentina -10.18
> > > 2 Australia -3.42
> > > 3 Austria -2.70
> > > 4 Belgium 1.94
> > > 5 Brazil -7.16
> > > 6 Canada 0.56
> > >
> > > > map1 = gvisGeoMap(index.ret,locationvar = 'country', numvar =
> > > > 'ytd')
> > > > plot(map1)
> > >
> > > But it just displays a blank page, showing an error symbol at the
> > > right bottom corner. I tried demo(googleVis), it also had a similar
> > > problem. The demo showed all other plots/maps except for those
> > > geomaps. Could any one please hint me what/where could be the
> > > problem? Many thanks for the idea and support.
>
>
> I had never used this until yesterday but it seems to generate html.
> I didn't manage to get a chart to display but if you are familiar with
> this package and html perhaps you could look at map1$html and see if
> anything is obvious. One great thing about html/js is that it is human
> readable and you can integrate it well with other page material without
> much in the way of special tools.
>
>
>
>
>
>
>
>
> > >
> > > Regards,
> > >
> > > SNV Krishna
> > >
> > > [[alternative HTML version deleted]]
> > >
> > >
> >
> > Hi All,
> >
> > I have also encountered this problem. I have tested the problem in

>
> 3.0. I
> > have latest java and flash and I have tried both Firefox and IE (both
> > latest
>
> k just
> > fine.
> >
> > I too would like to know how to solve this problem.
> >
> > Kind regards,
> >
> > Michael Phipps
&

Re: [R] any documents

2011-06-09 Thread Mike Marchywka













> Date: Thu, 9 Jun 2011 02:21:21 -0700
> From: wa7@gmail.com
> To: r-help@r-project.org
> Subject: [R] any documents
>
> Hi,
>
> I'm doing a textual analysis of several articles discussing the evolution of
> prices in order to give a forecast. if someone can give me a clear approach
> to this knowing that I work on the package tm.

LOL, are you talking about the computer generated "analysis" such as the thin
text platititudes around bandwagon stats such as " is trading xx above 30 day
moving average " etc etc. This sounds funny but is actually an interesting
test case as the hidden structured nature of the documents should be easier to 
analyse
than, say, poetry. The field is very much researchy AFAIK and you will need
to define an algorithm of do a literature search to get much in the
way of helpful response beyond "?tm"  Absent that, you are almost asking for
someone to invent an algorithm. I've refered many posters to terms like
"computational lingquistics" but people who have used these kinds
of things don't seem to post here much. If you can give us more details
or source article maybe someone can point you in useful direction. 





>
> Thank you very much
>
>
> --
> View this message in context: 
> http://r.789695.n4.nabble.com/any-documents-tp3584961p3584961.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
  
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] error with geomap in googleVis

2011-06-09 Thread Mike Marchywka



> To: r-h...@stat.math.ethz.ch
> From: mjphi...@tpg.com.au
> Date: Wed, 8 Jun 2011 10:14:01 +
> Subject: Re: [R] error with geomap in googleVis
>
> SNV Krishna  primps.com.sg> writes:
>
> >
> > Hi All,
> >
> > I am unable to get the plot geomap in googleVis package. data is as follows
> >
> > > head(index.ret)
> > country ytd
> > 1 Argentina -10.18
> > 2 Australia -3.42
> > 3 Austria -2.70
> > 4 Belgium 1.94
> > 5 Brazil -7.16
> > 6 Canada 0.56
> >
> > > map1 = gvisGeoMap(index.ret,locationvar = 'country', numvar = 'ytd')
> > > plot(map1)
> >
> > But it just displays a blank page, showing an error symbol at the right
> > bottom corner. I tried demo(googleVis), it also had a similar problem. The
> > demo showed all other plots/maps except for those geomaps. Could any one
> > please hint me what/where could be the problem? Many thanks for the idea and
> > support.


I had never used this until yesterday but it seems to generate html.
I didn't manage to get a chart to display but if you are familiar
with this package and html perhaps you could look at map1$html
and see if anything is obvious. One great thing about html/js
is that it is human readable and you can integrate it
well with other page material without much in the way of special
tools. 








> >
> > Regards,
> >
> > SNV Krishna
> >
> > [[alternative HTML version deleted]]
> >
> >
>
> Hi All,
>
> I have also encountered this problem. I have tested the problem in Windows XP

3.0. I
> have latest java and flash and I have tried both Firefox and IE (both latest

k just
> fine.
>
> I too would like to know how to solve this problem.
>
> Kind regards,
>
> Michael Phipps
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
  
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Resources for utilizing multiple processors

2011-06-09 Thread Mike Marchywka










> From: rjeffr...@ucla.edu
> Date: Wed, 8 Jun 2011 20:54:45 -0700
> To: r-help@r-project.org
> Subject: [R] Resources for utilizing multiple processors
>
> Hello,
>
> I know of some various methods out there to utilize multiple processors but
> am not sure what the best solution would be. First some things to note:
> I'm running dependent simulations, so direct parallel coding is out
> (multicore, doSnow, etc).

the
> *nix languages.

Well, for the situation below you seem to want a function
server. You could consider Rapache and just write this like a big
web application. A web server, like a DB, is not the first thing
you think of with high performance computing but if your computationally
intenstive tasks are in native code this could be a reasoanble
overhead that requires little learning. 

If you literally means cores instead of machines keep in mind
that cores can end up fighting over resources, like memory
( this cites IEEE article with cores making things worse
in non-contrived case)

http://lists.boost.org/boost-users/2008/11/42263.php


I think people have mentioned some classes like bigmemory, I forget
the names exactly, that let you handle larger things. Launching a bunch
of threads and letting VM thrash can easily make things slower quickly.

I guess a better approach would be to get an implementation that is
block oriented and you can do the memory/file stuff in R until
they get a data frame that uses disk transparently and with hints on
expected access patterns ( prefetch etc). 



>
> My main concern deals with Multiple analyses on large data sets. By large I
> mean that when I'm done running 2 simulations R is using ~3G of RAM, the
> remaining ~3G is chewed up when I try to create the Gelman-Rubin statistic
> to compare the two resulting samples, grinding the process to a halt. I'd
> like to have separate cores simultaneously run each analysis. That will save
> on time and I'll have to ponder the BGR calculation problem another way. Can
> R temporarily use HD space to write calculations to instead of RAM?
>
> The second concern boils down to whether or not there is a way to split up
> dependent simulations. For example at iteration (t) I feed a(t-2) into FUN1
> to generate a(t), then feed a(t), b(t-1) and c(t-1) into FUN2 to simulate
> b(t) and c(t). I'd love to have one core run FUN1 and another run FUN2, and
[[elided Hotmail spam]]
>
>
> So if anyone has any suggestions as to a direction I can look into, it would
> be appreciated.
>
>
> Robin Jeffries
> MS, DrPH Candidate
> Department of Biostatistics
> UCLA
> 530-633-STAT(7828)
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
  
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] RgoogleMaps Axes

2011-06-08 Thread Mike Marchywka














> Date: Tue, 7 Jun 2011 09:50:10 -0700
> From: egregory2...@yahoo.com
> To: r-help@r-project.org
> Subject: [R] RgoogleMaps Axes
>
> R Help,
>
>
> I posted a question on StackOverflow yesterday regarding an issue I've been 
> having with the RgoogleMaps packages' displaying of axes. Here is the text of 
> that submission:
> http://stackoverflow.com/questions/6258408/rgooglemaps-axes
>
> "I can't find any documentation of the following problem I'm having with the 
> axis labels in RGoogleMaps:
> library(RgoogleMaps)
> datas <-structure(list(LAT =c(37.875,37.925,37.775,37.875,37.875),
> LON =c(-122.225,-122.225,-122.075,-122.075,-122.025)),
> .Names=c("LAT","LON"),class="data.frame",
> row.names =c(1418L,1419L,1536L,1538L,1578L))
> # Get bounding box.
> boxt <-qbbox(lat =datas$LAT,lon =datas$LON)
> MyMap<-GetMap.bbox(boxt$lonR,boxt$latR,destfile ="Arvin12Map.png",
> maptype ="mobile")
> PlotOnStaticMap(MyMap,lat =datas$LAT,lon =datas$LON,
> axes =TRUE,mar =rep(4,4))
>

I haven't gotten too far as I had to download and install proj4 and rgdal
but I did get a transofrm related warning. Did you get warnings?


> MyMap<-GetMap.bbox(boxt$lonR,boxt$latR,destfile ="Arvin12Map.png",maptype ="m
obile")
[1] "http://maps.google.com/maps/api/staticmap?center=37.85,-122.125&zoom=12&siz
e=640x640&maptype=mobile&format=png32&sensor=true"
Loading required package: rgdal
Loading required package: sp
Geospatial Data Abstraction Library extensions to R successfully loaded
Loaded GDAL runtime: GDAL 1.8.0, released 2011/01/12
Path to GDAL shared files: /usr/local/share/gdal
Loaded PROJ.4 runtime: Rel. 4.7.1, 23 September 2009
Path to PROJ.4 shared files: (autodetected)
Warning message:
In readGDAL(destfile, silent = TRUE) : GeoTransform values not available
>


> PlotOnStaticMap(MyMap,lat =datas$LAT,lon =datas$LON,
+  axes =TRUE,mar =rep(4,4))
List of 6
 $ lat.center: num 37.8
 $ lon.center: num -122
 $ zoom  : num 12
 $ myTile: int [1:640, 1:640] 968 853 855 969 1033 888 855 884 888 995 ...
  ..- attr(*, "COL")= chr [1:1132] "#00" "#020201" "#020202" "#030302" ...
  ..- attr(*, "type")= chr "rgb"
 $ BBOX  :List of 2
  ..$ ll: num [1, 1:2] 37.8 -122.2
  .. ..- attr(*, "dimnames")=List of 2
  .. .. ..$ : chr "Y"
  .. .. ..$ : chr [1:2] "lat" "lon"
  ..$ ur: num [1, 1:2] 37.9 -122
  .. ..- attr(*, "dimnames")=List of 2
  .. .. ..$ : chr "Y"
  .. .. ..$ : chr [1:2] "lat" "lon"
 $ url   : chr "google"
NULL
[1] -291.2711  291.2711
[1] -276.5158  276.7972



> When I run this on my computer the horizontal axis ranges from 300W to 60E, 
> but the ticks in between aren't linearly spaced (300W, 200W, 100W, 0, 100E, 
> 160W, 60W). Also, the vertical axis moves linearly from 300S to 300N. It 
> seems that no matter what data I supply for datas, the axes are always 
> labeled this way.
> My question is:
> 1. Does this problem occur on other machines using this code?
> 2. Does anyone have an explanation for it?
> and
> 3. Can anybody suggest a way to get the correct axes labels (assuming these 
> are "incorrect", but maybe i'm somehow misinterpreting the plot!)?
> Thank you for your time."
>
> There has been no answer other than that I ought to contact the package 
> maintainer. Since it would be nice to have a publicly displayed solution, I 
> opted to post here first before doing so. Does anyone have any insight to 
> share?
> Thank you very much for your time,
>
> -Erik Gregory
> CSU Sacramento, Mathematics
> Student Assistant, California Environmental Protection Agency
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>   
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Can we prepare a questionaire in R

2011-06-08 Thread Mike Marchywka







> Date: Wed, 8 Jun 2011 12:37:33 +0530
> From: ammasamri...@gmail.com
> To: r-help@r-project.org
> Subject: [R] Can we prepare a questionaire in R
>
> Is there a way to prepare a questionnaire in R like html forms whose data
> can be directly populated into R?
>

I've started to use Rapache although Rserve would also be an option.
When installed on server, your can point your html form to an rhtml
page and get the form variables in R just as with other web languages.
For writing html output, I had been using R2HTML ( see code excerpt below
from rhtml page). I had also found Cairo works ok if you don't
need X-11 for anything. 

For your specific situation however, it may be easier to just
use whatever you already have and just use R for the data analysis.
When a request for a results report is made,  send that to Rapache for
example. I would mention that I have gone to running two versions
of Apache, one with R and one with PHP, to allow for security and
easier development( I can write the php to fail nicely if the R apache
server is not up and no new security issues are exposed). 


library("Cairo")
library("R2HTML")
library("RColorBrewer")


You can of course also generate html from normal R commands, or for that
matter bash scripts etc.

  
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Populating values from html

2011-06-07 Thread Mike Marchywka









> Date: Tue, 7 Jun 2011 03:35:46 -0700
> From: ammasamri...@gmail.com
> To: r-help@r-project.org
> Subject: [R] Populating values from html
>
> can we populate values into an excel sheet from html forms that has to be
> used in R for data analysis
>
> Can we directly retireve the values from html forms into R fro analysis


Judging from the way many websites offer data, you'd think that jpg is
the best means for getting it LOL. html, pdf, and image formats
are designed for human consumption of limited aspects of a data set.
Normally you would prefer something closer to raw data like csv.
After having visited yet another public website that offers
data in pdf ( YAPWTODIP) I would suggest you first contact the
site and ask them to present data in a form which allows it to
be easily examined in an open source environment ( note this criterion  does not
make Excel a preferred choice either). If you have to scrape web pages,
apparently there are some R facilities but depending on the page you
may need to do a lot of work and it will not  survive if the page
is redone for artistic reasons etc. See any of these results for example,

http://www.google.com/search?hl=en&q=cran+page+scraping








>
> --
> View this message in context: 
> http://r.789695.n4.nabble.com/Populating-values-from-html-tp3579215p3579215.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
  
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Logistic Regression

2011-06-07 Thread Mike Marchywka





> Date: Tue, 7 Jun 2011 01:38:32 -0700
> From: farah.farid@student.aku.edu
> To: r-help@r-project.org
> Subject: [R] Logistic Regression
>
> I am working on my thesis in which i have couple of independent variables
> that are categorical in nature and the depndent variable is dichotomus.
> Initially I run univariate analysis and added the variables with significant
> p-values (p<0.25) in my full model.
> I have three confusions. Firstly, I am looking for confounding variables by

I'm not sure what your thesis is about, some system that you are strying
by statistics or maybe the thesis is about statistics, but
according to this disputed wikipedia entry,

http://en.wikipedia.org/wiki/Confounding

confounding or extraneous is determined by the reality of your system.
It may help to consider factors related to that and use the statistics
to avoid fooling yourself. Look at the pictures ( non-pompous way of saying
look at graphs and scatter plots for some ideas to test ) and then test various
ideas. You see bad cause/effect inferences all the time in many fields- 
from econ to biotech ( although anecdotes suggest these mistakes usually
favour the sponsors LOL). Consider some mundane "known" examples about
what your data would look like and see if that relates to what you have.
If you were naively measuring car velocity 
at a single point in front of traffic light and color of light, what might you
observe ( much like with an earlier example on iron in patients, there are a 
number
of more precisely defined measurements you could take on a given "thing.").

If your concern is that " I ran test A and it said B but test C said D and
D seems inconsistent with B" it generally helps to look at assumptions and 
detailed
equations for each model and explore what those mean with your data. With 
continuous
variables anyway, non-monotonic relationships can easily destroy a correlation
even with strong causality. 


> using formula "(crude beta-cofficient - adjusted beta-cofficient)/ crude
> beta-cofficient x 100" as per rule if the percentage of any variable is >10%
> than I have considered that as confounder. I wanted to know that from
> initial model i have deducted one variable with insignificant p-value to
> form adjusted model. Now how will i know if the variable that i deducted
> from initial model was confounder or not?
> Secondly, I wanted to know if the percentage comes in negative like
> (-17.84%) than will it be considered as confounder or not? I also wanted to
> know that confounders should be removed from model? or should be kept in
> model?
> Lastly, I wanted to know that I am running likelihood ratio test to identify
> if the value is falling in critical region or not. So if the value doesnot
> fall in critical region than what does it show? what should I do in this
> case? In my final reduced model all p-values are significant but still the
> value identified via likelihood ratio test is not falling in critical
> region. So what does that show?
>
>
> --
> View this message in context: 
> http://r.789695.n4.nabble.com/Logistic-Regression-tp3578962p3578962.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
  
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] predict with eha package

2011-06-02 Thread Mike Harwood
Hello list, and thank you in advance.

I'm unable to generate predicted values when specifying newdata using
phreg and aftreg models
in the eha package, but I do not have the same problem with a
proportional hazards model from coxph.  Without the newdata argument
the predicted values are returned, but with
"newdata=" coxph is fine but both aftreg and phreg
models return an  "Error in predict.coxph(f.ph.eha,
newdata = mort, type = "lp") :  Data is not the same size as it was in
the original fit" message.  Since I ultimately want a parametric model
and the "real"
data is left truncated and right censored, I think the aftreg function
in the eha package is what I must use.  Following is my sample code,
without the output.

#~ All models generated successfully -
f.ph <- coxph(Surv(enter, exit, event) ~ ses, data = mort)
f.ph.eha <- phreg(Surv(enter, exit, event) ~ ses, data = mort)
f.aft <- aftreg(Surv(enter, exit, event) ~ ses, data = mort)

#~ All fits generated successfully ---
f.ph.fit <- predict(f.ph, type='lp')
f.ph.eha.fit <- predict(f.ph.eha, type='lp')
f.aft.fit <- predict(f.aft, type='lp')

#~ First fit generated successfully, others output
error
f.ph.fit <- predict(f.ph, newdata=mort, type='lp')
f.ph.eha.fit <- predict(f.ph.eha, newdata=mort, type='lp')
f.aft.fit <- predict(f.aft, newdata=mort, type='lp')


Mike

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] error in model specification for cfa with lavaan-package

2011-06-01 Thread Mike Cheung
Dear Alain,

You may speed up the analysis by using the sample covariance matrix based on
a listwise deletion:
cov.cfa <- cov(your.raw.data, use="complete.obs")

Since you have 36671 cases, the results should be similar to those based on
the raw data unless you have lots of missing data and/or the data are
missing at random.

By the way, if your questions are directly related to SEM, you may get more
responses from SEMNET (http://alabamamaps.ua.edu/archives/semnet.html).

Hope it helps.

Regards,
Mike
-- 
-----
 Mike W.L. Cheung   Phone: (65) 6516-3702
 Department of Psychology   Fax:   (65) 6773-1843
 National University of Singapore
 http://courses.nus.edu.sg/course/psycwlm/internet/
-

On Thu, Jun 2, 2011 at 1:21 AM, yrosseel  wrote:

> On 06/01/2011 05:39 PM, D. Alain wrote:
>
>> Thank you Yves and Mike,
>>
>> your comments make sence, however they do not resolved my problem: The p
>> < N is the result of my poor attempt to give a reproducible example. My
>> "real" dataframe has a dim of 36671 cases an 41 variables.
>> Following your advice, Yves, I passed my model without lavaanification
>> (just "cfa(cfa.model, data=df.cfa,missing="ml")"), but now R is working
>> for hours without printing any results...
>>
>
> To monitor progress, you can use the verbose=TRUE argument. If you are not
> fitting your final model, you may want to use se="none" (no standard errors)
> and test="none" (no test statistic) to speed things up.
>
> I've ran analyses that took multiple days, using both lavaan and commercial
> software. It is annoying (the wait), but not unusual. The time to fit is a
> function of the number of missing patterns you have in your data, and the
> number of variables.
>
> Yves.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Identifying sequences

2011-06-01 Thread Mike Marchywka







> Date: Wed, 1 Jun 2011 17:12:29 +0200
> From: cjp...@gmail.com
> To: r-help@r-project.org
> Subject: Re: [R] Identifying sequences
>
> Thanks to David, Thierry and Jonathan for your help.
> I have been able to put this function together
>
> a=1:10
> b=20:30
> c=40:50
> x=c(a,b,c)
>
> seq.matrix <- function(x){
> lower<- x[which(diff(x) != 1)]
> upper <- x[which(diff(x) != 1)+1]
> extremities <- c(1,lower, upper,x[length(x)])
> m <- 
> data.frame(matrix(extremities[order(extremities)],ncol=2,byrow=TRUE,dimnames=list(rows=paste("group",1:(length(lower)+1),sep=""),cols=c("lower","upper"
> m$length=m$upper-m$lower+1
> m
> }
>
> s.m=seq.matrix(x)
> s.m
> lower upper length
> group1 1 10 10
> group2 20 30 11
> group3 40 50 11
>
> One can then make a test to see if a certain value (say 9) falls
> within one of the groups and use that to find the group name or lower
> or upper border

As I understand, you are looking for large derivatives
or approx discontinuity against smooth signal.
This seems like a natural application for wavelets,
try the haar wavelet and use package 
wavelets,

> library(wavelets)
> f=wt.filter(c(-1,1),modwt=T)
> z<-modwt(X=as.numeric(x),filter=f,n.levels=1)
> z@W
$W1
  [,1]
 [1,]   49
 [2,]   -1
 [3,]   -1
 [4,]   -1
 [5,]   -1
 [6,]   -1
 [7,]   -1
 [8,]   -1
 [9,]   -1
[10,]   -1
[11,]  -10
[12,]   -1
[13,]   -1
[14,]   -1
[15,]   -1
[16,]   -1
[17,]   -1
[18,]   -1
[19,]   -1
[20,]   -1
[21,]   -1
[22,]  -10
[23,]   -1
[24,]   -1
[25,]   -1
[26,]   -1
[27,]   -1
[28,]   -1
[29,]   -1
[30,]   -1
[31,]   -1
[32,]   -1







>
> s.m.test=function(s.m,i){which(s.m[,1] > s.m.test(s.m,i=9)
> [1] 1
> e.g.
> row.names(s.m)[s.m.test(s.m,i=9)]
> [1] "group1"
>
> Cheers
> Christiaan
>
> On 1 June 2011 14:31, Jonathan Daily wrote:
> >
> > I am assuming in this case that you are looking for continuity along
> > integers, so if you expect noninteger values this will not work.
> >
> > You can get the index of where breaks can be found in your example using
> >
> > which(diff(x) > 1)
> >
> > On Wed, Jun 1, 2011 at 6:27 AM, christiaan pauw wrote:
> > > Hallo Everybody
> > >
> > > Consider the following vector
> > >
> > > a=1:10
> > > b=20:30
> > > c=40:50
> > > x=c(a,b,c)
> > >
> > > I need a function that can tell me that there are three set of continuos
> > > sequences and that the first is from 1:10, the second from 20:30 and the
> > > third from 40:50. In other words: a,b, and c.
> > >
> > > regards
> > > Christiaan
> > >
> > > [[alternative HTML version deleted]]
> > >
> > > __
> > > R-help@r-project.org mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide 
> > > http://www.R-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible code.
> > >
> >
> >
> >
> > --
> > ===
> > Jon Daily
> > Technician
> > ===
> > #!/usr/bin/env outside
> > # It's great, trust me.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>   
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] error in model specification for cfa with lavaan-package

2011-06-01 Thread Mike Cheung
Dear Alain,

There were 16 variables with 10 cases with missing values. The sample
covariance matrix is not positive definite. It has nothing to do with
lavaan. You need more cases before you can fit a CFA with 16 variables.

Regards,
Mike

-- 
-
 Mike W.L. Cheung   Phone: (65) 6516-3702
 Department of Psychology   Fax:   (65) 6773-1843
 National University of Singapore
 http://courses.nus.edu.sg/course/psycwlm/internet/
-

On Wed, Jun 1, 2011 at 6:31 PM, D. Alain  wrote:

> Dear R-List,
>
> (I am not sure whether this list is the right place for my question...)
>
> I have a dataframe df.cfa
>
>
> df.cfa<-data.frame(x1=c(5,4,1,5,5,NA,4,NA,NA,5),x2=c(2,3,3,3,NA,1,2,1,2,1),x3=c(5,3,4,1,5,5,5,5,5,5),x4=c(5,3,4,5,5,5,5,5,5,5),x5=c(5,4,3,3,4,4,4,5,NA,5),x6=c(3,5,2,1,4,NA,NA,5,3,4),x7=c(5,4,3,4,4,3,4,NA,NA,4),x8=c(5,5,3,4,5,4,5,4,5,5),x9=c(5,1,1,1,NA,3,3,2,2,3),x10=c(5,5,2,3,4,3,1,3,2,5),x11=c(5,4,4,5,5,4,5,4,5,5),x12=c(5,4,5,5,5,4,5,5,5,5),x13=c(5,4,4,5,5,5,5,5,5,5),x14=c(5,3,3,NA,5,4,5,3,NA,5),x15=c(4,4,4,3,4,5,2,4,5,5),x16=c(5,4,3,3,4,4,3,3,4,5))
>
>
> and I want to carry out a confirmatory factor analysis using the "cfa"
> function (lavaan).
>
> library(lavaan)
>
> then I specify my model
>
> cfa.model<-'  f1=~x1+x2+x3+x4+x5
> f2=~x6+x7+x8+x9+x10+x11
> f3=~x12+x13+x14+x15+x16 '
>
> and run the cfa routine
>
> fit<-cfa(cfa.model,data=df.cfa,missing="ml")
>
> The output is an error message (here partly in german)
>
> "Error in chol.default(S) : der führende Minor der Ordnung 4 ist nicht
> positiv definit
> Fehler in Sample(data = data, group = group, sample.cov = sample.cov,
> sample.mean = sample.mean,  : sample covariance can not be inverted"
>
> Then I tried to "lavaanify" my model specification first
>
> cfa.model<- lavaanify(cfa.model)
>
>id lhs op rhs user group free ustart fixed.x   label eq.id free.uncon
> 1   1  f1 =~  x11 11 NA   0  f1=~x1 0  1
> 2   2  f1 =~  x21 12 NA   0  f1=~x2 0  2
> 3   3  f1 =~  x31 13 NA   0  f1=~x3 0  3
> 4   4  f1 =~  x41 14 NA   0  f1=~x4 0  4
> 5   5  f1 =~  x51 15 NA   0  f1=~x5 0  5
> 6   6  f2 =~  x61 16 NA   0  f2=~x6 0  6
> 7   7  f2 =~  x71 17 NA   0  f2=~x7 0  7
> 8   8  f2 =~  x81 18 NA   0  f2=~x8 0  8
> 9   9  f2 =~  x91 19 NA   0  f2=~x9 0  9
> 10 10  f2 =~ x101 1   10 NA   0 f2=~x10 0 10
> 11 11  f2 =~ x111 1   11 NA   0 f2=~x11 0 11
> 12 12  f3 =~ x121 1   12 NA   0 f3=~x12 0 12
> 13 13  f3 =~ x131 1   13 NA   0 f3=~x13 0 13
> 14 14  f3 =~ x141 1   14 NA   0 f3=~x14 0 14
> 15 15  f3 =~ x151 1   15 NA   0 f3=~x15 0 15
> 16 16  f3 =~ x161 1   16 NA   0 f3=~x16 0 16
>
> I run cfa again
>
> fit<-cfa(cfa.model,data=df.cfa,missing="ml")
>
> And this time I get another error information
>
> "Error in start.idx[i]:end.idx[i] : NA/NaN Argument"
>
> I must admit that I am stuck, can anyone help?
>
> Thanks, Alain
>[[alternative HTML version deleted]]
>
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] newbie: fourier series for time series data

2011-05-31 Thread Mike Marchywka

( hotmail won't mark text so I'm top posting... )
 
Can you post the data? Personally I'd just plot abs(fft(x))
and see what you see, as well as looking at Im(fft(x))/Re(fft(x))
or phase spectrum. Now, presumably you are nominally looking
for something with a period of 1 year, that part could
be expressed in harmonic specrum as suggested below, but you'd
also be looking for trends and noises of various types- additive
gausssian, amplitude modulation, maybe even frequncy modulation, etc.
I guess you could remove a few power terms ( average, linear, etc)
just to simplify ( often you get this spectrum with huge uniformative
DC or zero frequency component just gets in the way). You can probably
find book online dealing with signal processing and RF ( this is where
I'm used to seeing this things). It would of course be helpful to 
then examine known simple cases and see if you can tell them apart.
Create fft( sin(t)*(1+a*sin(epsilon*t)+b*t ) for example. 


I guess if you want to look at writing a model, you could look at
phase portrait ( plot derviative versus value ) to again get some idea what 
you may have that makes sense as model to fit.




Date: Tue, 31 May 2011 10:35:16 -0700
From: spencer.gra...@structuremonitoring.com
To: eddie...@gmail.com
CC: r-help@r-project.org
Subject: Re: [R] newbie: fourier series for time series data


On 5/31/2011 5:12 AM, eddie smith wrote:
> Hi Guys,
>
> I had a monthly time series's data of land temperature from 1980 to 2008.
> After plotting a scatter diagram, it seems that annually, there is a semi
> sinusoidal cycle. How do I run Fourier's series to the data so that I can
> fit model on it?

There are several methods.


1. The simplest would be to select the number of terms you
want, put the data into a data.frame, and use lm(y ~ sin(t/period) +
cos(t/period) + sin(2*t/period) + cos(2*t/period) + ..., data),
including as many terms as you want in the series. This is not
recommended, because it ignores the time series effects and does not
apply a smoothness penalty to the Fourier approximation.


2. A second is to use the 'fda' package. Examples are
provided (even indexed) in Ramsay, Hooker and Graves (2009) Functional
Data Analysis with R and Matlab (Springer). This is probably what
Ramsay and Hooker would do, but I wouldn't, because it doesn't treat the
time series as a time series. It also requires more work on your part.


3. A third general class of approaches uses Kalman
filtering, also called dynamic linear models or state space models.
This would allow you to estimate a differential equation model, whose
solution could be a damped sinusoid. It would also allow you to
estimate regression coefficients of a finite Fourier series but without
the smoothness penalty you would get with 'fda'. For this, I recommend
the 'dlm' package with its vignette and companion book, Petris, Petrone
and Campagnoli (2009) Dynamic Linear Models with R (Springer).


If you want something quick and dirty, you might want option 1.
For that, I might use option 2, because I know and understand it
moderately well (being third author on the book). However, if you
really want to understand time series, I recommend option 3. That has
the additional advantage that I think it would have the greatest chances
of acceptance in a refereed academic journal of the three approaches.


> I am really sorry for my question sound stupid, but I just don't know where
> to start.

There also are specialized email lists that you might consider
for a future post. Go to www.r-project.org 
-> "Mailing Lists". In particular, you might be most interested in
R-sig-ecology.


Hope this helps.
Spencer Graves

> I am desperately looking for help from you guys.
>
> Thanks in advance.
>
> Eddie
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.


--
Spencer Graves, PE, PhD
President and Chief Operating Officer
Structure Inspection and Monitoring, Inc.
751 Emerson Ct.
San José, CA 95126
ph: 408-655-4567


[[alternative HTML version deleted]]


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.  
  
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Compiling C-code in Windows

2011-05-31 Thread Mike Marchywka





> Date: Tue, 31 May 2011 11:37:56 -0400
> From: murdoch.dun...@gmail.com
> To: tom.osb...@iinet.net.au
> CC: r-h...@stat.math.ethz.ch; pmi...@ff.uns.ac.rs

>
> On 31/05/2011 7:50 AM, Tom Osborn wrote:
> > You could use cygwin's cc/gcc, or the Watcom opensource compiler.
>
> Neither of those is supported. Use what the R Admin manual suggests, or
> you're on your own.

I have had fairly good luck with cygwin or mingw but you may need to
have a few conditionals etc. Not sure what R suggests but cygwin should
work.


>
> Duncan Murdoch
>
> > [Watcom used to be a commercial compiler which ceased and has
> > become an open project].
> >
> > But listen to people who've experienced the options. [Not I].
> >
> > Cheers, Tom.
> > - Original Message -
> > From: Petar Milin
> > To: R-HELP
> > Sent: Tuesday, May 31, 2011 9:43 PM

> >
> >
> > Hello ALL!
> > I am an Linux user (Debian testing i386), with very dusty
> > Win-experience. Nevertheless, my colleagues and I are making some
> > package in R, and I built C-routines to speed up things.
> > I followed instruction how to compile C for R (very useful link:
> > http://www.stat.lsa.umich.edu/~yizwang/software/maxLinear/noteonR.html,
> > and links listed there). Everything works like a charm in Linux. I have
> > *.so and wrapper function from R is doing a right call.
> > However, I wanted to make *.dll library for Win-users. Now, I used my
> > colleague's computer with Win XP on it, and with the latest R. In MS-DOS
> > console, I positioned prompt in 'C;\Program Files\R\R-2.13.0\bin\i386\',
> > and then I run: 'R CMD SHLIB C:\trial.c'. However, nothing happened, no
> > trial.dll, nothing. Then, I tried with: 'R CMD SHLIB --output=trial.dll
> > C:\trial.c', but no luck, again.
> > Please, can anyone help me with this? Can I use: 'R CMD SHLIB
> > --output=trial.dll C:\trial.c' under Linux, and expecting working DLL?
> >
> > Best,
> > PM
> >
> > __
> > R-help@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
> > [[alternative HTML version deleted]]
> >
> > __
> > R-help@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>   
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Value of 'pi'

2011-05-30 Thread Mike Marchywka












> Date: Sun, 29 May 2011 23:09:47 -0700
> From: jwiley.ps...@gmail.com
> To: vincy_p...@yahoo.ca
> CC: r-help@r-project.org
> Subject: Re: [R] Value of 'pi'
>
> Dear Vincy,
>
> I hope that in school you also learned that 22/7 is an approximation.
> Please consult your local mathematician for a proof that pi != 22/7.
> A quick search will provide you with volumes of information on what pi
> is, how it may be calculated, and calculations out to thousands of
> digits.
>
> Cheers,
>
> Josh
>
> On Sun, May 29, 2011 at 11:01 PM, Vincy Pyne wrote:
> > Dear R helpers,
> >
> > I have one basic doubt about the value of pi. In school, we have learned 
> > that
> >
> > pi = 22/7 (which is = 3.142857). However, if I type pi in R, I get pi = 
> > 3.141593. So which value of pi should be considered?

You could do this if you trust your trig functions since
that is presumably what you want a value of pi for,

> atan(1)
[1] 0.7853982
> atan(1)*4
[1] 3.141593

> atan(1)*4*7-22
[1] -0.008851425

> atan(1)*4-pi
[1] 0





> >
> > Regards
> >
> > Vincy
> >
> >
> >
> >
> >
> > [[alternative HTML version deleted]]
> >
> > __
> > R-help@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
>
>
> --
> Joshua Wiley
> Ph.D. Student, Health Psychology
> University of California, Los Angeles
> http://www.joshuawiley.com/
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>   
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Nested design

2011-05-28 Thread Mike Marchywka












> Date: Sat, 28 May 2011 09:33:03 -0700
> From: jwiley.ps...@gmail.com
> To: bjorn.robr...@gmail.com
> CC: r-help@r-project.org
> Subject: Re: [R] Nested design
>
> Hi,
>
> If you are not asking for stats help, then do you understand the model
> and are just confused by how R labels it? We can help match R's
> labels to the ones you are used to, if you tell us what you are used
> to.


I would not suggest as a rule to use a tool to validate itself but
you can use R to make sure your interpretation of other R output is right by 
giving 
contrived datasets to the analysis package and see what you get back.
Comparison can be to examples from text book or your own paper and pencil
analysis.  This is also a good way to learn things from basic terms to
things like sign or unit conventions in different fields etc. 

You can generate samples from normal distro and feed that to
the questionable package to see what comes back. 


>
> Cheers,
>
> Josh
>
> On Sat, May 28, 2011 at 6:54 AM, unpeatable  wrote:
> > Dear Dennis,
> > In my opinion I am not at all asking for any stats help, just a question how
> > to read this output.
> > Thanks, Bjorn
> >
> > -
> > Dr. Bjorn JM Robroek
> > Ecology and Biodiversity Group
> > Institute of Environmental Biology, Utrecht University
> > Padualaan 8, 3584 CH Utrecht, The Netherlands
> > Email address: b.j.m.robr...@uu.nl
> > http://www.researcherid.com/rid/C-4379-2008
> > Tel: +31-(0)30-253 6091
> > --
> > View this message in context: 
> > http://r.789695.n4.nabble.com/Nested-design-tp3557404p3557472.html
> > Sent from the R help mailing list archive at Nabble.com.
> >
> > __
> > R-help@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
>
>
> --
> Joshua Wiley
> Ph.D. Student, Health Psychology
> University of California, Los Angeles
> http://www.joshuawiley.com/
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
  
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] predictive accuracy

2011-05-26 Thread Mike Marchywka







> Date: Thu, 26 May 2011 13:50:15 -0700
> From: gunter.ber...@gene.com
> To: ahmed.el-taht...@pfizer.com
> CC: r-help@r-project.org
> Subject: Re: [R] predictive accuracy
>
> 1. This is not about R, and should be taken off list.

Well, depending on what mod's think a little bit of
generic "how do I REALLY use this tool" discusssion may be of 
benefit for all here- a maillist for a certain brand of hammer
may discuss various uses and types of nails etc.

Pesonally I have an interest in this-if the OP will 
post the data it may be possible to explore some analysis
options. 


>
> 2. You are wading in an alligator infested swamp. Get help from
> (other) statisticians at Pfizer (there are many good ones there).

I thought that is what statisticians do? LOL. 
We don't know the situation- intern, looking for outside ideas after
exhausting internals, specific issues with internal peers,
summer student not wishing to bother everyone there for details etc. 

>
> Best,
> Bert
>
> P.S. The answer to all your questions is "no" (imho).



>
>
>
> On Thu, May 26, 2011 at 1:35 PM, El-Tahtawy, Ahmed
>  wrote:
> > The strong predictor is the country/region where the study was
> > conducted. So it is not important/useful for a clinician to use it (as
> > long he/she is in USA or Europe).
> > Excluding that predictor will make another 2 insignificant predictors to
> > become significant!!  Can the new model have a reliable predictive
> > accuracy? I thought of excluding all patients from other countries and
> > develop the model accordingly- is the exclusion of a lot of patients and
> > compromise of the power is more acceptable??

LOL, quite the contrary, post hoc selection increases power to find
whatever you or sponsor desire... 

Presuming your general interest is in finding out attributes of a given
drug under various conditions, you would probably want to combine 
the observations with tentative thoughts on causality and see
what makes the best story. 

Statistical significance in isolation is a function of the data and analysis 
method,
doesn't really have anything specific to do with underlying systems.

In this case, if you have other continuous prognostic factors, say 
age, LDH, hemoglobin come to mind, you may be able to find that you
have nonmonotinc  relations between prognostic factor and outcome.
But, furhter,say you have enough patients that you could in fact map
dose response curves. It may turn out that this curve is in fact non-montonic
with parameters non-monotonic in prognsotic factor. Consider 

avg_survival= a+b*d-c*d^2

where d is the dose. At for small d, it seems to help but for larger dose it 
makes things worse. Now consider that "c" is a complicated function
of hematocrit, it may not be hard to imagine that anemics and siderositic( is 
that a word LOL?) have some underlying problems dealing with your drug. 
These may be distributed geographically etc etc etc.

This is all stuff you can simulate in R or even on paper. 


It sounds like you are already trying to write a label, which may
be a bit premature ( although I defer to the guy from DNA for that LOL).
" indicated for use in patients in Western Hemisphere with  " 

You may have decent luck looking at FDA panel discussion transcripts, search for
related general stats terms confined to "site:fda.gov"


> > Thanks for your help...
> > Al
> >
> > -Original Message-
> > From: Marc Schwartz [mailto:marc_schwa...@me.com]
> > Sent: Thursday, May 26, 2011 10:54 AM
> > To: El-Tahtawy, Ahmed
> > Cc: r-help@r-project.org
> > Subject: Re: [R] predictive accuracy
> >
> >
> > On May 26, 2011, at 7:42 AM, El-Tahtawy, Ahmed wrote:
> >
> >> I am trying to develop a prognostic model using logistic regression.
> > I
> >> built a full , approximate models with the use of penalization -
> > design
> >> package. Also, I tried Chi-square criteria, step-down techniques. Used
> >> BS for model validation.
> >>
> >> > The main purpose is to develop a predictive model for future patient
> >> population.   One of the strong predictor pertains to the study design
> >> and would not mean much for a clinician/investigator in real clinical
> >> situation and have been asked to remove it.
> >> > Can I propose a model and nomogram without that strong -irrelevant
> >> predictor?? If yes, do I need to redo model calibration,
> > discrimination,
> >> validation, etc...?? or just have 5 predictors instead of 6 in the
> >> prognostic model??
> >>
> >>
> >>
> >> Thanks for your help
> >>
> >> Al
> >
> >
> > Is it that the study design characteristic would not make sense to a
> > clinician but is relevant to future samples, or that the study design
> > characteristic is unique to the sample upon which the model was
> > developed and is not relevant to future samples because they will not be
> > in the same or a similar study?
> >
> > Is the study design characteristic a surrogate for other factors that
> > would be relevant t

Re: [R] Processing large datasets

2011-05-25 Thread Mike Marchywka









> Date: Wed, 25 May 2011 12:32:37 -0400
> Subject: Re: [R] Processing large datasets
> From: mailinglist.honey...@gmail.com
> To: marchy...@hotmail.com
> CC: ro...@bestroman.com; r-help@r-project.org
>
> Hi,
>
> On Wed, May 25, 2011 at 11:00 AM, Mike Marchywka  wrote:
> [snip]
> >> > If your datasets are *really* huge, check out some packages listed
> >> > under the "Large memory and out-of-memory data" section of the
> >> > "HighPerformanceComputing" task view at CRAN:
> >>
> >> > http://cran.r-project.org/web/views/HighPerformanceComputing.html
> >
> > Does this have any specific limitations ? It sounds offhand like it
> > does paging and all the needed buffering for arbitrary size
> > data. Does it work with everything?
>
> I'm not sure what limitations ... I know the bigmemory (and ff)
> packages try hard to make using out-of-memory datasets as
> "transparent" as possible.
>
> That having been said, I guess you will have to port "more advanced"
> methods to use such packages, hence the existence of the biglm,
> biganalytics, bigtabulate packages do.
>
> > I seem to recall bigmemory came up
> > before in this context and there was some problem.
>
> Well -- I don't often see emails on this list complaining about their
> functionality. That doesn't mean they're flawless (I also don't
> scrutinize the list traffic too closely). It could be that not too
> many people use them, or that people give up before they come knocking
> when there is a problem.
>
> Has something specifically failed for you in the past, or?

No, I haven't tried. I may have it confused with something else.
But this question does come up a bit usually related to 
" I tried to read huge file into data frame and wanted to pass
it to something with predictable memory access patterns and it
ran out of memory. What can I do?" I guess I also stopped reading
anything after " using a DB" as this is generally not a replacement
for a data strcuture. I'll take a look when I have a big dataset that
I can't condense easily. 






>
> -steve
>
> --
> Steve Lianoglou
> Graduate Student: Computational Systems Biology
>  | Memorial Sloan-Kettering Cancer Center
>  | Weill Medical College of Cornell University
> Contact Info: http://cbio.mskcc.org/~lianos/contact
  
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Processing large datasets

2011-05-25 Thread Mike Marchywka




> Date: Wed, 25 May 2011 10:18:48 -0400
> From: ro...@bestroman.com
> To: mailinglist.honey...@gmail.com
> CC: r-help@r-project.org
> Subject: Re: [R] Processing large datasets
>
> > Hi,
> > If your datasets are *really* huge, check out some packages listed
> > under the "Large memory and out-of-memory data" section of the
> > "HighPerformanceComputing" task view at CRAN:
>
> > http://cran.r-project.org/web/views/HighPerformanceComputing.html

Does this have any specific limitations ? It sounds offhand like it
does paging and all the needed buffering for arbitrary size
data. Does it work with everything? I seem to recall bigmemory came up
before in this context and there was some problem.

Thanks.



>
> > Also, if you find yourself needing to do lots of
> > "grouping/summarizing" type of calculations over large data
> > frame-like objects, you might want to check out the data.table package:
>
> > http://cran.r-project.org/web/packages/data.table/index.html
>
> > --
> > Steve Lianoglou
> > Graduate Student: Computational Systems Biology
> > | Memorial Sloan-Kettering Cancer Center
> > | Weill Medical College of Cornell University
> > Contact Info: http://cbio.mskcc.org/~lianos/contact
>
> I don't think data.table is fundamentally different from data.frame type, but 
> thanks for the suggestion.
>
> http://cran.r-project.org/web/packages/data.table/vignettes/datatable-intro.pdf
> "Just like data.frames, data.tables must fit inside RAM"
>
> The ff package by Adler, listed in "Large memory and out-of-memory data" is 
> probably most interesting.
>
> --Roman Naumenko
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
  
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Processing large datasets/ non answer but Q on writing data frame derivative.

2011-05-25 Thread Mike Marchywka






> Date: Wed, 25 May 2011 09:49:00 -0400
> From: ro...@bestroman.com
> To: biomathjda...@gmail.com
> CC: r-help@r-project.org
> Subject: Re: [R] Processing large datasets
>
> Thanks Jonathan.
>
> I'm already using RMySQL to load data for couple of days.
> I wanted to know what are the relevant R capabilities if I want to process 
> much bigger tables.
>
> R always reads the whole set into memory and this might be a limitation in 
> case of big tables, correct?

ok, now I ask, perhaps for my first R effort I will try to find source code for
data frame and make a paging or streaming derivative. That is, at least for
fixed size things, it can supply things like number of total rows but
has facilities for paging in and out of memory. Presumably all users of data
frame have to work through a limited interface which I guess could be 
expanded with various hints on " prefetch this" for example. I haven't looked
at this idea in a while but the issue keeps coming up, dev list maybe?

Anyway, for your immediate issues with a few statistics you could
probably write a simple c++ program that ultimately becomes part of
an R package. It is a good idea to see what is available but these
questions come up here a lot and the normal suggestion is "DB" which
is exactly the opposite of what you want if you have predictable
access patterns ( although even here prefetch could probably be implemented).






> Doesn't it use temporary files or something similar to deal such amount of 
> data?
>
> As an example I know that SAS handles sas7bdat files up to 1TB on a box with 
> 76GB memory, without noticeable issues.
>
> --Roman
>
> - Original Message -
>
> > In cases where I have to parse through large datasets that will not
> > fit into R's memory, I will grab relevant data using SQL and then
> > analyze said data using R. There are several packages designed to do
> > this, like [1] and [2] below, that allow you to query a database
> > using
> > SQL and end up with that data in an R data.frame.
>
> > [1] http://cran.cnr.berkeley.edu/web/packages/RMySQL/index.html
> > [2] http://cran.cnr.berkeley.edu/web/packages/RSQLite/index.html
>
> > On Wed, May 25, 2011 at 12:29 AM, Roman Naumenko
> >  wrote:
> > > Hi R list,
> > >
> > > I'm new to R software, so I'd like to ask about it is capabilities.
> > > What I'm looking to do is to run some statistical tests on quite
> > > big
> > > tables which are aggregated quotes from a market feed.
> > >
> > > This is a typical set of data.
> > > Each day contains millions of records (up to 10 non filtered).
> > >
> > > 2011-05-24 750 Bid DELL 14130770 400
> > > 15.4800 BATS 35482391 Y 1 1 0 0
> > > 2011-05-24 904 Bid DELL 14130772 300
> > > 15.4800 BATS 35482391 Y 1 0 0 0
> > > 2011-05-24 904 Bid DELL 14130773 135
> > > 15.4800 BATS 35482391 Y 1 0 0 0
> > >
> > > I'll need to filter it out first based on some criteria.
> > > Since I keep it mysql database, it can be done through by query.
> > > Not
> > > super efficient, checked it already.
> > >
> > > Then I need to aggregate dataset into different time frames (time
> > > is
> > > represented in ms from midnight, like 35482391).
> > > Again, can be done through a databases query, not sure what gonna
> > > be faster.
> > > Aggregated tables going to be much smaller, like thousands rows per
> > > observation day.
> > >
> > > Then calculate basic statistic: mean, standard deviation, sums etc.
> > > After stats are calculated, I need to perform some statistical
> > > hypothesis tests.
> > >
> > > So, my question is: what tool faster for data aggregation and
> > > filtration
> > > on big datasets: mysql or R?
> > >
> > > Thanks,
> > > --Roman N.
> > >
> > > [[alternative HTML version deleted]]
> > >
> > > __
> > > R-help@r-project.org mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide
> > > http://www.R-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible code.
> > >
>
> > --
> > ===
> > Jon Daily
> > Technician
> > ===
> > #!/usr/bin/env outside
> > # It's great, trust me.
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
  
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] RGL package installation problem on Centos

2011-05-23 Thread Mike Marchywka












Date: Mon, 23 May 2011 14:43:59 +0100
From: arraystrugg...@gmail.com
To: r-help@r-project.org
Subject: [R] RGL package installation problem on Centos


Dear R users,
I have installed the latest version of R from source on Centos (using
configure and make install).
This seemed to work fine, with no Errors reported and R at the command line
starts R.

However, if I try and installed the package rgl using;
install.packages("rgl")
I get the following error;

installing to /usr/local/lib64/R/library/rgl/libs
** R
** demo
** inst
** preparing package for lazy loading
** help
*** installing help indices
** building package indices ...
** testing if installed package can be loaded
 *** caught segfault ***
address (nil), cause 'memory not mapped'

I just did install of R from source built with various
options to support Rapache and tried to load rgl.
First it complained no display so I went back
to bash and did "export DISPLAY=:0" and it seemed
to load ok. Do you have X running and a display set?
Not sure what happens if you have R without X11 support
for example. 

I probably installed with dep=TRUE and only cygwin I do recall
some issues with missing dependencies.  Try setting dependencies to
true and see if that helps.











aborting ...
sh: line 1: 23732 Segmentation fault  '/usr/local/lib64/R/bin/R'
--no-save --slave < /tmp/RtmpkvIjOb/file6d97876
ERROR: loading failed
* removing â/usr/local/lib64/R/library/rglâ
The downloaded packages are in
â/tmp/Rtmp5OaGuQ/downloaded_packagesâ
Updating HTML index of packages in '.Library'
Making packages.html  ... done
Warning message:
In install.packages("rgl") :
  installation of package 'rgl' had non-zero exit status
I read that Open GL header files have to be present and are in
/usr/include/GL.
I also read about different graphics cards causing problems but I don't know
how to find this info out.

Any help appreciated and full error message included below.

Thanks,

> sessionInfo()
R version 2.13.0 (2011-04-13)
Platform: x86_64-unknown-linux-gnu (64-bit)
locale:
 [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=C  LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
 [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

 full error ##
> install.packages("rgl")
--- Please select a CRAN mirror for use in this session ---
Loading Tcl/Tk interface ... done
trying URL 'http://cran.ma.imperial.ac.uk/src/contrib/rgl_0.92.798.tar.gz'
Content type 'application/x-gzip' length 162 bytes (1.6 Mb)
opened URL
==
downloaded 1.6 Mb
* installing *source* package ârglâ ...
checking for gcc... gcc -std=gnu99
checking for C compiler default output file name... a.out
checking whether the C compiler works... yes
checking whether we are cross compiling... no
checking for suffix of executables...
checking for suffix of object files... o
checking whether we are using the GNU C compiler... yes
checking whether gcc -std=gnu99 accepts -g... yes
checking for gcc -std=gnu99 option to accept ISO C89... none needed
checking how to run the C preprocessor... gcc -std=gnu99 -E
checking for gcc... (cached) gcc -std=gnu99
checking whether we are using the GNU C compiler... (cached) yes
checking whether gcc -std=gnu99 accepts -g... (cached) yes
checking for gcc -std=gnu99 option to accept ISO C89... (cached) none needed
checking for libpng-config... yes
configure: using libpng-config
configure: using libpng dynamic linkage
checking for X... libraries , headers
checking GL/gl.h usability... yes
checking GL/gl.h presence... yes
checking for GL/gl.h... yes
checking GL/glu.h usability... yes
checking GL/glu.h presence... yes
checking for GL/glu.h... yes
checking for glEnd in -lGL... yes
checking for gluProject in -lGLU... yes
checking for freetype-config... yes
configure: using Freetype and FTGL
configure: creating ./config.status
config.status: creating src/Makevars
** libs
g++ -I/usr/local/lib64/R/include -DHAVE_PNG_H -I/usr/include/libpng12
-DHAVE_FREETYPE -Iext/ftgl -I/usr/include/freetype2 -Iext
-I/usr/local/include   -g -O2 -fpic  -g -O2 -c BBoxDeco.cpp -o BBoxDeco.o
g++ -I/usr/local/lib64/R/include -DHAVE_PNG_H -I/usr/include/libpng12
-DHAVE_FREETYPE -Iext/ftgl -I/usr/include/freetype2 -Iext
-I/usr/local/include   -g -O2 -fpic  -g -O2 -c Background.cpp -o
Background.o
g++ -I/usr/local/lib64/R/include -DHAVE_PNG_H -I/usr/include/libpng12
-DHAVE_FREETYPE -Iext/ftgl -I/usr/include/freetype2 -Iext
-I/usr/local/include   -g -O2 -fpic  -g -O2 -c Color.cpp -o Color.o
g++ -I/usr/local/lib64/R/include -DHAVE_PNG_H -I/usr/include/libpng12
-DHAVE_FREETYPE -Iext/ftgl -I/usr/include/freetype2 -Iext
-I/usr/local/include   -g -O2 -fpic 

[R] so current status of interactive PDF creation isl? trying to explore econometriic convergence thread

2011-05-23 Thread Mike Marchywka


I think I just saw a thread go by on " how do I make interactive PDF from
rgl output" but I ignored it at the time and can't seem to find consensus result
on google. Essentially I wanted to create pdf or other output to 
share a 3D plot and let viewers interact with it but it still wasn't
clear on easiest way to do this. 
Is this 2009 page the best reference?

http://cran.r-project.org/web/views/Graphics.html

Thanks.
Specific case of interest is below, 

I found rgl to be quite useful in regards to this,

http://r.789695.n4.nabble.com/maximum-likelihood-convergence-reproducing-Anderson-Blundell-1982-Econometrica-R-vs-Stata-td3502516.html#a3512807


I generated some data points using this script
( which itself uses this data file http://98.129.232.234/temp/share/em.txt )

http://98.129.232.234/temp/share/em.R.txt

After some editing with sed, the output of the above script
made this data file showing some optimization's
"trjectory" according to my variables of interest,

http://98.129.232.234/temp/share/emx.dat.txt

df<-read.table("emx.dat",header=F,sep=" ")




I subsequently found a few ways to plot the data,
notably,

png("em.png")

plot(log(df$V1),df$V2)

png.off()



http://98.129.232.234/temp/share/em.png



png("em2.png")

plot(log(df$V1),df$V2,cex=1+.2*(log(df$V3)-min(log(df$V3

dev.off()


http://98.129.232.234/temp/share/em2.png


But what I'd like to do is publish an interactive 3D
plot of this data similar to the rgl output
of this,

df<-read.table("emx.dat",header=F,sep=" ")
library(rgl)
rgl.points(log(df$V1),df$V2,log(df$V3))

quick google search and ?rgl didn't seem
to provide immediate answer.

Is there a way to publish interactive plots?
Thanks.








  
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] maximum likelihood convergence reproducing Anderson Blundell 1982 Econometrica R vs Stata

2011-05-22 Thread Mike Marchywka
hine$double.eps
> > n <- length(x)
> > h0 <- g <- rep(0, n)
> > for (i in 1:n) {
> > h0[i] <- h * 1i
> > g[i] <- Im(fn(x+h0, ...))/h
> > h0[i] <- 0
> > }
> > g
> > }
> >
> > gr.csd <- function(theta,y1, y2, x1, x2, x3) {
> > csd(lnl, theta, y1=y1, y2=y2, x1=x1, x2=x2, x3=x3)
> > }
> >
> > # exact solution as the starting value
> > start <- c(coef(lm(y1~x1+x2+x3)), coef(lm(y2~x1+x2+x3)))
> > p1 <- optimx(start, lnl, y1=y1, y2=y2, x1=x1, x2=x2, x3=x3, 
> > control=list(all.methods=TRUE, maxit=1500))
> >
> > # numerical gradient
> > p2 <- optimx(rep(0,8), lnl, y1=y1, y2=y2, x1=x1, x2=x2, x3=x3, 
> > control=list(all.methods=TRUE, maxit=1500))
> >
> > # exact gradient
> > p2g <- optimx(rep(0,8), lnl, gr.csd, y1=y1, y2=y2, x1=x1, x2=x2, x3=x3, 
> > control=list(all.methods=TRUE, maxit=1500))
> >
> > # comparing p2 and p2g, we see the dramatic improvement in BFGS when exact 
> > gradient is used, we also see a major difference for L-BFGS-B
> > # Exact gradient did not affect the gradient methods, CG and spg, much. 
> > However, convergence of Rcgmin improved when exact gradient was used
> >
> > x1s <- scale(x1)
> > x2s <- scale(x2)
> > x3s <- scale(x3)
> >
> > p3 <- optimx(rep(0,8),lnl, y1=y1, y2=y2, x1=x1s, x2=x2s, x3=x3s, 
> > control=list(all.methods=TRUE, maxit=1500))
> >
> > # both scaling and exact gradient
> > p3g <- optimx(rep(0,8),lnl, gr.csd, y1=y1, y2=y2, x1=x1s, x2=x2s, x3=x3s, 
> > control=list(all.methods=TRUE, maxit=1500))
> >
> > # Comparing p3 and p3g, use of exact gradient improved spg 
> > dramatically[[elided Hotmail spam]]
> >
> > # Of course, derivative-free methods newuoa, and Nelder-Mead are improved 
> > by scaling, but not by the availability of exact gradients. I don't know 
> > what is wrong with bobyqa in this example.
> >
> > In short, even with scaling and exact gradients, this optimization problem 
> > is recalcitrant.
> >
> > Best,
> > Ravi.
> >
> > 
> > From: Mike Marchywka [marchy...@hotmail.com]
> > Sent: Thursday, May 12, 2011 8:30 AM
> > To: Ravi Varadhan; pda...@gmail.com; alex.ols...@gmail.com
> > Cc: r-help@r-project.org
> > Subject: RE: [R] maximum likelihood convergence reproducing Anderson 
> > Blundell 1982 Econometrica R vs Stata
> >
> > So what was the final verdict on this discussion? I kind of
> > lost track if anyone has a minute to summarize and critique my summary 
> > below.
> >
> >
> > Apparently there were two issues, the comparison between R and Stata
> > was one issue and the "optimum" solution another. As I understand it,
> > there was some question about R numerical gradient calculation. This would
> > suggest some features of the function may be of interest to consider.
> >
> > The function to be optimized appears to be, as OP stated,
> > some function of residuals of two ( unrelated ) fits.
> > The residual vectors e1 and e2 are dotted in various combinations
> > creating a matrix whose determinant is (e1.e1)(e2.e2)-(e1.e2)^2 which
> > is the result to be minimized by choice of theta. Theta it seems is
> > an 8 component vector, 4 components determine e1 and the other 4 e2.
> > Presumably a unique solution would require that e1 and e2, both n-component 
> > vectors,
> > point in different directions or else both could become aribtarily large
> > while keeping the error signal at zero. For fixed magnitudes, colinearity
> > would reduce the "Error." The intent would appear to be to
> > keep the residuals distributed similarly in the two ( unrelated) fits.
> > I guess my question is,
> > " did anyone determine that there is a unique solution?" or
> > am I totally wrong here ( I haven't used these myself to any
> > extent and just try to run some simple teaching examples, asking
> > for my own clarification as much as anything).
> >
> > Thanks.
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > 
> > > From: rvarad...@jhmi.edu
> > > To: pda...@gmail.com; alex.ols...@gmail.com
> > > Date: Sat, 7 May 2011 11:51:56 -0400
> > > CC: r-help@r-project.org
> > > Subject: Re: [R] maximum likelihood convergence reproducing Anderson 
> > > Blundell 1982 Econometrica R vs Stata
> > >
> > > There is something strange in this

Re: [R] maximum likelihood convergence reproducing Anderson Blundell 1982 Econometrica R vs Stata

2011-05-21 Thread Mike Marchywka
1=x1s, x2=x2s, x3=x3s, 
> control=list(all.methods=TRUE, maxit=1500))
>
> # both scaling and exact gradient
> p3g <- optimx(rep(0,8),lnl, gr.csd, y1=y1, y2=y2, x1=x1s, x2=x2s, x3=x3s, 
> control=list(all.methods=TRUE, maxit=1500))
>
> # Comparing p3 and p3g, use of exact gradient improved spg 
> dramatically[[elided Hotmail spam]]
>
> # Of course, derivative-free methods newuoa, and Nelder-Mead are improved by 
> scaling, but not by the availability of exact gradients. I don't know what is 
> wrong with bobyqa in this example.
>
> In short, even with scaling and exact gradients, this optimization problem is 
> recalcitrant.
>
> Best,
> Ravi.
>
> 
> From: Mike Marchywka [marchy...@hotmail.com]
> Sent: Thursday, May 12, 2011 8:30 AM
> To: Ravi Varadhan; pda...@gmail.com; alex.ols...@gmail.com
> Cc: r-help@r-project.org
> Subject: RE: [R] maximum likelihood convergence reproducing Anderson Blundell 
> 1982 Econometrica R vs Stata
>
> So what was the final verdict on this discussion? I kind of
> lost track if anyone has a minute to summarize and critique my summary below.
>
>
> Apparently there were two issues, the comparison between R and Stata
> was one issue and the "optimum" solution another. As I understand it,
> there was some question about R numerical gradient calculation. This would
> suggest some features of the function may be of interest to consider.
>
> The function to be optimized appears to be, as OP stated,
> some function of residuals of two ( unrelated ) fits.
> The residual vectors e1 and e2 are dotted in various combinations
> creating a matrix whose determinant is (e1.e1)(e2.e2)-(e1.e2)^2 which
> is the result to be minimized by choice of theta. Theta it seems is
> an 8 component vector, 4 components determine e1 and the other 4 e2.
> Presumably a unique solution would require that e1 and e2, both n-component 
> vectors,
> point in different directions or else both could become aribtarily large
> while keeping the error signal at zero. For fixed magnitudes, colinearity
> would reduce the "Error." The intent would appear to be to
> keep the residuals distributed similarly in the two ( unrelated) fits.
> I guess my question is,
> " did anyone determine that there is a unique solution?" or
> am I totally wrong here ( I haven't used these myself to any
> extent and just try to run some simple teaching examples, asking
> for my own clarification as much as anything).
>
> Thanks.
>
>
>
>
>
>
>
>
>
>
> 
> > From: rvarad...@jhmi.edu
> > To: pda...@gmail.com; alex.ols...@gmail.com
> > Date: Sat, 7 May 2011 11:51:56 -0400
> > CC: r-help@r-project.org
> > Subject: Re: [R] maximum likelihood convergence reproducing Anderson 
> > Blundell 1982 Econometrica R vs Stata
> >
> > There is something strange in this problem. I think the log-likelihood is 
> > incorrect. See the results below from "optimx". You can get much larger 
> > log-likelihood values than for the exact solution that Peter provided.
> >
> > ## model 18
> > lnl <- function(theta,y1, y2, x1, x2, x3) {
> > n <- length(y1)
> > beta <- theta[1:8]
> > e1 <- y1 - theta[1] - theta[2]*x1 - theta[3]*x2 - theta[4]*x3
> > e2 <- y2 - theta[5] - theta[6]*x1 - theta[7]*x2 - theta[8]*x3
> > e <- cbind(e1, e2)
> > sigma <- t(e)%*%e
> > logl <- -1*n/2*(2*(1+log(2*pi)) + log(det(sigma))) # it looks like there is 
> > something wrong here
> > return(-logl)
> > }
> >
> > data <- read.table("e:/computing/optimx_example.dat", header=TRUE, sep=",")
> >
> > attach(data)
> >
> > require(optimx)
> >
> > start <- c(coef(lm(y1~x1+x2+x3)), coef(lm(y2~x1+x2+x3)))
> >
> > # the warnings can be safely ignored in the "optimx" calls
> > p1 <- optimx(start, lnl, hessian=TRUE, y1=y1, y2=y2,
> > + x1=x1, x2=x2, x3=x3, control=list(all.methods=TRUE, maxit=1500))
> >
> > p2 <- optimx(rep(0,8), lnl, hessian=TRUE, y1=y1, y2=y2,
> > + x1=x1, x2=x2, x3=x3, control=list(all.methods=TRUE, maxit=1500))
> >
> > p3 <- optimx(rep(0.5,8), lnl, hessian=TRUE, y1=y1, y2=y2,
> > + x1=x1, x2=x2, x3=x3, control=list(all.methods=TRUE, maxit=1500))
> >
> > Ravi.
> > 
> > From: r-help-boun...@r-project.org [r-help-boun...@r-project.org] On Behalf 
> > Of peter dalgaard [pda...@gmail.com]
> > Sent: Saturday, May 07, 2011 4:46 AM
> > To: Alex Olssen
> > Cc: r-help@r-project.org
> >

[R] predict 'expected' with eha package

2011-05-21 Thread Mike Harwood
I am unsure what is being returned, and what is supposed to be
returned, when using 'predict' with "type='expected'" for an aftreg
survival model.  The code below first generates a weibull  model, then
uses predict to create a vector of the linear predictors, then
attempts to create the 'expected' vector, which is empty.  The final
two steps in the code generate a lognormal model with the same data,
and the same empty 'expected' vector.

My expectation had been that 'expected' would generate the same
transformed dependent variable output as predict with a survreg model
using type='response'.  Since my 'real' data is left-truncated and
right-censored I cannot use survreg, and I wanted to investigate the
output from eha.

Thanks in advance!

Mike

> data(mort)
> aftreg(Surv(enter, exit, event) ~ ses, data = mort)
Call:
aftreg(formula = Surv(enter, exit, event) ~ ses, data = mort)

Covariate  W.mean  Coef Exp(Coef)  se(Coef)Wald p
ses
   lower0.416 0 1   (reference)
   upper0.584-0.348 0.706 0.089 0.000

log(scale)3.60336.698 0.065 0.000
log(shape)0.331 1.392 0.058 0.000

Events276
Total time at risk 17038
Max. log. likelihood  -1391.3
LR test statistic 16.1
Degrees of freedom1
Overall p-value   5.91578e-05
> m1 <- aftreg(Surv(enter, exit, event) ~ ses, data = mort)
> head(predict(m1, type='lp')) ## produces output
1 2 3 4 5 6
-0.347853  0.00 -0.347853  0.00  0.00  0.00
> head(predict(m1, type='expected')) ## is this correct?
numeric(0)
> m2 <- aftreg(Surv(enter, exit, event) ~ ses, dist='lognormal', data = mort)
> head(predict(m2, type='expected')) ## is this correct?
numeric(0)


from eha (the survival and rms packages are not an option for my
'real' question, since I have left-truncated right-censored data

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Help, please

2011-05-19 Thread Mike Marchywka








> From: dwinsem...@comcast.net
> To: julio.flo...@spss.com.mx
> Date: Thu, 19 May 2011 10:40:08 -0400
> CC: r-help@r-project.org
> Subject: Re: [R] Help, please
>
>
> On May 18, 2011, at 6:29 PM, Julio César Flores Castro wrote:
>
> > Hi,
> >
> > I am using R 2.10.1 and I have a doubt. Do you know how many cases
> > can R
> > handle?
>
> I was able to handle (meaning do Cox proportional hazards work with
> the 'rms' package which adds extra memory overhead with a datadist
> object) a 5.5 million rows by 100 columns dataframe without
> difficulty using 24 GB on a Mac (BSD UNIX kernel). I was running into
> performance slow downs related to paging out to virtual memory at 150
> columns, but after expanding to 32 GB can now handle 5.5 MM records
> with 200 columns without paging.
>
> >
> > I want to use the library npmc but if I have more than 4,500 cases I
> > get an
> > error message. If I use less than 4500 cases I don´t have problems
> > with this
> > library.
> >
> > Is there any way to increase the number of cases in order to use this
> > library.
>
> 64 bit OS, 64 bit R, and more memory.
>
The longer term solution is implementation and algorithm designed to
increase coherence
of memory accesses ( firefox is doing this to me now dropping every few chars 
and
 getting 
many behind as it thrashes with memory leak, LOL).




  
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] text mining analysis and word visualization of pdfs

2011-05-19 Thread Mike Marchywka













Date: Wed, 18 May 2011 15:24:49 +0530
From: ashimkap...@gmail.com
To: k...@huftis.org
CC: r-h...@stat.math.ethz.ch
Subject: Re: [R] text mining analysis and word visualization of pdfs


On Wed, May 18, 2011 at 1:44 PM, Karl Ove Hufthammer wrote:

> Ajay Ohri wrote:
>
> > What is the appropriate software package for dumping say 20 PDFS in a
> > folder, then creating data visualization with frequency counts of
> > certain words as well as measure correlation within each file for
> > certain key relationships or key words.
>
> pdftotext + Unix™ for Poets + R (ggplot2)
>
> What about the tm package ? I am a beginner and I don't know much about
this but I recall that it does have the ability to handle PDF's. A few words
from the experts would be nice.

I don;t know if I'm an expert, I can't even get a browser that echo's
keystrokes in a reasonable time with 4 core CPU on 'dohs, but PDF
could mean just about anything in terms of how text is respresented. Whatever
R packages do, they will not be able to read the mind of the author.
Even with pdftotext, there are many options and even simple things like
US IRS instruction forms can be almost impossible to extract in a coherent
manner. Many authors could care less about the information as long as the
thing looks like paper copy. If you are stuck with PDF, I'd be looking
for more tools first as you will probably want to know how they are 
constrcuted. 

I would just reiterate that the best approach for many data analysts would
be to contact data source explaining problems with improperly authored PDF or
other specialized file format that are only supported by limited proprietary 
tools
or that obfuscate information of interest. 


  









  
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Powerful PC to run R

2011-05-13 Thread Mike Marchywka






> Date: Fri, 13 May 2011 12:38:51 +0200
> From: haenl...@escpeurope.eu
> To: r-help@r-project.org
> Subject: [R] Powerful PC to run R
>
> Dear all,
>
> I'm currently running R on my laptop -- a Lenovo Thinkpad X201 (Intel Core
> i7 CPU, M620, 2.67 Ghz, 8 GB RAM). The problem is that some of my
> calculations run for several days sometimes even weeks (mainly simulations
> over a large parameter space). Depending on the external conditions, my
> laptop sometimes shuts down due to overheating.
>
> I'm now thinking about buying a more powerful desktop PC or laptop. Can
> anybody advise me on the best configuration to run R as fast as possible? I
> will use this PC exclusively for R so any other factors are of limited
> importance.

( I think my laptop is overheating with firefox trying to execute whatever
stupid code hotmail is using  ssh to remote server echos keys faster LOL)
The point of the above is that it really depends what you
are doing. Heat can com from disk drive as well as silicon. Generally
you'd want to consider algorithm and implementation and get profiling info
before just buying bigger hammer. If you are thrashing VM, sorting
some data may help for example. If it is suited parallelism, you could
even try to distribute task over several cheaper computers, hard to knwo.


>
> Thanks,

  
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Power Spectrum from STFT (e1071)?

2011-05-13 Thread Mike Marchywka






> Date: Tue, 10 May 2011 11:55:34 -0700
> From: ajf...@psu.edu
> To: r-help@r-project.org
> Subject: [R] Power Spectrum from STFT (e1071)?
>
> Hello.
>
> Does anyone know how to generate a power spectrum from the STFT function in
> package e1071? The output for this function supplies the Fourier
> coefficients, but I do not know how to relate these back to the power
> spectrum.
>
> Alternatively, if anyone is aware of other packages/functions that perform
> short time Fourier transforms, that would also be helpful.

What exactly are you trying to do? From what I can recall, quick look on 
wikipedia
and I've done this in the past for audio envelope/carrier separation, 
the short-term modifier just means you need to window the input signal.
I did just do something like this in R, to deconvolve histograms with
an impulse response, and it is not hard to create your own window function
and multiply with signal. Power spectrum  is just magnitude, but you may want
to also invert and see that output is real (Re>> Im) to verify nothing went
wrong in processing. Probably this returns complex numbers, I think you
can use abs) or Re() and Im() on them and also str() but post some
code and try "?fft" for example.



>
> Thanks.
>

  
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] maximum likelihood convergence reproducing Anderson Blundell 1982 Econometrica R vs Stata

2011-05-12 Thread Mike Marchywka




So what was the final verdict on this discussion? I kind of 
lost track if anyone has a minute to summarize and critique my summary below.


Apparently there were two issues, the comparison between R and Stata
was one issue and the "optimum" solution another. As I understand it,
there was some question about R numerical gradient calculation. This would
suggest some features of the function may be of interest to consider. 

The function to be optimized appears to be, as OP stated, 
some function of residuals of two ( unrelated ) fits. 
The residual vectors e1 and e2 are dotted in various combinations
creating a matrix whose determinant is (e1.e1)(e2.e2)-(e1.e2)^2 which
is the result to be minimized by choice of theta. Theta it seems is
an 8 component vector, 4 components determine e1 and the other 4 e2.
Presumably a unique solution would require that e1 and e2, both n-component 
vectors,
 point in different directions or else both could become aribtarily large
while keeping the error signal at zero. For fixed magnitudes, colinearity
would reduce the "Error."  The intent would appear to be to 
keep the residuals distributed similarly in the two ( unrelated) fits. 
 I guess my question is,
" did anyone determine that there is a unique solution?" or
am I totally wrong here ( I haven't used these myself to any
extent and just try to run some simple teaching examples, asking
for my own clarification as much as anything).

Thanks.











> From: rvarad...@jhmi.edu
> To: pda...@gmail.com; alex.ols...@gmail.com
> Date: Sat, 7 May 2011 11:51:56 -0400
> CC: r-help@r-project.org
> Subject: Re: [R] maximum likelihood convergence reproducing Anderson Blundell 
> 1982 Econometrica R vs Stata
>
> There is something strange in this problem. I think the log-likelihood is 
> incorrect. See the results below from "optimx". You can get much larger 
> log-likelihood values than for the exact solution that Peter provided.
>
> ## model 18
> lnl <- function(theta,y1, y2, x1, x2, x3) {
> n <- length(y1)
> beta <- theta[1:8]
> e1 <- y1 - theta[1] - theta[2]*x1 - theta[3]*x2 - theta[4]*x3
> e2 <- y2 - theta[5] - theta[6]*x1 - theta[7]*x2 - theta[8]*x3
> e <- cbind(e1, e2)
> sigma <- t(e)%*%e
> logl <- -1*n/2*(2*(1+log(2*pi)) + log(det(sigma))) # it looks like there is 
> something wrong here
> return(-logl)
> }
>
> data <- read.table("e:/computing/optimx_example.dat", header=TRUE, sep=",")
>
> attach(data)
>
> require(optimx)
>
> start <- c(coef(lm(y1~x1+x2+x3)), coef(lm(y2~x1+x2+x3)))
>
> # the warnings can be safely ignored in the "optimx" calls
> p1 <- optimx(start, lnl, hessian=TRUE, y1=y1, y2=y2,
> + x1=x1, x2=x2, x3=x3, control=list(all.methods=TRUE, maxit=1500))
>
> p2 <- optimx(rep(0,8), lnl, hessian=TRUE, y1=y1, y2=y2,
> + x1=x1, x2=x2, x3=x3, control=list(all.methods=TRUE, maxit=1500))
>
> p3 <- optimx(rep(0.5,8), lnl, hessian=TRUE, y1=y1, y2=y2,
> + x1=x1, x2=x2, x3=x3, control=list(all.methods=TRUE, maxit=1500))
>
> Ravi.
> 
> From: r-help-boun...@r-project.org [r-help-boun...@r-project.org] On Behalf 
> Of peter dalgaard [pda...@gmail.com]
> Sent: Saturday, May 07, 2011 4:46 AM
> To: Alex Olssen
> Cc: r-help@r-project.org
> Subject: Re: [R] maximum likelihood convergence reproducing Anderson Blundell 
> 1982 Econometrica R vs Stata
>
> On May 6, 2011, at 14:29 , Alex Olssen wrote:
>
> > Dear R-help,
> >
> > I am trying to reproduce some results presented in a paper by Anderson
> > and Blundell in 1982 in Econometrica using R.
> > The estimation I want to reproduce concerns maximum likelihood
> > estimation of a singular equation system.
> > I can estimate the static model successfully in Stata but for the
> > dynamic models I have difficulty getting convergence.
> > My R program which uses the same likelihood function as in Stata has
> > convergence properties even for the static case.
> >
> > I have copied my R program and the data below. I realise the code
> > could be made more elegant - but it is short enough.
> >
> > Any ideas would be highly appreciated.
>
> Better starting values would help. In this case, almost too good values are 
> available:
>
> start <- c(coef(lm(y1~x1+x2+x3)), coef(lm(y2~x1+x2+x3)))
>
> which appears to be the _exact_ solution.
>
> Apart from that, it seems that the conjugate gradient methods have 
> difficulties with this likelihood, for some less than obvious reason. 
> Increasing the maxit gets you closer but still not satisfactory.
>
> I would suggest trying out the experimental optimx package. Apparently, some 
> of the algorithms in there are much better at handling this likelihood, 
> notably "nlm" and "nlminb".
>
>
>
>
> >
> > ## model 18
> > lnl <- function(theta,y1, y2, x1, x2, x3) {
> > n <- length(y1)
> > beta <- theta[1:8]
> > e1 <- y1 - theta[1] - theta[2]*x1 - theta[3]*x2 - theta[4]*x3
> > e2 <- y2 - theta[5] - theta[6]*x1 - theta[7]*x2 - theta[8]*x3
> > e <- cbind(e1, e2)
> > sigma <- t(e)

Re: [R] How to extract information from the following dataset?

2011-05-12 Thread Mike Marchywka













> Date: Thu, 12 May 2011 10:43:59 +0200
> From: jose-marcio.mart...@mines-paristech.fr
> To: xzhan...@ucr.edu
> CC: r-help@r-project.org
> Subject: Re: [R] How to extract information from the following dataset?
>
> Xin Zhang wrote:
> > Hi all,
> >
> > I have never worked with this kind of data before, so Please help me out
> > with it.
> > I have the following data set, in a csv file, looks like the following:
> >
> > Jan 27, 2010 16:01:24,000 125 - - -
> > Jan 27, 2010 16:06:24,000 125 - - -
> > Jan 27, 2010 16:11:24,000 176 - - -
> > Jan 27, 2010 16:16:25,000 159 - - -
> > Jan 27, 2010 16:21:25,000 142 - - -
> > Jan 27, 2010 16:26:24,000 142 - - -
> > Jan 27, 2010 16:31:24,000 125 - - -
> > Jan 27, 2010 16:36:24,000 125 - - -
> > Jan 27, 2010 16:41:24,000 125 - - -
> > Jan 27, 2010 16:46:24,000 125 - - -
> > Jan 27, 2010 16:51:24,000 125 - - -
> > Jan 27, 2010 16:56:24,000 125 - - -
> > Jan 27, 2010 17:01:24,000 157 - - -
> > Jan 27, 2010 17:06:24,000 172 - - -
> > Jan 27, 2010 17:11:25,000 142 - - -
> > Jan 27, 2010 17:16:24,000 125 - - -
> > Jan 27, 2010 17:21:24,000 125 - - -
> > Jan 27, 2010 17:26:24,000 125 - - -
> > Jan 27, 2010 17:31:24,000 125 - - -
> > Jan 27, 2010 17:36:24,000 125 - - -
> > Jan 27, 2010 17:41:24,000 125 - - -
> > Jan 27, 2010 17:46:24,000 125 - - -
> > Jan 27, 2010 17:51:24,000 125 - - -
> > ..
> >
> > The first few columns are month, day, year, time with OS3 accuracy. And the
> > last number is the measurement I need to extract.
> > I wonder if there is a easy way to just take out the measurements only from
> > a specific day and hour, i.e. if I want measurements from Jan 27 2010
> > 16:--:--
> > then I get 125,125,176,159,142,142,125,125,125,125,125,125.
> > Many thanks!!
>
> The easiest is in the shell, if you're using some flavour of unix :
>
> grep "Jan 27, 2010 16" filein.txt | awk '{print $5}' > fileout.txt
>
> and use fileout which will contain only the column of data you want.
>
Nomrally that is what I do but the R POSIXct features work pretty easily.
I guess I'd use bash text processing commands to put the data into a 
form you like, perhaps "y-mo-day time " and then read it in in as data frame.
Usually I convert everything to "time since epoch began" because I like integers
but there are some facilities here like "round" that work well with date-times.

> dx<-as.POSIXct("2011-04-03 13:14:15")
> dx
[1] "2011-04-03 13:14:15 CDT"
> round(dx,"hour")
[1] "2011-04-03 13:00:00 CDT"
> as.integer(dx)
[1] 1301854455
>

  
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] maximum likelihood convergence reproducing Anderson Blundell 1982 Econometrica R vs Stata

2011-05-09 Thread Mike Marchywka







> Date: Mon, 9 May 2011 22:06:38 +1200
> From: alex.ols...@gmail.com
> To: pda...@gmail.com
> CC: r-help@r-project.org; da...@otter-rsch.com
> Subject: Re: [R] maximum likelihood convergence reproducing Anderson Blundell 
> 1982 Econometrica R vs Stata
>
> Peter said
>
> "Ahem! You might get us interested in your problem, but not to the
> level that we are going to install Stata and Tsp and actually dig out
> and study the scientific paper you are talking about. Please cite the
> results and explain the differences."
>
> Apologies Peter, will do,
>
> The results which I can emulate in Stata but not (yet) in R are reported 
> below.

did you actually cut/paste code anywhere and is your first coefficient -.19 or 
-.019?
Presumably typos would be one possible problem. 

> They come from Econometrica Vol. 50, No. 6 (Nov., 1982), pp. 1569


is this it, page 1559?

http://www.jstor.org/pss/1913396

generally it helps if we could at least see the equations to check your
code against typos ( note page number ?) in lnl that may fix part of the
mystery.  Is full text available
on author's site, doesn't come up on citeseer AFAICT,


http://citeseerx.ist.psu.edu/search?q=blundell+1982&sort=ascdate

I guess one question would be " what is beta" in lnl supposed to be -
it isn't used anywhere but I will also mentioned I'm not that familiar
with the R code ( I'm trying to work through this to learn R and the 
optimizers). 

maybe some words would help, is sigma supposed to be 2x2 or 8x8 and what are
e1 and e2 supposed to be?




>
> TABLE II - model 18s
>
> coef std err
> p10 -0.19 0.078
> p11 0.220 0.019
> p12 -0.148 0.021
> p13 -0.072
> p20 0.893 0.072
> p21 -0.148
> p22 0.050 0.035
> p23 0.098
>
> The results which I produced in Stata are reported below.
> I spent the last hour rewriting the code to reproduce this - since I
> am now at home and not at work :(
> My results are "identical" to those published. The estimates are for
> a 3 equation symmetrical singular system.
> I have not bothered to report symmetrical results and have backed out
> an extra estimate using adding up constraints.
> I have also backed out all standard errors using the delta method.
>
> . ereturn display
> --
> | Coef. Std. Err. z P>|z| [95% Conf. Interval]
> -+
> a |
> a1 | -.0188115 .0767759 -0.25 0.806 -.1692895 .1316664
> a2 | .8926598 .0704068 12.68 0.000 .7546651 1.030655
> a3 | .1261517 .0590193 2.14 0.033 .010476 .2418275
> -+
> g |
> g11 | .2199442 .0184075 11.95 0.000 .183866 .2560223
> g12 | -.1476856 .0211982 -6.97 0.000 -.1892334 -.1061378
> g13 | -.0722586 .0145154 -4.98 0.000 -.1007082 -.0438089
> g22 | .0496865 .0348052 1.43 0.153 -.0185305 .1179034
> g23 | .0979991 .0174397 5.62 0.000 .0638179 .1321803
> g33 | -.0257405 .0113869 -2.26 0.024 -.0480584 -.0034226
> --
>
> In R I cannot get results like this - I think it is probably to do
> with my inability at using the optimisers well.
> Any pointers would be appreciated.
>
> Peter said "Are we maximizing over the same parameter space? You say
> that the estimates from the paper gives a log-likelihood of 54.04, but
> the exact solution clocked in at 76.74, which in my book is rather
> larger."
>
> I meant +54.04 > -76.74. It is quite common to get positive
> log-likelihoods in these system estimation.
>
> Kind regards,
>
> Alex
>
> On 9 May 2011 19:04, peter dalgaard  wrote:
> >
> > On May 9, 2011, at 06:07 , Alex Olssen wrote:
> >
> >> Thank you all for your input.
> >>
> >> Unfortunately my problem is not yet resolved.  Before I respond to
> >> individual comments I make a clarification:
> >>
> >> In Stata, using the same likelihood function as above, I can reproduce
> >> EXACTLY (to 3 decimal places or more, which is exactly considering I
> >> am using different software) the results from model 8 of the paper.
> >>
> >> I take this as an indication that I am using the same likelihood
> >> function as the authors, and that it does indeed work.
> >> The reason I am trying to estimate the model in R is because while
> >> Stata reproduces model 8 perfectly it has convergence
> >> difficulties for some of the other models.
> >>
> >> Peter Dalgaard,
> >>
> >> "Better starting values would help. In this case, almost too good
> >> values are available:
> >>
> >> start <- c(coef(lm(y1~x1+x2+x3)), coef(lm(y2~x1+x2+x3)))
> >>
> >> which appears to be the _exact_ solution."
> >>
> >> Thanks for the suggestion.  Using these starting values produces the
> >> exact estimate that Dave Fournier emailed me.
> >> If these are the exact solution then why did the author publish
> >> different answers which are completely reproducible in
> >> Stata and 

Re: [R] Confidence intervals and polynomial fits

2011-05-08 Thread Mike Marchywka




> From: pda...@gmail.com
> Date: Sun, 8 May 2011 09:33:23 +0200
> To: rh...@sticksoftware.com
> CC: r-help@r-project.org
> Subject: Re: [R] Confidence intervals and polynomial fits
>
>
> On May 7, 2011, at 16:15 , Ben Haller wrote:
>
> > On May 6, 2011, at 4:27 PM, David Winsemius wrote:
> >
> >> On May 6, 2011, at 4:16 PM, Ben Haller wrote:
> >>>
> >>
> >>> As for correlated coefficients: x, x^2, x^3 etc. would obviously be 
> >>> highly correlated, for values close to zero.
> >>
> >> Not just for x close to zero:
> >>
> >>> cor( (10:20)^2, (10:20)^3 )
> >> [1] 0.9961938
> >>> cor( (100:200)^2, (100:200)^3 )
> >> [1] 0.9966219
> >
> > Wow, that's very interesting. Quite unexpected, for me. Food for thought. 
> > Thanks!
> >
>
> Notice that because of the high correlations between the x^k, their parameter 
> estimates will be correlated too. In practice, this means that the c.i. for 
> the quartic term contains values for which you can compensate with the other 
> coefficients and still have an acceptable fit to data. (Nothing strange about 
> that; already in simple linear regression, you allow the intercept to change 
> while varying the slope.)

I was trying to compose a longer message but at least for even/odd it isn't 
hard to find a set
of values for which cor is zero or to find a set of points that make sines of 
different frequencies have
non-zero correlation- this highlights the fact that the computer isn't magic 
and it
needs data to make basis functions different from each other. 
For background, you probably want to look up "Taylor Series" and "Orthogonal 
Basis."
I would also suggest using R to add noise to your input and see what that does 
to your predictions
or for that matter take simple known data and add noise although I think in 
principal you can
do this analytically. You can always project a signal
onto some subspace and get an estimate of how good your estimate is, but that 
is different from
asking how well you can reconstruct your signal from a bunch of projections. 
If you want to know, "what can I infer about the slope of my thing at x=a" that 
is a
specific question about one coefficient but at this point statisticians can 
elaborate about
various issues with the other things you ignore. Also, I think you said 
something about
correclated at x=0 but you can change your origin, (x-a)^n and expand this in a 
finite series in x^m
to see what happens here. 

Also, if you are using hotmail don't think that a dot product is not html LOL 
since
hotmail know you must mean html when you use less than even in text email...


>
>
> --
> Peter Dalgaard
> Center for Statistics, Copenhagen Business School
> Solbjerg Plads 3, 2000 Frederiksberg, Denmark
> Phone: (+45)38153501
> Email: pd@cbs.dk Priv: pda...@gmail.com
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
  
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Syntax for iter.max in rms

2011-05-08 Thread Mike Harwood
Hello,

I would like to increase the number of iterations for running a
Buckley-James regression model in the rms package, but something is
apparently syntactically wrong.  The following code initially is
exactly as it appears in the help page (which runs successfully), then
a "failure to converge" message (resulting from specifying an
'identity' link argument, the error message by adding both the
iter.max control and 'identity' link arguments, and finally, the error
message when testing just an iter.max argument.

This was run in R 2.13.0 with rms version 3.3-0 on Ubuntu Linux 11.04.

Thank you in advance, and all insights and criticisms are appreciated.

Mike

library(rms)
> f <- bj(Surv(ftime, stroke) ~ rcs(age,5) + hospital,x=TRUE, y=TRUE)
> f <- bj(Surv(ftime, stroke) ~ rcs(age,5) + hospital, link='identity',x=TRUE, 
> y=TRUE)

No convergence in 50 steps
Failure in bj.fit
> f <- bj(Surv(ftime, stroke) ~ rcs(age,5) + hospital, link='identity', 
> iter.max=200, x=TRUE, y=TRUE)
Error in bj(Surv(ftime, stroke) ~ rcs(age, 5) + hospital, link =
"identity",  :
  unused argument(s) (iter.max = 200)
> f <- bj(Surv(ftime, stroke) ~ rcs(age,5) + hospital, iter.max=200, x=TRUE, 
> y=TRUE)
Error in bj(Surv(ftime, stroke) ~ rcs(age, 5) + hospital, iter.max =
200,  :
  unused argument(s) (iter.max = 200)

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] nls problem with R

2011-05-05 Thread Mike Marchywka










> Date: Thu, 5 May 2011 01:20:33 -0700
> From: sterles...@hotmail.com
> To: r-help@r-project.org
> Subject: Re: [R] nls problem with R
>
> ID1 ID2 t V(t)
> 1 1 0 6.053078443
> 2 1 0.3403 5.56937391
> 3 1 0.4181 5.45484486
> 4 1 0.4986 5.193124598
> 5 1 0.7451 4.31386722
> 6 1 1.0069 3.645422269
> 7 1 1.5535 3.587710965
> 8 1 1.8049 3.740362689
> 9 1 2.4979 3.699837726
> 10 1 6.4903 2.908485019
> 11 1 13.5049 1.888179494
> 12 1 27.5049 1.176091259
> 13 1 41.5049 1.176091259
>
> The model
> (1) V(t)=V0[1-epi+ epi*exp(-c(t-t0))]

A=Vo, B-Vo*epi, C=exp(-c*t0)
V(t)=A-B+B*C*exp(-ct)

or further, D=A-B, F=B*C,

V(t)=D+F*exp(-ct)

this model only really has 3 attriubtes: initial value, final value,
and decay constant yet you ask for 4 parameters. There is no
way to get a unique answer. For some reason this same form comes up
a lot here, I think this is about third time I've sene this in last few weeks.

I guess when fishing or shopping for forms to fit, it is tempting to
throw a bunch of parameteres into your model but this can create intractable
ambiguities. 

Indeed, if I just remove t0 and use your first 8 points I get this
( random starting values, but convewrged easily you still need to plot etc)


[1] "1   v= 8.77181162126362  epi= 0.672516376478598  cl= 1.90973175223917 t0= 0
.643481321167201"
> summary(nls2)

Formula: V2 ~ v0 * (1 - epi + epi * exp(-cl * (T2)))

Parameters:
    Estimate Std. Error t value Pr(>|t|)
v0    6.2901 0.3384  18.585  8.3e-06 ***
epi   0.5430 0.1373   3.955   0.0108 *
cl    0.9684 0.5491   1.763   0.1381
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.3579 on 5 degrees of freedom

Number of iterations to convergence: 11
Achieved convergence tolerance: 4.057e-06



> (2) V(t)=V0{A*exp[-lambda1(t-t0)]+(1-A)*exp[-lambda2(t-t0)]}
>
> in formula (2) lambda1=0.5*{(c+delta)+[(c-delta)^2+4*(1-epi)*c*delta]^0.5}
>
> lambda2=0.5*{(c+delta)-[(c-delta)^2+4*(1-epi)*c*delta]^0.5}
> A=(epi*c-lambda2)/(lambda1-lambda2)
>
> The regression rule :
> for formula (1):(t<=2,that is) first 8 rows are used for non-linear
> regression
> epi,c,t0,V0 parameters are obtained
> for formula (2):all 13 rows of results are used for non-linear regression
> lambda1,lambda2,A (with these parameters, delta can be calculated from them)
>
> Thanks for help
> Ster Lesser
>
> --
> View this message in context: 
> http://r.789695.n4.nabble.com/nls-problem-with-R-tp3494454p3497825.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
  
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] nls problem with R

2011-05-04 Thread Mike Marchywka



> Date: Wed, 4 May 2011 07:07:44 -0700
> From: sterles...@hotmail.com
> To: r-help@r-project.org
> Subject: Re: [R] nls problem with R
>
> Thanks Andrew.
> I am sorry for some typos that I omit some numbers of T2.
> Based on your suggestion,I think the problem is in the initial values.
> And I will read more theory about the non-linear regression.

there is unlikely to be any magic involved, unlike getting hotmail to work.
As a tool for understanding your data, you should have some idea
of the qualitiative properties of model and data and the error
function you use to reconcile the two. 

If you can post your full data set I may post an R example of somethings
to try. I was looking for an excuse to play with nls, I'm not expert here,
and curious to see what I can do with your example for critique by others.
If you want to fully automate this for N contnuous parameters, you
can take a shotgun approach but not sure it helps other htna to
find gross problems in model or data.
I actually wrote a loop to keep picking random parameter values
and calculate and SSE between predicted and real data. What you soon
find is that this is like trying to decode a good crypto algorithm
by guessing- you can do the math to see the problem LOL.







>
> --
> View this message in context: 
> http://r.789695.n4.nabble.com/nls-problem-with-R-tp3494454p3495672.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
  
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] ID parameter in model

2011-05-04 Thread Mike Harwood
Thank you, Goran.  Please see the package details below:

> packageDescription('eha')
Encoding: UTF-8
Package: eha
Version: 1.3-2
Date: 2011-03-01
Title: Event History Analysis
Description: A package for survival and event history analysis
License: GPL (>= 3)
Author: Göran Broström
Depends: R (>= 2.2.0), survival, graphics
Maintainer: Göran Broström 
Packaged: 2011-03-01 14:56:12 UTC; gb
Repository: CRAN
Date/Publication: 2011-03-01 15:50:52
Built: R 2.13.0; i386-pc-mingw32; 2011-04-15 08:22:36 UTC; windows

Mike


On Mon, May 2, 2011 at 10:38 AM, Mike Harwood  wrote:

> Hello,
>
> I am apparently confused about the use of an id parameter for an event
> history/survival model, and why the EHA documentation for aftreg does
> not specify one.  All assistance and insights are appreciated.
>
> Attempting to specifiy an id variable with the documentation example
> generates an "overlapping intervals" error, so I sorted the original
> mort dataframe and set subsequent entry times an id to the previous
> exit time + 0.0001.  This allowed me to see the affect of the id
> parameter on the coefficients and significance tests, and prompted my
> question.  The code I used is shown below, with the results at the
> bottom.  Thanks in advance!
>
> Mike
>
> head(mort) ## data clearly contains multiple entries for some of the
> dataframe ids
>
> no.id.aft <- aftreg(Surv(enter, exit, event) ~ ses, data = mort)  ##
> Inital model
> id.aft <- aftreg(Surv(enter, exit, event) ~ ses, data = mort, id=id)
> ## overlapping intervals error
>
> mort.sort <- ## ensure records ordered
>mort[
>order(mort$id, mort$enter),]
>
> ## remove overlap
> for (i in 2:nrow(mort.sort)){
> if (mort.sort[i,'id'] == mort.sort[i-1,'id'])
> mort.sort[i,'enter'] <- mort.sort[i-1, 'exit'] + 0.0001
>}
>
> no.id.aft.sort <- aftreg(Surv(enter, exit, event) ~ ses, data =
> mort.sort) ## initial model on modified df
> id.aft.sort <- aftreg(Surv(enter, exit, event) ~ ses, id=id, data =
> mort.sort) ## with id parameter
>
>
> #=== output ===#
> > no.id.aft.sort
> Call:
> aftreg(formula = Surv(enter, exit, event) ~ ses, data = mort.sort)
>
> Covariate  W.mean  Coef Exp(Coef)  se(Coef)Wald p
> ses
>   lower0.416 0 1   (reference)
>   upper0.584-0.347 0.707 0.089 0.000
>
> log(scale)3.60336.704 0.065 0.000
> log(shape)0.331 1.393 0.058 0.000
>
> Events276
> Total time at risk 17045
> Max. log. likelihood  -1391.4
> LR test statistic 16.1
> Degrees of freedom1
> Overall p-value   6.04394e-05
> > id.aft.sort
> Call:
> aftreg(formula = Surv(enter, exit, event) ~ ses, data = mort.sort,
>id = id)
>
> Covariate  W.mean  Coef Exp(Coef)  se(Coef)Wald p
> ses
>   lower0.416 0 1   (reference)
>   upper0.584-0.364 0.695 0.090 0.000
>
> log(scale)3.58836.171 0.065 0.000
> log(shape)0.338 1.402 0.058 0.000
>
> Events276
> Total time at risk 17045
> Max. log. likelihood  -1390.8
> LR test statistic 17.2
> Degrees of freedom1
> Overall p-value   3.3091e-05
> >
>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] UNIX-like "cut" command in R

2011-05-02 Thread Mike Miller

On Tue, 3 May 2011, Christian Schulz wrote:


On Mon, 2 May 2011, P Ehlers wrote:


Use str_sub() in the stringr package:

require(stringr)  # install first if necessary
s <- "abcdefghijklmnopqrstuvwxyz"

str_sub(s, c(1,12,17), c(3,15,-1))
#[1] "abc""lmno"   "qrstuvwxyz"



Thanks.  That's very close to what I'm looking for, but it seems to 
correspond to "cut -c", not to "cut -f".  Can it work with delimiters or 
only with character counts?


Mike



x <- "this is a string"
unlist(strsplit(x," "))[c(1,4)]



Thanks.  I did figure that one out a couple of messages back, but to get 
it do behave like "cut -d' ' -f1,4", I had to add a paste command to 
reassemble the parts:


paste(unlist(strsplit(x," "))[c(1,4)], collapse=" ")

Then I wasn't sure if I could do this to every element of a vector of 
strings without looping -- I have to think not.


Mike

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] UNIX-like "cut" command in R

2011-05-02 Thread Mike Miller

On Mon, 2 May 2011, P Ehlers wrote:


Use str_sub() in the stringr package:

require(stringr)  # install first if necessary
s <- "abcdefghijklmnopqrstuvwxyz"

str_sub(s, c(1,12,17), c(3,15,-1))
#[1] "abc""lmno"   "qrstuvwxyz"



Thanks.  That's very close to what I'm looking for, but it seems to 
correspond to "cut -c", not to "cut -f".  Can it work with delimiters or 
only with character counts?


Mike

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] UNIX-like "cut" command in R

2011-05-02 Thread Mike Miller

On Mon, 2 May 2011, Gabor Grothendieck wrote:


On Mon, May 2, 2011 at 10:32 PM, Mike Miller  wrote:

On Tue, 3 May 2011, Andrew Robinson wrote:


try substr()


OK.  Apparently, it allows things like this...


substr("abcdef",2,4)


[1] "bcd"

...which is like this:

echo "abcdef" | cut -c2-4

But that doesn't use a delimiter, it only does character-based cutting, and
it is very limited.  With "cut -c" I can do stuff this:

echo "abcdefghijklmnopqrstuvwxyz" | cut -c-3,12-15,17-

abclmnoqrstuvwxyz

It extracts characters 1 to 3, 12 to 15 and 17 to the end.

That was a great tip, though, because it led me to strsplit, which can do
what I want, however somewhat awkwardly:


y <- "a b c d e f g h i j k l m n o p q r s t u v w x y z"
paste(unlist(strsplit(y, delim))[c(1:3,12:15,17:26)], collapse=delim)


[1] "a b c l m n o q r s t u v w x y z"

That gives me what I want, but it is still a little awkward.  I guess I
don't quite get what I'm doing with lists.  I'm not clear on how this would
work with a vector of strings.



Try this:


read.fwf(textConnection("abcdefghijklmnopqrstuvwxyz"), widths = c(3, 8, 4, 1, 10), 
colClasses = c(NA, "NULL"))

  V1   V3 V5
1 abc lmno qrstuvwxyz



That gives me a few more functions to study.  Of course the new code 
(using read.fwf() and textConnection()) is not doing what was requested 
and it requires some work to compute the widths from the given numbers 
(c(1:3, 12:15, 17:26) has to be converted to c(3, 8, 4, 1, 10)).


Mike__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] UNIX-like "cut" command in R

2011-05-02 Thread Mike Miller

On Tue, 3 May 2011, Andrew Robinson wrote:


try substr()


OK.  Apparently, it allows things like this...


substr("abcdef",2,4)

[1] "bcd"

...which is like this:

echo "abcdef" | cut -c2-4

But that doesn't use a delimiter, it only does character-based cutting, 
and it is very limited.  With "cut -c" I can do stuff this:


echo "abcdefghijklmnopqrstuvwxyz" | cut -c-3,12-15,17-

abclmnoqrstuvwxyz

It extracts characters 1 to 3, 12 to 15 and 17 to the end.

That was a great tip, though, because it led me to strsplit, which can do 
what I want, however somewhat awkwardly:



y <- "a b c d e f g h i j k l m n o p q r s t u v w x y z"
paste(unlist(strsplit(y, delim))[c(1:3,12:15,17:26)], collapse=delim)

[1] "a b c l m n o q r s t u v w x y z"

That gives me what I want, but it is still a little awkward.  I guess I 
don't quite get what I'm doing with lists.  I'm not clear on how this 
would work with a vector of strings.


Mike

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] List of Data Frames

2011-05-02 Thread Mike Smith
I'm trying to create a list of Data Frames.  I have 17 data frames that I need 
to move through in a loop, but if I simply make a list of them, then they do 
not stay data frames, and I can't sort through them.  I tried to create an 
array, but the data frames can have anywhere from 14-16 rows, and I couldn't 
find a way to make a variable size array.  If you have any ideas, I would 
greatly appreciate any help, as I'm trying to learn R, and decided to apply it 
to a project that I have been working on.  My goal is splitting a sports season 
into games per week, and then do statistics on each week, but have an average 
running up to that point in the season.  Thus the list would be indexed by 
weeks, and then there's a data frame of the game and all relevant statistics. 

Thank You,
Mike Smith


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] UNIX-like "cut" command in R

2011-05-02 Thread Mike Miller
The R "cut" command is entirely different from the UNIX "cut" command. 
The latter retains selected fields in a line of text.  I can do that kind 
of manipulation using sub() or gsub(), but it is tedious.  I assume there 
is an R function that will do this, but I don't know its name.  Can you 
tell me?


I'm also guessing that there is a web page somewhere that will tell me how 
to do a lot of common GNU/UNIX/Linux "text util" commmand-line kinds of 
things in R.  By that I mean by using R functions, not by making system 
calls.  Does anyone know of such a web page?


Thanks in advance.

Mike

--
Michael B. Miller, Ph.D.
Minnesota Center for Twin and Family Research
Department of Psychology
University of Minnesota

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] ID parameter in model

2011-05-02 Thread Mike Harwood
Hello,

I am apparently confused about the use of an id parameter for an event
history/survival model, and why the EHA documentation for aftreg does
not specify one.  All assistance and insights are appreciated.

Attempting to specifiy an id variable with the documentation example
generates an "overlapping intervals" error, so I sorted the original
mort dataframe and set subsequent entry times an id to the previous
exit time + 0.0001.  This allowed me to see the affect of the id
parameter on the coefficients and significance tests, and prompted my
question.  The code I used is shown below, with the results at the
bottom.  Thanks in advance!

Mike

head(mort) ## data clearly contains multiple entries for some of the
dataframe ids

no.id.aft <- aftreg(Surv(enter, exit, event) ~ ses, data = mort)  ##
Inital model
id.aft <- aftreg(Surv(enter, exit, event) ~ ses, data = mort, id=id)
## overlapping intervals error

mort.sort <- ## ensure records ordered
mort[
order(mort$id, mort$enter),]

## remove overlap
for (i in 2:nrow(mort.sort)){
 if (mort.sort[i,'id'] == mort.sort[i-1,'id'])
 mort.sort[i,'enter'] <- mort.sort[i-1, 'exit'] + 0.0001
}

no.id.aft.sort <- aftreg(Surv(enter, exit, event) ~ ses, data =
mort.sort) ## initial model on modified df
id.aft.sort <- aftreg(Surv(enter, exit, event) ~ ses, id=id, data =
mort.sort) ## with id parameter


#=== output ===#
> no.id.aft.sort
Call:
aftreg(formula = Surv(enter, exit, event) ~ ses, data = mort.sort)

Covariate  W.mean  Coef Exp(Coef)  se(Coef)Wald p
ses
   lower0.416 0 1   (reference)
   upper0.584-0.347 0.707 0.089 0.000

log(scale)3.60336.704 0.065 0.000
log(shape)0.331 1.393 0.058 0.000

Events276
Total time at risk 17045
Max. log. likelihood  -1391.4
LR test statistic 16.1
Degrees of freedom1
Overall p-value   6.04394e-05
> id.aft.sort
Call:
aftreg(formula = Surv(enter, exit, event) ~ ses, data = mort.sort,
id = id)

Covariate  W.mean  Coef Exp(Coef)  se(Coef)Wald p
ses
   lower0.416 0 1   (reference)
   upper0.584-0.364 0.695 0.090 0.000

log(scale)3.58836.171 0.065 0.000
log(shape)0.338 1.402 0.058 0.000

Events276
Total time at risk 17045
Max. log. likelihood  -1390.8
LR test statistic 17.2
Degrees of freedom1
Overall p-value   3.3091e-05
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] regular expression in gsub() for strings with leading backslash

2011-04-29 Thread Mike Miller

On Fri, 29 Apr 2011, Duncan Murdoch wrote:


On 29/04/2011 7:41 PM, Miao wrote:


Can anyone help on gsub() in R?  I have a string like something below, and
wanted to delete all the strings with leading backslash, including 
"\xa0On",

"\023, "\xab", and many others.   How should I write a regular expression
pattern in gsub()?  I don't care how many characters following backslash.



If those are R strings, none of them contain a backslash.  In R, a backslash 
would always be printed as \\.


\x is the introduction to a hexadecimal encoding for a character; the next 
two characters show the hex digits.  So your first string contains a single 
character \xa0, the third one contains \xab, and so on.


The \023 is an octal encoding for a single character.



If we were dealing with a leading backslash, I guess this would do it:

gsub("^.*", "", txt)

R would display a double backslash, but I believe that represents a single 
backslash.  So if the string were saved using write.table, say, only a 
single backslash would be stored.



a <- "\\This is a string."
a

[1] "\\This is a string."

gsub("^", "", a)

[1] "This is a string."

a

[1] "\\This is a string."

gsub("^.*", "", a)

[1] ""

gsub("^.*", "", c(a,"Another string","\\more"))

[1] ""   "Another string" ""

write.table(a, file="a.txt", quote=F, row.names=F, col.names=F)


$ cat a.txt
\This is a string.

Apparently this is not what the OP really wanted.  The OP probably wanted 
to remove characters that were not from the regular ASCII set.



Mike

--
Michael B. Miller, Ph.D.
Minnesota Center for Twin and Family Research
Department of Psychology
University of Minnesota

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] fisher exact for > 2x2 table

2011-04-29 Thread Mike Miller

Rob--

Your biostatistician has not disagreed with the rest of us about anything 
except for his preferred name for the test.  He wants to call it the 
Freeman-Halton test, some people call it the Fisher-Freeman-Halton test, 
but most people call it the Fisher Exact test -- all are the same test. 
When he was "adamant you could not do > 2x2", what he was being adamant 
about was the name you should use when referring to the test for tables 
larger than 2x2.  Why he was doing that, I don't know, but I think it is 
silly -- he confused you and the rest of us.


He goes on to tell you that to get the Freeman-Halton test in SAS, you use 
"tables a * b / fisher".  In other words, SAS calls the test "Fisher" 
instead of calling it Freeman-Halton.  R also calls it "Fisher" and not 
Freeman-Halton.  I'm like R and SAS and unlike your biostatistician, but 
to each his own.


You say that he is "exceptionally clear on this point," which may be true, 
but what is the point?  The point is that he prefers a different *name* 
for the test than the rest of us.  Everyone agrees on the math/stat.


Mike

--
Michael B. Miller, Ph.D.
Minnesota Center for Twin and Family Research
Department of Psychology
University of Minnesota


On Fri, 29 Apr 2011, viostorm wrote:



After I shared comments form the forum yesterday with the biostatistician he
indicated this:

"Fisher's exact test is the non-parametric analog for the Chi-square
test for 2x2 comparisons. A version (or extension) of the Fisher's Exact
test, known as the Freeman-Halton test applies to comparisons for tables
greater than 2x2. SAS can calculate both statistics using the following
instructions.

 proc freq; tables a * b / fisher;"

Do people here still stand by position fisher exact test can be used for RxC
contingency tables ?  Sorry to both you all so much it is just important for
a paper I am writing and planning to submit soon. ( I have a 4x2 table but
does not meet expected frequencies requirements for chi-squared.)

I guess people here have suggested R implements, the following, which
unfortunately are unavailable at least easily at my library but  at least by
the titles indicates it is extending it to RxC

Mehta CR, Patel NR. A network algorithm for performing Fisher's exact test
in r c contingency tables. Journal of the American Statistical Association
1983;78:427-34.

Mehta CR, Patel NR. Algorithm 643: FEXACT: A FORTRAN subroutine for Fisher's
exact test on unordered r x c contingency tables. ACM Transactions on
Mathematical Software 1986;12:154-61.

The only reason I ask again is he is exceptionally clear on this point.

Thanks again,

-Rob


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to generate a normal distribution with mean=1, min=0.2, max=0.8

2011-04-29 Thread Mike Miller

On Fri, 29 Apr 2011, David Winsemius wrote:


On Apr 29, 2011, at 1:29 PM, Mike Miller wrote:


On Fri, 29 Apr 2011, Giovanni Petris wrote:

Well, but the original poster also refers to 0.2 and 0.8 as "expected min 
and max", in which case we are back to a joke...


Well, he is a lot better with English than I am with Mandarin.  He seemed 
to like the truncated normal answers, so we'll let those be his answers.


It is possible to choose parameters for a normal distribution with 500 
observations such that the expected value of the maximum is .8 and the 
expected value of the minimum is .2.  Obviously, the mean would be .5, 
not 1, but what would the variance then have to be to provide the 
correct expected max and min?  That's another legitimate question.


You would need to specify an N since the expected first and last order 
statistic would decrease/increase with increasing N.


Right -- I chose N=500, as did the OP.  I think the order statistics for 
the normal are pretty complex, but it wouldn't be hard to use the density 
for order statistics for the uniform to compute the appropriate values for 
a standard normal, then rescale.


http://en.wikipedia.org/wiki/Order_statistic#The_order_statistics_of_the_uniform_distribution

You'd have to multiply the beta density times the inverse normal cdf and 
get the weighted average for a set of points.  It doesn't sound terribly 
difficult but I don't want to do it!  ;-)


Mike

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to generate a normal distribution with mean=1, min=0.2, max=0.8

2011-04-29 Thread Mike Miller

On Fri, 29 Apr 2011, Giovanni Petris wrote:

Well, but the original poster also refers to 0.2 and 0.8 as "expected 
min and max", in which case we are back to a joke...


Well, he is a lot better with English than I am with Mandarin.  He seemed 
to like the truncated normal answers, so we'll let those be his answers.


It is possible to choose parameters for a normal distribution with 500 
observations such that the expected value of the maximum is .8 and the 
expected value of the minimum is .2.  Obviously, the mean would be .5, not 
1, but what would the variance then have to be to provide the correct 
expected max and min?  That's another legitimate question.


Mike



-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Mao Jianfeng
Sent: Thursday, April 28, 2011 12:02 PM
To: r-help@r-project.org
Subject: [R] how to generate a normal distribution with mean=1, min=0.2, max=0.8

Dear all,

This is a simple probability problem. I want to know, How to generate 
a normal distribution with mean=1, min=0.2 and max=0.8?


I know how the generate a normal distribution of mean = 1 and sd = 1 
and with 500 data point.


rnorm(n=500, m=1, sd=1)

But, I am confusing with how to generate a normal distribution with 
expected min and max. I expect to hear your directions.


Thanks in advance.

Best,
Jian-Feng,


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to generate a normal distribution with mean=1, min=0.2, max=0.8

2011-04-28 Thread Mike Miller
I just realized that I had misread what was wanted -- the code I wrote was 
for mean=0, sd=1, not for mean=1.  So for mean=m, and sd=s, lower limit L 
and upper limit U, this approach will work:


n <- 1000

m <- 1
s <- 1

L <- .2
U <- .8

p_L <- pnorm(L, mean=m, sd=s)
p_U <- pnorm(U, mean=m, sd=s)


x <- qnorm(runif(n, p_L, p_U), mean=m, sd=s)

Or it could be written on one line:

x <- qnorm(runif(n, pnorm(L, mean=m, sd=s), pnorm(U, mean=m, sd=s)), mean=m, 
sd=s)

Mike


On Thu, 28 Apr 2011, Mike Miller wrote:

Good point.  It would be absurdly inefficient if the upper and lower 
limits on the interval of interest were, say, 0.2 and 0.201 instead of 
0.2 and 0.8. Here's what I think is probably the best general approach:


Compute the CDF for the upper and lower limits of the interval and 
generate uniform random numbers within that range of CDF values, then 
compute the inverse CDF of those values.  To be explicit:


n <- 1000
L <- .2
U <- .8

p_L <- pnorm(L)
p_U <- pnorm(U)

x <- qnorm(runif(n, p_L, p_U))

It is very fast and it always produces exactly the desired number ("n") 
of random normal values.


Mike

--
Michael B. Miller, Ph.D.
Bioinformatics Specialist
Minnesota Center for Twin and Family Research
Department of Psychology
University of Minnesota



On Thu, 28 Apr 2011, Carl Witthoft wrote:

That method (creating lots of samples and throwing most of them away) is 
usually frowned upon :-).


Try this:  (I haven't, so it may well have syntax errors)

% n28<- dnorm(seq(.2,.8,by=.001),mean=1,sd=1)

% x <- sample(seq(.2,.8,by=.001), size=500,replace=TRUE, prob=n28)

And I guess in retrospect this  will get really ugly if you want, say, a 
sampling grid resolution of 1e-6 or so.


Anyone know what's applicable from the "sampling" package?


Carl

---__
From: David Winsemius 
Date: Thu, 28 Apr 2011 13:06:21 -0400

On Apr 28, 2011, at 12:09 PM, Ravi Varadhan wrote:


Surely you must be joking, Mr. Jianfeng.



Perhaps not joking and perhaps not with correct statistical specification.

A truncated Normal could be simulated with:

set.seed(567)
x <- rnorm(n=5, m=1, sd=1)
xtrunc <- x[x>=0.2 & x <=0.8]
require(logspline)
plot(logspline(xtrunc, lbound=0.2, ubound=0.8, nknots=7))

--
David.





-Original Message-


From: r-help-bounces_at_r-project.org 

[mailto:r-help-bounces_at_r-project.org


] On Behalf Of Mao Jianfeng





Dear all,







This is a simple probability problem. I want to know, How to



generate a



normal distribution with mean=1, min=0.2 and max=0.8?






__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.





__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to generate a normal distribution with mean=1, min=0.2, max=0.8

2011-04-28 Thread Mike Miller
Good point.  It would be absurdly inefficient if the upper and lower 
limits on the interval of interest were, say, 0.2 and 0.201 instead of 0.2 
and 0.8.  Here's what I think is probably the best general approach:


Compute the CDF for the upper and lower limits of the interval and 
generate uniform random numbers within that range of CDF values, then 
compute the inverse CDF of those values.  To be explicit:


n <- 1000
L <- .2
U <- .8

p_L <- pnorm(L)
p_U <- pnorm(U)

x <- qnorm(runif(n, p_L, p_U))

It is very fast and it always produces exactly the desired number ("n") of 
random normal values.


Mike

--
Michael B. Miller, Ph.D.
Bioinformatics Specialist
Minnesota Center for Twin and Family Research
Department of Psychology
University of Minnesota



On Thu, 28 Apr 2011, Carl Witthoft wrote:

That method (creating lots of samples and throwing most of them away) is 
usually frowned upon :-).


Try this:  (I haven't, so it may well have syntax errors)

% n28<- dnorm(seq(.2,.8,by=.001),mean=1,sd=1)

% x <- sample(seq(.2,.8,by=.001), size=500,replace=TRUE, prob=n28)

And I guess in retrospect this  will get really ugly if you want, say, a 
sampling grid resolution of 1e-6 or so.


Anyone know what's applicable from the "sampling" package?


Carl

---__
From: David Winsemius 
Date: Thu, 28 Apr 2011 13:06:21 -0400

On Apr 28, 2011, at 12:09 PM, Ravi Varadhan wrote:


Surely you must be joking, Mr. Jianfeng.



Perhaps not joking and perhaps not with correct statistical specification.

A truncated Normal could be simulated with:

set.seed(567)
x <- rnorm(n=5, m=1, sd=1)
xtrunc <- x[x>=0.2 & x <=0.8]
require(logspline)
plot(logspline(xtrunc, lbound=0.2, ubound=0.8, nknots=7))

--
David.





-Original Message-


From: r-help-bounces_at_r-project.org 

[mailto:r-help-bounces_at_r-project.org


] On Behalf Of Mao Jianfeng





Dear all,







This is a simple probability problem. I want to know, How to



generate a



normal distribution with mean=1, min=0.2 and max=0.8?






__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] fisher exact for > 2x2 table

2011-04-28 Thread Mike Miller

On Thu, 28 Apr 2011, viostorm wrote:


I have read the help page, or at least ?fisher.exact

I looked a bit on the Internet I guess it is applicable to > 2x2.  I had 
spoken to a biostatistician here who is quite excellent and was adamant 
with me I could not do > 2x2.


I found this:

http://mathworld.wolfram.com/FishersExactTest.html



That page shows that the Fisher Exact test can be implemented on tables 
with any numbers of rows and columns (so long as there are at least two 
rows and two columns).  Your biostatistician just didn't happen to know 
about this, but s/he shouldn't have been adamant when s/he was wrong. 
Show your biostatistician the MathWorld page.


Mike

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] fisher exact for > 2x2 table

2011-04-28 Thread Mike Miller

On Fri, 29 Apr 2011, Thomas Lumley wrote:


On Fri, Apr 29, 2011 at 8:01 AM, Mike Miller  wrote:


On Thu, 28 Apr 2011, viostorm wrote:


I'm using fisher.exact on a 4x2 table and it seems to work.

Does anyone know exactly what is going on?  I thought fisher.exact is 
only for 2x2 tables.



You were wrong.  I'm sure there's nothing wrong with the program.  You 
will find that with bigger tables and larger sample sizes the 
computational cost becomes quite enormous.


In fact, with large tables, roundoff error becomes significant before 
computational cost becomes prohibitive.



To avoid both of these problems one might use Monte Carlo resampling under 
the null, maybe 10,000 times or more.  I think independence_test() in the 
coin package will do this:


http://cran.r-project.org/web/packages/coin/

To estimate very small p-values properly, one must resample many more 
times.


Mike__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] fisher exact for > 2x2 table

2011-04-28 Thread Mike Miller

On Thu, 28 Apr 2011, viostorm wrote:


I'm using fisher.exact on a 4x2 table and it seems to work.

Does anyone know exactly what is going on?  I thought fisher.exact is 
only for 2x2 tables.



You were wrong.  I'm sure there's nothing wrong with the program.  You 
will find that with bigger tables and larger sample sizes the 
computational cost becomes quite enormous.


Mike

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Speed up plotting to MSWindows graphics window

2011-04-27 Thread Mike Marchywka














> Date: Wed, 27 Apr 2011 14:40:23 +0200
> From: jonat...@k-m-p.nl
> To: r-help@r-project.org
> Subject: Re: [R] Speed up plotting to MSWindows graphics window
>
>
> On 27/04/2011 13:18, Mike Marchywka wrote:
> >
> >> > Date: Wed, 27 Apr 2011 11:16:26 +0200
> >> > From:jonat...@k-m-p.nl
> >> > To:r-help@r-project.org
> >> > Subject: [R] Speed up plotting to MSWindows graphics window
> >> >
> >> > Hello,
> >> >
> >> > I am working on a project analysing the performance of motor-vehicles
> >> > through messages logged over a CAN bus.
> >> >

> >> >
> >> > I am currently plotting the data in R, overlaying 5 or more plots of
> >> > data, logged at 1kHz, (using plot.ts() and par(new = TRUE)).
> >> > The aim is to be able to pan, zoom in and out and get values from the
> >> > plotted graph using a custom Qt interface that is used as a front end to
> >> > R.exe (all this works).
> >> > The plot is drawn by R directly to the windows graphic device.
> >> >
> >> > The data is imported from a .csv file (typically around 100MB) to a 
> >> > matrix.
> >> > (timestamp, message ID, byte0, byte1, ..., byte7)
> >> > I then separate this matrix into several by message ID (dimensions are
> >> > in the order of 8cols, 10^6 rows)
> >> >
> >> > The panning is done by redrawing the plots, shifted by a small amount.
> >> > So as to view a window of data from a second to a minute long that can
> >> > travel the length of the logged data.
> >> >
> >> > My problem is that, the redrawing of the plots whilst panning is too
> >> > slow when dealing with this much data.
> >> > i.e.: I can see the last graphs being drawn to the screen in the
> >> > half-second following the view change.
> >> > I need a fluid change from one view to the next.
> >> >
> >> > My question is this:
> >> > Are there ways to speed up the plotting on the MSWindows display?
> >> > By reducing plotted point densities to*sensible* values?
> > Well, hard to know but it would help to know where all the time is going.
> > Usually people start complaining when VM thrashing is common but if you are
> > CPU limited you could try restricting the range of data you want to plot
> > rather than relying on the plot to just clip the largely irrelevant points
> > when you are zoomed in. It should not be too expensive to find the
> > limits either incrementally or with binary search on ordered time series.
> > Presumably subsetting is fast using foo[a:b,]
> >
> > One thing you may want to try for change of scale is wavelet or
> > multi-resolution analysis. You can make a tree ( increasing memory usage
> > but even VM here may not be a big penalty if coherence is high ) and
> > display the resolution appropriate for the current scale.
> >
> >
> I forgot to add, for plotting I use a command similar to:
>
> plot.ts(timestampVector, dataVector, xlim=c(a,b))
>
> a and b are timestamps from timestampVector
>
> Is the xlim parameter sufficient for limiting the scope of the plots?
> Or should I subset the timeseries each time I do a plot?

well, maybe time series knows the data to be ordered, I never use
that, but in general it has to go check each point and clip the out
of range ones. It could I suppose binary search for start/end points
but I don't know.Based on what you said it sounds like it does not.


>
>
> The multi-resolution analysis looks interesting.
> I shall spend some time finding out how to use the wavelets package.
>
> Cheers!
>
> >
> >> > Using something other than plot.ts() - is the lattice package faster?
> >> > I don't need publication quality plots, they can be rougher...
> >> >
> >> > I have tried:
> >> > -Using matrices instead of dataframes - (works for calculations but not
> >> > enough for plots)
> >> > -increasing the max usable memory (max-mem-size) - (no change)
> >> > -increasing the size of the pointer protection stack (max-ppsize) - (no
> >> > change)
> >> > -deleting the unnecessary leftover matrices - (no change)
> >> > -I can't use lines() instead of plot() because of the very different
> >> > scales (rpm-1, flags -1to3)
> >> >
> >> > I am going to do some resampling of the logged data to reduce the vector
> >>

Re: [R] Speed up plotting to MSWindows graphics window

2011-04-27 Thread Mike Marchywka



> Date: Wed, 27 Apr 2011 11:16:26 +0200
> From: jonat...@k-m-p.nl
> To: r-help@r-project.org
> Subject: [R] Speed up plotting to MSWindows graphics window
>
> Hello,
>
> I am working on a project analysing the performance of motor-vehicles
> through messages logged over a CAN bus.
>
> I am using R 2.12 on Windows XP and 7
>
> I am currently plotting the data in R, overlaying 5 or more plots of
> data, logged at 1kHz, (using plot.ts() and par(new = TRUE)).
> The aim is to be able to pan, zoom in and out and get values from the
> plotted graph using a custom Qt interface that is used as a front end to
> R.exe (all this works).
> The plot is drawn by R directly to the windows graphic device.
>
> The data is imported from a .csv file (typically around 100MB) to a matrix.
> (timestamp, message ID, byte0, byte1, ..., byte7)
> I then separate this matrix into several by message ID (dimensions are
> in the order of 8cols, 10^6 rows)
>
> The panning is done by redrawing the plots, shifted by a small amount.
> So as to view a window of data from a second to a minute long that can
> travel the length of the logged data.
>
> My problem is that, the redrawing of the plots whilst panning is too
> slow when dealing with this much data.
> i.e.: I can see the last graphs being drawn to the screen in the
> half-second following the view change.
> I need a fluid change from one view to the next.
>
> My question is this:
> Are there ways to speed up the plotting on the MSWindows display?
> By reducing plotted point densities to *sensible* values?

Well, hard to know but it would help to know where all the time is going.
Usually people start complaining when VM thrashing is common but if you are
CPU limited you could try restricting the range of data you want to plot
rather than relying on the plot to just clip the largely irrelevant points
when you are zoomed in. It should not be too expensive to find the
limits either incrementally or with binary search on ordered time series. 
Presumably subsetting is fast using  foo[a:b,] 

One thing you may want to try for change of scale is wavelet  or
multi-resolution analysis. You can make a tree ( increasing memory usage
but even VM here may not be a big penalty if coherence is high ) and
display the resolution appropriate for the current scale. 




> Using something other than plot.ts() - is the lattice package faster?
> I don't need publication quality plots, they can be rougher...
>
> I have tried:
> -Using matrices instead of dataframes - (works for calculations but not
> enough for plots)
> -increasing the max usable memory (max-mem-size) - (no change)
> -increasing the size of the pointer protection stack (max-ppsize) - (no
> change)
> -deleting the unnecessary leftover matrices - (no change)
> -I can't use lines() instead of plot() because of the very different
> scales (rpm-1, flags -1to3)
>
> I am going to do some resampling of the logged data to reduce the vector
> sizes.
> (removal of *less* important data and use of window.ts())
>
> But I am currently running out of ideas...
> So if sombody could point out something, I would be greatfull.
>
> Thanks,
>
> Jonathan Gabris
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
  
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] blank space escape sequence in R?

2011-04-25 Thread Mike Miller

On Mon, 25 Apr 2011, Mark Heckmann wrote:

I use a function that inserts line breaks ("\n" as escape sequence) 
according to some criterion when there are blanks in the string. e.g. 
"some text \nand some more text".


What I want now is another form of a blank, so my function will not 
insert a ?\n" at that point. e.g. "some text\spaceand some more text"


Here "\space" stands for some escape sequence for a blank, which is what 
I am looking for. So what I need is something that will appear as a 
blank when printed but not in the string itself.



Is it possible to use \x20 or some similar way to evoke the hexadecimal 
ascii form of blank?  That works in perl as does \040 for the octal form.


Mike

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Using Java methods in R

2011-04-23 Thread Mike Marchywka














> From: marchy...@hotmail.com
> To: hill0...@umn.edu; r-help@r-project.org
> Date: Sat, 23 Apr 2011 15:12:30 -0400
> Subject: Re: [R] Using Java methods in R
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> 
> > Date: Sat, 23 Apr 2011 05:32:59 -0700
> > From: hill0...@umn.edu
> > To: r-help@r-project.org
> > Subject: Re: [R] Using Java methods in R
> >
> > No answer to my post,
> > so let's try a simpler question. Am I doing this correctly?
> > I have the RGui with R Console on the screen.
> > On rhe top pullDowns, Packages > Install Packages > USA(IA)> rJava
> > > library(rJava)
> > > .jinit()
> > > qsLin <- .jnew("C:/ad/j/CalqsLin")
> > Error in .jnew("C:/ad/j/CalqsLin") :
> > java.lang.NoClassDefFoundError: C:/ad/j/CalqsLin
>
> i haven't used rjava yet, I think I installed it on linux
> for testing but no real usage, but this message
> appears to come from jvm probably because you
> specified an aboslute path. You can try this from
> command line,
>
>
>
>
> $ ls ../geo/dayte*
> ../geo/dayte.class
>
> mmarchywka@phlulap01 /cygdrive/c/d/phluant/duphus
> $ java ../geo/dayte
> Exception in thread "main" java.lang.NoClassDefFoundError: ///geo/dayte
> Caused by: java.lang.ClassNotFoundException: ...geo.dayte
> at java.net.URLClassLoader$1.run(Unknown Source)
> at java.security.AccessController.doPrivileged(Native Method)
> at java.net.URLClassLoader.findClass(Unknown Source)
> at java.lang.ClassLoader.loadClass(Unknown Source)
> at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source)
> at java.lang.ClassLoader.loadClass(Unknown Source)
> Could not find the main class: ../geo/dayte.  Program will exit.
>
>
> you want to search for the classpath and set that up then
> refernce the class name not the path ( the class loader will follow
> the path looking for it ).
> Not sure how you do that in rjava under 'dohs but
> if you search for that term is should be apparent.
> And of course 'dohs and pdf aren't always friendly
> for automated work so get something like cygwin and
> reduce pdf's to text etc.
>
>

It appears that by default the current directory is on the
classpath. I must have installed this on 'dohs before
and if I copy the class file into R directory it
can find it,




> library("rJava")
> .jinit()
> .jnew("foo")
Error in .jnew("foo") : java.lang.NoClassDefFoundError: foo
> .jnew("dayte")
[1] "Java-Object{dayte@1de3f2d}"
>

that's unlikely to fix all your problems, you want to set the classpath
but if you know the term and can load cygwin you should be able
to find the way to set that up.












>
>
>
>
> >
> > So I got this error which means I don't understand very much.
> > I go to C:/ad/j and get
> > C:\ad\j>dir CalqsLin.class
> > Volume in drive C has no label.
> > Volume Serial Number is 9A35-67A2
> > Directory of C:\ad\j
> > 04/23/2011 07:11 AM 14,651 CalqsLin.class
> > 1 File(s) 14,651 bytes
> > 0 Dir(s) 104,257,716,224 bytes free
> >
> > Just to show my intentions,
> > I had next wanted to call this java function method:
> > linTimOfCalqsStgIsLev("20110405235959",-4) using:
> > > dblTim <-
> > > .jcall(qsLin,"D","linTimOfCalqsStgIsLev","20110405235959","-4")
> > but that will probably also be wrong?
> > Obviously I don't understand.
> >
> >
> > --
> > View this message in context: 
> > http://r.789695.n4.nabble.com/Using-Java-methods-in-R-tp3469299p3469848.html
> > Sent from the R help mailing list archive at Nabble.com.
> >
> > __
> > R-help@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
  
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Using Java methods in R

2011-04-23 Thread Mike Marchywka
















> Date: Sat, 23 Apr 2011 05:32:59 -0700
> From: hill0...@umn.edu
> To: r-help@r-project.org
> Subject: Re: [R] Using Java methods in R
>
> No answer to my post,
> so let's try a simpler question. Am I doing this correctly?
> I have the RGui with R Console on the screen.
> On rhe top pullDowns, Packages > Install Packages > USA(IA)> rJava
> > library(rJava)
> > .jinit()
> > qsLin <- .jnew("C:/ad/j/CalqsLin")
> Error in .jnew("C:/ad/j/CalqsLin") :
> java.lang.NoClassDefFoundError: C:/ad/j/CalqsLin

i haven't used rjava yet, I think I installed it on linux
for testing but no real usage, but this message
appears to come from jvm probably because you
specified an aboslute path. You can try this from
command line,




$ ls ../geo/dayte*
../geo/dayte.class

mmarchywka@phlulap01 /cygdrive/c/d/phluant/duphus
$ java ../geo/dayte
Exception in thread "main" java.lang.NoClassDefFoundError: ///geo/dayte
Caused by: java.lang.ClassNotFoundException: ...geo.dayte
    at java.net.URLClassLoader$1.run(Unknown Source)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(Unknown Source)
    at java.lang.ClassLoader.loadClass(Unknown Source)
    at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source)
    at java.lang.ClassLoader.loadClass(Unknown Source)
Could not find the main class: ../geo/dayte.  Program will exit.


you want to search for the classpath and set that up then
refernce the class name not the path ( the class loader will follow
the path looking for it ). 
Not sure how you do that in rjava under 'dohs but
if you search for that term is should be apparent.
And of course 'dohs and pdf aren't always friendly
for automated work so get something like cygwin and
reduce pdf's to text etc.






>
> So I got this error which means I don't understand very much.
> I go to C:/ad/j and get
> C:\ad\j>dir CalqsLin.class
> Volume in drive C has no label.
> Volume Serial Number is 9A35-67A2
> Directory of C:\ad\j
> 04/23/2011 07:11 AM 14,651 CalqsLin.class
> 1 File(s) 14,651 bytes
> 0 Dir(s) 104,257,716,224 bytes free
>
> Just to show my intentions,
> I had next wanted to call this java function method:
> linTimOfCalqsStgIsLev("20110405235959",-4) using:
> > dblTim <-
> > .jcall(qsLin,"D","linTimOfCalqsStgIsLev","20110405235959","-4")
> but that will probably also be wrong?
> Obviously I don't understand.
>
>
> --
> View this message in context: 
> http://r.789695.n4.nabble.com/Using-Java-methods-in-R-tp3469299p3469848.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
  
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Survival analysis: same subject with multiple treatments and experience multiple events

2011-04-23 Thread Mike Marchywka










> Date: Fri, 22 Apr 2011 10:00:31 -0400
> From: littleduc...@gmail.com
> To: r-help@r-project.org
> Subject: [R] Survival analysis: same subject with multiple treatments and 
> experience multiple events
>
> Hi there,
>
> I need some help to figure out what is the proper model in survival analysis
> for my data.
> Subjects were randomized to 3 treatments in trial 1, some of them experience
> the event during the trial;
> After period of time those subjects were randomized to 3 treatments again in
> trial 2, but different from what they got in 1st trial, some of them
> experience the event during the 2nd trial (I think the carryover effect can
> be ignored since the time between two trials is long enough.)
>
> What I am interested is whether the survival functions differ among
> treatments. How should I deal with the correlation between the observation
> since the same subject was treated with two different drugs in two trials?
> Should I add "TRIAL" , "whether the event happened before", or "number of
> times the event happened before" as covariate(s)?
>
> Any input will be appreciated. Thank you.
> Qian
>

No one else replied so I would just suggest a web search using the term 
"crossover design"

http://www.google.com/#q=cran+crossover+design+survival

and refer you to any FDA panel discussions regarding any drugs that have been 
debated with
similar trial designs as part of the debate.

http://www.google.com/#sclient=psy&hl=en&site=&source=hp&q=site:fda.gov+briefing+crossover

The point of the above is to get some idea what can happen as no battle plan 
survives first
contact with data. Usually the objective in these designs is to infer something 
about causality in some system
and you just use the statistics to avoid fooling yourself. Personally it seems 
to me that
understanding of disease and host dynamics is improving to the point where you 
can
do more with the "carryover effct" that you mention as well as parametric 
putative prognostic
factors but you can also see opinions vary.



  
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to answer the question about transitive correlation?

2011-04-22 Thread Mike Marchywka











> Date: Fri, 22 Apr 2011 11:42:59 +0800
> From: mailzhu...@gmail.com
> To: r-help@r-project.org
> Subject: [R] How to answer the question about transitive correlation?
>
> Hi, everyone. I know it may be a basic statistical question. But I can't
> find a good answer.
>
> I have a question raised by one of the reviewers.
> Factor A expression was strongly correlated with B expression (chi-square)
> in this series. Prior reports by the same authors showed that B expression
> strongly correlated with survival (Log-rank). Please provide an explanation
> why then were the results not transitive.


The only explanation that would have any value would require you post the
data and let everyone look at it and any R output would be a benefit too. 
So you compared A and B with one test, 
B vs C with another, cutoff the result at some arbitrary criterion, and
then wonder why some unspeficied test and criterion applied to A vs C
doesn't vote in the majority with the same answer? Changing acceptance levels
or applying a "correction" after careful shoppping can always "fix" that
logical inconsistency LOL. Its not hard to plot 3 normal curves on same graph
and see one example of how their overlaps can relate.

I'm not sure what to think but perhaps you could look at stuff like this,

http://en.wikipedia.org/wiki/Path_analysis_%28statistics%29



  
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] survexp with weights

2011-04-21 Thread Mike Harwood
The system details follow below.  Also, I have attempted specifying
the variables in a second "ratetable" statement, but the same "missing
object" error occurs in creating a survexp object/list.

> R.version
   _
platform   i386-pc-mingw32
arch   i386
os mingw32
system i386, mingw32
status
major  2
minor  12.2
year   2011
month  02
day25
svn rev54585
language   R
version.string R version 2.12.2 (2011-02-25)


package ‘survival’ version 2.36-5




On Apr 20, 11:03 am, Mike Harwood  wrote:
> Hello,
>
> I probably have a syntax error in trying to generate an expected
> survival curve from a weighted cox model, but I can't see it.  I used
> the help sample code to generate a weighted model, with the addition
> of a "weights=albumin" argument (I only chose albumin because it had
> no missing values, not because of any real relevance).  Below are my
> code with the resulting error messages.  Thanks in advance!
>
> > pfit <- coxph(Surv(time,status>0) ~ trt + log(bili) + log(protime) + age +
>
> +     + platelet,  data=pbc
> +     )
>
> > pfit
>
> Call:
> coxph(formula = Surv(time, status > 0) ~ trt + log(bili) +
> log(protime) +
>     age + +platelet, data = pbc)
>
>                   coef exp(coef) se(coef)        z      p
> trt          -0.000624     0.999  0.17304 -0.00360 1.
> log(bili)     0.985497     2.679  0.08949 11.01262 0.
> log(protime)  2.794001    16.346  0.95289  2.93215 0.0034
> age           0.020590     1.021  0.00805  2.55666 0.0110
> platelet     -0.001699     0.998  0.00085 -2.00130 0.0450
>
> Likelihood ratio test=164  on 5 df, p=0  n= 308, number of events=
> 143
>    (110 observations deleted due to missingness)
>
> > plot(survfit(Surv(time, status>0) ~ trt, data=pbc))
> > lines(survexp( ~ trt, ratetable=pfit, data=pbc), col='purple')
>
> > pfit.wtd <- coxph(Surv(time,status>0) ~ trt + log(bili) + log(protime) + 
> > age +
>
> +     + platelet,  weights=albumin, data=pbc
> +     )
>
> > pfit.wtd
>
> Call:
> coxph(formula = Surv(time, status > 0) ~ trt + log(bili) +
> log(protime) +
>     age + +platelet, data = pbc, weights = albumin)
>
>                  coef exp(coef) se(coef)      z       p
> trt          -0.01354     0.987 0.094204 -0.144 8.9e-01
> log(bili)     0.99282     2.699 0.048690 20.391 0.0e+00
> log(protime)  2.54136    12.697 0.525797  4.833 1.3e-06
> age           0.01913     1.019 0.004398  4.350 1.4e-05
> platelet     -0.00165     0.998 0.000462 -3.578 3.5e-04
>
> Likelihood ratio test=535  on 5 df, p=0  n= 308, number of events=
> 143
>    (110 observations deleted due to missingness)> plot(survfit(Surv(time, 
> status>0) ~ trt, data=pbc))
> > lines(survexp( ~ trt, ratetable=pfit.wtd, data=pbc), col='purple')
>
> Error in eval(expr, envir, enclos) : object 'albumin' not found> 
> lines(survexp( ~ trt, ratetable=pfit.wtd, weights=albumin, data=pbc), 
> col='purple')
>
> Error in eval(expr, envir, enclos) : object 'albumin' not found
> In addition: Warning message:
> In survexp(~trt, ratetable = pfit.wtd, weights = albumin, data =
> pbc) :
>   Weights ignored> lines(survexp( ~ trt, ratetable=pfit.wtd, 
> weights=pbc$albumin, data=pbc), col='purple')
>
> Error in eval(expr, envir, enclos) : object 'albumin' not found
> In addition: Warning message:
> In survexp(~trt, ratetable = pfit.wtd, weights = pbc$albumin, data =
> pbc) :
>   Weights ignored
>
>
>
> __
> r-h...@r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] survexp with weights

2011-04-20 Thread Mike Harwood
Hello,

I probably have a syntax error in trying to generate an expected
survival curve from a weighted cox model, but I can't see it.  I used
the help sample code to generate a weighted model, with the addition
of a "weights=albumin" argument (I only chose albumin because it had
no missing values, not because of any real relevance).  Below are my
code with the resulting error messages.  Thanks in advance!

> pfit <- coxph(Surv(time,status>0) ~ trt + log(bili) + log(protime) + age +
+ + platelet,  data=pbc
+ )
>
> pfit
Call:
coxph(formula = Surv(time, status > 0) ~ trt + log(bili) +
log(protime) +
age + +platelet, data = pbc)


  coef exp(coef) se(coef)z  p
trt  -0.000624 0.999  0.17304 -0.00360 1.
log(bili) 0.985497 2.679  0.08949 11.01262 0.
log(protime)  2.79400116.346  0.95289  2.93215 0.0034
age   0.020590 1.021  0.00805  2.55666 0.0110
platelet -0.001699 0.998  0.00085 -2.00130 0.0450

Likelihood ratio test=164  on 5 df, p=0  n= 308, number of events=
143
   (110 observations deleted due to missingness)
>
> plot(survfit(Surv(time, status>0) ~ trt, data=pbc))
> lines(survexp( ~ trt, ratetable=pfit, data=pbc), col='purple')
>
> pfit.wtd <- coxph(Surv(time,status>0) ~ trt + log(bili) + log(protime) + age +
+ + platelet,  weights=albumin, data=pbc
+ )
>
> pfit.wtd
Call:
coxph(formula = Surv(time, status > 0) ~ trt + log(bili) +
log(protime) +
age + +platelet, data = pbc, weights = albumin)


 coef exp(coef) se(coef)  z   p
trt  -0.01354 0.987 0.094204 -0.144 8.9e-01
log(bili) 0.99282 2.699 0.048690 20.391 0.0e+00
log(protime)  2.5413612.697 0.525797  4.833 1.3e-06
age   0.01913 1.019 0.004398  4.350 1.4e-05
platelet -0.00165 0.998 0.000462 -3.578 3.5e-04

Likelihood ratio test=535  on 5 df, p=0  n= 308, number of events=
143
   (110 observations deleted due to missingness)
> plot(survfit(Surv(time, status>0) ~ trt, data=pbc))
> lines(survexp( ~ trt, ratetable=pfit.wtd, data=pbc), col='purple')
Error in eval(expr, envir, enclos) : object 'albumin' not found
> lines(survexp( ~ trt, ratetable=pfit.wtd, weights=albumin, data=pbc), 
> col='purple')
Error in eval(expr, envir, enclos) : object 'albumin' not found
In addition: Warning message:
In survexp(~trt, ratetable = pfit.wtd, weights = albumin, data =
pbc) :
  Weights ignored
> lines(survexp( ~ trt, ratetable=pfit.wtd, weights=pbc$albumin, data=pbc), 
> col='purple')
Error in eval(expr, envir, enclos) : object 'albumin' not found
In addition: Warning message:
In survexp(~trt, ratetable = pfit.wtd, weights = pbc$albumin, data =
pbc) :
  Weights ignored
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Extrapolating data points for individuals who lack them

2011-04-20 Thread Mike Marchywka





> From: de...@exeter.ac.uk
> To: r-help@r-project.org
> Date: Wed, 20 Apr 2011 12:41:29 +0100
> Subject: [R] Extrapolating data points for individuals who lack them
>
> Hi,
>
> We have an experiment where individuals responses were measured over 5 days. 
> Some responses were not obtained because we only allowed individuals to 
> respond within a limited time-frame. These individuals are given the maximum 
> response time as they did not respond, yet we feel they may have done if 
> given time (and by looking at the rest of their responses over time, the 
> non-response days stand out).
>
> We therefore want to extrapolate data points for individuals, on days when 
> they didn't respond, using a regression of days when they did.
>
> Does anyone know how we could do this quickly and easily in R?

You are probably talking about right censoring. See things like this, 
( you may have good luck just with "R" rather than "CRAN" ) 

http://www.google.com/#sclient=psy&hl=en&source=hp&q=CRAN+informative+%22right+censoring%22


If you post data maybe someone can try a few things. It isn't hard to take data
subsets, fit models, and replace data with model predictions but easier and more
interesting to illustrate with your data.


Personally I would avoid making up data and of course extrapolation tends
to be the most error prone way of doing that. 





>
> Thanks very much
> Dave

  
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] regression and lmer

2011-04-18 Thread Mike Marchywka








Date: Mon, 18 Apr 2011 03:27:40 -0700
From: lampria...@yahoo.com
To: r-help@r-project.org
Subject: Re: [R] regression and lmer


Dear all,

I hope this is the right place to ask this question.
 
( hotmail not marking your text sorry I can';t find option to change that  )
opinions vary but mods seem to let this by and personally it
seems appropriate to discuss general questions about what R 
can do. 

( end my text ) 

I am reviewing a research where the analyst(s) are using a linear
regression model. The dependent variable (DV) is a continuous measure.
The independent variables (IVs) are a mixture of linear and categorical
variables.

( my text ) 

No one wants to do homework or do your job for free but open
free peer review should not be a problem philosophically. 

( / my text, afraid to use less-than for inciting hotmail ) 


The
author investigates whether performance (DV - continuous linear) is a
function of age (continuous IV1 - measured in years), previous
performance (continuous IV2), country (categorical IV3 - six countries), the 
percentage of PhD graduates in each country (continuous IV4 -
country level data - apparently only six different percentages since we
have only six countries) and population of country (continuous IV5 - country 
level data - again only six numbers here, one for each country population).

My own opinion is that the lm function cannot be used with country
level data as IVs (for example IV4 and IV5 cannot be entered into the
model because they are country level data). If IV4 and IV5 are included
in the model, it is possible that the model will not be able to be
defined because we only have six countries and it is very likely that
the levels of counties (IV3) may be confounding with IV4 and IV5. This
also calls for multicollinearity issues, right? I would like to suggest
to the analyst to use lmer using the IV3 as a random variable and  IV4
and IV5 as IV at the second level of the two-level model.


The questions are: (a) Is it true that IV4 and IV5 cannot be entered in a
one-level regression if we also have IV3?, (b) can I use an lm function to 
check for multicollinearity
between IV3, IV4 and IV5?  and (c) If we use a two-level regression
model, does lmer cope well with only six coutnries as a random effect?

( my txt)

So you have presumably a large number of test subjects per country and a small 
number
( n~6 ) of countries. You could ask a number of questions such as, " do the 
mean performances
change from country to country by more than that expected given the observed 
distributions of
performances within country?" You could also ask a question like, " if I try to 
describe performance
as a function of country attriubte what fitting parameters minimize an error 
between fit and observation?"
Apparently author tried to write an expression like 

average_performance= a[country_index] + m1*some_attribute_of_country + m2* 
some_other_attribute_of_country + b

and then expected the fittring algoright to pick a[i], b,m1, and m2 in such a 
way as to minimize
the resulting error. The reported fits hopefully minimize the error function 
but then you need
to exmaine the second derivative in various directions, so you have to ask how 
the error varies
as you change a[i],b,m1, and m2. ( Ignore b right now and assume it s included 
in a[i]). 
I guess if you can find a direction where the error can not change due to these 
contraints then
it would seem to be impossible for the fit to come up with unique values. If 
you change
each a[i] and m1 by some amounts, for example, can you pick those amounts to 
not change anything? 



( /my text ) 



Thank you for your help

Jason
[[alternative HTML version deleted]]


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
  
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Rsquared for anova

2011-04-17 Thread Mike Marchywka



( did this msg make it through the lists as rich text? hotmail
didn't seem to think it was plain text?)

Anyway, having come in in the middle of this it isn't clear
if your issues are with R or stats or both. Usually the hard
core stats people punt the stats questions to other places but
both can be addressed somewhat.
In any case, exploratory work is a good way to learn both and I 
always like looking at new data. If you have one or
a few dependent variable and many independent variable,
it would probably help if you could visualize a
surface with the response as a function of the input
variables and then, maybe with the input of prior information or
anecdotes, you have some idea what tests or
analyses would make sense. 

just some thoughts "for illustration only"

df<-read.table("results_processedCP.txt",header=T)


first it helps to make sure everything went ok and do quick
checks, for example, 

str(df)
unique(df$nh1)
unique(df$nh2)
unique(df$nh3)
unique(df$randsize)
unique(df$aweoghts)
unique(df$aweights)


now personally lots of binary variable confuse me and
I can munge them all together since I expect I can
later identify issues in following plots. So, with
this data you can create a composite variable like this,
( now I have not checked any of this for accuracy
and typos and other problems may render the results useless)

x=df$nh1+2*df$nh2+4*df$nh3+2*df$randsize+32*df$aweights
df2<-cbind(df,x)
str(df2)

not sure if "time" was an input or output but you could
see if there is any obvious trend or periodicity of
time with your new made up variable,

plot(df2$time,df2$x)

Apparently x is a num rather than int, it can be changed for illustration
but probably of no consequence,

xi=as.integer(x)
str(xi)

and then you can add color based on this varaiable, 

min(xi)
c=rainbow(56)
cx=c[xi+1]
str(cx)

and make color coded scatter plots. Now, if you 
got lucky and guessed right you may see some patterns
that you want to test, 

plot(df2$tos,df2$tws,col=cx)

in this case, I get a cool red-yellow-green line along bottom ( very
compelling linear fit question ) and scattered magenta( pink red? LOL ) and 
blue points
everywhere with cluster near origin and nothing in top right quadrant. 
Also note a few blues lines above the red-green-yellow line but much shorter.

And in fact, presumably you already knew this as it looks like it was designed
in, if you just plot the red and green points the fit looks perfect for linear,


> good=which(df2$x<20)
> plot(df2$tos[good],df2$tws[good],col=cx[good])

now if you look at results of fit of "Good" points vs all points,
it isn't clear that anything like this would emerge from just
looking at summaries of a linear fit, 


td=df2$tos[good]
ti=df2$tws[good]
lm(td~ti)
lm(df2$tos~ df2$tws)
summary(lm(td~ti))
summary(lm(df2$tos~ df2$tws))





Now of course "tests" need to be considered ahead of time or else
it is easy to go shopping for the answer you want. Anything post hoc
needs to be very complete and you should at least try to rationalize
test results you don't happen to like ( assuming you are trying to understand
the system from which the data was measured rather than justify some
particular outcome). 




Date: Sun, 17 Apr 2011 11:34:14 +0200
From: dorien.herrem...@ua.ac.be
To: dieter.me...@menne-biomed.de
CC: r-help@r-project.org
Subject: Re: [R] Rsquared for anova

Thanks for your remarks. I've been reading about R for the last two days,
but I don't really get when I should use lm or aov.
 
I have attached the dataset, feel free to take a look at it.
 
So far, running it with alle the combinations did not take too long and
there seem to be some effects between the parameters. However, 2x2
combinations might suffice.
 
Thanks for any help, or a pointer to some good documentation,
 
Dorien
 
 
On 16 April 2011 10:13, Dieter Menne  wrote:
 
>
> dorien wrote:
> >
> >> fit <- lm((tos~nh1*nh2*nh3*randsize*aweights*tt1*tt2*tt3*iters*length,
> > data=expdata))
> > Error: unexpected ',' in "fit <-
> > lm((tos~nh1*nh2*nh3*randsize*aweights*tt1*tt2*tt3*iters*length,"
> >
> >
>
> Peter's point is the important one: too many interactions, and even with +
> instead of * you might be running into problems.
>
> But anyway: if you don't let us access
>
>
> /home/dorien/UA/meta-music/optimuse/optimuse1-build-desktop/results/results_processedCP
>
> you cannot expect a better answer which will depend on the structure of the
> data set.
>
> Dieter
>
>
>
> --
> View this message in context:
> http://r.789695.n4.nabble.com/Rsquared-for-anova-tp3452399p3453719.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
 
 
 
-- 
Dorien Herremans
 
*Department of Environment, Technology and Technology Mana

<    1   2   3   4   5   6   7   8   9   >