Re: [R] change value in one cell

2011-06-11 Thread jour4life
That's great! But, I guess I should have provided a better example for 
my problem. The reason being that I did not consider strings. For 
instance, in my case, I'm actually trying to add a value to a data frame 
that looks like this:

Obs X
0 NA
1 01 001
2 01 002
3 01 003

And I am actually trying to get that specific NA value and convert it 
to 00 000 so it can look like this:

Obs X
0 00 000
1 01 001
2 01 002
3 01 003

When I write the code you provided, I get this result:

x[1,X]-00 000
Warning message:
In `[-.factor`(`*tmp*`, iseq, value = c(NA, 3L, 4L, 5L, 6L, 7L,  :

invalid factor level, NAs generated

I am wondering what I am doing wrong and how to solve this problem.

Thanks for all the help!

Carlos



On 6/10/2011 8:34 PM, jholtman [via R] wrote:
 you probably need to read the Introduction to R to understand indexing:

  x
   Obs X Y Z
 1   1 1 0 1
 2   2 0 0 1
 3   3 1 1 1
 4   4 0 1 1
 5   5 1 1 0
 6   6 0 0 0
  x[4, Y] - 0
  x
   Obs X Y Z
 1   1 1 0 1
 2   2 0 0 1
 3   3 1 1 1
 4   4 0 0 1
 5   5 1 1 0
 6   6 0 0 0


 On Fri, Jun 10, 2011 at 5:42 PM, jour4life [hidden email] 
 /user/SendEmail.jtp?type=nodenode=3589793i=0 wrote:

  Hello all,
 
  I am wondering if there is a way to change the value of one cell in 
 R. For
  instance let's say I have a hypothetical data frame that looks like 
 this:
 
  Obs X Y Z
  11 0 1
  20 0 1
  31 1 1
  40 1 1
  51 1 0
  60 0 0
 
  I would like to change the value of the 4th observation in the Y 
 column from
  1 to 0. It should look like this:
 
  Obs X Y Z
  11 0 1
  20 0 1
  31 1 1
  40 0 1
  51 1 0
  60 0 0
 
  Is it possible to change the value in one command?
 
  Thanks,
 
  Carlos
 
 
  --
  View this message in context: 
 http://r.789695.n4.nabble.com/change-value-in-one-cell-tp3589456p3589456.html
  Sent from the R help mailing list archive at Nabble.com.
 
  __
  [hidden email] /user/SendEmail.jtp?type=nodenode=3589793i=1 
 mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 



 -- 
 Jim Holtman
 Data Munger Guru

 What is the problem that you are trying to solve?

 __
 [hidden email] /user/SendEmail.jtp?type=nodenode=3589793i=2 
 mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


 
 If you reply to this email, your message will be added to the 
 discussion below:
 http://r.789695.n4.nabble.com/change-value-in-one-cell-tp3589456p3589793.html 

 To unsubscribe from change value in one cell, click here 
 http://r.789695.n4.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=3589456code=am91cjRsaWZlQGdtYWlsLmNvbXwzNTg5NDU2fC0xNjIzNjcwNDM1.
  




--
View this message in context: 
http://r.789695.n4.nabble.com/change-value-in-one-cell-tp3589456p3590037.html
Sent from the R help mailing list archive at Nabble.com.
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Memory(RAM) issues

2011-06-11 Thread Abhisek Saha
Thanks Anupam for your inputs. I believe there are two ways to
circumvent the issue...1 making the code more efficient 1 Increasing
the memory in some way.The reasons why I did not paste the code are 1
It is long and using quite a number of functions that  I created 2
Secondly my intention is not to make the code more efficient if that's
possible. Here landing into a memory problem with 2 GB RAM is natural
as my analysis entails 1500 simulations from huge multivariate
distributions that change after every simulation and tomorrow I may
have to do similar analysis with 10 million observations * 20 columns.

In view of above I shall be needing more memory sometime later and my
IT friends are ready to support me for that(probably with a sandbox)
but I am not sure whether I can install probably a servor version of R
that can be capable of working with 8GB or so RAM. So it is more of
technical help I need and I have no clue regarding the plausibility of
the solution mentioned( i.e. a servor version of R that is capable of
more memory).

Regards,
Abhisek

On Sat, Jun 11, 2011 at 10:10 AM, Anupam anupa...@gmail.com wrote:

 It will be helpful on this forum to use metric measures: 12 Lakh is 1.2
 million, thus your data is 1.2 million observations x 15 variables. I do not
 know the intricacies of R. You may have to wait for someone with that
 knowledge to respond.

 Including some relevant portions of error messages and code in your query
 can also help someone to respond to your message.

 Anupam.
 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On
 Behalf Of Abhisek Saha
 Sent: Saturday, June 11, 2011 6:25 AM
 To: r-help@r-project.org
 Subject: [R] Memory(RAM) issues

 Dear All,
 I have been working with R(desktop version) in VISTA. I have the latest
 version R 2.13.0. I have been working with few data-sets of size 12Lakh * 15
 and my code is quite computing intensive ( applying MCMC gibbs sampler on a
 posterior of 144 variables) that often runs into a memory issue such as
 memory can not allocate the vector ,full size(shows to have reached
 something like 1.5 GB) reached or something to this effect. I have a RAM of
 2GB.  I checked with the option like memory.size and it says a 64 bit R if
 sat on 64 bit windows then max memory capability is 8TB.

 Now I don't have  background to understand the definitions and differences
 between 32 and 64 bit machines and other technical requirements like servor
 etc but it would be of great help if anyone can let me have a feel of it.
 Could any of you tell me whether some servor version of R would resolve my
 issue or not (I am not sure now what kind of servor my company would allow R
 to be installed at this point ,may be linux type) and if that's the case
 could any of you guide me about how to go about installing that onto a
 sevor.

 Thank you,
 Abhisek

        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] change value in one cell

2011-06-11 Thread Rolf Turner

On 11/06/11 18:14, jour4life wrote:

That's great! But, I guess I should have provided a better example for
my problem. The reason being that I did not consider strings. For
instance, in my case, I'm actually trying to add a value to a data frame
that looks like this:

Obs X
0 NA
1 01 001
2 01 002
3 01 003

And I am actually trying to get that specific NA value and convert it
to 00 000 so it can look like this:

Obs X
0 00 000
1 01 001
2 01 002
3 01 003

When I write the code you provided, I get this result:

x[1,X]-00 000
Warning message:
In `[-.factor`(`*tmp*`, iseq, value = c(NA, 3L, 4L, 5L, 6L, 7L,  :

invalid factor level, NAs generated

I am wondering what I am doing wrong and how to solve this problem.

Thanks for all the help!


The error is clear enough, isn't it?  00 000 is not one of the levels 
of the

X column of your data frame; this column is a factor.  Probably because
of the stringsAsFactors = TRUE default in options().

Since you appear to have been unaware of the factor nature of X, 
presumably

you don't really want it to be a factor.  If this is the case execute

x[,X] - as.character(x[,X])

and then your reassignment of the [1,X] entry of x will work.

If you do want X to be a factor you could:

(a) execute x[,X] - factor(x[,X]) *after* doing the 
reassignment, or


(b) execute levels(x[,X]) - c(00 000,levels(x[,X])) *before* 
doing

the reassignment.

Learn more about how R works.  In particular learn about factors; they are
important and useful.

cheers,

Rolf Turner

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] change value in one cell

2011-06-11 Thread Jim Holtman
your dataframe has X as a factor; convert to character vector and try again.

x$X - as.character(x$X)

Sent from my iPad

On Jun 11, 2011, at 2:14, jour4life jour4l...@gmail.com wrote:

 That's great! But, I guess I should have provided a better example for 
 my problem. The reason being that I did not consider strings. For 
 instance, in my case, I'm actually trying to add a value to a data frame 
 that looks like this:
 
 Obs X
 0 NA
 1 01 001
 2 01 002
 3 01 003
 
 And I am actually trying to get that specific NA value and convert it 
 to 00 000 so it can look like this:
 
 Obs X
 0 00 000
 1 01 001
 2 01 002
 3 01 003
 
 When I write the code you provided, I get this result:
 
 x[1,X]-00 000
 Warning message:
 In `[-.factor`(`*tmp*`, iseq, value = c(NA, 3L, 4L, 5L, 6L, 7L,  :
 
 invalid factor level, NAs generated
 
 I am wondering what I am doing wrong and how to solve this problem.
 
 Thanks for all the help!
 
 Carlos
 
 
 
 On 6/10/2011 8:34 PM, jholtman [via R] wrote:
 you probably need to read the Introduction to R to understand indexing:
 
 x
  Obs X Y Z
 1   1 1 0 1
 2   2 0 0 1
 3   3 1 1 1
 4   4 0 1 1
 5   5 1 1 0
 6   6 0 0 0
 x[4, Y] - 0
 x
  Obs X Y Z
 1   1 1 0 1
 2   2 0 0 1
 3   3 1 1 1
 4   4 0 0 1
 5   5 1 1 0
 6   6 0 0 0
 
 
 On Fri, Jun 10, 2011 at 5:42 PM, jour4life [hidden email] 
 /user/SendEmail.jtp?type=nodenode=3589793i=0 wrote:
 
 Hello all,
 
 I am wondering if there is a way to change the value of one cell in 
 R. For
 instance let's say I have a hypothetical data frame that looks like 
 this:
 
 Obs X Y Z
 11 0 1
 20 0 1
 31 1 1
 40 1 1
 51 1 0
 60 0 0
 
 I would like to change the value of the 4th observation in the Y 
 column from
 1 to 0. It should look like this:
 
 Obs X Y Z
 11 0 1
 20 0 1
 31 1 1
 40 0 1
 51 1 0
 60 0 0
 
 Is it possible to change the value in one command?
 
 Thanks,
 
 Carlos
 
 
 --
 View this message in context: 
 http://r.789695.n4.nabble.com/change-value-in-one-cell-tp3589456p3589456.html
 Sent from the R help mailing list archive at Nabble.com.
 
 __
 [hidden email] /user/SendEmail.jtp?type=nodenode=3589793i=1 
 mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 
 
 
 -- 
 Jim Holtman
 Data Munger Guru
 
 What is the problem that you are trying to solve?
 
 __
 [hidden email] /user/SendEmail.jtp?type=nodenode=3589793i=2 
 mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 
 
 If you reply to this email, your message will be added to the 
 discussion below:
 http://r.789695.n4.nabble.com/change-value-in-one-cell-tp3589456p3589793.html
  
 
 To unsubscribe from change value in one cell, click here 
 http://r.789695.n4.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=3589456code=am91cjRsaWZlQGdtYWlsLmNvbXwzNTg5NDU2fC0xNjIzNjcwNDM1.
  
 
 
 
 
 --
 View this message in context: 
 http://r.789695.n4.nabble.com/change-value-in-one-cell-tp3589456p3590037.html
 Sent from the R help mailing list archive at Nabble.com.
[[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] rcspline.plot query

2011-06-11 Thread Georgios Chalikias

Dear all,

As I am new to the R community - although eager to advance- I would 
like to pose a question to the community.


I have an SPSS file which I have imported it in R (with the read.spss 
command) which conists of scale (continuous) variable adiponectin and 
the corresponding categorical value death (0=No, 1=Yes). In all there 
are 60 observations (among which there are 26 events).


I am trying to do rcspline.plot (model=logisitc, nk=5, knots=NULL) 
however the error message that i get is fewer than 6 non missing 
observations.


I have repeated the process with a similar dataset of 300 observations 
(instead of 60) and I still get the same message.


Can anybody suggest a hint ?

Kind regards

Georgios Chalikias MD,PhD
Senior Lecturer in Cardiology
Democritus University of Thrace
Alexandroupolis, Greece

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] RES: Linear multivariate regression with Robust error

2011-06-11 Thread Mike Marchywka















 Date: Fri, 10 Jun 2011 16:50:24 -0300
 From: filipe.bote...@vpar.com.br
 To: frien...@yorku.ca; bkkoc...@gmail.com
 CC: r-help@r-project.org
 Subject: [R] RES: Linear multivariate regression with Robust error



 --Forwarded Message Attachment--
 Subject: RES: [R] Linear multivariate regression with Robust error
 Date: Fri, 10 Jun 2011 16:50:24 -0300
 From: filipe.bote...@vpar.com.br
 To: frien...@yorku.ca; bkkoc...@gmail.com
 CC: r-help@r-project.org


 Hi Barjesh,

 I am not sure which data you analyze, but once I had a similar situation
 and it was a multicolinearity issue. I realized this after finding a
 huge correlation between the independent variables, then I dropped one
 of them and signs of slopes made sense.

 Beforehand, have a glance at your correlation matrix of independent
 variables to see the relationship between them.

I guess look at the data ( LATFD ) seems to be the recurring solution.
Certainly with polynomial regression, there's a tendency to think that
the fit parameter is a pure property of the system being examined. With
something like a Taylor series, and you have a slope at a specific point,
maybe you could think about it that way but here the coefficient
is just whatever optimizes your (arbitrary ) error function for the data
you have. A linear approaximation to a non-line could be made at
one point and maybe that property should remain constant but 
you have people claiming  past authors samples some curve near x=a
and got a different slope than my work largely smapling f(x) around
x=b what is wrong with my result? It may be interesting to try to
write a taylor series around some point and see how those coefficients
vary with data sets for example( you still need arbitrary way to
estimate slopes and simply differencing two points may be a bit noisy LOL 
but you could play with some wavelet families maybe ). 

If you try to describe a fruit as a linear combination of vegetables,
you may get confusing but possibly useful results even if they don't correspond 
to
properties of the fruits so much as a specific need you have. For example, if 
you
are compressing images of fruits and your decoder already has a dictionary
of vegetables, it may make sense to do this.  This is not much different from 
trying to compress non-vocal music with an ACELP codec that attempts to fit the 
sounds
to models of human vocal tract. Sometimes this may be even be informative
about how a given sound was produced even if it sounds silly. 













 Not sure how helpful my advice will be, but it did the trick for me back
 then ;)

 Cheers,
 Filipe Botelho

 -Mensagem original-
 De: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
 Em nome de Michael Friendly
 Enviada em: sexta-feira, 10 de junho de 2011 12:51
 Para: Barjesh Kochar
 Cc: r-help@r-project.org
 Assunto: Re: [R] Linear multivariate regression with Robust error

 On 6/10/2011 12:23 AM, Barjesh Kochar wrote:
  Dear all,
 
  i am doing linear regression with robust error to know the effect of
  a (x) variable on (y)other if i execute the command i found positive
  trend.
  But if i check the effect of number of (x.x1,x2,x3)variables
  on same (y)variable then the positive effect shwon by x variable turns
  to negative. so plz help me in this situation.
 
  Barjesh Kochar
  Research scholar
 
 You don't give any data or provide any code (as the posting guide
 requests) , so I have to guess that you
 have just rediscovered Simpson's paradox -- that the coefficient of a
 variable in a marginal regression can have an opposite sign to that in
 a joint model with other predictors. I have no idea what you mean
 by 'robust error'.

 One remedy is an added-variable plot which will show you the partial
 contributions of each predictor in the joint model, as well as whether
 there are any influential observations that are driving the estimated
 coefficients.


 --
 Michael Friendly Email: friendly AT yorku DOT ca
 Professor, Psychology Dept.
 York University Voice: 416 736-5115 x66249 Fax: 416 736-5814
 4700 Keele Street Web: http://www.datavis.ca
 Toronto, ONT M3J 1P3 CANADA

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

 This message and its attachments may contain confidential and/or
 privileged information. If you are not the addressee, please, advise
 the sender immediately by replying to the e-mail and delete this
 message. Este mensaje y sus anexos pueden contener información
 confidencial o privilegiada. Si ha recibido este e-mail por error por
 favor bórrelo y envíe un mensaje al remitente. Esta mensagem e seus
 anexos podem conter informação confidencial ou privilegiada. Caso não
 seja o destinatário, solicitamos a imediata notificação ao 

Re: [R] rcspline.plot query

2011-06-11 Thread Frank Harrell
The error message is literally true, as you did not specify any data to the
function at all.  Please read the documentation.  Also, note that
rcspline.plot is largely replaced by the R rms package, although it may have
value if there is only one predictor and that predictor is continuous.  Also
note spelling of logistic.

You will need to load the rms package when using rcspline.plot anyway.

When reading SPSS files you may find the Hmisc package's spss.get function
useful.  You've already loaded Hmisc if using rcspline.plot.

Please follow the posting guide.
Frank

Georgios Chalikias wrote:
 
 Dear all,
 
  As I am new to the R community - although eager to advance- I would 
  like to pose a question to the community.
 
  I have an SPSS file which I have imported it in R (with the read.spss 
  command) which conists of scale (continuous) variable adiponectin and 
  the corresponding categorical value death (0=No, 1=Yes). In all there 
  are 60 observations (among which there are 26 events).
 
  I am trying to do rcspline.plot (model=logisitc, nk=5, knots=NULL) 
  however the error message that i get is fewer than 6 non missing 
  observations.
 
  I have repeated the process with a similar dataset of 300 observations 
  (instead of 60) and I still get the same message.
 
  Can anybody suggest a hint ?
 
  Kind regards
 
  Georgios Chalikias MD,PhD
  Senior Lecturer in Cardiology
  Democritus University of Thrace
  Alexandroupolis, Greece
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 


-
Frank Harrell
Department of Biostatistics, Vanderbilt University
--
View this message in context: 
http://r.789695.n4.nabble.com/rcspline-plot-query-tp3590233p3590392.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Amazon AWS, RGenoud, Parallel Computing

2011-06-11 Thread Lui ##
Dear R group,

since I do only have a moderately fast MacBook and I would like to get
my results faster than within 72h per run ( :-((( ), I tried Amazon
AWS, which offers pretty fast computers via remote access. I don't
want to post any code because its several hundreds lines of code, I am
not looking for the optimal answer, but maybe some suggestions from
you if faced similar problems.

I did install Revolution on Windows on the amazon instance. I usually
go for the large one (8 cores, about 20 Ghz, several GB of RAM).

- I am running a financial analysis over several periods (months) in
which various CVaR calculations are made (with the rGenoud package).
The periods do depend on each other, so parallelizing that does not
work. I was quite surprised how well written all the libraries seem
for R on Mac since they seem to use my dual core on the Macbook for a
large portion of the calculations (I guess the matrix multiplications
and the like). I was a little bit astonished though, that the
performance increase on the Amazon instance (about 5 times faster than
my Macbook) was very moderate with only about a 30% decrease in
calculation time. The CPUs were about 60% in use (obviously, the code
was not written specifically for several cores).

(1) I did try to use multiple cores for the rGenoud package (snow
package) as mentioned on the excellent website
(http://sekhon.berkeley.edu/rgenoud/multiple_cpus.html) but found a
rather strange behaviour: The CPU use on the Amazon instances would
decrease to about 25% with periodic peaks. At least the first
instance/optimization rum took significant longer (several times
longer) than without explicitly including multicores in the genoud
function. The number of cores I used was usually smaller than the
number of cores I had at my service (4 of 8). So it does not seem like
I am able to improve my performance here, even though I think it is
somewhat strange...

(2) I tried to improve the performance by parallelizing the solution
quality functions (which are subject to minimization by rGenoud): One
was basically a sorting algorithm (CVaR), the other one just a matrix
multiplication sort of thing. Parallelizing either the composition of
the solution function (which was the sum of the CVaR and matrix
multiplication) or parallelizing the sort function (splitting up the
dataset and later uniting subsets of the solution again) did not show
any improvements: the performance was much worse - even  though all 8
CPUs were 100% idle... I do think that it has to do with all the data
management between the instances...

I am a little bit puzzled now about what I could do... It seems like
there are only very limited options for me to increase the
performance. Does anybody have experience with parallel computations
with rGenoud or parallelized sorting algorithms? I think one major
problem is that the sorting happens rather quick (only a few hundred
entries to sort), but needs to be done very frequently (population
size 2000, iterations 500), so I guess the problem with the
housekeeping of the parallel computation deminishes all benefits.

I tried snowfall (for #2) and the snow package (#1). I also tried the
foreach library - but could get it working on windows...

Suggestions with respect to operating system, Amazon AWS, or rgenoud
are highly appreciated.

Thanks a lot!

Lui

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] To add cut-off points in surface response with lattice

2011-06-11 Thread Ivan Allaman
Good morning gentlemen! 

I'm not a fan of the lattice due to a large number of procedures what should
be done to reach a simple goal, but have confess that in some cases the
graphics are way better than the graphics. Some days I have been searching
without success as is to add a cut-off point on a graph of response surface.
It is interesting that the researcher to look at the graph spotting cut-off
point. Here is a minimal code reproducible. 

require(plotrix)
jet.colors - colorRampPalette( c(blue, green) )  
x - seq(-1.95, 1.95, length=30)
y - seq(-1.95, 1.95, length=35)
da - expand.grid(x=x, y=y)
da$z - with(da, x*y^2)

require(lattice)
panel.3d.contour -
  function(x, y, z, rot.mat, distance,
   nlevels = 20, zlim.scaled, ...)
  {
add.line - trellis.par.get(add.line)
panel.3dwire(x, y, z, rot.mat, distance,
 zlim.scaled = zlim.scaled, ...)
clines -
  contourLines(x, y, matrix(z, nrow = length(x), byrow = TRUE),
   nlevels = nlevels)
for (ll in clines) {
  m - ltransform3dto3d(rbind(ll$x, ll$y, zlim.scaled[1]),
rot.mat, distance)
  panel.lines(m[1,], m[2,], col = add.line$col,
  lty = add.line$lty, lwd = add.line$lwd)
}
  }
wireframe(z~x+y, da, drape=TRUE,
scales=list(arrows=FALSE),col.regions=jet.colors(100),panel.3d.wireframe=panel.3d.contour)


Tank' s!

--
View this message in context: 
http://r.789695.n4.nabble.com/To-add-cut-off-points-in-surface-response-with-lattice-tp3590414p3590414.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Memory(RAM) issues

2011-06-11 Thread Lui ##
Hello Abhisek,

maybe you wanna try it on just a bigger machine (I guess you are
working at a university, so I hope you do have access to them). In
case getting computing time or the like is a major issue, you may
wanna try Amazon AWS: For a few rupees (about 50-100 per hour) you can
rent pretty fast computers (20 Ghz, 8BG of RAM). You may want to try
out the Windows version (little bit more expensive) which is easily
accessible via remote desktop. Installing Revolution (which is free
for academic purposes) (64 Bit Version) might give you a good start.
Maybe its not a viable option in a long term view (pricing), but it
might help to get a clue whether the problem can be solved on a bigger
machine and just trying it out without a bigger hastle...

Good luck!

Lui

On Sat, Jun 11, 2011 at 7:47 AM, Abhisek Saha tad...@gmail.com wrote:
 Thanks Anupam for your inputs. I believe there are two ways to
 circumvent the issue...1 making the code more efficient 1 Increasing
 the memory in some way.The reasons why I did not paste the code are 1
 It is long and using quite a number of functions that  I created 2
 Secondly my intention is not to make the code more efficient if that's
 possible. Here landing into a memory problem with 2 GB RAM is natural
 as my analysis entails 1500 simulations from huge multivariate
 distributions that change after every simulation and tomorrow I may
 have to do similar analysis with 10 million observations * 20 columns.

 In view of above I shall be needing more memory sometime later and my
 IT friends are ready to support me for that(probably with a sandbox)
 but I am not sure whether I can install probably a servor version of R
 that can be capable of working with 8GB or so RAM. So it is more of
 technical help I need and I have no clue regarding the plausibility of
 the solution mentioned( i.e. a servor version of R that is capable of
 more memory).

 Regards,
 Abhisek

 On Sat, Jun 11, 2011 at 10:10 AM, Anupam anupa...@gmail.com wrote:

 It will be helpful on this forum to use metric measures: 12 Lakh is 1.2
 million, thus your data is 1.2 million observations x 15 variables. I do not
 know the intricacies of R. You may have to wait for someone with that
 knowledge to respond.

 Including some relevant portions of error messages and code in your query
 can also help someone to respond to your message.

 Anupam.
 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On
 Behalf Of Abhisek Saha
 Sent: Saturday, June 11, 2011 6:25 AM
 To: r-help@r-project.org
 Subject: [R] Memory(RAM) issues

 Dear All,
 I have been working with R(desktop version) in VISTA. I have the latest
 version R 2.13.0. I have been working with few data-sets of size 12Lakh * 15
 and my code is quite computing intensive ( applying MCMC gibbs sampler on a
 posterior of 144 variables) that often runs into a memory issue such as
 memory can not allocate the vector ,full size(shows to have reached
 something like 1.5 GB) reached or something to this effect. I have a RAM of
 2GB.  I checked with the option like memory.size and it says a 64 bit R if
 sat on 64 bit windows then max memory capability is 8TB.

 Now I don't have  background to understand the definitions and differences
 between 32 and 64 bit machines and other technical requirements like servor
 etc but it would be of great help if anyone can let me have a feel of it.
 Could any of you tell me whether some servor version of R would resolve my
 issue or not (I am not sure now what kind of servor my company would allow R
 to be installed at this point ,may be linux type) and if that's the case
 could any of you guide me about how to go about installing that onto a
 sevor.

 Thank you,
 Abhisek

        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] GLS model - diagnostic plots

2011-06-11 Thread Craig, Jessica
I am applying a GLS model and would like to look at diagnostic plots of 
influence.
The function (plot(model)) that works for linear models does not seem to 
function for GLS models. Is there a reason for this? Or is different code 
required?

Sorry if it's a very basic question, but thanks for your help!




The University of Aberdeen is a charity registered in Scotland, No SC013683.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] plot of tree

2011-06-11 Thread Shuiwang Ji
friends,

I need some tool to visualize the results of hierarchical clustering.
Specifically, I want to plot it as a radial plot as a phylogenetic tree. In
addition, I want to specify the color to each leaf node. I search all
phylogenetic tree plotting routines here, they all cannot show the leaf
nodes and their colors.

Since i will have a lot of leaf nodes, the color needs to be specified and
given as a parameter, instead of manually.

Can someone point me to some software packages?

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] plot of hierarchical clustering results

2011-06-11 Thread Shuiwang Ji
friends,

I need some tool to visualize the results of hierarchical clustering.
Specifically, I want to plot it as a radial plot as a phylogenetic tree. In
addition, I want to specify the color to each leaf node. I search all
phylogenetic tree plotting routines here, they all cannot show the leaf
nodes and their colors.

Since i will have a lot of leaf nodes, the color needs to be specified and
given as a parameter, instead of manually.

Can someone point me to some software packages?

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Inspecting C code in an R package

2011-06-11 Thread Layman123
Hello everyone,

Trying to comprehend code of an R package, I encountered the problem that
the interesting part of the
function I'm inspecting is written in C-Code and called by .C(somename,
). Now I can't inspect the C-Code the function is calling since I can't
find it in the folder of the package.
Does someone know, where the corresponding C-Code ist stored, so I could
inspect it and comprehend what is happening?

Thank you very much in advance!

Regards
Roman

--
View this message in context: 
http://r.789695.n4.nabble.com/Inspecting-C-code-in-an-R-package-tp3590596p3590596.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Inspecting C code in an R package

2011-06-11 Thread Sarah Goslee
The easiest thing to do is download the source package from your local
CRAN mirror. That will contain all the R and other code.

Sarah

On Sat, Jun 11, 2011 at 10:50 AM, Layman123 romanhorn...@web.de wrote:
 Hello everyone,

 Trying to comprehend code of an R package, I encountered the problem that
 the interesting part of the
 function I'm inspecting is written in C-Code and called by .C(somename,
 ). Now I can't inspect the C-Code the function is calling since I can't
 find it in the folder of the package.
 Does someone know, where the corresponding C-Code ist stored, so I could
 inspect it and comprehend what is happening?

 Thank you very much in advance!

 Regards
 Roman


-- 
Sarah Goslee
http://www.functionaldiversity.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Inspecting C code in an R package

2011-06-11 Thread Layman123
Thank you Sarah for your quick answer! I've just downloaded the source
package, but now I don't know in which file the C-Code is stored
.C(somename,) is calling - there is no file with the name
somename. How could one figure that out?

--
View this message in context: 
http://r.789695.n4.nabble.com/Inspecting-C-code-in-an-R-package-tp3590596p3590641.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Inspecting C code in an R package

2011-06-11 Thread Jeff Newmiller
Presumably in the source code of the package (something.tar.gz). Follow the 
posting guidelines if you want further assistance.
---
Jeff Newmiller The . . Go Live...
DCN:jdnew...@dcn.davis.ca.us Basics: ##.#. ##.#. Live Go...
Live: OO#.. Dead: OO#.. Playing
Research Engineer (Solar/Batteries O.O#. #.O#. with
/Software/Embedded Controllers) .OO#. .OO#. rocks...1k
--- 
Sent from my phone. Please excuse my brevity.

Layman123 romanhorn...@web.de wrote:

Hello everyone,

Trying to comprehend code of an R package, I encountered the problem that
the interesting part of the
function I'm inspecting is written in C-Code and called by .C(somename,
). Now I can't inspect the C-Code the function is calling since I can't
find it in the folder of the package.
Does someone know, where the corresponding C-Code ist stored, so I could
inspect it and comprehend what is happening?

Thank you very much in advance!

Regards
Roman

--
View this message in context: 
http://r.789695.n4.nabble.com/Inspecting-C-code-in-an-R-package-tp3590596p3590596.html
Sent from the R help mailing list archive at Nabble.com.

_

R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Amazon AWS, RGenoud, Parallel Computing

2011-06-11 Thread Mike Marchywka





 Date: Sat, 11 Jun 2011 13:03:10 +0200
 From: lui.r.proj...@googlemail.com
 To: r-help@r-project.org
 Subject: [R] Amazon AWS, RGenoud, Parallel Computing

 Dear R group,


[...]

 I am a little bit puzzled now about what I could do... It seems like
 there are only very limited options for me to increase the
 performance. Does anybody have experience with parallel computations
 with rGenoud or parallelized sorting algorithms? I think one major
 problem is that the sorting happens rather quick (only a few hundred
 entries to sort), but needs to be done very frequently (population
 size 2000, iterations 500), so I guess the problem with the
 housekeeping of the parallel computation deminishes all benefits.

Your sort is part of algorithm or you have to sort results after 
getting then back out of order from async processes? One of 
my favorite anecdotes is how I used a bash sort on huge data
file to make program run faster ( from impractical zero percent CPU
to very fast with full CPU usage and you complain about exactly
a lack of CPU saturation). I guess a couple of comments. First, 
if you have specialized apps you need optimized, you may want
to write dedicated c++ code. However, this won't help if
you don't find the bottleneck. Lack of CPU saturation could
easily be due to waiting for stuff like disk IO or VM
swap. You really ought to find the bottle neck first, it
could be anything ( except the CPU maybe LOL). The sort
that I used prevented VM thrashing with no change to the app
code- the app got sorted data and so VM paging became infrequent.
If you can specify the problem precisely you may be able to find
a simple solution. 

  
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] running R commands asynchronously

2011-06-11 Thread Greg Snow
Good points, I was going to include something along the lines of don't change 
any data in the console that is involved in the asynchronous process, but 
figured Richard was smart enough to not do something like that.  But your 
points add to that and make it clear for others who read this that don't have 
the experience to know better.

-Original Message-
From: Prof Brian Ripley [mailto:rip...@stats.ox.ac.uk] 
Sent: Friday, June 10, 2011 11:36 PM
To: Greg Snow
Cc: Richard M. Heiberger; r-help
Subject: Re: [R] running R commands asynchronously

On Fri, 10 Jun 2011, Greg Snow wrote:

 Tk windows run asynchronous to the rest of R, so you could write a 
 small function that lets you type the command into a Tk window and 
 runs it while you continue to work in the main R, then the window 
 could signal you when the function finishes.

That is really not advised.  R is not designed to run asynchronous R 
commands and is not protected against quite a lot of things which can 
happen if you try that (many low-level routines are not re-entrant, 
for example).  Allowing callbacks to either run simple things (as 
happens from e.g. the GUI menus) whilst a task is running, or to do 
almost all the running (as happens with a menu system built in Tk) is 
fairly safe just because only one task is likely to run at a time.

For over a decade Duncan TL has promised a facility to run tasks in 
parallel in R (as I recall the estimate at DSC 2001 was 12 months).

So the only safe way at present (and the foreseeable future) is to run 
separate processes.  Packages snow and multicore provide means to do 
that (and to wait for and collect results): in the case of multicore 
the parallel tasks work on a copy of the current session and so you do 
come close to the appearance of running aysnchronous tasks.  (By 
choosing to use Windows you exclude yourself from multicore.)


 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
 Behalf Of Richard M. Heiberger
 Sent: Friday, June 10, 2011 3:30 PM
 To: r-help
 Subject: [R] running R commands asynchronously

 I am interested in running R commands asynchronously.

 My first choice is in the same R session that I am currently in.
 Here, the goal would be to run something like

 RunAsynchSameSession(myfunction(), outputname.rda)

 Once RunAsynchSameSession had started myfunction(),
 RunAsynchSameSession would complete immediately.  myfunction would
 keep going.  It is OK if execution of the myfunction() command
 prevents new input to R until it has completed.  The important feature
 is that RunAsynchSameSession must tell the progam that called it that
 it was done.

 Second choice is to start an independent process, BATCH or something
 similar, and save the resulting data objects in an .rda file.

 RunAsynchBatch(myfile.r, outputname.rda)

 The RunAsynchBatch would start a batch process and complete
 immediately after starting the batch file.  The batch file would run
 independently until it was completed.  While I know how to do this,
 for example with system(wait=FALSE), I would appreciate seeing a
 worked collection of statements, including getting outputname.rda back
 to the original R session.  I am working on Windows.

 Rich

-- 
Brian D. Ripley,  rip...@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Inspecting C code in an R package

2011-06-11 Thread Layman123
Thank you! Of course, I will read the posting guidelines. A subscriber helped
me via e-mail telling me to use the grep-command, that is type in: grep
somename *.c. For Windows users it's: findstr somename *.c.

--
View this message in context: 
http://r.789695.n4.nabble.com/Inspecting-C-code-in-an-R-package-tp3590596p3590744.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] plot of tree

2011-06-11 Thread John
On Saturday, June 11, 2011 04:26:44 AM Shuiwang Ji wrote:
 friends,
 
 I need some tool to visualize the results of hierarchical clustering.
 Specifically, I want to plot it as a radial plot as a phylogenetic tree. In
 addition, I want to specify the color to each leaf node. I search all
 phylogenetic tree plotting routines here, they all cannot show the leaf
 nodes and their colors.
 
 Since i will have a lot of leaf nodes, the color needs to be specified and
 given as a parameter, instead of manually.
 
 Can someone point me to some software packages?
 
You might want to search here:

http://addictedtor.free.fr/graphiques/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Inspecting C code in an R package

2011-06-11 Thread Barry Rowlingson
On Sat, Jun 11, 2011 at 5:31 PM, Layman123 romanhorn...@web.de wrote:
 Thank you! Of course, I will read the posting guidelines. A subscriber helped
 me via e-mail telling me to use the grep-command, that is type in: grep
 somename *.c. For Windows users it's: findstr somename *.c.

 The problem here is that this will also find every time that function
is called in the C files. And also anytime something called somename2
is mentioned.

 Most decent programmers editors will parse code files and construct a
little database of where all your functions are defined so you can
quickly jump to the definition of a function. In emacs this is done
with a thing called 'etags'.

 But this is now beyond the scope of R-help...

Barry

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] running R commands asynchronously

2011-06-11 Thread Martin Morgan

On 06/10/2011 02:29 PM, Richard M. Heiberger wrote:

I am interested in running R commands asynchronously.

My first choice is in the same R session that I am currently in.
Here, the goal would be to run something like

  RunAsynchSameSession(myfunction(), outputname.rda)

Once RunAsynchSameSession had started myfunction(),
RunAsynchSameSession would complete immediately.  myfunction would
keep going.  It is OK if execution of the myfunction() command
prevents new input to R until it has completed.  The important feature
is that RunAsynchSameSession must tell the progam that called it that
it was done.


somewhere in-between, starting a new R session but keeping the 
communication within the original.


library(snow)
cl - makeSOCKcluster(localhost)
sendCall(cl[[1]], function(n) { Sys.sleep(n); n }, list(n=5))
## do other things here...
recvResult(cl[[1]]) ## blocks until result available

I had hoped that I could open a non-blocking connection on the master 
(blocking=FALSE in the next-to-last line of newSOCKnode) and then poll 
with isIncomplete(cl[[1]]$con) but I couldn't get this to work (the 
connection, or more likely my understanding, seemed to be blocking anyway).


sendCall(cl[[1]], function(n) { Sys.sleep(n); n }, list(n=5))
while(isIncomplete(cl[[1]]$con))
cat(tick\n)  ## not printed
recvResult(cl[[1]])

This approach might also work with the multicore package.

Martin



Second choice is to start an independent process, BATCH or something
similar, and save the resulting data objects in an .rda file.

  RunAsynchBatch(myfile.r, outputname.rda)

The RunAsynchBatch would start a batch process and complete
immediately after starting the batch file.  The batch file would run
independently until it was completed.  While I know how to do this,
for example with system(wait=FALSE), I would appreciate seeing a
worked collection of statements, including getting outputname.rda back
to the original R session.  I am working on Windows.

Rich

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Computational Biology
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109

Location: M1-B861
Telephone: 206 667-2793

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Is there an implementation loess with more than 4 parametric predictors or a trick to similar effect?

2011-06-11 Thread Dr. D. P. Kreil (Boku)
I have revised my earlier question to and would be grateful for any
comments!

--

Subject: Is there an implementation of loess with more than 3 parametric
predictors or a trick to a similar effect?

Dear R experts,

I have a problem that is a related to the question raised in this earlier
post
https://stat.ethz.ch/pipermail/r-help/2007-January/124064.html

My situation is different in that I have only 2 predictors (coordinates x,y)
for local regression but a number of global (parametric) offsets that I
need to consider.

Essentially, I have a spatial distortion overlaid over a number of
measurements. These measurements can be grouped by the same underlying
undistorted measurement value for each group. The groups are known, but the
values are not. We need to estimate the spatial trend, which we then want to
remove. In our application, the spatial trend is two-dimensional (x,y), and
there are about 20 groups of about 50 measurements each, in the most simple
scenario. The measurements are randomly placed. Taking the first group as
reference, there are thus 19 unknown offsets.

The below code for toy data (spatial trend in one dimension x) works for two
or three offset groups.

Unfortunately, the loess call fails for four or more offset groups with the
error message
Error in simpleLoess(y, x, w, span, degree, parametric, drop.square,
normalize,  :
  only 1-4 predictors are allowed

I tried overriding the restriction and got
kd2MAX in ehg136.  Need to recompile with increased dimensions.

How easy would that be to do? I cannot find a definition of d2MAX anywhere,
and it seems this might be hardcoded -- the error is apparently triggered by
line #1359 in loessf.f
  if(k .gt. 15)   call ehg182(105)

Alternatively, does anyone know of an implementation of local regression
with global (parametric) offset groups that could be applied here?

Or is there a better way of dealing with this? I tried lme with correlation
structures but that seems to be much, much slower.

Any comments would be greatly appreciated!

Best regards,
David Kreil.



###
#
# loess with parametric offsets - toy data demo
#

x-seq(0,9,.1);
x.N-length(x);

o-c(0.4,-0.8,1.2#,-0.2  # works for three but not four
 );  # these are the (unknown) offsets
o.N-length(o);
f-sapply(seq(o.N),
  function(n){
ifelse((seq(x.N)= n   *x.N/(o.N+1) 
seq(x.N) (n-1)*x.N/(o.N+1)),
1,0);
  });
f-f[sample(NROW(f)),];

y-sin(x)+rnorm(length(x),0,.1)+f%*%o;
s.fs-sapply(seq(NCOL(f)),function(i){paste('f',i,sep='')});
s-paste(c('y~x',s.fs),collapse='+');
d-data.frame(x,y,f)
names(d)-c('x','y',s.fs);

l-loess(formula(s),parametric=s.fs,drop.square=s.fs,normalize=F,data=d,
 span=0.4);
yp-predict(l,newdata=d);
plot(x,y,pch='+',ylim=c(-3,3),col='red');  # input data
points(x,yp,pch='o',col='blue');   # fit of that

d0-d; d0$f1-d0$f2-d0$f3-0;
yp0-predict(l,newdata=d0);
points(x,y-f%*%o); # spatial distortion
lines(x,yp0,pch='+');  # estimate of that

op-sapply(seq(NCOL(f)),function(i){(yp-yp0)[!!f[,i]][1]});

cat(Demo offsets:,o,\n);
cat(Estimated offsets:,format(op,digits=1),\n);

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Amazon AWS, RGenoud, Parallel Computing

2011-06-11 Thread Lui ##
Hello Mike,

thank you very much for your response!
Best to my knowledge the sort algorithm implemented in R is already
backed by C++ code and not natively written in R. Writing the code
in C++ is not really an option either (i think rGenoud is also written
in C++). I am not sure whether there really is a bottleneck with
respect to the computer - I/O is pretty low, plenty of RAM left etc.
It really seems to me as if parallelizing is not easily possible or
only at high costs so that the benefits diminish through all the
coordination and handling needed...
Did anybody use rGenoud in cluster mode an experience sth similar?
Are quicksort packages available using multiple processors efficiently
(I didnt find any... :-( ).

I am by no means an expert on parallel processing, but is it possible,
that benefits from parallelizing a process greatly diminish if a large
set of variables/functions need to be made available and the actual
function (in this case sorting a few hundred entries) is quite short
whereas the number of times the function is called is very high!? It
was quite striking that the first run usually took several hours
(instead of half an hour) and the subsequent runs were much much
faster..

There is so much happening behind the scenes that it is a little
hard for me to tell what might help - and what will not...

Help appreciated :-)
Thank you

Lui

On Sat, Jun 11, 2011 at 4:42 PM, Mike Marchywka marchy...@hotmail.com wrote:




 
 Date: Sat, 11 Jun 2011 13:03:10 +0200
 From: lui.r.proj...@googlemail.com
 To: r-help@r-project.org
 Subject: [R] Amazon AWS, RGenoud, Parallel Computing

 Dear R group,


 [...]

 I am a little bit puzzled now about what I could do... It seems like
 there are only very limited options for me to increase the
 performance. Does anybody have experience with parallel computations
 with rGenoud or parallelized sorting algorithms? I think one major
 problem is that the sorting happens rather quick (only a few hundred
 entries to sort), but needs to be done very frequently (population
 size 2000, iterations 500), so I guess the problem with the
 housekeeping of the parallel computation deminishes all benefits.

 Your sort is part of algorithm or you have to sort results after
 getting then back out of order from async processes? One of
 my favorite anecdotes is how I used a bash sort on huge data
 file to make program run faster ( from impractical zero percent CPU
 to very fast with full CPU usage and you complain about exactly
 a lack of CPU saturation). I guess a couple of comments. First,
 if you have specialized apps you need optimized, you may want
 to write dedicated c++ code. However, this won't help if
 you don't find the bottleneck. Lack of CPU saturation could
 easily be due to waiting for stuff like disk IO or VM
 swap. You really ought to find the bottle neck first, it
 could be anything ( except the CPU maybe LOL). The sort
 that I used prevented VM thrashing with no change to the app
 code- the app got sorted data and so VM paging became infrequent.
 If you can specify the problem precisely you may be able to find
 a simple solution.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Help with Median test and Coxon-Mann-Whittney test

2011-06-11 Thread Ethan Brown
Hi Rob,

This list is primarily intended for questions about how to do things in R,
so you're more likely to get a helpful response elsewhere. You might want to
try some place like the Cross-Validated web site (
http://stats.stackexchange.com/) for general statistics and data analysis
questions.

Before you do that, you might want to clarify your question. It sounds like
you ran a couple tests on a very small dataset, and they came up with
different results. They're not even tests for the same thing, since the
t-test is testing whether the means are different, and the Mann-Whitney is
testing whether the medians are different. And in any case, it's hard to
imagine coming up with any kind of useful inference on this tiny sample.
When you post on Cross-Validated (or other site), people are going to want
to know the problem you are trying to solve, which isn't at all clear at
this point.

Best,
Ethan

On Fri, Jun 10, 2011 at 12:26 PM, Robert Johnson robjoh...@gmail.comwrote:

 Hi All,

 I have the following dataset (triplicates values of 5 independent
 measurements and duplicate vaues of a control):

 12  3   4   5  C
 181.8  58.2 288.9 273.2290.953.9
 120.3 116.8108.9 281.3 446  39.6
 86.1  148.5 52.9  126   150.3

 My aim was to find if mean values of Samples 1 - 5 are significantly higher
 than the mean value for C (control).

  At first, I carried out mean, SD and t-test (one-tail). Although SD error
 bars are large, two of the samples have mean values that are significantly
 higher than that of C.

 Second, I carried out median and Coxon-Mann-Whittney test because of my
 worry about the size of my data and the high variations in the replicates.
 I
 was surprised however, that median and Coxon-Mann-Whittney tests did not
 reveal statistical significant results.

 I will be happy if anyone could adivce me on the best way to analyse this
 dataset.

 Regards,

 Rob

[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Is there an implementation of loess with more than 3 parametric predictors or a trick to a similar effect? [re-posting as plain text to pass char-set filter]

2011-06-11 Thread Dr. D. P. Kreil (Boku)
Dear R experts,

I have a problem that is a related to the question raised in this earlier post
    https://stat.ethz.ch/pipermail/r-help/2007-January/124064.html

My situation is different in that I have only 2 predictors
(coordinates x,y) for local regression but a number of global
(parametric) offsets that I need to consider.

Essentially, I have a spatial distortion overlaid over a number of
measurements. These measurements can be grouped by the same underlying
undistorted measurement value for each group. The groups are known,
but the values are not. We need to estimate the spatial trend, which
we then want to remove. In our application, the spatial trend is
two-dimensional (x,y), and there are about 20 groups of about 50
measurements each, in the most simple scenario. The measurements are
randomly placed. Taking the first group as reference, there are thus
19 unknown offsets.

The below code for toy data (spatial trend in one dimension x) works
for two or three offset groups.

Unfortunately, the loess call fails for four or more offset groups
with the error message
Error in simpleLoess(y, x, w, span, degree, parametric, drop.square,
normalize,  :
  only 1-4 predictors are allowed

I tried overriding the restriction and got
kd2MAX in ehg136.  Need to recompile with increased dimensions.

How easy would that be to do? I cannot find a definition of d2MAX
anywhere, and it seems this might be hardcoded -- the error is
apparently triggered by line #1359 in loessf.f
  if(k .gt. 15)   call ehg182(105)

Alternatively, does anyone know of an implementation of local
regression with global (parametric) offset groups that could be
applied here?

Or is there a better way of dealing with this? I tried lme with
correlation structures but that seems to be much, much slower.

Any comments would be greatly appreciated!

Best regards,
David Kreil.



###
#
# loess with parametric offsets - toy data demo
#

x-seq(0,9,.1);
x.N-length(x);

o-c(0.4,-0.8,1.2#,-0.2  # works for three but not four
 );  # these are the (unknown) offsets
o.N-length(o);
f-sapply(seq(o.N),
  function(n){
    ifelse((seq(x.N)= n   *x.N/(o.N+1) 
    seq(x.N) (n-1)*x.N/(o.N+1)),
    1,0);
  });
f-f[sample(NROW(f)),];

y-sin(x)+rnorm(length(x),0,.1)+f%*%o;
s.fs-sapply(seq(NCOL(f)),function(i){paste('f',i,sep='')});
s-paste(c('y~x',s.fs),collapse='+');
d-data.frame(x,y,f)
names(d)-c('x','y',s.fs);

l-loess(formula(s),parametric=s.fs,drop.square=s.fs,normalize=F,data=d,
 span=0.4);
yp-predict(l,newdata=d);
plot(x,y,pch='+',ylim=c(-3,3),col='red');  # input data
points(x,yp,pch='o',col='blue');   # fit of that

d0-d; d0$f1-d0$f2-d0$f3-0;
yp0-predict(l,newdata=d0);
points(x,y-f%*%o); # spatial distortion
lines(x,yp0,pch='+');  # estimate of that

op-sapply(seq(NCOL(f)),function(i){(yp-yp0)[!!f[,i]][1]});

cat(Demo offsets:,o,\n);
cat(Estimated offsets:,format(op,digits=1),\n);

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Amazon AWS, RGenoud, Parallel Computing

2011-06-11 Thread Mike Marchywka












 Date: Sat, 11 Jun 2011 19:57:47 +0200
 Subject: Re: [R] Amazon AWS, RGenoud, Parallel Computing
 From: lui.r.proj...@googlemail.com
 To: marchy...@hotmail.com
 CC: r-help@r-project.org

 Hello Mike,

[[elided Hotmail spam]]
 Best to my knowledge the sort algorithm implemented in R is already
 backed by C++ code and not natively written in R. Writing the code
 in C++ is not really an option either (i think rGenoud is also written
 in C++). I am not sure whether there really is a bottleneck with
 respect to the computer - I/O is pretty low, plenty of RAM left etc.
 It really seems to me as if parallelizing is not easily possible or
 only at high costs so that the benefits diminish through all the
 coordination and handling needed...
 Did anybody use rGenoud in cluster mode an experience sth similar?
 Are quicksort packages available using multiple processors efficiently
 (I didnt find any... :-( ).

I'm no expert but these don't seem to be terribly subtle problems
in most cases. Sure, if the task is not suited to parallelism and
you force it to be parallel and it spends all its time syncing
up, that can be a problem. Just making more tasks to fight over
the bottle neck- memory, CPU, locks- can easily make things worse.
I think I posted my link earlier on IEEE blurb showing 
how easy it is for many cores to make things worse on non-contrived
benchmarks.





 I am by no means an expert on parallel processing, but is it possible,
 that benefits from parallelizing a process greatly diminish if a large
 set of variables/functions need to be made available and the actual
 function (in this case sorting a few hundred entries) is quite short
 whereas the number of times the function is called is very high!? It
 was quite striking that the first run usually took several hours
 (instead of half an hour) and the subsequent runs were much much
 faster..

 There is so much happening behind the scenes that it is a little
 hard for me to tell what might help - and what will not...

 Help appreciated :-)
 Thank you

 Lui

 On Sat, Jun 11, 2011 at 4:42 PM, Mike Marchywka  wrote:
 
 
 
 
  
  Date: Sat, 11 Jun 2011 13:03:10 +0200
  From: lui.r.proj...@googlemail.com
  To: r-help@r-project.org
  Subject: [R] Amazon AWS, RGenoud, Parallel Computing
 
  Dear R group,
 
 
  [...]
 
  I am a little bit puzzled now about what I could do... It seems like
  there are only very limited options for me to increase the
  performance. Does anybody have experience with parallel computations
  with rGenoud or parallelized sorting algorithms? I think one major
  problem is that the sorting happens rather quick (only a few hundred
  entries to sort), but needs to be done very frequently (population
  size 2000, iterations 500), so I guess the problem with the
  housekeeping of the parallel computation deminishes all benefits.
 
  Your sort is part of algorithm or you have to sort results after
  getting then back out of order from async processes? One of
  my favorite anecdotes is how I used a bash sort on huge data
  file to make program run faster ( from impractical zero percent CPU
  to very fast with full CPU usage and you complain about exactly
  a lack of CPU saturation). I guess a couple of comments. First,
  if you have specialized apps you need optimized, you may want
  to write dedicated c++ code. However, this won't help if
  you don't find the bottleneck. Lack of CPU saturation could
  easily be due to waiting for stuff like disk IO or VM
  swap. You really ought to find the bottle neck first, it
  could be anything ( except the CPU maybe LOL). The sort
  that I used prevented VM thrashing with no change to the app
  code- the app got sorted data and so VM paging became infrequent.
  If you can specify the problem precisely you may be able to find
  a simple solution.
 
 
  
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Factor Analysis/Inputting Correlation Matrix

2011-06-11 Thread Matt Stati
Can someone please direct me to how to run a factor analysis in R by first 
inputting a correlation matrix? Does the function factanal allow one to read 
a correlation matrix instead of data vectors? 

Thanks, 
Matt. 

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Factor Analysis/Inputting Correlation Matrix

2011-06-11 Thread Joshua Wiley
Hi Matt,

Did you try reading the documentation for factanal()?  You can pull it
up by typing: help(factanal)

These give basically identical results using the raw data, the
covariance matrix, and the correlation matrix.

factanal(x = mtcars, factors = 3)
factanal(factors = 3, covmat = cov(mtcars))
factanal(factors = 3, covmat = cor(mtcars))

all of which is clearly, explained if you read the details for the
argument covmat in the help.

Cheers,

Josh

On Sat, Jun 11, 2011 at 3:06 PM, Matt Stati mattst...@yahoo.com wrote:
 Can someone please direct me to how to run a factor analysis in R by first 
 inputting a correlation matrix? Does the function factanal allow one to 
 read a correlation matrix instead of data vectors?

 Thanks,
 Matt.

        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles
http://www.joshuawiley.com/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] How to compute the P-value for a mixture of chi^2 distributions

2011-06-11 Thread Tiago Pereira
Hello,

The test I am working on has an asymptotic 0.5*chi2(1)+0.5*chi2(2)
distribution, where numbers inside parenthesis stand for the degrees of
freedom. Is is possible to compute quickly in R the cumulative distribution
of that distribution?

Thanks in advance.

Tiago

--
View this message in context: 
http://r.789695.n4.nabble.com/How-to-compute-the-P-value-for-a-mixture-of-chi-2-distributions-tp3591276p3591276.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to compute the P-value for a mixture of chi^2 distributions

2011-06-11 Thread Thomas Lumley
On Sun, Jun 12, 2011 at 12:44 PM, Tiago Pereira
tiago.pere...@mbe.bio.br wrote:

 The test I am working on has an asymptotic 0.5*chi2(1)+0.5*chi2(2)
 distribution, where numbers inside parenthesis stand for the degrees of
 freedom. Is is possible to compute quickly in R the cumulative distribution
 of that distribution?


There appear to be pchibar() functions in both the ibdreg and ic.infer
packages that should do want you want.  Simulation is also fairly
efficient.

-thomas

-- 
Thomas Lumley
Professor of Biostatistics
University of Auckland

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] smoothScatter function (color density question) and adding a legend

2011-06-11 Thread Clayton K Collings
Dear R experts,

I am resending my questions below one more time
just in case someone out there could help
but missed my email.

I don't think my questions are too hard.

I am most concerned about the transformation function.
See below.

Thanks,
Clayton

Hello,

I have a few questions, regarding the smoothScatter function.


I have a scatter plot with more than 500,000 data points for
two samples.  So, I am wanting to display the density in colors
to convince people that my good correlation coefficient is not
due to an influential point effect and plus, I also want to make
my scatter plot look pretty.  Anyway ...

I have been able to make the plot, but I have a couple of questions about it.
I also want to make a legend for it, which I don't know how to do.
I have only been playing around with R for a few weeks now off and on.
So, I am definitely a beginner.  


1. 
I have 10 colors for my plot: white representing zero density and dark red 
representing the maximum density (I presume).
According to the R documentation, the transformation argument represents the 
function mapping the density scale to the color scale. 
Note that in my R code below, transformation = function(x) x^0.14. 
I was wondering how exactly or mathematically that this function relates the 
density to color.
I believe the colorRampPalette ramps colors between 0 and 1.  I am not sure if 
x represents the color or the
density.  Since I have 10 colors, I suppose the colorRampPalette would assign 
values of 
0, 0.11, 0.22, 0.33, 0.44, 0.55, 0.66, 0.77, 0.88, and 1 for white to dark red. 
 I am not sure though.
Does anyone know how this works?  I am sure its not too too complicated.


2.
In a related issue, I also would like to make a legend for this plot.  Then, I 
would be able to see the transformation
function's effects on the color-density relationship.  Could someone help me in 
making a legend for my smoothScatter plot?  I would like to place it 
immediately to the right of my plot as a vertical bar, matching the vertical 
length of the plot as is often convention.

I really like the smoothScatter function.  It is easy to use, and I believe 
it's relatively new.

Thanks in advance.

-Clayton

 clayton - c(white, lightblue, blue, darkgreen, green, yellow, 
 orange, darkorange, red, darkred)
 x - read.table(c:/users/ccolling/log_u1m1.txt, sep =\t)
 smoothScatter(x$V8~x$V7,
nbin=1000,
colramp = colorRampPalette(c(clayton)),
nrpoints=Inf,
pch=,
cex=.7, 
transformation = function(x) x^.14,
col=black,
main=M1 vs. M3 (r = 0.92),
xlab=Normalized M1,
ylab=Normalized M2,
cex.axis = 1.75,
cex.lab = 1.25
cex.main = 2)

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to compute the P-value for a mixture of chi^2 distributions

2011-06-11 Thread Tiago Pereira
thank you very much for your advice, Thomas! Extremely helpful!

Cheers!

Tiago

--
View this message in context: 
http://r.789695.n4.nabble.com/How-to-compute-the-P-value-for-a-mixture-of-chi-2-distributions-tp3591276p3591365.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to do Clustered Standard Errors for Regression in R?

2011-06-11 Thread kevingoulding
Hi Bo,

See the following blog post:

http://thetarzan.wordpress.com/2011/06/11/clustered-standard-errors-in-r/
http://thetarzan.wordpress.com/2011/06/11/clustered-standard-errors-in-r/ 

where I calculate clustered standard errors in R.  The result is the same as
the cluster command in Stata.  I had to figure this out to do some
homework assignments for my econometrics class.

Best,
Kevin



--
View this message in context: 
http://r.789695.n4.nabble.com/How-to-do-Clustered-Standard-Errors-for-Regression-in-R-tp867787p3591217.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Score Test Function

2011-06-11 Thread Tyler Rinker

Greeting R Community,
 
I'm trying to learn Logistic Regression on my own and am using An Introduction 
to Logistic Regression Analysis and Reporting (Peng, C., Lee, K.,  Ingersoll, 
G. ,2002). This article uses a Score Test Stat as a measure of overall fit for 
a logistic regression model.  The author calculates this statistic using SAS.  
I am looking for an [R] function that can compute this stat and a p=value 
(please don't critique the stat itself).  As far as I can tell glm.scoretest() 
from library(statmod) is the only function for this but it does not give a 
pvalue and appears to not correspond with the author's example.  Some chat on 
the discussion board a while back indicated that this was a well known test 
Click here for that thread but difficult to calculate in [R].  I'm wondering if 
there is a way to do this fairly easily in [R] at this time?
 
I am working with the data set from the study and am looking to get close to 
the 9.5177 score test stat and .0086 p-value with 2 degrees freedom.
Below is the code for the data set and some descriptives so you can see the 
data set is the same as  from the study.  I highlighted my attempt to get a 
score test statistic and am curious if this is it (minus the p-value).
 
#BEGINNING OF CODE
id-factor(1:189)
gender-factor(c(Boy,Boy,Girl,Girl,Girl,Boy,Girl,Boy,Girl,Girl,Boy,Boy,Boy,Boy,Girl,Girl,Boy,Girl,Boy,Girl,Boy,Girl,Girl,
Boy,Boy,Girl,Girl,Girl,Boy,Boy,Boy,Girl,Boy,Girl,Boy,Girl,Girl,Girl,Girl,Girl,Boy,Girl,Boy,Girl,Girl,Girl,
Girl,Boy,Girl,Boy,Girl,Boy,Girl,Girl,Boy,Boy,Boy,Boy,Boy,Boy,Boy,Boy,Boy,Girl,Boy,Boy,Boy,Boy,Girl,Boy,
Girl,Boy,Boy,Boy,Girl,Boy,Girl,Girl,Boy,Girl,Girl,Girl,Boy,Boy,Boy,Boy,Boy,Girl,Girl,Girl,Girl,Boy,Girl,
Girl,Girl,Girl,Girl,Girl,Girl,Girl,Girl,Girl,Boy,Girl,Boy,Boy,Girl,Girl,Girl,Boy,Girl,Boy,Girl,Girl,Girl,Boy,
Girl,Boy,Girl,Boy,Girl,Boy,Girl,Girl,Girl,Girl,Girl,Girl,Girl,Girl,Boy,Girl,Boy,Boy,Boy,Boy,Boy,Boy,Boy,Girl,
Girl,Girl,Boy,Boy,Girl,Girl,Boy,Girl,Boy,Boy,Boy,Girl,Girl,Girl,Girl,Boy,Boy,Girl,Boy,Boy,Girl,Boy,Boy,Boy,
Boy,Girl,Boy,Boy,Girl,Girl,Boy,Boy,Boy,Boy,Boy,Girl,Girl,Girl,Girl,Boy,Boy,Boy,Girl,Boy,Girl,Boy,Boy,Boy,Girl))
reading.score-c(91.0,77.5,52.5,54.0,53.5,62.0,59.0,51.5,61.5,56.5,47.5,75.0,47.5,53.5,50.0,50.0,49.0,59.0,60.0,60.0,
60.5,50.0,101.0,60.0,60.0,83.5,61.0,75.0,84.0,56.5,56.5,45.0,60.5,77.5,62.5,70.0,69.0,62.0,107.5,54.5,92.5,94.5,65.0,
80.0,45.0,45.0,66.0,66.0,57.5,42.5,60.0,64.0,65.0,47.5,57.5,55.0,55.0,76.5,51.5,59.5,59.5,59.5,55.0,70.0,66.5,84.5,
57.5,125.0,70.5,79.0,56.0,75.0,57.5,56.0,67.5,114.5,70.0,67.0,60.5,95.0,65.5,85.0,55.0,63.5,61.5,60.0,52.5,65.0,87.5,
62.5,66.5,67.0,117.5,47.5,67.5,67.5,77.0,73.5,73.5,68.5,55.0,92.0,55.0,55.0,60.0,120.5,56.0,84.5,60.0,85.0,93.0,60.0,
65.0,58.5,85.0,67.0,67.5,65.0,60.0,47.5,79.0,80.0,57.5,64.5,65.0,60.0,85.0,60.0,58.0,61.5,60.0,65.0,93.5,52.5,42.5,
75.0,48.5,64.0,66.0,82.5,52.5,45.5,57.5,65.0,46.0,75.0,100.0,77.5,51.5,62.5,44.5,51.0,56.0,58.5,69.0,65.0,60.0,65.0,
65.0,40.0,55.0,52.5,54.5,74.0,55.0,60.5,50.0,48.0,51.0,55.0,93.5,61.0,52.5,57.5,60.0,71.0,65.0,60.0,55.0,60.0,77.0,
52.5,95.0,50.0,47.5,50.0,47.0,71.0,65.0)
reading.recommendation-as.factor(c(rep(no,130),rep(yes,59)))
DF-data.frame(id,gender,reading.score,reading.recommendation)
head(DF)
#=
#  DESCRIPTIVES
#=
table(DF[2:4])  #BREAKDOWN OF SCORES BY GENDER AND REMEDIAL READING 
RECOMENDATIONS
table(DF[c(2)])  #TABLE OF GENDER
print(table(DF[c(2)])/sum(table(DF[c(4)]))*100,digits=4)#PERCENT GENDER 
BREAKDOWN
table(DF[c(4)])  #TABLE RECOMENDDED FOR REMEDIAL READING
print(table(DF[c(4)])/sum(table(DF[c(4)]))*100,digits=4)#Probability of 
Reccomended
table(DF[c(2,4)])  #TABLE OF GIRLS AND BOYS RECOMENDDED FOR REMEDIAL READING
print(prop.table(table(DF[c(2,4)]),1)*100,digits=4)#Probability of Reccomended
#=
#ANALYSIS
#=
(mod1-glm(reading.recommendation~reading.score+gender,family=binomial,data=DF))
 
library(statmod)
with(DF,glm.scoretest(mod1, c(0,2,3), dispersion=NULL))^2
#If I move the decimal over 2 to the right I get close to the 9.5177 from the 
study
(with(DF,glm.scoretest(mod1, c(0,2,3), dispersion=NULL))^2)*100 #is this it?
#END OF CODE
 
I am running R 2.13.0 in a windows 7 machine.
 
Peng, C., Lee, K.,  Ingersoll, G. (2002). An Introduction to Logistic 
Regression Analysis and Reporting. The Journal of Educational Research, 96(1), 
3-14. Heldref Publications. Retrieved from   
http://www.informaworld.com/openurl?genre=articledoi=10.1080/00220670209598786magic=crossref

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list

Re: [R] Score Test Function

2011-06-11 Thread Bill.Venables
The score test looks at the effect of adding extra columns to the model matrix. 
 The function glm.scoretest takes the fitted model object as the first argument 
and the extra column, or columns, as the second argument.  Your x2 argument has 
length only 3.  Is this really what you want?  I would have expected you need 
to specify a vector of length nrow(DF), [as in the help information for 
glm.scoretest itself].

Bill Venables.
 

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Tyler Rinker
Sent: Sunday, 12 June 2011 2:46 PM
To: r-help@r-project.org
Subject: [R] Score Test Function


Greeting R Community,
 
I'm trying to learn Logistic Regression on my own and am using An Introduction 
to Logistic Regression Analysis and Reporting (Peng, C., Lee, K.,  Ingersoll, 
G. ,2002). This article uses a Score Test Stat as a measure of overall fit for 
a logistic regression model.  The author calculates this statistic using SAS.  
I am looking for an [R] function that can compute this stat and a p=value 
(please don't critique the stat itself).  As far as I can tell glm.scoretest() 
from library(statmod) is the only function for this but it does not give a 
pvalue and appears to not correspond with the author's example.  Some chat on 
the discussion board a while back indicated that this was a well known test 
Click here for that thread but difficult to calculate in [R].  I'm wondering if 
there is a way to do this fairly easily in [R] at this time?
 
I am working with the data set from the study and am looking to get close to 
the 9.5177 score test stat and .0086 p-value with 2 degrees freedom.
Below is the code for the data set and some descriptives so you can see the 
data set is the same as  from the study.  I highlighted my attempt to get a 
score test statistic and am curious if this is it (minus the p-value).
 
#BEGINNING OF CODE
id-factor(1:189)
gender-factor(c(Boy,Boy,Girl,Girl,Girl,Boy,Girl,Boy,Girl,Girl,Boy,Boy,Boy,Boy,Girl,Girl,Boy,Girl,Boy,Girl,Boy,Girl,Girl,
Boy,Boy,Girl,Girl,Girl,Boy,Boy,Boy,Girl,Boy,Girl,Boy,Girl,Girl,Girl,Girl,Girl,Boy,Girl,Boy,Girl,Girl,Girl,
Girl,Boy,Girl,Boy,Girl,Boy,Girl,Girl,Boy,Boy,Boy,Boy,Boy,Boy,Boy,Boy,Boy,Girl,Boy,Boy,Boy,Boy,Girl,Boy,
Girl,Boy,Boy,Boy,Girl,Boy,Girl,Girl,Boy,Girl,Girl,Girl,Boy,Boy,Boy,Boy,Boy,Girl,Girl,Girl,Girl,Boy,Girl,
Girl,Girl,Girl,Girl,Girl,Girl,Girl,Girl,Girl,Boy,Girl,Boy,Boy,Girl,Girl,Girl,Boy,Girl,Boy,Girl,Girl,Girl,Boy,
Girl,Boy,Girl,Boy,Girl,Boy,Girl,Girl,Girl,Girl,Girl,Girl,Girl,Girl,Boy,Girl,Boy,Boy,Boy,Boy,Boy,Boy,Boy,Girl,
Girl,Girl,Boy,Boy,Girl,Girl,Boy,Girl,Boy,Boy,Boy,Girl,Girl,Girl,Girl,Boy,Boy,Girl,Boy,Boy,Girl,Boy,Boy,Boy,
Boy,Girl,Boy,Boy,Girl,Girl,Boy,Boy,Boy,Boy,Boy,Girl,Girl,Girl,Girl,Boy,Boy,Boy,Girl,Boy,Girl,Boy,Boy,Boy,Girl))
reading.score-c(91.0,77.5,52.5,54.0,53.5,62.0,59.0,51.5,61.5,56.5,47.5,75.0,47.5,53.5,50.0,50.0,49.0,59.0,60.0,60.0,
60.5,50.0,101.0,60.0,60.0,83.5,61.0,75.0,84.0,56.5,56.5,45.0,60.5,77.5,62.5,70.0,69.0,62.0,107.5,54.5,92.5,94.5,65.0,
80.0,45.0,45.0,66.0,66.0,57.5,42.5,60.0,64.0,65.0,47.5,57.5,55.0,55.0,76.5,51.5,59.5,59.5,59.5,55.0,70.0,66.5,84.5,
57.5,125.0,70.5,79.0,56.0,75.0,57.5,56.0,67.5,114.5,70.0,67.0,60.5,95.0,65.5,85.0,55.0,63.5,61.5,60.0,52.5,65.0,87.5,
62.5,66.5,67.0,117.5,47.5,67.5,67.5,77.0,73.5,73.5,68.5,55.0,92.0,55.0,55.0,60.0,120.5,56.0,84.5,60.0,85.0,93.0,60.0,
65.0,58.5,85.0,67.0,67.5,65.0,60.0,47.5,79.0,80.0,57.5,64.5,65.0,60.0,85.0,60.0,58.0,61.5,60.0,65.0,93.5,52.5,42.5,
75.0,48.5,64.0,66.0,82.5,52.5,45.5,57.5,65.0,46.0,75.0,100.0,77.5,51.5,62.5,44.5,51.0,56.0,58.5,69.0,65.0,60.0,65.0,
65.0,40.0,55.0,52.5,54.5,74.0,55.0,60.5,50.0,48.0,51.0,55.0,93.5,61.0,52.5,57.5,60.0,71.0,65.0,60.0,55.0,60.0,77.0,
52.5,95.0,50.0,47.5,50.0,47.0,71.0,65.0)
reading.recommendation-as.factor(c(rep(no,130),rep(yes,59)))
DF-data.frame(id,gender,reading.score,reading.recommendation)
head(DF)
#=
#  DESCRIPTIVES
#=
table(DF[2:4])  #BREAKDOWN OF SCORES BY GENDER AND REMEDIAL READING 
RECOMENDATIONS
table(DF[c(2)])  #TABLE OF GENDER
print(table(DF[c(2)])/sum(table(DF[c(4)]))*100,digits=4)#PERCENT GENDER 
BREAKDOWN
table(DF[c(4)])  #TABLE RECOMENDDED FOR REMEDIAL READING
print(table(DF[c(4)])/sum(table(DF[c(4)]))*100,digits=4)#Probability of 
Reccomended
table(DF[c(2,4)])  #TABLE OF GIRLS AND BOYS RECOMENDDED FOR REMEDIAL READING
print(prop.table(table(DF[c(2,4)]),1)*100,digits=4)#Probability of Reccomended
#=
#ANALYSIS
#=
(mod1-glm(reading.recommendation~reading.score+gender,family=binomial,data=DF))
 
library(statmod)
with(DF,glm.scoretest(mod1, c(0,2,3), dispersion=NULL))^2
#If I move the decimal over 2 to the right