date:20081013

Re: [R] Logistic Regression - Interpreting SENS (Sensitiv ity) and SPEC (Specificity)

2008-10-13 Thread Dieter Menne

Maithili Shiva maithili_shiva at yahoo.com writes:

 I havd main sample of 42500 clentes and
 based on their status as regards to defaulted / non - defaulted, I have
genereted the probability of default.
 
 I have a hold out sample of 5000 clients. I have calculated (1) No of
correctly classified goods Gg, (2) No of
 correcly classified Bads Bg and also (3) number of wrongly classified bads
(Gb) and (4) number of wrongly
 classified goods (Bg).

The simple and wrong answer is to use these data directly to compute sensitivity
(fraction of hits). This measure is useless, but I encounter it often in medical
publications.

You can get a more reasonable answer by using cross-validation. Check, for
example, Frank Harrell's 

http://biostat.mc.vanderbilt.edu/twiki/pub/Main/RmS/logistic.val.pdf

Dieter

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] predicting from a local regression and plotting in lattice

2008-10-13 Thread Dieter Menne

Alex Karner aakarner at ucdavis.edu writes:

 I realize these limitations. However, I know that my actual dataset is
 reasonably well behaved in the range I want to predict, and I'm not using
 the predicted values for any further analysis, only for schematic purposes
 in the plot.
 
 I'm still curious if this type of extension of a loess line is possible,
 notwithstanding its statistical shortcomings.

I don't understand why you want to use a local method for a global job. Why not
use a spline, a logistic regression (also well behaved!) or some exp-derivate if
it is a growth curve?

Dieter

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Blowing up portions of a graph

2008-10-13 Thread rajesh j

Hi,

I have a really large graph and would like to zoom in on portions of the
graph and post them as blocks below the graph.Is there an add on package to
do this?
-- 
Rajesh.J


I skate to where the puck is going to be, not where it has been. - Wayne
Gretzky
-

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] png(): Linux vs Windows

2008-10-13 Thread Prof Brian Ripley


On Sun, 12 Oct 2008, [EMAIL PROTECTED] wrote:


On 12-Oct-08 22:09:46, jim holtman wrote:

Seem to work fine on my R 2.7.2 version of Windows:


png(file=myplot.png, bg=transparent, units='cm',
width=12,height=15, res=200)
plot(1:10)
rect(1, 5, 3, 7, col=white)
dev.off()


Did you check the version they are using.


Hi Jim,
Thanks! I've now learned that it is R 2.5.1 (which I see
is from June 2007).
Ted.


This difference *is* the version of R, not 'Linux vs Windows'.
The NEWS for 2.6.0 says

o   jpeg(), png(), bmp() (Windows), dev2bitmap() and bitmap() have
a new argument 'units' to specify the units of 'width' and
'height'.

(that is something you could have checked without access to R under 
Windows).







On Sun, Oct 12, 2008 at 6:02 PM, Ted Harding
[EMAIL PROTECTED] wrote:

Hi Folks,
Quick question. I have the following line in an R code file
which runs fine on Linux:

if(PNG) png(GraphName,width=12,height=15,units=cm,res=200)

I learn that, when the same code was run on a Windows machine,
there was the following error:

Error in png(GraphName,width=12,height=15,units=cm,res=200):
   unused argument(s) (units = cm)

Sorry to be a bother -- but could a Windows Ruser put me wise
on any differences between png() on Windows and Linux?

(And, sorry, I don't know what version of R, nor what version
of Windows, this occurred on).

Thanks,
Ted.


E-Mail: (Ted Harding) [EMAIL PROTECTED]
Fax-to-email: +44 (0)870 094 0861
Date: 12-Oct-08   Time: 23:02:41
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.





--
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



E-Mail: (Ted Harding) [EMAIL PROTECTED]
Fax-to-email: +44 (0)870 094 0861
Date: 12-Oct-08   Time: 23:24:21
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Sweave-LaTEX question

2008-10-13 Thread Liviu Andronic

On Sun, Oct 12, 2008 at 1:39 AM, Felipe Carrillo
[EMAIL PROTECTED] wrote:
 I am working on a publication and I have heard about LaTEX but I haven't 
 actually tried to learn about it until today. I've found a few

There are two more packages that might be of interest:
RReportGenerator [1] and relax [2].
Liviu

[1] http://alnitak.u-strasbg.fr/~wraff/RReportGenerator/index.php
[2] http://cran.r-project.org/web/packages/relax/index.html

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] numeric derivation

2008-10-13 Thread Prof Brian Ripley


On Sun, 12 Oct 2008, David Winsemius wrote:


Two follow-up questions:

A) I get an error message when using Harrell's describe() function on one of 
my variable,  telling me that sum() is not meaningful for a difftime object. 
Why should sum() not be meaningful for a collection of interval lengths?


That's not what it actually says.  It says it is 'not defined' -- it could 
be defined but it has not been.  Just add a function sum.difftime() with 
appropriate code (and watch out that different difftime objects can be in 
different units).



describe(pref900)

Error in Summary.difftime(c(1075, 3429, 2915, 2002, 967, 1759, 532, 589,  :
'sum' not defined for difftime objects

summary() is informative and throws no error, but does not report means. 
Even with na.rm=TRUE,  sum fails:

sum(pref900$deatht, na.rm=TRUE)

Error in Summary.difftime(c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,  :
'sum' not defined for difftime objects

My interest in the sum of difftime objects comes from my interest in 
calculating the number of person-years of observation in various categories. 
I have durations created by subtracting times.



B) The help pages are not particularly expansive regarding the output of 
deltat() but your answer suggests that it should work on non-time objects as 
well? Am I correct in assuming you meant that diff(x)/deltat(x) should be 
meaningful for any numeric x.


--
David Winsemius
R 2.7.1 / Mac OS 10.5.4 / Intel CPUs

s
On Oct 12, 2008, at 10:34 AM, Gabor Grothendieck wrote:


?deltat

On Sun, Oct 12, 2008 at 9:45 AM, Oliver Bandel
[EMAIL PROTECTED] wrote:

Zitat von Gabor Grothendieck [EMAIL PROTECTED]:


If you simply want successive differences use diff:

x - seq(4)^2
diff(x)

tx - ts(x)
diff(tx)

[...]

Oh, cool, thanks.

But what about  diff / delta_t ?

Do I have to calculate it by my own, or is there
already a function for making a difference-qoutient?

This would be fine to have, because for example
coming from space vs. time to velocity vs. time
and acceleration vs. time (and further derivatives)
are also a time-series.

The possibility of using the advantages of the time series class here,
would be fine.


Ciao,
Oliver



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


--
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Logistic Regression - Interpreting SENS (Sensitivity) and SPEC (Specificity)

2008-10-13 Thread Peter Dalgaard


Dieter Menne wrote:

Maithili Shiva maithili_shiva at yahoo.com writes:


I havd main sample of 42500 clentes and
based on their status as regards to defaulted / non - defaulted, I have

genereted the probability of default.

I have a hold out sample of 5000 clients. I have calculated (1) No of

correctly classified goods Gg, (2) No of

correcly classified Bads Bg and also (3) number of wrongly classified bads

(Gb) and (4) number of wrongly

classified goods (Bg).


The simple and wrong answer is to use these data directly to compute sensitivity
(fraction of hits). This measure is useless, but I encounter it often in medical
publications.

You can get a more reasonable answer by using cross-validation. Check, for
example, Frank Harrell's 


http://biostat.mc.vanderbilt.edu/twiki/pub/Main/RmS/logistic.val.pdf


But if he has a hold out sample, isn't he already cross-validating?? 
 I wonder if you're answering the right question there. Could he just 
be looking for Sp=Gg/(Gg+Bg), Se=Bb/(Gb+Bb)? (If I got the notation 
right.)


--
   O__   Peter Dalgaard Øster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   DenmarkPh:  (+45) 35327918
~~ - ([EMAIL PROTECTED])FAX: (+45) 35327907

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Logistic Regression - Interpreting SENS (Sensitivity) and SPEC (Specificity)

2008-10-13 Thread Dieter Menne

Peter Dalgaard p.dalgaard at biostat.ku.dk writes:

 
 But if he has a hold out sample, isn't he already cross-validating?? 
   I wonder if you're answering the right question there. Could he just 
 be looking for Sp=Gg/(Gg+Bg), Se=Bb/(Gb+Bb)? (If I got the notation 
 right.)

You are right. My brain was biased by some ongoing discussion. 

Dieter

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] linear expenditure model

2008-10-13 Thread Marie Vandresse



I have already used  aidsEst() function in the micEcon package for the 
estimation of elasticities.


But with the LES, I plan to estimate the minimal consumption 
level...which is not possible with the AIDS model, do I?



Arne Henningsen wrote:

Hi Marie!

On Friday 10 October 2008 12:40:23, Marie Vandresse wrote:
  

I would like to estimate a linear expendire with Systemfit package.
(method: SUR)



If I remember correctly, the linear expenditure system (LES) is linear in 
income but non-linear in the parameters. Hence, you have to estimate a system 
of non-linear equations. Unfortunately, the nlsystemfit() function in the 
systemfit package that estimates systems of non-linear equations is still 
under development and has convergence problems rather often. Since the 
systemfit() function in the systemfit package that estimates systems of 
linear equations is very reliable [1], I suggest that you choose a demand 
system that is linear in parameters (e.g. the Almost Ideal Demand System, 
AIDS)


[1] http://www.jstatsoft.org/v23/i04

  

As someone could show me how to define the equations?



If you use the aidsEst() function in the micEcon package [2], you don't have 
to specify the equations of the Almost Ideal Demand System yourself.


[2] http://www.micEcon.org

Best wishes,
Arne

  


--
Marie Vandresse

Bureau fédéral du Plan

Avenue des Arts 47-49
1000 Bruxelles
Bureau 212
Tél. +32.(0)2.507.73.62
Tél. +32.(0)2.507.73.73
http://www.plan.be 








Think before you print !

**
Disclaimer: This e-mail may contain confidential informa...{{dropped:9}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Fw: Logistic regresion - Interpreting (SENS) and (SPEC)

2008-10-13 Thread Maithili Shiva


Dear Mr Peter Dalgaard and Mr Dieter Menne,

I sincerely thank you for helping me out with my problem. The thing is taht I 
already have calculated SENS = Gg / (Gg + Bg) = 89.97%
and SPEC = Bb / (Bb + Gb) = 74.38%.

Now I have values of SENS and SPEC, which are absolute in nature. My question 
was how do I interpret these absolue values. How does these values help me to 
find out wheher my model is good.

With regards

Ms Maithili Shiva








 Subject: [R] Logistic regresion - Interpreting (SENS) and (SPEC)
 To: r-help@r-project.org
 Date: Friday, October 10, 2008, 5:54 AM
 Hi
 
 Hi I am working on credit scoring model using logistic
 regression. I havd main sample of 42500 clentes and based on
 their status as regards to defaulted / non - defaulted, I
 have genereted the probability of default.
 
 I have a hold out sample of 5000 clients. I have calculated
 (1) No of correctly classified goods Gg, (2) No of correcly
 classified Bads Bg and also (3) number of wrongly classified
 bads (Gb) and (4) number of wrongly classified goods (Bg).
 
 My prolem is how to interpret these results? What I have
 arrived at are the absolute figures.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained,
 reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Re : using predict() or fitted() from a model with offset; unsolved, included reproducible code

2008-10-13 Thread sam_oi

Thanks for your reply Mark, 
but no, using predict on the new data.frame does not help here.
 
I had first thought that the probelm was due the explanatory variable 
(age) and the offset one (date) being very similar (highly correlated, I am 
trying to tease their effect apart, and hoped offset would help in this since I 
know the relationship with age already). But this appears not to be the case. 
Simply, the predicted  (or fitted) values for the offset model always return 
predicted values based on the effect of the variable within offset(), 
completely ignoring the explanatory variable and that it is supposed to offset 
the effect in the first place:
such as, i get the same predicted values for the 2 very different models below. 
The summary table and coefficents remain perfectly valid though (and very 
different). 
 
lmAO-glm(MassChange24h~T1+offset(-2*AGE),  
family=gaussian,na.action=na.exclude)

lmAO-glm(MassChange24h~AGE,  family=gaussian,na.action=na.exclude) 

Has anyone got any experience in predicting from models that include an offset 
term? Am I not specifying the offset term correctly in the model? Please get 
back to me if you have the slightest idea of what is going on. Or if you would 
know of another way than offset for my purposes


I include below reproducible code with dummy data. Models do not fit, but they 
work. 

Thank you

Samuel Riou

##
AGE- c(1:10) 
MassChange24h-c(10,8,6,4,2,0,-2,-4,-6,-8)
T1-c(10,11,12,13,14,15,16,17,18,19)  ### variable for which I want the effect, 
taking into acount the known effect of AGE
T-c(A,B,A,B,A,B,A,B,A,B) ## added for testing 
T-c(1,2,3,4,5,6,5,4,3,2) ## added for testing 

#no offset 
lmA-glm(MassChange24h~T1, na.action=na.exclude, family=gaussian)
summary(lmA)
fitted(lmA)
#linear offset
lmAO-glm(MassChange24h~T1+offset(-2*AGE),  
family=gaussian,na.action=na.exclude) ### model 
lmAO-glm(MassChange24h~T1+offset(AGE),  family=gaussian,na.action=na.exclude)
lmAO-glm(MassChange24h~AGE,  family=gaussian,na.action=na.exclude) ###the 
fitted values from the offset model are the same as from this one!
summary(lmAO) ## table is fine, shows the effect of T1, taking into account the 
offset
fitted(lmAO)  ## Problem : getting same values as for model lmA
nd1-expand.grid(T1=c(10,11,12,13,14,15,16,17,18,19))
Pred-predict(lmA, nd1, type=response)
nd1-expand.grid(T1=c(10,11,12,13,14,15,16,17,18,19))
Pred-predict(lmAO, nd1, type=response)  get same values as for model lmA 
, and changing T variable in the offset model, again i get the same predicted 
values...very strange
#


- Message d'origine 
De : [EMAIL PROTECTED] [EMAIL PROTECTED]
À : [EMAIL PROTECTED]
Envoyé le : Dimanche, 12 Octobre 2008, 20h16mn 36s
Objet : RE: [R] using predict() or fitted() from a model with offset

hi: I haven't use fitted much but when you used predict, did you send in 
the new dataframe ? the code below says that you didn't
but i don't know if that would fix it anyway.



On Sun, Oct 12, 2008 at  6:36 AM, [EMAIL PROTECTED] wrote:

 Dear R-users,

 I have come across some difficulties using the offset argument in a 
 model. There is not much information in ?offset, ?lm, ?glm and 
 therefore have resorted to post here.
 Offset appears to work in the models I run becuase I get the expected 
 coefficients when comparing offset and non-offset models . BUT the 
 fitted values obtained from fitted() are identical in both models!! 
 Why is this? Is there an argument to add to fitted() so that it takes 
 the offset into accout? Note that I have included the offset in the 
 formula, as seen below in the code.
 I have also tried to use predict, with exactly the same result: the 
 offset is ignored. This applies to both lms and glms.

 Am I missing something here?
 Thank you
 Samuel Riou

 CODE
 #no offset lmA-lm(MassChange24h~DATEN1, subset(Chicks1, Year==2007   
 AGE10), na.action=na.exclude)
 summary(lmA)

 #linear offset
 lmAO-lm(MassChange24h~DATEN1+offset(-0.37356*AGE), subset(Chicks1, 
 Year==2007   AGE10), na.action=na.exclude)
 summary(lmAO)


 print(Chicks$DATEN1[Year==2007   AGE10])
 print(t(fitted(lmA)))
 NEW-cbind(as.vector(t(fitted(lmA))), Chicks$DATEN1[Year==2007   
 AGE10])
 NEW-as.data.frame(NEW)
 m1-aggregate(NEW[1],NEW[2],mean, na.rm=TRUE)
 plot(m1$V1~m1$V2, pch=20, col=black) Pred-predict(lmA)

 print(Chicks$DATEN1[Year==2007   AGE10])
 print(t(fitted(lmAO)))
 NEW-cbind(as.vector(t(fitted(lmAO))), Chicks$DATEN1[Year==2007   
 AGE10])
 NEW-as.data.frame(NEW)
 m1-aggregate(NEW[1],NEW[2],mean, na.rm=TRUE)
 points(m1$V1~m1$V2, pch=20, col=red) ###but the fitted values dont 
 seem to take into account the offset
 Pred-predict(lmAO)





 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

Re: [R] Logistic Regression - Interpreting SENS (Sensitivity) and SPEC (Specificity)

2008-10-13 Thread Prof Brian Ripley


On Mon, 13 Oct 2008, Peter Dalgaard wrote:


Dieter Menne wrote:

Maithili Shiva maithili_shiva at yahoo.com writes:


I havd main sample of 42500 clentes and
based on their status as regards to defaulted / non - defaulted, I have

genereted the probability of default.

I have a hold out sample of 5000 clients. I have calculated (1) No of

correctly classified goods Gg, (2) No of

correcly classified Bads Bg and also (3) number of wrongly classified bads

(Gb) and (4) number of wrongly

classified goods (Bg).


The simple and wrong answer is to use these data directly to compute 
sensitivity
(fraction of hits). This measure is useless, but I encounter it often in 
medical

publications.

You can get a more reasonable answer by using cross-validation. Check, for
example, Frank Harrell's 
http://biostat.mc.vanderbilt.edu/twiki/pub/Main/RmS/logistic.val.pdf


But if he has a hold out sample, isn't he already cross-validating??  I 
wonder if you're answering the right question there. Could he just be looking 
for Sp=Gg/(Gg+Bg), Se=Bb/(Gb+Bb)? (If I got the notation right.)


Strictly no, she is 'validating' (no cross- involved).  Cross-validation 
would be a better idea for much smaller sample sizes (we don't know how 
many regressors are involved, so say hundreds unless there are more than 
ten regressors).


--
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] na.pass

2008-10-13 Thread jim holtman

If you want to remove the N, then you can work with the indices:

 x
 [1] NA  B NA  B B NA  N A B B A NA  A N N N
A B B A
 # if you want the indices of the non-N, then
 (indx - which(is.na(x) | x != N))
 [1]  1  2  3  4  5  6  8  9 10 11 12 13 17 18 19 20
 x[indx]
 [1] NA  B NA  B B NA  A B B A NA  A A B B A



On Mon, Oct 13, 2008 at 7:48 AM, Laura Bonnett
[EMAIL PROTECTED] wrote:
 I have a data frame.  It has lots of patient information, their age, their
 gender, etc etc.  I need to keep all this information whilst selecting
 relevant rows.  So, in the example of code I provided I want to remove all
 those patients who have entry N in the column with.Wcode.  The dimension of
 the data is 378 i.e. 378 patients and currently I am replacing any entries
 in column with.Wcode with the letter O as this is another level of the same
 column.  Does that make more sense?
 nep - function(data)
 {
 dummy - rep(0,378)
 for(i in 1:378){
 if(is.na(data$with.Wcode)[i])
 data$with.Wcode[i] - O
 }
 for(i in 1:378){
 if(data$with.Wcode[i]==N)
 dummy[i] - i
 }
 return(data[-dummy,])
 }

 How can I therefore not replace NA with level O but instead, ignore the NAs
 and effectively gloss over them?

 Thank you,

 Laura


 On Mon, Oct 13, 2008 at 12:42 PM, jim holtman [EMAIL PROTECTED] wrote:

 Not sure exactly what you are trying to do since you did not provide
 commented, minimal, self-contained, reproducible code.  Let me take a
 guess in that you also have to test for NAs:

  x - sample(c(N, A, B, NA), 20, TRUE)
  x
  [1] A A B NA  N NA  NA  B B N N N B A NA  A
 B NA  A NA
  x != N
  [1]  TRUE  TRUE  TRUENA FALSENANA  TRUE  TRUE FALSE FALSE
 FALSE  TRUE  TRUENA  TRUE  TRUENA
 [19]  TRUENA
  x[x != N]
  [1] A A B NA  NA  NA  B B B A NA  A B NA  A NA
  (!is.na(x))  (x != N)
  [1]  TRUE  TRUE  TRUE FALSE FALSE FALSE FALSE  TRUE  TRUE FALSE FALSE
 FALSE  TRUE  TRUE FALSE  TRUE  TRUE FALSE
 [19]  TRUE FALSE
  x[(!is.na(x))  (x != N)]
  [1] A A B B B B A A B A
 


 On Mon, Oct 13, 2008 at 7:15 AM, Laura Bonnett
 [EMAIL PROTECTED] wrote:
  Hi All,
 
  I have a data frame which has columns comprised mainly of NAs.  I know
  there are functions na.pass and na.omit etc which can be used in these
  situations however I can't them to work in this case.  I have a function
  which returns the data according to some rule i.e. removal of N in this
  code:
 
  nep - function(data)
 {
 dummy - rep(0,378)
 for(i in 1:378){
 if(is.na(data$with.Wcode)[i])
 data$with.Wcode[i] - O
 }
 for(i in 1:378){
 if(data$with.Wcode[i]==N)
 dummy[i] - i
 }
 return(data[-dummy,])
 }
 
  However, I really don't want to replace the NAs with O.  I'd just like
  to
  gloss over them.  I can't just delete them because the structure of the
  data
  frame needs to be maintained.  Can anyone suggest how I can write in a
  line
  or two to ignore the NAs instead of replacing them?  I've tried this
  code
  but it doesn't work!
 
  nep - function(data)
 {
 dummy - rep(0,378)
 for(i in 1:378){
 na.pass(data$with.Wcode[i])
 if(data$with.Wcode[i]==N)
 dummy[i] - i
 }
 return(data[-dummy,])
 }
 
 
  Thank you,
 
  Laura
 
 [[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 



 --
 Jim Holtman
 Cincinnati, OH
 +1 513 646 9390

 What is the problem that you are trying to solve?





-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Timestamps and manipulations

2008-10-13 Thread jim holtman

Is this what you want:

 x.df
 timestamp   id
1  2008-05-27 22:57:00 763830873067
2  2008-05-27 23:00:00 763830873067
3  2008-05-27 23:01:00 763830873067
4  2008-05-27 23:01:00 763830873067
5  2008-06-05 11:34:00 763830873067
6  2008-05-29 23:08:00 765253440317
7  2008-05-29 23:06:00 765253440317
8  2008-05-29 22:52:00 765253440317
9  2008-05-29 22:52:00 765253440317
10 2008-05-29 23:04:00 765253440317
11 2008-06-27 19:34:00 765253440317
12 2008-07-09 15:45:00 765329002557
13 2008-07-06 19:24:00 765329002557
14 2008-07-09 15:46:00 765329002557
15 2008-07-07 13:05:00 765329002557
16 2008-05-16 22:40:00 765329002557
17 2008-06-08 11:24:00 765329002557
18 2008-06-08 12:33:00 765329002557
 x.df$time - ifelse(x.df$timestamp  as.POSIXct(2008-07-01), 1, 0)
 x.df
 timestamp   id time
1  2008-05-27 22:57:00 7638308730670
2  2008-05-27 23:00:00 7638308730670
3  2008-05-27 23:01:00 7638308730670
4  2008-05-27 23:01:00 7638308730670
5  2008-06-05 11:34:00 7638308730670
6  2008-05-29 23:08:00 7652534403170
7  2008-05-29 23:06:00 7652534403170
8  2008-05-29 22:52:00 7652534403170
9  2008-05-29 22:52:00 7652534403170
10 2008-05-29 23:04:00 7652534403170
11 2008-06-27 19:34:00 7652534403170
12 2008-07-09 15:45:00 7653290025571
13 2008-07-06 19:24:00 7653290025571
14 2008-07-09 15:46:00 7653290025571
15 2008-07-07 13:05:00 7653290025571
16 2008-05-16 22:40:00 7653290025570
17 2008-06-08 11:24:00 7653290025570
18 2008-06-08 12:33:00 7653290025570
 # time difference by id
 sapply(split(x.df$timestamp, x.df$id), function(.time){
+ difftime(max(.time), min(.time), units='secs')
+ })
763830873067 765253440317 765329002557
  736620  2493720  4640760




On Mon, Oct 13, 2008 at 6:57 AM, Michael Pearmain [EMAIL PROTECTED] wrote:
 Hi All,
 I've a couple of questions i've been struggling with using the time
 features, can anyone help? sample data

   Timestampuser_id
  27/05/08 22:57 763830873067  27/05/08 23:00 763830873067  27/05/08 23:01
 763830873067  27/05/08 23:01 763830873067  05/06/08 11:34 763830873067
  29/05/08
 23:08 765253440317  29/05/08 23:06 765253440317  29/05/08 22:52
 765253440317  29/05/08
 22:52 765253440317  29/05/08 23:04 765253440317  27/06/08 19:34
 765253440317  09/07/08
 15:45 765329002557  06/07/08 19:24 765329002557  09/07/08 15:46
 765329002557  07/07/08
 13:05 765329002557  16/05/08 22:40 765329002557  08/06/08 11:24
 765329002557  08/06/08
 12:33 765329002557

 My first question is how can i create a new var creating a filter based on a
 date?

 I've tried as.POSIXct.strptime below as well but to no avail.. can anyone
 give any advice?

Mcookie$timestamp - as.POSIXct(strptime(Mcookie$timestamp,%m/%d/%Y
 %H:%M))
Mcookie$time - ifelse(Mcookie$timestamp 
 strptime(07-08-2008-00:00,%m-%d-%Y-%H:%M,1,0)

 My second questions refers to finding the time difference in seconds between
 the first time a user sees something Vs the last.. and engagment time
 essentially,
 i see there is the difftime function,  is there a more elegant way of
 working this out then my thoughts (Pysdo code below)

 sort data by user_id and Timestamp
 take the head of user_id as new_time_var
 take the tail of user_id as new_time_var2
 use difftime(new_time_var, new_time_var2, units=secs)

 Mike

[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] na.pass

2008-10-13 Thread jim holtman

Not sure exactly what you are trying to do since you did not provide
commented, minimal, self-contained, reproducible code.  Let me take a
guess in that you also have to test for NAs:

 x - sample(c(N, A, B, NA), 20, TRUE)
 x
 [1] A A B NA  N NA  NA  B B N N N B A NA  A
B NA  A NA
 x != N
 [1]  TRUE  TRUE  TRUENA FALSENANA  TRUE  TRUE FALSE FALSE
FALSE  TRUE  TRUENA  TRUE  TRUENA
[19]  TRUENA
 x[x != N]
 [1] A A B NA  NA  NA  B B B A NA  A B NA  A NA
 (!is.na(x))  (x != N)
 [1]  TRUE  TRUE  TRUE FALSE FALSE FALSE FALSE  TRUE  TRUE FALSE FALSE
FALSE  TRUE  TRUE FALSE  TRUE  TRUE FALSE
[19]  TRUE FALSE
 x[(!is.na(x))  (x != N)]
 [1] A A B B B B A A B A



On Mon, Oct 13, 2008 at 7:15 AM, Laura Bonnett
[EMAIL PROTECTED] wrote:
 Hi All,

 I have a data frame which has columns comprised mainly of NAs.  I know
 there are functions na.pass and na.omit etc which can be used in these
 situations however I can't them to work in this case.  I have a function
 which returns the data according to some rule i.e. removal of N in this
 code:

 nep - function(data)
{
dummy - rep(0,378)
for(i in 1:378){
if(is.na(data$with.Wcode)[i])
data$with.Wcode[i] - O
}
for(i in 1:378){
if(data$with.Wcode[i]==N)
dummy[i] - i
}
return(data[-dummy,])
}

 However, I really don't want to replace the NAs with O.  I'd just like to
 gloss over them.  I can't just delete them because the structure of the data
 frame needs to be maintained.  Can anyone suggest how I can write in a line
 or two to ignore the NAs instead of replacing them?  I've tried this code
 but it doesn't work!

 nep - function(data)
{
dummy - rep(0,378)
for(i in 1:378){
na.pass(data$with.Wcode[i])
if(data$with.Wcode[i]==N)
dummy[i] - i
}
return(data[-dummy,])
}


 Thank you,

 Laura

[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Gower distance between a individual and a population

2008-10-13 Thread cgenolin


Hi the list,

I need to compute Gower distance between a specific individual and all 
the other  individual.


The function DAISY from package cluster compute all the pairwise 
dissimilarities of a population. If the population is N individuals, 
that is arround N^2 distances to compute.


I need to compute the distance between a specific individual and all 
the other  individual, that is only N distances to compute. Is there a 
function that can do it ?


Christophe

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] SAS Data

2008-10-13 Thread Stefo Ratino

Hello everybody,

I would like to read a SAS Data data1.sas7bdat in R! Is this possible?

Thank you a lot in advance ;),
Stefo

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] SAS Data

2008-10-13 Thread Peter Dalgaard

Stefo Ratino wrote:
 Hello everybody,

 I would like to read a SAS Data data1.sas7bdat in R! Is this possible?
   

Only if you have access to SAS. The file format is proprietary and noone
has bothered to decipher it.
 Thank you a lot in advance ;),
 Stefo

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
   


-- 
   O__   Peter Dalgaard Øster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark  Ph:  (+45) 35327918
~~ - ([EMAIL PROTECTED])  FAX: (+45) 35327907

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] SAS Data

2008-10-13 Thread Shubha Vishwanath Karanth

library(foreign)
Rdata=read.ssd(Z:/MyFolder,data1,sascmd = C:/Program Files/SAS/SAS 
9.1/sas.exe)

Rdata
Y D1 D2 D3
1 100  1  0  0
2 101  1  0  0
3 105  1  0  0
4 200  0  1  0
5 201  0  1  0
6 205  0  1  0
7 300  0  0  1
8 301  0  0  1
9 305  0  0  1

where 'data1' is the SAS datafile to be read, ' Z:\\MyFolder' is the physical 
path where 'data1.sas7bdat' is situated and 'sascmd' refers to the path where 
SAS is installed in your system. Please confirm that your SAS is installed in 
the same path. Also note that you have to install the 'foreign' package to do 
this.

BR, Shubha
Shubha Karanth | Amba Research
Ph +91 80 3980 8031 | Mob +91 94 4886 4510 
Bangalore * Colombo * London * New York * San José * Singapore * 
www.ambaresearch.com

-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Stefo Ratino
Sent: Monday, October 13, 2008 5:20 PM
To: r-help@r-project.org
Subject: [R] SAS Data

Hello everybody,

I would like to read a SAS Data data1.sas7bdat in R! Is this possible?

Thank you a lot in advance ;),
Stefo

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
This e-mail may contain confidential and/or privileged i...{{dropped:10}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Creating GUIs for R

2008-10-13 Thread Michael Lawrence

On Sun, Oct 12, 2008 at 4:50 PM, Dirk Eddelbuettel [EMAIL PROTECTED] wrote:


 On 12 October 2008 at 12:53, cls59 wrote:
 | On a related note... does anyone know good resources for binding a C++
 | program to the R library?

 RCpp, at http://rcpp.r-forge.r-project.org, formerly known as
 RCppTemplate,
 is pretty mature and testing having been around since 2004 or 2005.
 Introductory documentation could be better, feedback welcome.

 | Basically, I would like to start with just a plain vanilla R session
 running
 | inside a Qt widget. Any suggestions?


Isn't RKWard a Qt-based GUI for R? They probably have some reusable console
code in there.



 Deepayan once did just that in a test application. I am not sure if that
 was
 ever made public.

 Cheers, Dirk

 --
 Three out of two people have difficulties with fractions.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Sweave from Kile

2008-10-13 Thread Matthieu Stigler


Hello

Does anybody have experience with Sweave run from Kile? I'm trying to 
make it run but have problems and don't know if the instructions are 
false or I do something wrong (my knowledge in bash and shell is too low 
to understand it)...


I discovered recently Sweave and wanted to run it from my latex editor, 
Kile. I found and followed these instructions:
If you want to be able to call Sweave outside of R, you will need to 
install a shell
script (see footnote 4). To install the script, copy it to 
/usr/local/bin, then open
the Konsole program and type sudo chmod +x /usr/local/bin/Sweave.sh to 
make it executable.
Next, you may want to tell Kile where to find the Sweave.sh shell 
script. Open
Kile and click Settings → Configure Kile. Click the Tools tab on the 
left-hand
side of the preferences window, and select Build. Click the New Tool 
button at the
bottom of the preferences window. Name the new tool Sweave, click 
next, and then
Finish. In the resulting screen, type Sweave.sh in the top box, and 
−ld \%source' in the

bottom box.

I followed these instructions but have 2 problems:
1: finished with exit status 126

SweaveOnly output:
* cd '/media/Partition_Commune/Mes documents/Ordi/LaTex/Sweave'
* Sweave.sh −ld '\example1Leisch.Rnw'
*
/bin/bash: /usr/local/bin/Sweave.sh: Permission non accordée

in english: permission not given
Do you see where the problem is?

2: If I run kile with sudo (sudo Kile), the problem disappears but a new 
one comes

SweaveOnly output:
* cd '/media/Partition_Commune/Mes documents/Ordi/LaTex/Sweave'
* Sweave.sh −ld '\example1Leisch.Rnw'
*
Run Sweave and postprocess with LaTeX directly from command line
−ld is not a supported file type!
It should be one of: .lyx, .Rnw, .Snw., .nw or .tex

Is the instructions false? Or do I do something wrong?

Thank you much for your help!!

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Sweave from Kile

2008-10-13 Thread cls59


I use Sweave on a Mac from my LaTeX editor and it requires a similar shell
script to typeset the document. Mine uses R it's self (instead of sh) and
looks like:

 #!/usr/bin/Rscript
args - commandArgs(T)
 
fname - strsplit(args[1],'\\.')[[1]][1]
 
Sweave(paste(fname,'Rnw',sep='.'))

system(paste('pdflatex',paste(fname,'tex',sep='.')))


This script is run as so:

sweave /path/to/source.Rnw

It works pretty good, although I think I used: 

chmod 755 sweave

To make sure it would execute. You can also check ls -l to make sure you
have ownership of the file, if not you might need to hit it with chown.

-Charlie


Matthieu Stigler-2 wrote:
 
 Hello
 
 Does anybody have experience with Sweave run from Kile? I'm trying to 
 make it run but have problems and don't know if the instructions are 
 false or I do something wrong (my knowledge in bash and shell is too low 
 to understand it)...
 
 I discovered recently Sweave and wanted to run it from my latex editor, 
 Kile. I found and followed these instructions:
 If you want to be able to call Sweave outside of R, you will need to 
 install a shell
 script (see footnote 4). To install the script, copy it to 
 /usr/local/bin, then open
 the Konsole program and type sudo chmod +x /usr/local/bin/Sweave.sh to 
 make it executable.
 Next, you may want to tell Kile where to find the Sweave.sh shell 
 script. Open
 Kile and click Settings → Configure Kile. Click the Tools tab on the 
 left-hand
 side of the preferences window, and select Build. Click the New Tool 
 button at the
 bottom of the preferences window. Name the new tool Sweave, click 
 next, and then
 Finish. In the resulting screen, type Sweave.sh in the top box, and 
 −ld \%source' in the
 bottom box.
 I followed these instructions but have 2 problems:
 1: finished with exit status 126
 SweaveOnly output:
 * cd '/media/Partition_Commune/Mes documents/Ordi/LaTex/Sweave'
 * Sweave.sh −ld '\example1Leisch.Rnw'
 *
 /bin/bash: /usr/local/bin/Sweave.sh: Permission non accordée
 in english: permission not given
 Do you see where the problem is?
 
 2: If I run kile with sudo (sudo Kile), the problem disappears but a new 
 one comes
 SweaveOnly output:
 * cd '/media/Partition_Commune/Mes documents/Ordi/LaTex/Sweave'
 * Sweave.sh −ld '\example1Leisch.Rnw'
 *
 Run Sweave and postprocess with LaTeX directly from command line
 −ld is not a supported file type!
 It should be one of: .lyx, .Rnw, .Snw., .nw or .tex
 Is the instructions false? Or do I do something wrong?
 
 Thank you much for your help!!
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 


-
Charlie Sharpsteen
Undergraduate
Environmental Resources Engineering
Humboldt State University
-- 
View this message in context: 
http://www.nabble.com/Sweave-from-Kile-tp19955007p19955763.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Gower distance between a individual and a population

2008-10-13 Thread cgenolin




If you used daisy, is there a problem with converting the resulting
object to a full dissimilarity matrix and extracting the relevant
row/column you need for the target site?


Well, the lost of efficiantcy is huge. I need to compute the distance 
several time on data base that count 1000 or even 10 000 subjects. 10 
000^2 cost a lot in term of time, whereas 10 000 does not.


A solution would be to re-write DAISY and adapt it. But since I do not 
know fortran, I prefers first to ask if someone already did it...


Christophe

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] bivariate non-parametric smoothing

2008-10-13 Thread Simon Wood

The functions `gam' in packages `mgcv' or `gam', will do this, as will the 
ssanova* functions in `gss' (there are numerous alternatives)

On Friday 10 October 2008 23:32, Verschuere Benjamin wrote:
 Hi,

 I was wondering  if there is a function in R which performs bivariate non
 parametric smoothing which allows for the possibility of including some
 weights in the smoothing (for each data points in my grid I have some
 predefined weights that I would like to include in the smoothing).

 Thanks,

 Ben
 _


   [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html and provide commented, minimal,
 self-contained, reproducible code.

-- 
 Simon Wood, Mathematical Sciences, University of Bath, Bath, BA2 7AY UK
 +44 1225 386603  www.maths.bath.ac.uk/~sw283

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Fw: Logistic regresion - Interpreting (SENS) and (SPEC)

2008-10-13 Thread Pedro.Rodriguez

Hi Maithili,

There are two good papers that illustrate how to compare classifiers
using Sensitivity and Specificity and their extensions (e.g., likelihood
ratios, young index, KL distance, etc).

See: 
1) Biggerstaff, Brad, 2000, Comparing diagnostic tests: a simple
graphic using likelihood ratios, Statistics in Medicine, 19:649-663.

2) Lee, Wen-Chung, 1999, Selecting diagnostic tests for ruling out or
ruling in disease: the use of the Kllback-Leibler distance,
International Epidemiological Association, 28:521-525.

Please let me know if have problems finding the aforementioned papers.

Kind Regards,

Pedro


-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
On Behalf Of Maithili Shiva
Sent: Monday, October 13, 2008 3:28 AM
To: r-help@r-project.org
Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED]
Subject: [R] Fw: Logistic regresion - Interpreting (SENS) and (SPEC)


Dear Mr Peter Dalgaard and Mr Dieter Menne,

I sincerely thank you for helping me out with my problem. The thing is
taht I already have calculated SENS = Gg / (Gg + Bg) = 89.97%
and SPEC = Bb / (Bb + Gb) = 74.38%.

Now I have values of SENS and SPEC, which are absolute in nature. My
question was how do I interpret these absolue values. How does these
values help me to find out wheher my model is good.

With regards

Ms Maithili Shiva








 Subject: [R] Logistic regresion - Interpreting (SENS) and (SPEC)
 To: r-help@r-project.org
 Date: Friday, October 10, 2008, 5:54 AM
 Hi
 
 Hi I am working on credit scoring model using logistic
 regression. I havd main sample of 42500 clentes and based on
 their status as regards to defaulted / non - defaulted, I
 have genereted the probability of default.
 
 I have a hold out sample of 5000 clients. I have calculated
 (1) No of correctly classified goods Gg, (2) No of correcly
 classified Bads Bg and also (3) number of wrongly classified
 bads (Gb) and (4) number of wrongly classified goods (Bg).
 
 My prolem is how to interpret these results? What I have
 arrived at are the absolute figures.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained,
 reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Fw: Logistic regresion - Interpreting (SENS) and (SPEC)

2008-10-13 Thread Dieter Menne

 Pedro.Rodriguez at sungard.com writes:

 There are two good papers that illustrate how to compare classifiers
 using Sensitivity and Specificity and their extensions (e.g., likelihood
 ratios, young index, KL distance, etc).
 
 See: 
 1) Biggerstaff, Brad, 2000, Comparing diagnostic tests: a simple
 graphic using likelihood ratios, Statistics in Medicine, 19:649-663.
 
 2) Lee, Wen-Chung, 1999, Selecting diagnostic tests for ruling out or
 ruling in disease: the use of the Kllback-Leibler distance,
 International Epidemiological Association, 28:521-525.
 

Both papers refer to medical applications, and even the most basic books on
medical statistics explain the concepts in the context of incidence and
prevalance of a disease. Interpreting sensitivity and specificity is much more a
problem of the context than one of R and statistics: note that her application
was in econometrics.

Dieter

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Timestamps and manipulations

2008-10-13 Thread Henrique Dallazuanna

Try this:

x$Timestamp - as.POSIXct(strptime(as.character(x$Timestamp), %d/%m/%y
%H:%M))
x$time - as.numeric(x$Timestamp  as.POSIXct(strptime(07-08-2008 00:00,
%d-%m-%Y %H:%M)))
with(x, tapply(Timestamp, user_id, function(x)diff(range(x), units=secs),
simplify = F))


On Mon, Oct 13, 2008 at 7:57 AM, Michael Pearmain [EMAIL PROTECTED]wrote:

 Hi All,
 I've a couple of questions i've been struggling with using the time
 features, can anyone help? sample data

   Timestampuser_id
  27/05/08 22:57 763830873067  27/05/08 23:00 763830873067  27/05/08 23:01
 763830873067  27/05/08 23:01 763830873067  05/06/08 11:34 763830873067
  29/05/08
 23:08 765253440317  29/05/08 23:06 765253440317  29/05/08 22:52
 765253440317  29/05/08
 22:52 765253440317  29/05/08 23:04 765253440317  27/06/08 19:34
 765253440317  09/07/08
 15:45 765329002557  06/07/08 19:24 765329002557  09/07/08 15:46
 765329002557  07/07/08
 13:05 765329002557  16/05/08 22:40 765329002557  08/06/08 11:24
 765329002557  08/06/08
 12:33 765329002557

 My first question is how can i create a new var creating a filter based on
 a
 date?

 I've tried as.POSIXct.strptime below as well but to no avail.. can anyone
 give any advice?

 Mcookie$timestamp - as.POSIXct(strptime(Mcookie$timestamp,%m/%d/%Y
 %H:%M))
 Mcookie$time - ifelse(Mcookie$timestamp 
 strptime(07-08-2008-00:00,%m-%d-%Y-%H:%M,1,0)

 My second questions refers to finding the time difference in seconds
 between
 the first time a user sees something Vs the last.. and engagment time
 essentially,
 i see there is the difftime function,  is there a more elegant way of
 working this out then my thoughts (Pysdo code below)

 sort data by user_id and Timestamp
 take the head of user_id as new_time_var
 take the tail of user_id as new_time_var2
 use difftime(new_time_var, new_time_var2, units=secs)

 Mike

[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Henrique Dallazuanna
Curitiba-Paraná-Brasil
25° 25' 40 S 49° 16' 22 O

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Error in plot.gam?

2008-10-13 Thread Chris Taylor

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Error in gam plot?

2008-10-13 Thread Chris Taylor

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Simulations using differential equations

2008-10-13 Thread pomchip



Dear R-users,

I am trying to perform some simulations from a model defined by ordinary
differential equations. I would be grateful if someone could indicate some
functions/packages/examples I could look at.

Thank you in advance.
Sebastien

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] na.pass

2008-10-13 Thread Laura Bonnett

Hi All,

I have a data frame which has columns comprised mainly of NAs.  I know
there are functions na.pass and na.omit etc which can be used in these
situations however I can't them to work in this case.  I have a function
which returns the data according to some rule i.e. removal of N in this
code:

nep - function(data)
{
dummy - rep(0,378)
for(i in 1:378){
if(is.na(data$with.Wcode)[i])
data$with.Wcode[i] - O
}
for(i in 1:378){
if(data$with.Wcode[i]==N)
dummy[i] - i
}
return(data[-dummy,])
}

However, I really don't want to replace the NAs with O.  I'd just like to
gloss over them.  I can't just delete them because the structure of the data
frame needs to be maintained.  Can anyone suggest how I can write in a line
or two to ignore the NAs instead of replacing them?  I've tried this code
but it doesn't work!

nep - function(data)
{
dummy - rep(0,378)
for(i in 1:378){
na.pass(data$with.Wcode[i])
if(data$with.Wcode[i]==N)
dummy[i] - i
}
return(data[-dummy,])
}


Thank you,

Laura

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Simulations using differential equations

2008-10-13 Thread megha patnaik

See http://cran.r-project.org/web/packages/odesolve/index.html

2008/10/13 [EMAIL PROTECTED]



 Dear R-users,

 I am trying to perform some simulations from a model defined by ordinary
 differential equations. I would be grateful if someone could indicate some
 functions/packages/examples I could look at.

 Thank you in advance.
 Sebastien

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Simulations using differential equations

2008-10-13 Thread megha patnaik

and http://cran.at.r-project.org/web/packages/deSolve/

2008/10/13 megha patnaik [EMAIL PROTECTED]

 See http://cran.r-project.org/web/packages/odesolve/index.html

 2008/10/13 [EMAIL PROTECTED]



 Dear R-users,

 I am trying to perform some simulations from a model defined by ordinary
 differential equations. I would be grateful if someone could indicate some
 functions/packages/examples I could look at.

 Thank you in advance.
 Sebastien

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] correlation structure in gls or lme/lmer with several observations per day

2008-10-13 Thread Olivier Renaud


Hi,

To simplify, suppose I have 2 observations each day for three days. I 
would like to define the correlation structure of these 6 observations 
as follows: the correlation of 2 observations on the same day is, say, 
alpha, the correlation for 2 observations one day apart is rho and the 
correlation for 2 observations 2 days apart is rho^2. I.e. I would like 
to have an AR1 correlation + a correlation for the same day. I tried 
with gls and lme from the nlme package, but with no success. One 
difficulty arises since corAR1 seems to require only one observation per 
day (see example below). Any idea on how to implement it, either with 
special correlation structures, or through random effects in lme/lmer ? 
should I try to define a new correlation structure corMultiAR1 ? If 
so, where can I find help on how to write such a piece of code ( 
nlme:::corAR1 is not clear to me) ?


Or is there a way to define a general parametrised covariance matrix in gls ?


Olivier




obs6 - matrix( c(1,2,3,4,5,6, 1,1,2,2,3,3), byrow=F, nc=2)
dimnames(obs6) - list(NULL, c(y,time))
obs6 - data.frame(obs6)
obs6

y time
1 11
2 21
3 32
4 42
5 53
6 63

gls (y~1, correl=corAR1(0.0,~time), data=obs6)

Error in Initialize.corAR1(X[[1L]], ...) :
Covariate must have unique values within groups for corAR1 objects


--
Olivier Renaud  http://www.unige.ch/~renaud/
Methodology  Data Analysis - Psychology Dept - University of Geneva
UniMail, Office 4142  -  40, Bd du Pont d'Arve   -  CH-1211 Geneva 4

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] subsetting dataframe by rownames to be excluded

2008-10-13 Thread Dieter Menne

Prof Brian Ripley ripley at stats.ox.ac.uk writes:

 Yes: DF[is.na(match(row.names(DF), exclude_me)), ]

Assuming everything is possible in R: would it be possible to make the below
work without breaking existing code?


a - data.frame(x=1:10)
rownames(a) = letters[1:10]
exclude = c(a,c)
a[is.na(match(row.names(a), exclude)), ] # not really that easy to remember

a[-c(1,3),]
# In analogy
a[-c(exclude),]   #invalid argument to unary operator
 

Dieter

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Timestamps and manipulations

2008-10-13 Thread Michael Pearmain

Hi All,
I've a couple of questions i've been struggling with using the time
features, can anyone help? sample data

   Timestampuser_id
  27/05/08 22:57 763830873067  27/05/08 23:00 763830873067  27/05/08 23:01
763830873067  27/05/08 23:01 763830873067  05/06/08 11:34 763830873067
 29/05/08
23:08 765253440317  29/05/08 23:06 765253440317  29/05/08 22:52
765253440317  29/05/08
22:52 765253440317  29/05/08 23:04 765253440317  27/06/08 19:34
765253440317  09/07/08
15:45 765329002557  06/07/08 19:24 765329002557  09/07/08 15:46
765329002557  07/07/08
13:05 765329002557  16/05/08 22:40 765329002557  08/06/08 11:24
765329002557  08/06/08
12:33 765329002557

My first question is how can i create a new var creating a filter based on a
date?

I've tried as.POSIXct.strptime below as well but to no avail.. can anyone
give any advice?

Mcookie$timestamp - as.POSIXct(strptime(Mcookie$timestamp,%m/%d/%Y
%H:%M))
Mcookie$time - ifelse(Mcookie$timestamp 
strptime(07-08-2008-00:00,%m-%d-%Y-%H:%M,1,0)

My second questions refers to finding the time difference in seconds between
the first time a user sees something Vs the last.. and engagment time
essentially,
i see there is the difftime function,  is there a more elegant way of
working this out then my thoughts (Pysdo code below)

sort data by user_id and Timestamp
take the head of user_id as new_time_var
take the tail of user_id as new_time_var2
use difftime(new_time_var, new_time_var2, units=secs)

Mike

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] ldBands (Hmisc)

2008-10-13 Thread David Afshartous


All,

I'm getting the same error message as that discussed in a previous post (Feb
3, 2006).  The reply to that post was to insure that the ld98 program was in
the system path (as also suggested in the help on ldBands).  I have done
this but this does not change the result.  Any advice much appreciated.

David



 sessionInfo()
R version 2.7.2 (2008-08-25)
i386-pc-mingw32 

locale:
LC_COLLATE=English_United States.1252;LC_CTYPE=English_United
States.1252;LC_MONETARY=English_United
States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

other attached packages:
[1] Hmisc_3.4-3

loaded via a namespace (and not attached):
[1] cluster_1.11.11 grid_2.7.2  lattice_0.17-13 tools_2.7.2

## run example from help on ldBands
 b - ldBands(5, pr=FALSE)
Error in (head + 1):length(w) : NA/NaN argument


## my Path variable is specified as follows on Windows XP, with the ld98
sitting in the WinLD directory:
C:\Program Files\MiKTeX
2.6\miktex\bin;%SystemRoot%\system32;%SystemRoot%;%SystemRoot%\System32\Wbem
;C:\Program Files\WinLD

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] na.pass

2008-10-13 Thread Laura Bonnett

I have a data frame.  It has lots of patient information, their age, their
gender, etc etc.  I need to keep all this information whilst selecting
relevant rows.  So, in the example of code I provided I want to remove all
those patients who have entry N in the column with.Wcode.  The dimension of
the data is 378 i.e. 378 patients and currently I am replacing any entries
in column with.Wcode with the letter O as this is another level of the same
column.  Does that make more sense?
nep - function(data)
{
dummy - rep(0,378)
for(i in 1:378){
if(is.na(data$with.Wcode)[i])
data$with.Wcode[i] - O
}
for(i in 1:378){
if(data$with.Wcode[i]==N)
dummy[i] - i
}
return(data[-dummy,])
}

How can I therefore not replace NA with level O but instead, ignore the NAs
and effectively gloss over them?

Thank you,

Laura


On Mon, Oct 13, 2008 at 12:42 PM, jim holtman [EMAIL PROTECTED] wrote:

 Not sure exactly what you are trying to do since you did not provide
 commented, minimal, self-contained, reproducible code.  Let me take a
 guess in that you also have to test for NAs:

  x - sample(c(N, A, B, NA), 20, TRUE)
  x
  [1] A A B NA  N NA  NA  B B N N N B A NA  A
 B NA  A NA
  x != N
  [1]  TRUE  TRUE  TRUENA FALSENANA  TRUE  TRUE FALSE FALSE
 FALSE  TRUE  TRUENA  TRUE  TRUENA
 [19]  TRUENA
  x[x != N]
  [1] A A B NA  NA  NA  B B B A NA  A B NA  A NA
  (!is.na(x))  (x != N)
  [1]  TRUE  TRUE  TRUE FALSE FALSE FALSE FALSE  TRUE  TRUE FALSE FALSE
 FALSE  TRUE  TRUE FALSE  TRUE  TRUE FALSE
 [19]  TRUE FALSE
  x[(!is.na(x))  (x != N)]
  [1] A A B B B B A A B A
 


 On Mon, Oct 13, 2008 at 7:15 AM, Laura Bonnett
 [EMAIL PROTECTED] wrote:
  Hi All,
 
  I have a data frame which has columns comprised mainly of NAs.  I know
  there are functions na.pass and na.omit etc which can be used in these
  situations however I can't them to work in this case.  I have a function
  which returns the data according to some rule i.e. removal of N in this
  code:
 
  nep - function(data)
 {
 dummy - rep(0,378)
 for(i in 1:378){
 if(is.na(data$with.Wcode)[i])
 data$with.Wcode[i] - O
 }
 for(i in 1:378){
 if(data$with.Wcode[i]==N)
 dummy[i] - i
 }
 return(data[-dummy,])
 }
 
  However, I really don't want to replace the NAs with O.  I'd just like
 to
  gloss over them.  I can't just delete them because the structure of the
 data
  frame needs to be maintained.  Can anyone suggest how I can write in a
 line
  or two to ignore the NAs instead of replacing them?  I've tried this code
  but it doesn't work!
 
  nep - function(data)
 {
 dummy - rep(0,378)
 for(i in 1:378){
 na.pass(data$with.Wcode[i])
 if(data$with.Wcode[i]==N)
 dummy[i] - i
 }
 return(data[-dummy,])
 }
 
 
  Thank you,
 
  Laura
 
 [[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 



 --
 Jim Holtman
 Cincinnati, OH
 +1 513 646 9390

 What is the problem that you are trying to solve?


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Blowing up portions of a graph

2008-10-13 Thread Greg Snow

There is the zoomplot function in the TeachingDemos package that allows you to 
zoom in/out the current plot.  But it is a bit of a kludge.  The better option 
is probably to just set the xlim and ylim arguments in a new plot command.  You 
can use the locator function as one way to find the coordinates to pass to xlim 
and ylim.

For adding the zoomed areas, you can use the layout function to set up the 
device with one big area on top and multiple smaller areas below to place the 
zooms in, or you can use the subplot function from the TeachingDemos package to 
add the zooms to uninteresting/empty areas of the current plot.

Hope this helps,

--
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
[EMAIL PROTECTED]
801.408.8111


 -Original Message-
 From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
 project.org] On Behalf Of rajesh j
 Sent: Monday, October 13, 2008 12:35 AM
 To: r-help@r-project.org
 Subject: [R] Blowing up portions of a graph

 Hi,

 I have a really large graph and would like to zoom in on portions of
 the
 graph and post them as blocks below the graph.Is there an add on
 package to
 do this?
 --
 Rajesh.J

 
 I skate to where the puck is going to be, not where it has been. -
 Wayne
 Gretzky
 -

 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-
 guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Logistic Regression - Interpreting SENS (Sensitivity) and SPEC (Specificity)

2008-10-13 Thread Frank E Harrell Jr


Dieter Menne wrote:

Maithili Shiva maithili_shiva at yahoo.com writes:


I havd main sample of 42500 clentes and
based on their status as regards to defaulted / non - defaulted, I have

genereted the probability of default.

I have a hold out sample of 5000 clients. I have calculated (1) No of

correctly classified goods Gg, (2) No of

correcly classified Bads Bg and also (3) number of wrongly classified bads

(Gb) and (4) number of wrongly

classified goods (Bg).


The simple and wrong answer is to use these data directly to compute sensitivity
(fraction of hits). This measure is useless, but I encounter it often in medical
publications.


Exactly.  Using classification accuracy, sensitivity, specificity means 
that you are not using the model's predicted probabilities in a 
reasonable or powerful way.  Credit scoring models need to demonstrate 
absolute calibration accuracy.


Frank



You can get a more reasonable answer by using cross-validation. Check, for
example, Frank Harrell's 


http://biostat.mc.vanderbilt.edu/twiki/pub/Main/RmS/logistic.val.pdf

Dieter

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




--
Frank E Harrell Jr   Professor and Chair   School of Medicine
 Department of Biostatistics   Vanderbilt University

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Simulations using differential equations

2008-10-13 Thread David Winsemius


and see page 8 of:

http://www.r-project.org/doc/Rnews/Rnews_2003-3.pdf

R as a Simulation Platform in Ecological Modelling
Thomas Petzoldt
Conclusions
The examples described so far, plus the experience
with R as data analysis environment for measure-
ment and simulation data, allows to conclude that
R is not only a great tool for statistical analysis and
data processing but also a general-purpose simula-
tion environment, both in research and especially in
teaching. 

--  
David Winsemius

Heritage Labs

On Oct 13, 2008, at 11:37 AM, megha patnaik wrote:


and http://cran.at.r-project.org/web/packages/deSolve/

2008/10/13 megha patnaik [EMAIL PROTECTED]


See http://cran.r-project.org/web/packages/odesolve/index.html

2008/10/13 [EMAIL PROTECTED]




Dear R-users,

I am trying to perform some simulations from a model defined by  
ordinary
differential equations. I would be grateful if someone could  
indicate some

functions/packages/examples I could look at.

Thank you in advance.
Sebastien

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.






[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Fw: Logistic regresion - Interpreting (SENS) and (SPEC)

2008-10-13 Thread Frank E Harrell Jr


Maithili Shiva wrote:

Dear Mr Peter Dalgaard and Mr Dieter Menne,

I sincerely thank you for helping me out with my problem. The thing is taht I 
already have calculated SENS = Gg / (Gg + Bg) = 89.97%
and SPEC = Bb / (Bb + Gb) = 74.38%.

Now I have values of SENS and SPEC, which are absolute in nature. My question 
was how do I interpret these absolue values. How does these values help me to 
find out wheher my model is good.

With regards

Ms Maithili Shiva


I can't understand why you are interested in probabilities that are in 
backwards time order.


Frank











Subject: [R] Logistic regresion - Interpreting (SENS) and (SPEC)
To: r-help@r-project.org
Date: Friday, October 10, 2008, 5:54 AM
Hi

Hi I am working on credit scoring model using logistic
regression. I havd main sample of 42500 clentes and based on
their status as regards to defaulted / non - defaulted, I
have genereted the probability of default.

I have a hold out sample of 5000 clients. I have calculated
(1) No of correctly classified goods Gg, (2) No of correcly
classified Bads Bg and also (3) number of wrongly classified
bads (Gb) and (4) number of wrongly classified goods (Bg).

My prolem is how to interpret these results? What I have
arrived at are the absolute figures.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained,
reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




--
Frank E Harrell Jr   Professor and Chair   School of Medicine
 Department of Biostatistics   Vanderbilt University

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Using a discrete function in nls()

2008-10-13 Thread Joseph Kambeitz


I am trying to fit a discrete function to my dataset using nls().

fit-nls(T2~form(SOA,t1weight,t2weight,d1weight), 
start=list(t1weight=1,t2weight=1,d1weight=1), data=data1, trace=TRUE)


The problem is that my function (form) includes a discrete function 
and in that function I used the variable SOA to define the discrete 
function (see below).


form-function(SOA,t1weight,t2weight,d1weight){
decay_functionT1_1 - 0
decay_functionT1_2 - rep(t1weight,ttime)
decay_functionT1_3 - t1weight*exp(-x/q)
decay_functionT1_3[decay_functionT1_3threshold]- 0
T1 - c(decay_functionT1_1, decay_functionT1_2, decay_functionT1_3)

decay_functionT2_1 - rep(0,SOA)
decay_functionT2_2 - rep (1,ttime)
decay_functionT2_3 - decay_t2(x1)
decay_functionT2_3[decay_functionT2_3threshold]- 0
T2 - c(decay_functionT2_1, decay_functionT2_2, decay_functionT2_3)

When I call nls() with my function a get an error message:

Error in rep(0, SOA) : invalid 'times' argument

That is propably due to the way nls() calls my function with the 
variable SOA. Can you help me to fix that?


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] linear expenditure model

2008-10-13 Thread Arne Henningsen

On Monday 13 October 2008 09:14:23, Marie Vandresse wrote:
 I have already used  aidsEst() function in the micEcon package for the
 estimation of elasticities.

 But with the LES, I plan to estimate the minimal consumption
 level...which is not possible with the AIDS model, do I?

You are right; it is not possible to estimate the minimum consumption level 
with the Almost Ideal Demand System. Unfortunately, I don't see an easy way 
to estimate the LES in R using system estimation techniques. 
BTW: I am not convinced that the estimated parameters of the LES that are the 
minimum consumption levels in theory are good estimates for the actual 
minimum consumption levels in real life.

Arne

 Arne Henningsen wrote:
  Hi Marie!
 
  On Friday 10 October 2008 12:40:23, Marie Vandresse wrote:
  I would like to estimate a linear expendire with Systemfit package.
  (method: SUR)
 
  If I remember correctly, the linear expenditure system (LES) is linear
  in income but non-linear in the parameters. Hence, you have to estimate a
  system of non-linear equations. Unfortunately, the nlsystemfit() function
  in the systemfit package that estimates systems of non-linear equations
  is still under development and has convergence problems rather often.
  Since the systemfit() function in the systemfit package that estimates
  systems of linear equations is very reliable [1], I suggest that you
  choose a demand system that is linear in parameters (e.g. the Almost
  Ideal Demand System, AIDS)
 
  [1] http://www.jstatsoft.org/v23/i04
 
  As someone could show me how to define the equations?
 
  If you use the aidsEst() function in the micEcon package [2], you don't
  have to specify the equations of the Almost Ideal Demand System yourself.
 
  [2] http://www.micEcon.org
 
  Best wishes,
  Arne


-- 
Arne Henningsen
http://www.arne-henningsen.name

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Gower distance between a individual and a population

2008-10-13 Thread Gavin Simpson

On Mon, 2008-10-13 at 16:28 +0200, [EMAIL PROTECTED] wrote:
  If you used daisy, is there a problem with converting the resulting
  object to a full dissimilarity matrix and extracting the relevant
  row/column you need for the target site?
 
 Well, the lost of efficiantcy is huge. I need to compute the distance 
 several time on data base that count 1000 or even 10 000 subjects. 10 
 000^2 cost a lot in term of time, whereas 10 000 does not.
 
 A solution would be to re-write DAISY and adapt it. But since I do not 
 know fortran, I prefers first to ask if someone already did it...

Sorry, I didn't intend to suggest that what was there was good enough
for your purposes. I appreciate the loss of efficiency, and simply
wondered if it would work for your purposes, given that the solution in
analogue::distance is coded in R.

I am progressing, slowly to convert analogue::distance to C, which
should run faster, but other areas of the package have taken priority
over that just now.

analogue::distance may work for you in your case. I'd be interested in
finding out (off-list) how you get on with it if you do use it.

All the best,

G

 
 Christophe
 
 
 
 
 
 Ce message a ete envoye par IMP, grace a l'Universite Paris 10 Nanterre
 
 


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] subsetting dataframe by rownames to be excluded

2008-10-13 Thread Prof Brian Ripley


On Mon, 13 Oct 2008, Dieter Menne wrote:


Prof Brian Ripley ripley at stats.ox.ac.uk writes:


Yes: DF[is.na(match(row.names(DF), exclude_me)), ]


Assuming everything is possible in R: would it be possible to make the below
work without breaking existing code?


It would be possible, but not I think desirable.

c(exclude) is fine (works now, does nothing useful except strip 
attributes).  But -char vector will give an error: that's not 
necessarily the end, as `[` is a SPECIALSXP and so is passed unevaluated 
arguments.  However, its first step is method dispatch and that evaluates 
all the arguments, so a substantial internal rewrite would be needed.


It would be fairly easy to make

subset(a, subset=-exclude)

work, and select=-col_name already works.  I think though that messing 
with `[` would be too dangerous, and would also lead to expectations that 
all its methods should accept this notation (and hence many would need to 
be re-written, including [.data.frame as used here).  And then people 
would expect this to work on RHS, so [- would need to be re-written 





a - data.frame(x=1:10)
rownames(a) = letters[1:10]
exclude = c(a,c)
a[is.na(match(row.names(a), exclude)), ] # not really that easy to remember

a[-c(1,3),]
# In analogy
a[-c(exclude),]   #invalid argument to unary operator


Dieter

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Avoid overlap of labels in a scatterplot

2008-10-13 Thread Nicola Sturaro Sommacal

Felix Andrews ha scritto:
 thigmophobe.labels in the plotrix package tries to avoid label crashes, and
Thank you, but I choosed

 There is also pointLabel() in the maptools package.
This work fine. Thank you very much.



Nicola

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Fw: Logistic regresion - Interpreting (SENS) and (SPEC)

2008-10-13 Thread John Sorkin

Jumping into a thread can be like jumping into a den of lions but here goes . . 
.
Sensitivity and specificity are not designed to determine the quality of a fit 
(i.e. if your model is good), but rather are characteristics of a test. A test 
that has high sensitivity will properly identify a large portion of people with 
a disease (or a characteristic) of interest. A test with high specificity will 
properly identify large proportion of people without a disease (or 
characteristic) of interest. Sensitivity and specificity inform the end user 
about the quality of a test. Other metrics have been designed to determine 
the quality of the fit, none that I know of are completely satisfactory. The 
pseudo R squared is one such measure. 

For a given diagnostic test (or classification scheme), different cut-off 
points for identifying subject who have disease can be examined to see how they 
influence sensitivity and 1-specificity using ROC curves.  

I await the flames that will surely come my way

John




John David Sorkin M.D., Ph.D.
Chief, Biostatistics and Informatics
University of Maryland School of Medicine Division of Gerontology
Baltimore VA Medical Center
10 North Greene Street
GRECC (BT/18/GR)
Baltimore, MD 21201-1524
(Phone) 410-605-7119
(Fax) 410-605-7913 (Please call phone number above prior to faxing)

 Frank E Harrell Jr [EMAIL PROTECTED] 10/13/2008 12:27 PM 
Maithili Shiva wrote:
 Dear Mr Peter Dalgaard and Mr Dieter Menne,
 
 I sincerely thank you for helping me out with my problem. The thing is taht I 
 already have calculated SENS = Gg / (Gg + Bg) = 89.97%
 and SPEC = Bb / (Bb + Gb) = 74.38%.
 
 Now I have values of SENS and SPEC, which are absolute in nature. My question 
 was how do I interpret these absolue values. How does these values help me to 
 find out wheher my model is good.
 
 With regards
 
 Ms Maithili Shiva

I can't understand why you are interested in probabilities that are in 
backwards time order.

Frank

 
 
 
 
 
 
 
 
 Subject: [R] Logistic regresion - Interpreting (SENS) and (SPEC)
 To: r-help@r-project.org 
 Date: Friday, October 10, 2008, 5:54 AM
 Hi

 Hi I am working on credit scoring model using logistic
 regression. I havd main sample of 42500 clentes and based on
 their status as regards to defaulted / non - defaulted, I
 have genereted the probability of default.

 I have a hold out sample of 5000 clients. I have calculated
 (1) No of correctly classified goods Gg, (2) No of correcly
 classified Bads Bg and also (3) number of wrongly classified
 bads (Gb) and (4) number of wrongly classified goods (Bg).

 My prolem is how to interpret these results? What I have
 arrived at are the absolute figures.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help 
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html 
 and provide commented, minimal, self-contained,
 reproducible code.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help 
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html 
 and provide commented, minimal, self-contained, reproducible code.
 


-- 
Frank E Harrell Jr   Professor and Chair   School of Medicine
  Department of Biostatistics   Vanderbilt University

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help 
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html 
and provide commented, minimal, self-contained, reproducible code.

Confidentiality Statement:
This email message, including any attachments, is for th...{{dropped:6}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] how to evaluate a cubic Bezier curve (B-spline?) given the four control points

2008-10-13 Thread Greg Snow

You could look at the xspline function.  It approximates b-splines or Bezier 
curves given control points and shape parameters.  It can either plot the 
spline or return a bunch of point on the curve for comparison to other values.

Hope this helps,

--
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
[EMAIL PROTECTED]
801.408.8111


 -Original Message-
 From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
 project.org] On Behalf Of Zack Weinberg
 Sent: Friday, October 10, 2008 4:11 PM
 To: r-help@r-project.org
 Subject: [R] how to evaluate a cubic Bezier curve (B-spline?) given the
 four control points

 I'm trying to use R to determine the quality of a cubic Bezier curve
 approximation of an elliptical arc.  I know the four control points
 and I want to compute (x,y) coordinates of many points on the curve.
 I can't find anything in either the base distribution or CRAN that
 does this; all the spline-related packages seem to be about *fitting*
 piecewise Bezier curves to a data set.  Presumably, internally they
 have the capability I need, but it doesn't seem to be exposed in a
 straightforward way.  Help?

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-
 guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] split data, but ensure each level of the factor is represented

2008-10-13 Thread Jay

Hello,

I'll use part of the iris dataset for an example of what I want to
do.

 data(iris)
 iris-iris[1:10,1:4]
 iris
   Sepal.Length Sepal.Width Petal.Length Petal.Width
1   5.1 3.5  1.4 0.2
2   4.9 3.0  1.4 0.2
3   4.7 3.2  1.3 0.2
4   4.6 3.1  1.5 0.2
5   5.0 3.6  1.4 0.2
6   5.4 3.9  1.7 0.4
7   4.6 3.4  1.4 0.3
8   5.0 3.4  1.5 0.2
9   4.4 2.9  1.4 0.2
10  4.9 3.1  1.5 0.1

Now if I want to split this data using the vector
 a-c(3, 3, 3, 2, 3, 1, 2, 3, 2, 3)
 a
 [1] 3 3 3 2 3 1 2 3 2 3

Then the function split works fine
 split(iris,a)
$`1`
  Sepal.Length Sepal.Width Petal.Length Petal.Width
6  5.4 3.9  1.7 0.4

$`2`
  Sepal.Length Sepal.Width Petal.Length Petal.Width
4  4.6 3.1  1.5 0.2
7  4.6 3.4  1.4 0.3
9  4.4 2.9  1.4 0.2

$`3`
   Sepal.Length Sepal.Width Petal.Length Petal.Width
1   5.1 3.5  1.4 0.2
2   4.9 3.0  1.4 0.2
3   4.7 3.2  1.3 0.2
5   5.0 3.6  1.4 0.2
8   5.0 3.4  1.5 0.2
10  4.9 3.1  1.5 0.1


My problem is when the vector lacks one of the values from 1:n. For
example if the vector is
 a-c(3, 3, 3, 2, 3, 2, 2, 3, 2, 3)
 a
 [1] 3 3 3 2 3 2 2 3 2 3

then split will return a list without a $`1`. I would like to have the
$`1` be a vector of 0's with the same length as the number of columns
in the dataset. In other words I want to write a function that returns

 mysplit(iris,a)
$`1`
[1] 0 0 0 0 0

$`2`
  Sepal.Length Sepal.Width Petal.Length Petal.Width
4  4.6 3.1  1.5 0.2
6  5.4 3.9  1.7 0.4
7  4.6 3.4  1.4 0.3
9  4.4 2.9  1.4 0.2

$`3`
   Sepal.Length Sepal.Width Petal.Length Petal.Width
1   5.1 3.5  1.4 0.2
2   4.9 3.0  1.4 0.2
3   4.7 3.2  1.3 0.2
5   5.0 3.6  1.4 0.2
8   5.0 3.4  1.5 0.2
10  4.9 3.1  1.5 0.1

Thank you for your time,

Jay

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] split data, but ensure each level of the factor is represented

2008-10-13 Thread Henrique Dallazuanna

Try this:

a-factor(c(3, 3, 3, 2, 3, 2, 2, 3, 2, 3), levels = 1:3)
split(iris, a)

lapply(split(iris, a), dim)


On Mon, Oct 13, 2008 at 2:06 PM, Jay [EMAIL PROTECTED] wrote:

 Hello,

 I'll use part of the iris dataset for an example of what I want to
 do.

  data(iris)
  iris-iris[1:10,1:4]
  iris
   Sepal.Length Sepal.Width Petal.Length Petal.Width
 1   5.1 3.5  1.4 0.2
 2   4.9 3.0  1.4 0.2
 3   4.7 3.2  1.3 0.2
 4   4.6 3.1  1.5 0.2
 5   5.0 3.6  1.4 0.2
 6   5.4 3.9  1.7 0.4
 7   4.6 3.4  1.4 0.3
 8   5.0 3.4  1.5 0.2
 9   4.4 2.9  1.4 0.2
 10  4.9 3.1  1.5 0.1

 Now if I want to split this data using the vector
  a-c(3, 3, 3, 2, 3, 1, 2, 3, 2, 3)
  a
  [1] 3 3 3 2 3 1 2 3 2 3

 Then the function split works fine
  split(iris,a)
 $`1`
  Sepal.Length Sepal.Width Petal.Length Petal.Width
 6  5.4 3.9  1.7 0.4

 $`2`
  Sepal.Length Sepal.Width Petal.Length Petal.Width
 4  4.6 3.1  1.5 0.2
 7  4.6 3.4  1.4 0.3
 9  4.4 2.9  1.4 0.2

 $`3`
   Sepal.Length Sepal.Width Petal.Length Petal.Width
 1   5.1 3.5  1.4 0.2
 2   4.9 3.0  1.4 0.2
 3   4.7 3.2  1.3 0.2
 5   5.0 3.6  1.4 0.2
 8   5.0 3.4  1.5 0.2
 10  4.9 3.1  1.5 0.1


 My problem is when the vector lacks one of the values from 1:n. For
 example if the vector is
  a-c(3, 3, 3, 2, 3, 2, 2, 3, 2, 3)
  a
  [1] 3 3 3 2 3 2 2 3 2 3

 then split will return a list without a $`1`. I would like to have the
 $`1` be a vector of 0's with the same length as the number of columns
 in the dataset. In other words I want to write a function that returns

  mysplit(iris,a)
 $`1`
 [1] 0 0 0 0 0

 $`2`
  Sepal.Length Sepal.Width Petal.Length Petal.Width
 4  4.6 3.1  1.5 0.2
 6  5.4 3.9  1.7 0.4
 7  4.6 3.4  1.4 0.3
 9  4.4 2.9  1.4 0.2

 $`3`
   Sepal.Length Sepal.Width Petal.Length Petal.Width
 1   5.1 3.5  1.4 0.2
 2   4.9 3.0  1.4 0.2
 3   4.7 3.2  1.3 0.2
 5   5.0 3.6  1.4 0.2
 8   5.0 3.4  1.5 0.2
 10  4.9 3.1  1.5 0.1

 Thank you for your time,

 Jay

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Henrique Dallazuanna
Curitiba-Paraná-Brasil
25° 25' 40 S 49° 16' 22 O

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Creating GUIs for R

2008-10-13 Thread Deepayan Sarkar

On 10/13/08, Michael Lawrence [EMAIL PROTECTED] wrote:


 On Sun, Oct 12, 2008 at 4:50 PM, Dirk Eddelbuettel [EMAIL PROTECTED] wrote:
 
 
  On 12 October 2008 at 12:53, cls59 wrote:
  | On a related note... does anyone know good resources for binding a C++
  | program to the R library?
 
  RCpp, at http://rcpp.r-forge.r-project.org, formerly
 known as RCppTemplate,
  is pretty mature and testing having been around since 2004 or 2005.
  Introductory documentation could be better, feedback welcome.
 
 
  | Basically, I would like to start with just a plain vanilla R session
 running
  | inside a Qt widget. Any suggestions?
 

 Isn't RKWard a Qt-based GUI for R? They probably have some reusable console
 code in there.

Yes. It seems somewhat intergrated with KDE, so not easily ported.

  Deepayan once did just that in a test application. I am not sure if that
 was
  ever made public.

There's a webpage at

http://dsarkar.fhcrc.org/R/R-Qt.html

See the last section. It's not very active, but should be an adequate
proof of concept. This takes the approach of embedding R and creating
a GUI using the GUI callbacks described in R-exts; this works in Linux
and Mac, but not in Windows, because these callbacks are not supported
by R on Windows.

-Deepayan

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] split data, but ensure each level of the factor is represented

2008-10-13 Thread Jay

Thanks so much.

On Oct 13, 1:14 pm, Henrique Dallazuanna [EMAIL PROTECTED] wrote:
 Try this:

 a-factor(c(3, 3, 3, 2, 3, 2, 2, 3, 2, 3), levels = 1:3)
 split(iris, a)

 lapply(split(iris, a), dim)



 On Mon, Oct 13, 2008 at 2:06 PM, Jay [EMAIL PROTECTED] wrote:
  Hello,

  I'll use part of the iris dataset for an example of what I want to
  do.

   data(iris)
   iris-iris[1:10,1:4]
   iris
Sepal.Length Sepal.Width Petal.Length Petal.Width
  1   5.1 3.5  1.4 0.2
  2   4.9 3.0  1.4 0.2
  3   4.7 3.2  1.3 0.2
  4   4.6 3.1  1.5 0.2
  5   5.0 3.6  1.4 0.2
  6   5.4 3.9  1.7 0.4
  7   4.6 3.4  1.4 0.3
  8   5.0 3.4  1.5 0.2
  9   4.4 2.9  1.4 0.2
  10  4.9 3.1  1.5 0.1

  Now if I want to split this data using the vector
   a-c(3, 3, 3, 2, 3, 1, 2, 3, 2, 3)
   a
   [1] 3 3 3 2 3 1 2 3 2 3

  Then the function split works fine
   split(iris,a)
  $`1`
   Sepal.Length Sepal.Width Petal.Length Petal.Width
  6  5.4 3.9  1.7 0.4

  $`2`
   Sepal.Length Sepal.Width Petal.Length Petal.Width
  4  4.6 3.1  1.5 0.2
  7  4.6 3.4  1.4 0.3
  9  4.4 2.9  1.4 0.2

  $`3`
Sepal.Length Sepal.Width Petal.Length Petal.Width
  1   5.1 3.5  1.4 0.2
  2   4.9 3.0  1.4 0.2
  3   4.7 3.2  1.3 0.2
  5   5.0 3.6  1.4 0.2
  8   5.0 3.4  1.5 0.2
  10  4.9 3.1  1.5 0.1

  My problem is when the vector lacks one of the values from 1:n. For
  example if the vector is
   a-c(3, 3, 3, 2, 3, 2, 2, 3, 2, 3)
   a
   [1] 3 3 3 2 3 2 2 3 2 3

  then split will return a list without a $`1`. I would like to have the
  $`1` be a vector of 0's with the same length as the number of columns
  in the dataset. In other words I want to write a function that returns

   mysplit(iris,a)
  $`1`
  [1] 0 0 0 0 0

  $`2`
   Sepal.Length Sepal.Width Petal.Length Petal.Width
  4  4.6 3.1  1.5 0.2
  6  5.4 3.9  1.7 0.4
  7  4.6 3.4  1.4 0.3
  9  4.4 2.9  1.4 0.2

  $`3`
Sepal.Length Sepal.Width Petal.Length Petal.Width
  1   5.1 3.5  1.4 0.2
  2   4.9 3.0  1.4 0.2
  3   4.7 3.2  1.3 0.2
  5   5.0 3.6  1.4 0.2
  8   5.0 3.4  1.5 0.2
  10  4.9 3.1  1.5 0.1

  Thank you for your time,

  Jay

  __
  [EMAIL PROTECTED] mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.

 --
 Henrique Dallazuanna
 Curitiba-Paraná-Brasil
 25° 25' 40 S 49° 16' 22 O

 [[alternative HTML version deleted]]

 __
 [EMAIL PROTECTED] mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] split data, but ensure each level of the factor is represented

2008-10-13 Thread Gabor Grothendieck

Try this:

split(iris, factor(a, levels = 1:3))

On Mon, Oct 13, 2008 at 1:06 PM, Jay [EMAIL PROTECTED] wrote:
 Hello,

 I'll use part of the iris dataset for an example of what I want to
 do.

 data(iris)
 iris-iris[1:10,1:4]
 iris
   Sepal.Length Sepal.Width Petal.Length Petal.Width
 1   5.1 3.5  1.4 0.2
 2   4.9 3.0  1.4 0.2
 3   4.7 3.2  1.3 0.2
 4   4.6 3.1  1.5 0.2
 5   5.0 3.6  1.4 0.2
 6   5.4 3.9  1.7 0.4
 7   4.6 3.4  1.4 0.3
 8   5.0 3.4  1.5 0.2
 9   4.4 2.9  1.4 0.2
 10  4.9 3.1  1.5 0.1

 Now if I want to split this data using the vector
 a-c(3, 3, 3, 2, 3, 1, 2, 3, 2, 3)
 a
  [1] 3 3 3 2 3 1 2 3 2 3

 Then the function split works fine
 split(iris,a)
 $`1`
  Sepal.Length Sepal.Width Petal.Length Petal.Width
 6  5.4 3.9  1.7 0.4

 $`2`
  Sepal.Length Sepal.Width Petal.Length Petal.Width
 4  4.6 3.1  1.5 0.2
 7  4.6 3.4  1.4 0.3
 9  4.4 2.9  1.4 0.2

 $`3`
   Sepal.Length Sepal.Width Petal.Length Petal.Width
 1   5.1 3.5  1.4 0.2
 2   4.9 3.0  1.4 0.2
 3   4.7 3.2  1.3 0.2
 5   5.0 3.6  1.4 0.2
 8   5.0 3.4  1.5 0.2
 10  4.9 3.1  1.5 0.1


 My problem is when the vector lacks one of the values from 1:n. For
 example if the vector is
 a-c(3, 3, 3, 2, 3, 2, 2, 3, 2, 3)
 a
  [1] 3 3 3 2 3 2 2 3 2 3

 then split will return a list without a $`1`. I would like to have the
 $`1` be a vector of 0's with the same length as the number of columns
 in the dataset. In other words I want to write a function that returns

 mysplit(iris,a)
 $`1`
 [1] 0 0 0 0 0

 $`2`
  Sepal.Length Sepal.Width Petal.Length Petal.Width
 4  4.6 3.1  1.5 0.2
 6  5.4 3.9  1.7 0.4
 7  4.6 3.4  1.4 0.3
 9  4.4 2.9  1.4 0.2

 $`3`
   Sepal.Length Sepal.Width Petal.Length Petal.Width
 1   5.1 3.5  1.4 0.2
 2   4.9 3.0  1.4 0.2
 3   4.7 3.2  1.3 0.2
 5   5.0 3.6  1.4 0.2
 8   5.0 3.4  1.5 0.2
 10  4.9 3.1  1.5 0.1

 Thank you for your time,

 Jay

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] gamm() and predict()

2008-10-13 Thread Wakefield, Ewan

Dear All,
I have a query relating to the use of the ‘predict’ and ‘gamm’ functions.  I am 
dealing with large (approx. 5000) sets of presence/absence data, which I am 
trying to model as a function of different of environmental covariates.  
Ideally my models should include individual and colony as random factors. I 
have been trying to fit binomial models using the gamm function to achieve 
this. For the sake of simplicity I have adapted a some of the example code from 
?gamm to illustrate some problems I have been having predicting values using 
this approach.
### Begin example ###
library(mgcv)
## Generate some example data
set.seed(0)
n - 400
sig - 2
x0 - runif(n, 0, 1)
x1 - runif(n, 0, 1)
x2 - runif(n, 0, 1)
x3 - runif(n, 0, 1)
f - 2 * sin(pi * x0)
f - f + exp(2 * x1) - 3.75887
f - f+0.2*x2^11*(10*(1-x2))^6+10*(10*x2)^3*(1-x2)^10-1.396
e - rnorm(n, 0, sig)
y - f + e
## Change the response to binary
y -round(y/max(y))
## Add a factor to the linear predictor, to be modelled as random
fac - rep(1:4,n/4)
f - f + fac*3
fac-as.factor(fac)
## Fit an additive model
mod-gamm(y~s(x0)+s(x1)+s(x2)+s(x3),family=binomial,random=list(fac=~1))
## Generate some new example data
new.dat = data.frame(x0 = runif(n, 0, 1), x1 = runif(n, 0, 1), x2 = runif(n, 0, 
1),x3 = runif(n, 0, 1), fac = fac)
## Predict response values using the original data and the gam part of the 
model…
predict(mod$gam, type = response)
## Predict response values using the new data and the gam part of the model…
predict(mod$gam, type = response, new.dat)
## Predict response values using the original data and the glmm part of the 
model…
predict(mod$lme, level = 0, type = response)
## Predict response values using the new data and the gam part of the model…
predict(mod$lme, level = 0, type = response, new.dat)
# This produces the error message ‘Error in eval(expr, envir, enclos) : object 
fixed not found’
## End example ###
My questions are as follows:
1. I presume predict(mod.$gam) produces population level predictions. Is this 
correct?
2. Is it possible to extract standard errors using predict(mod.$gam) or is 
there a more suitable approach to estimating confidence in prections made with 
gamms?
3. It seems that predict(mod.$lme) results in predictions at the level of 
random factors. Furthermore, these appear to be on the scale of the linear 
predictor regardless of how level is specified (see ?glmmPQL). Is this correct?
4. The code predict(mod$lme, new.dat) produces an error message, seemingly 
indicating the the fixed effects are missing from my new data frame (see 
example). Am I doing something wrong here?
5. Is it possible to predict both population and random factor level 
predictions using new data with gamm objects?
I have read all the relevant help files including those associated with glmmPQL 
and also Simon Woods book and I am still a bit confused so any help would be 
gratefully received. I am using R 2.6.1 with Windows XP pro, mgcv version 
1.3-29.
Thanks,
Ewan Wakefield
British Antarctic Survey
High Cross
Madingley Road
Cambridge
UK

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] graphs in R

2008-10-13 Thread guria


How Graphs in R with leveling of point can be done?
Please help.
-- 
View this message in context: 
http://www.nabble.com/graphs-in-R-tp19955281p19955281.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Misura precauzionale: Cambia il tuo codice di accesso!

2008-10-13 Thread [EMAIL PROTECTED]


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Problem installing tseries under FC7 x86_64

2008-10-13 Thread Simone Russo


Dear Michal,
I had the same problem of you in installing quadprog packages ?
Did you resolve it?
Can you help me?
the error is:
* Installing *source* package 'quadprog' ...
** libs
gfortran   -fpic  -g -O2 -c aind.f -o aind.o
In order to use gfortran please type either:
source /usr/local/free/gfortran.csh
. /usr/local/free/gfortran.sh
make: *** [aind.o] Error 1
ERROR: compilation failed for package 'quadprog'
** Removing '/usr/people/russo/R/i686-pc-linux-gnu-library/2.5/quadprog'
Warning message:
installation of package 'quadprog_1.4-11.tar.gz' had non-zero exit 
status in: install.packages(quadprog_1.4-11.tar.gz, repos = NULL, type 
= /usr/local/free/gfortran.csh)

Thank you in advances!
Simone

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Running R at a specific time - alternative to Sys.sleep() ?

2008-10-13 Thread Tony Breyal

Dear R-Help,

Is it possible to set R up to run a particular script at specific
times of the day? trivial example: If the time is now 8:59:55am and I
wish to run a function at 9am, I do the following:


my.function - function(x) {
p1 - proc.time()
Sys.sleep(x)
print('Hello R-Help!')
proc.time() - p1
}
my.function (5)

[1] Hello R-Help!
   user  system elapsed
  0   0   5


What I would rather do is just put in the time at which I wish R to
execute at.

Hope that made sense, and thanks for any help in advance!
Tony Breyal

### Windows Vista
 sessionInfo()
R version 2.7.2 (2008-08-25)
i386-pc-mingw32

locale:
LC_COLLATE=English_United Kingdom.1252;LC_CTYPE=English_United Kingdom.
1252;LC_MONETARY=English_United Kingdom.
1252;LC_NUMERIC=C;LC_TIME=English_United Kingdom.1252

attached base packages:
[1] stats graphics  grDevices utils datasets  methods
base

loaded via a namespace (and not attached):
[1] RCurl_0.9-4

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Subset based on items in a list.

2008-10-13 Thread Kyle.

R-help:

I have a variable (ID_list) containing about 1800 unique numbers, and a
143066x29 data frame.  One of the columns (ID) in my data frame contains a
list of ids, many of which appear more than once.  I'd like to find the
subset of my data frame for which ID matches one of the numbers in
ID_list. I'm pretty sure I could write a function to do this--something
like:

dataSubset-function(df, id_list){
tmp = data.frame()
for(i in id_list){
for(j in 1:dim(df)[1]){
if(i==df$ID[j]){
tmp-data.frame(df[j,])
}
}
}
tmp
}

but this seems inefficient. As I understand it, the subset function won't
really solve my problem, but it seems like there must be something out there
that will that I must be forgetting. Does anyone know of a way to solve this
problem in an efficient way?  Thanks!


Kyle H. Ambert
Graduate Student, Department of Medical Informatics  Clinical Epidemiology
Oregon Health  Science University
[EMAIL PROTECTED]

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] [R-pkgs] New package: StatMatch 0.4

2008-10-13 Thread Marcello D'Orazio

Dear useRs,

I am pleased to announce the availability of the new package 'StatMatch'
(version 0.4)
http://cran.at.r-project.org/web/packages/StatMatch/index.html


'StatMatch' contains some functions to perform Statistical Matching.
Statistical Matching methods aim at integrate two samples, referred to
the same target population, sharing a certain number of common variables
but without overlapping of the units.
Note that some functions in 'StatMatch' can also be used to impute
missing values in a data set.

Best Regards,
Marcello D'Orazio

-- 

 Marcello D'Orazio

 ISTAT   (Italian National Statistical Institute)   
 Via Cesare Balbo, 16 (1° piano, stanza 153)
 00184 ROMA  ITALY
 Tel.: +39 06 4673 2772
 Fax:  +39 06 4673 2955



 Legal Disclaimer:
 Any views expressed by the sender of this message are not necessarily
 those of the Italian National Statistical Institute.

___
R-packages mailing list
[EMAIL PROTECTED]
https://stat.ethz.ch/mailman/listinfo/r-packages

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] optim and nlm error to estimate a matrix

2008-10-13 Thread Ji Young Kim

Dear R users,

I'm trying to estimate the matrix of regression parameters.
I need to do it numerically, so I used optim and nls.
I got the initial parameter estimates from least squares, and input them into 
those functions.

But when I run the optim function, it stops in 30 seconds and shows 
'convergence=1'. 
And if I use the nlm function, then it runs for a while, and finally stops with 
the code=4.

Both of these error codes mean  iteration limit exceeded.
Since the maxit for optim is 500 for Nelder-Mead by default, I increased the 
maxit to 1000, but it still gives me the same error code.

Can anyone tell me how I can fix the problem?

I defined an objective function in the following way :

## Objective function to be minimized
obj=function(beta.v){
obj1=rep(0,n) ; beta.m=matrix(beta.v,p,sdf)
for (i in 1:n){
yi=Y[i,];xi=X[i,]

obj1[i]=rho((1/sigma)*sqrt((yi-xi%*%beta.m)%*%solve(t(H)%*%H)%*%t(yi-xi%*%beta.m)))
}
sum(obj1)
}

I tried to find a minimizer from functions below :
result1=optim(c(ini.beta),obj)
result2=nlm(obj,c(ini.beta))
where ini.beta is the initial parameters obtained from least squares estimation.

The most weird thing to me is that result1 gives exactly the same values as the 
initial values, while result2 gives a little different values from the initial 
values, and it gives a smaller value of obj, which means nlm moved the initial 
value a little toward the true minimizer.

At first, I thought I put too good initial values, and therefore the algorithm 
didn't need to move too much, but even if I just put a matrix of 1's, it still 
stops with the same error codes. 

*** This is my first time to post a question. I apologize if I didn't explain 
enough. I would be very happy to listen to anyone's suggestions.
Thanks for your time!!


Ji Young

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] error in plots of gam (package:gam)

2008-10-13 Thread Chris . Taylor

(Sorry for the duplicate posting, first posting did not contain text in the 
body of the message.)

Greetings,
I am attempting to plot the model fits of a generalized additive model using 
the gam package (gam version 1.0, R v. 2.6.2).  The gam object is created 
without any apparent problem, but when I try to plot (plot(gam_object)), I 
repeatedly receive the following error: Error in dim(data) - dim : attempt to 
set an attribute on NULL.  Diagnostics are pasted below.  Cheers,  Chris Taylor

Traceback()
14: array(np, 1)
13: predict.gam(object, type = lpmatrix, ...)
12: model.matrix.gam(object)
11: model.matrix(object)
10: predict.lm(object, newdata, se.fit, scale = residual.scale, type = 
ifelse(type ==
link, response, type), terms = terms, na.action = na.action)
9: predict.glm(object, type = terms, terms = terms, se.fit = TRUE)
8: NextMethod(predict)
7: switch(type, response = {
   out - predict.gam(object, type = link, se.fit = TRUE,
   ...)
   famob - family(object)
   out$se.fit - drop(out$se.fit * abs(famob$mu.eta(out$fit)))
   out$fit - fitted(object)
   out
   }, link = {
   out - NextMethod(predict)
   out$fit - object$additive.predictors
   TS - out$residual.scale^2
   TT - ncol(object$var)
   out$se.fit - sqrt(out$se.fit^2 + TS * object$var %*% rep(1,
   TT))
   out
   }, terms = {
   out - NextMethod(predict)
   TT - dimnames(s - object$smooth)[[2]]
   out$fit[, TT] - out$fit[, TT] + s
   TS - out$residual.scale^2
   out$se.fit[, TT] - sqrt(out$se.fit[, TT]^2 + TS * object$var)
   out
   })
6: predict.gam(object, type = terms, terms = terms, se.fit = TRUE)
5: predict(object, type = terms, terms = terms, se.fit = TRUE)
4: preplot.gam(x, terms = terms)
3: plot.gam(Cr4Rot1_100m.gam, se = T, residuals = T, main = Trend analysis on 
ABC, 100m resolution: Cr4Rot1)
2: plot(Cr4Rot1_100m.gam, se = T, residuals = T, main = Trend analysis on ABC, 
100m resolution: Cr4Rot1)
1: plot(Cr4Rot1_100m.gam, se = T, residuals = T, main = Trend analysis on ABC, 
100m resolution: Cr4Rot1)

 sessionInfo()
R version 2.6.2 (2008-02-08)
i386-pc-mingw32

locale:
LC_COLLATE=English_United States.1252;LC_CTYPE=English_United 
States.1252;LC_MONETARY=English_United 
States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252

attached base packages:
[1] splines   stats graphics  grDevices utils datasets  methods   base  
  

other attached packages:
[1] gstat_0.9-47 rgdal_0.5-25 sp_0.9-25gam_1.0  akima_0.5-1

loaded via a namespace (and not attached):
[1] grid_2.6.2  lattice_0.17-10 mgcv_1.4-1  tools_2.6.2 

-- 
J. Christopher Taylor, Ph.D.
Applied Ecology and Restoration Research
National Ocean Service / NOAA
National Centers for Coastal Ocean Science
Center for Coastal Fisheries and Habitat Research
101 Pivers Island Road, Beaufort, North Carolina 28516-9722
Ph:(252) 838 0833 Fx:(252) 728 8784
Website: http://www.ccfhr.noaa.gov/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Running R at a specific time - alternative to Sys.sleep() ?

2008-10-13 Thread jim holtman

Use the task scheduler in Windows and have a batch files executed.

On Mon, Oct 13, 2008 at 11:44 AM, Tony Breyal
[EMAIL PROTECTED] wrote:
 Dear R-Help,

 Is it possible to set R up to run a particular script at specific
 times of the day? trivial example: If the time is now 8:59:55am and I
 wish to run a function at 9am, I do the following:


 my.function - function(x) {
p1 - proc.time()
Sys.sleep(x)
print('Hello R-Help!')
proc.time() - p1
 }
 my.function (5)

 [1] Hello R-Help!
   user  system elapsed
  0   0   5


 What I would rather do is just put in the time at which I wish R to
 execute at.

 Hope that made sense, and thanks for any help in advance!
 Tony Breyal

 ### Windows Vista
 sessionInfo()
 R version 2.7.2 (2008-08-25)
 i386-pc-mingw32

 locale:
 LC_COLLATE=English_United Kingdom.1252;LC_CTYPE=English_United Kingdom.
 1252;LC_MONETARY=English_United Kingdom.
 1252;LC_NUMERIC=C;LC_TIME=English_United Kingdom.1252

 attached base packages:
 [1] stats graphics  grDevices utils datasets  methods
 base

 loaded via a namespace (and not attached):
 [1] RCurl_0.9-4

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] graphs in R

2008-10-13 Thread stephen sefick

Do you have an example.  I am not sure what you mean.

On Mon, Oct 13, 2008 at 9:48 AM, guria [EMAIL PROTECTED] wrote:

 How Graphs in R with leveling of point can be done?
 Please help.
 --
 View this message in context: 
 http://www.nabble.com/graphs-in-R-tp19955281p19955281.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Stephen Sefick
Research Scientist
Southeastern Natural Sciences Academy

Let's not spend our time and resources thinking about things that are
so little or so large that all they really do for us is puff us up and
make us feel like gods.  We are mammals, and have not exhausted the
annoying little problems of being mammals.

-K. Mullis

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Subset based on items in a list.

2008-10-13 Thread Erik Iverson

I don't know if I understand (small example with R command wouuld help), 
but, assuming your data.frame is called 'df'


subset(df, ID %in% ID_list)

Question, is ID_list a list or a vector, and are they really numbers 
or factors?


Kyle. wrote:

R-help:

I have a variable (ID_list) containing about 1800 unique numbers, and a
143066x29 data frame.  One of the columns (ID) in my data frame contains a
list of ids, many of which appear more than once.  I'd like to find the
subset of my data frame for which ID matches one of the numbers in
ID_list. I'm pretty sure I could write a function to do this--something
like:

dataSubset-function(df, id_list){
tmp = data.frame()
for(i in id_list){
for(j in 1:dim(df)[1]){
if(i==df$ID[j]){
tmp-data.frame(df[j,])
}
}
}
tmp
}

but this seems inefficient. As I understand it, the subset function won't
really solve my problem, but it seems like there must be something out there
that will that I must be forgetting. Does anyone know of a way to solve this
problem in an efficient way?  Thanks!


Kyle H. Ambert
Graduate Student, Department of Medical Informatics  Clinical Epidemiology
Oregon Health  Science University
[EMAIL PROTECTED]

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Fw: Logistic regresion - Interpreting (SENS) and (SPEC)

2008-10-13 Thread Frank E Harrell Jr


John Sorkin wrote:

Jumping into a thread can be like jumping into a den of lions but here goes . . 
.
Sensitivity and specificity are not designed to determine the quality of a fit (i.e. if your model is good), but rather are characteristics of a test. A test that has high sensitivity will properly identify a large portion of people with a disease (or a characteristic) of interest. A test with high specificity will properly identify large proportion of people without a disease (or characteristic) of interest. Sensitivity and specificity inform the end user about the quality of a test. Other metrics have been designed to determine the quality of the fit, none that I know of are completely satisfactory. The pseudo R squared is one such measure. 

For a given diagnostic test (or classification scheme), different cut-off points for identifying subject who have disease can be examined to see how they influence sensitivity and 1-specificity using ROC curves.  


I await the flames that will surely come my way

John


John this has been much debated but I fail to see how backwards 
probabilities are that helpful in judging the usefulness of a test.  Why 
not condition on what we know (the test result and other baseline 
variables) and quit conditioning on what we are trying to find out 
(disease status)?  The data collected in most studies (other than 
case-control) allow one to use logistic modeling with the correct time 
order.


Furthermore, sensitivity and specificity are not constants but vary with 
subjects' characteristics.  So they are not even useful as simplifying 
concepts.


Frank





John David Sorkin M.D., Ph.D.
Chief, Biostatistics and Informatics
University of Maryland School of Medicine Division of Gerontology
Baltimore VA Medical Center
10 North Greene Street
GRECC (BT/18/GR)
Baltimore, MD 21201-1524
(Phone) 410-605-7119
(Fax) 410-605-7913 (Please call phone number above prior to faxing)


Frank E Harrell Jr [EMAIL PROTECTED] 10/13/2008 12:27 PM 

Maithili Shiva wrote:

Dear Mr Peter Dalgaard and Mr Dieter Menne,

I sincerely thank you for helping me out with my problem. The thing is taht I 
already have calculated SENS = Gg / (Gg + Bg) = 89.97%
and SPEC = Bb / (Bb + Gb) = 74.38%.

Now I have values of SENS and SPEC, which are absolute in nature. My question 
was how do I interpret these absolue values. How does these values help me to 
find out wheher my model is good.

With regards

Ms Maithili Shiva


I can't understand why you are interested in probabilities that are in 
backwards time order.


Frank










Subject: [R] Logistic regresion - Interpreting (SENS) and (SPEC)
To: r-help@r-project.org 
Date: Friday, October 10, 2008, 5:54 AM

Hi

Hi I am working on credit scoring model using logistic
regression. I havd main sample of 42500 clentes and based on
their status as regards to defaulted / non - defaulted, I
have genereted the probability of default.

I have a hold out sample of 5000 clients. I have calculated
(1) No of correctly classified goods Gg, (2) No of correcly
classified Bads Bg and also (3) number of wrongly classified
bads (Gb) and (4) number of wrongly classified goods (Bg).

My prolem is how to interpret these results? What I have
arrived at are the absolute figures.




--
Frank E Harrell Jr   Professor and Chair   School of Medicine
 Department of Biostatistics   Vanderbilt University

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Fw: Logistic regresion - Interpreting (SENS) and (SPEC)

2008-10-13 Thread John Sorkin

Frank,
Perhaps I was not clear in my previous Email message. Sensitivity and 
specificity do tell us about the quality of a test in that given two tests the 
one with higher sensitivity will be better at identifying subjects who have a 
disease in a pool who have a disease, and the more sensitive test will be 
better at identifying subjects who do not have a disease in a pool of people 
who do not have a disease. It is true that positive predictive and negative 
predictive values are of greater utility to a clinician, but as you know these 
two measures are functions of sensitivity, specificity and disease prevalence. 
All other things being equal, given two tests one would select the one with 
greater sensitivity and specificity so in a sense they do measure the quality 
of a clinical test - but not, as I tried to explain the quality of a 
statistical model. 

You are of course correct that sensitivity and specificity are not truly 
inherent characteristics of a test as their values may change from 
population-to-population, but paretically speaking, they don't change all that 
much, certainly not as much as positive and negative predictive values.   

I guess we will disagree about the utility of sensitivity and specificity as 
simplifying concepts.

Thank you as always for your clear thoughts and stimulating comments.
John




among those subjects with a disease and the one with greater specificity will 
be better at indentifying  

John David Sorkin M.D., Ph.D.
Chief, Biostatistics and Informatics
University of Maryland School of Medicine Division of Gerontology
Baltimore VA Medical Center
10 North Greene Street
GRECC (BT/18/GR)
Baltimore, MD 21201-1524
(Phone) 410-605-7119
(Fax) 410-605-7913 (Please call phone number above prior to faxing)

 Frank E Harrell Jr [EMAIL PROTECTED] 10/13/2008 2:35 PM 
John Sorkin wrote:
 Jumping into a thread can be like jumping into a den of lions but here goes . 
 . .
 Sensitivity and specificity are not designed to determine the quality of a 
 fit (i.e. if your model is good), but rather are characteristics of a test. A 
 test that has high sensitivity will properly identify a large portion of 
 people with a disease (or a characteristic) of interest. A test with high 
 specificity will properly identify large proportion of people without a 
 disease (or characteristic) of interest. Sensitivity and specificity inform 
 the end user about the quality of a test. Other metrics have been designed 
 to determine the quality of the fit, none that I know of are completely 
 satisfactory. The pseudo R squared is one such measure. 
 
 For a given diagnostic test (or classification scheme), different cut-off 
 points for identifying subject who have disease can be examined to see how 
 they influence sensitivity and 1-specificity using ROC curves.  
 
 I await the flames that will surely come my way
 
 John

John this has been much debated but I fail to see how backwards 
probabilities are that helpful in judging the usefulness of a test.  Why 
not condition on what we know (the test result and other baseline 
variables) and quit conditioning on what we are trying to find out 
(disease status)?  The data collected in most studies (other than 
case-control) allow one to use logistic modeling with the correct time 
order.

Furthermore, sensitivity and specificity are not constants but vary with 
subjects' characteristics.  So they are not even useful as simplifying 
concepts.

Frank
 
 
 
 
 John David Sorkin M.D., Ph.D.
 Chief, Biostatistics and Informatics
 University of Maryland School of Medicine Division of Gerontology
 Baltimore VA Medical Center
 10 North Greene Street
 GRECC (BT/18/GR)
 Baltimore, MD 21201-1524
 (Phone) 410-605-7119
 (Fax) 410-605-7913 (Please call phone number above prior to faxing)
 
 Frank E Harrell Jr [EMAIL PROTECTED] 10/13/2008 12:27 PM 
 Maithili Shiva wrote:
 Dear Mr Peter Dalgaard and Mr Dieter Menne,

 I sincerely thank you for helping me out with my problem. The thing is taht 
 I already have calculated SENS = Gg / (Gg + Bg) = 89.97%
 and SPEC = Bb / (Bb + Gb) = 74.38%.

 Now I have values of SENS and SPEC, which are absolute in nature. My 
 question was how do I interpret these absolue values. How does these values 
 help me to find out wheher my model is good.

 With regards

 Ms Maithili Shiva
 
 I can't understand why you are interested in probabilities that are in 
 backwards time order.
 
 Frank
 
 






 Subject: [R] Logistic regresion - Interpreting (SENS) and (SPEC)
 To: r-help@r-project.org 
 Date: Friday, October 10, 2008, 5:54 AM
 Hi

 Hi I am working on credit scoring model using logistic
 regression. I havd main sample of 42500 clentes and based on
 their status as regards to defaulted / non - defaulted, I
 have genereted the probability of default.

 I have a hold out sample of 5000 clients. I have calculated
 (1) No of correctly classified goods Gg, (2) No

Re: [R] subsetting dataframe by rownames to be excluded

2008-10-13 Thread David Winsemius



On Oct 13, 2008, at 5:36 AM, Dieter Menne wrote:


Prof Brian Ripley ripley at stats.ox.ac.uk writes:


Yes: DF[is.na(match(row.names(DF), exclude_me)), ]


Assuming everything is possible in R: would it be possible to make  
the below

work without breaking existing code?


a - data.frame(x=1:10)
rownames(a) = letters[1:10]
exclude = c(a,c)
a[is.na(match(row.names(a), exclude)), ] # not really that easy to  
remember


a[-c(1,3),]
# In analogy
a[-c(exclude),]   #invalid argument to unary operator


Given the negative to your question, I wonder if you would find, as I  
hope works for me, that it will be easier to remember this  
(equivalent) form?


 a[ ! row.names(a) %in% exclude, ]
[1]  2  4  5  6  7  8  9 10

... equivalent because, per the help page, %in% is defined by  
function(x,table) {match( x, table , nomatch=0)  0}  and the nomatch  
argument converts the NA's properly from a logical perspective. The  
help page defines a %w/o% function in just such a manner.


--
David Winsemius
Heritage Labs

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] ggplot faceting like lattice | variable

2008-10-13 Thread stephen sefick

I would like to be able to do the xyplot in ggplot below.  I read in
the archive that Hadley was working on this for the next release, and
I can not find the documentation (Aug. 23rd).

River.Mile - c(215 ,202, 198, 190, 185, 179, 148, 119, 61)
Cu - rnorm(9)
Fe - rnorm(9)
Mg - rnorm(9)
Ti - rnorm(9)
Ir - rnorm(9)
r - data.frame(River.Mile, Cu, Fe, Mg, Ti, Ir)

z - melt.data.frame(r, id.var=River.Mile)

#this is what ggplot does
qplot(River.Mile, value, facets=(variable~.), data=z)

#this is what I would like to do
xyplot(value~River.Mile | variable, data=z)

Thanks

-- 
Stephen Sefick
Research Scientist
Southeastern Natural Sciences Academy

Let's not spend our time and resources thinking about things that are
so little or so large that all they really do for us is puff us up and
make us feel like gods.  We are mammals, and have not exhausted the
annoying little problems of being mammals.

-K. Mullis

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Fw: Logistic regresion - Interpreting (SENS) and (SPEC)

2008-10-13 Thread Frank E Harrell Jr


John Sorkin wrote:

Frank,
Perhaps I was not clear in my previous Email message. Sensitivity and specificity do tell us about the quality of a test in that given two tests the one with higher sensitivity will be better at identifying subjects who have a disease in a pool who have a disease, and the more sensitive test will be better at identifying subjects who do not have a disease in a pool of people who do not have a disease. It is true that positive predictive and negative predictive values are of greater utility to a clinician, but as you know these two measures are functions of sensitivity, specificity and disease prevalence. All other things being equal, given two tests one would select the one with greater sensitivity and specificity so in a sense they do measure the quality of a clinical test - but not, as I tried to explain the quality of a statistical model. 


That is not very relevant John.  It is a function of all those things 
because those quantities are all deficient.


I would select the test that can move the pre-test probability a great 
deal in one or both directions.




You are of course correct that sensitivity and specificity are not truly inherent characteristics of a test as their values may change from population-to-population, but paretically speaking, they don't change all that much, certainly not as much as positive and negative predictive values.   


They change quite a bit, and mathematically must change if the disease 
is not all-or-nothing.







I guess we will disagree about the utility of sensitivity and specificity as 
simplifying concepts.

Thank you as always for your clear thoughts and stimulating comments.


And thanks for yours John.
Frank


John




among those subjects with a disease and the one with greater specificity will be better at indentifying  


John David Sorkin M.D., Ph.D.
Chief, Biostatistics and Informatics
University of Maryland School of Medicine Division of Gerontology
Baltimore VA Medical Center
10 North Greene Street
GRECC (BT/18/GR)
Baltimore, MD 21201-1524
(Phone) 410-605-7119
(Fax) 410-605-7913 (Please call phone number above prior to faxing)


Frank E Harrell Jr [EMAIL PROTECTED] 10/13/2008 2:35 PM 

John Sorkin wrote:

Jumping into a thread can be like jumping into a den of lions but here goes . . 
.
Sensitivity and specificity are not designed to determine the quality of a fit (i.e. if your model is good), but rather are characteristics of a test. A test that has high sensitivity will properly identify a large portion of people with a disease (or a characteristic) of interest. A test with high specificity will properly identify large proportion of people without a disease (or characteristic) of interest. Sensitivity and specificity inform the end user about the quality of a test. Other metrics have been designed to determine the quality of the fit, none that I know of are completely satisfactory. The pseudo R squared is one such measure. 

For a given diagnostic test (or classification scheme), different cut-off points for identifying subject who have disease can be examined to see how they influence sensitivity and 1-specificity using ROC curves.  


I await the flames that will surely come my way

John


John this has been much debated but I fail to see how backwards 
probabilities are that helpful in judging the usefulness of a test.  Why 
not condition on what we know (the test result and other baseline 
variables) and quit conditioning on what we are trying to find out 
(disease status)?  The data collected in most studies (other than 
case-control) allow one to use logistic modeling with the correct time 
order.


Furthermore, sensitivity and specificity are not constants but vary with 
subjects' characteristics.  So they are not even useful as simplifying 
concepts.


Frank




John David Sorkin M.D., Ph.D.
Chief, Biostatistics and Informatics
University of Maryland School of Medicine Division of Gerontology
Baltimore VA Medical Center
10 North Greene Street
GRECC (BT/18/GR)
Baltimore, MD 21201-1524
(Phone) 410-605-7119
(Fax) 410-605-7913 (Please call phone number above prior to faxing)


Frank E Harrell Jr [EMAIL PROTECTED] 10/13/2008 12:27 PM 

Maithili Shiva wrote:

Dear Mr Peter Dalgaard and Mr Dieter Menne,

I sincerely thank you for helping me out with my problem. The thing is taht I 
already have calculated SENS = Gg / (Gg + Bg) = 89.97%
and SPEC = Bb / (Bb + Gb) = 74.38%.

Now I have values of SENS and SPEC, which are absolute in nature. My question 
was how do I interpret these absolue values. How does these values help me to 
find out wheher my model is good.

With regards

Ms Maithili Shiva
I can't understand why you are interested in probabilities that are in 
backwards time order.


Frank










Subject: [R] Logistic regresion - Interpreting (SENS) and (SPEC)
To: r-help@r-project.org 
Date: Friday, October 10, 2008, 5:54 AM

Hi

Hi I am

[R] MiKTEX and texi2dvi

2008-10-13 Thread Felipe Carrillo

Liviu: Thanks for the links, I'll check them out. On a different note, have you 
used MiKTEX at all? I have downloaded it but I don't know how to make it work. 
Sweave and Stangle seem to work fine but when I use texi2dvi it crashes.
 library(tools)
  Sweave(C:/Program Files/R/R-2.7.2/bin/foo.Rnw)
Writing to file foo.tex
Processing code chunks ...
 1 : echo term verbatim (label=two)
 2 : echo term verbatim (label=reg)
 3 : echo term verbatim (label=fig1plot)
 4 : term verbatim eps pdf (label=fig1)
 5 : term verbatim eps pdf (label=fig2)
 6 : term hide (label=foo)
 7 : term hide (label=foo2)
 8 : echo term verbatim (label=blurfle)
 9 : term tex (label=tab1)

You can now run LaTeX on 'foo.tex'
Stangle(C:/Program Files/R/R-2.7.2/bin/foo.Rnw)
Writing to file foo.R 
texi2dvi(foo.tex,pdf=TRUE)

C:/Program Files/R/R-2.7.2/bin/foo.tex:11: Missing 
C:/Program Files/R/R-2.7.2/bin/foo.tex:11: 
C:/Program Files/R/R-2.7.2/bin/foo.tex:11: Extra 
C:/Program Files/R/R-2.7.2/bin/foo.tex:11: Missing 
C:/Program Files/R/R-2.7.2/bin/foo.tex:11: Extra 
C:/Program Files/R/R-2.7.2/bin/foo.tex:11: Missing 
C:/Program Files/R/R-2.7.2/bin/foo.tex:11: Extra 
C:/Program Files/R/R-2.7.2/bin/foo.tex:11: Missing 
C:/Program Files/R/R-2.7.2/bin/foo.tex:11: Missing 
C:/Program Files/R/R-2.7.2/bin/foo.tex:11: Extra 
C:/Program Files/R/R-2.7.2/bin/foo.tex:11: Missing 
C:/Program Files/R/R-2.7.2/bin/foo.tex:11: Extra 
Error in texi2dvi(foo.tex, pdf = TRUE) : 
  running 'texi2dvi' on 'foo.tex' failed

Any ideas why texi2dvi is crashing? foo.tex exist on same directory as 
foo.Rnw but it says that is missing.


Felipe D. Carrillo  
Supervisory Fishery Biologist  
Department of the Interior  
US Fish  Wildlife Service  
California, USA


--- On Sun, 10/12/08, Liviu Andronic [EMAIL PROTECTED] wrote:

 From: Liviu Andronic [EMAIL PROTECTED]
 Subject: Re: [R] Sweave-LaTEX question
 To: [EMAIL PROTECTED]
 Cc: [EMAIL PROTECTED]
 Date: Sunday, October 12, 2008, 11:47 PM
 On Sun, Oct 12, 2008 at 1:39 AM, Felipe Carrillo
 [EMAIL PROTECTED] wrote:
  I am working on a publication and I have heard about
 LaTEX but I haven't actually tried to learn about it
 until today. I've found a few
 
 There are two more packages that might be of interest:
 RReportGenerator [1] and relax [2].
 Liviu
 
 [1]
 http://alnitak.u-strasbg.fr/~wraff/RReportGenerator/index.php
 [2] http://cran.r-project.org/web/packages/relax/index.html

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Overdispersion in the lmer models

2008-10-13 Thread David Winsemius


Dear Eva;

I shouldn't have sent my unhelpful reply to the entire list, since it  
is now glaringly obvious that I did not carefully read your original  
question. You are outside my experience, since I have not used lme4,   
but I wonder if questions about over-dispersion shouldn't be handled  
by examining grouped residuals? According to the documentation,  mer  
models have a resid method, although the help page it links to appears  
to be under construction.


--
David Winsemius
Heritage Laboratories




On Oct 13, 2008, at 4:46 AM, Fucikova, Eva wrote:


Dear David,

Thank you for such a fast answer. Unortunatelly, your suggestion does
not work for lmer for some reason. I can probably try to run the model
without random effect to find out the overdispersion in the glm.

Anyway, thank you very much.

Yours sincerely,
Eva



-Original Message-
From: David Winsemius [mailto:[EMAIL PROTECTED]
Sent: maandag 13 oktober 2008 3:42
To: Fucikova, Eva
Cc: r-help@r-project.org
Subject: Re: [R] Overdispersion in the lmer models

Have you considered  using glm() with  family = quasipoisson or  
family

= quasibinomial ? I know from experience that the quasipoisson choice
reports an index of dispersion.

?family

--
David Winsemius

On Oct 12, 2008, at 4:55 AM, Fucikova, Eva wrote:


Dear All,

I am working with linear mixed-effects models using the lme4 package
in R. I created a model using the lmer function including some main
effects, a three-way interaction and a random effect.
Because I work with a binomial and poisson distribution, I want to
know whether there is overdispersion in my data or not. Does anybody
know how I can retrieve this information from R?

Thank you in advance,
Eva Fucikova

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Subset based on items in a list.

2008-10-13 Thread Kyle Ambert


R-help:

I have a variable (ID_list) containing about 1800 unique numbers, and a
143066x29 data frame.  One of the columns (ID) in my data frame contains a
list of ids, many of which appear more than once.  I'd like to find the
subset of my data frame for which ID matches one of the numbers in
ID_list. I'm pretty sure I could write a function to do this--something
like:

dataSubset-function(df, id_list){
tmp = data.frame()
for(i in id_list){
for(j in 1:dim(df)[1]){
if(i==df$ID[j]){
tmp-data.frame(df[j,])
}
}
}
tmp
}

but this seems inefficient. As I understand it, the subset function won't
really solve my problem, but it seems like there must be something out there
that will that I must be forgetting. Does anyone know of a way to solve this
problem in an efficient way?  Thanks!


Kyle H. Ambert
Graduate Student, Department of Medical Informatics  Clinical Epidemiology
Oregon Health  Science University
[EMAIL PROTECTED]

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Sweave from Kile

2008-10-13 Thread Gorjanc Gregor

Hi Matthieu,

 Does anybody have experience with Sweave run from Kile? I'm trying to
 make it run but have problems and don't know if the instructions are
 false or I do something wrong (my knowledge in bash and shell is too low
 to understand it)...
...

It would help if you stated that you use mine Sweave.sh i.e. the one from
http://cran.r-project.org/contrib/extra/scripts/Sweave.sh. I will assume you
do.

I will start with the second problem

 2: If I run kile with sudo (sudo Kile), the problem disappears but a new
one comes
 SweaveOnly output:
 * cd '/media/Partition_Commune/Mes documents/Ordi/LaTex/Sweave'
 * Sweave.sh −ld '\example1Leisch.Rnw'
 *
 Run Sweave and postprocess with LaTeX directly from command line
 −ld is not a supported file type!
 It should be one of: .lyx, .Rnw, .Snw., .nw or .tex
 Is the instructions false? Or do I do something wrong?

Is there a single - or double - i.e. --. If I issue the following

$ Sweave.sh --ld test.Rnw

Run Sweave and postprocess with LaTeX directly from command line

--ld is not a supported file type!
It should be one of: .lyx, .Rnw, .Snw., .nw or .tex

I get the same error.

 1: finished with exit status 126
 SweaveOnly output:
 * cd '/media/Partition_Commune/Mes documents/Ordi/LaTex/Sweave'
 * Sweave.sh −ld '\example1Leisch.Rnw'
 *
 /bin/bash: /usr/local/bin/Sweave.sh: Permission non accordée
 in english: permission not given

It seems that chmod did not behave as you expected. First check file
permissions with

ls -l /usr/local/bin/Sweave.sh

On my computer I get

-rwxr-xr-x 1 root root 30K 2008-04-30 11:17 /usr/local/bin/Sweave.sh*

Note that x is there three times i.e. anyone can run this script, the user, the
group and others.

Try with

sudo chmod a+x /usr/local/bin/Sweave.sh

and check the file permissions.

gg
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Add notes to sink output

2008-10-13 Thread Michael Just

Hello,
How can I add notes (i.e. text) to a sink output?

sink(test.txt)
#This text will describe the test
summary(x)
sink()

How can I add that text above to the sink output?

Thanks,
Michael

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Add notes to sink output

2008-10-13 Thread Rolf Turner



On 14/10/2008, at 9:02 AM, Michael Just wrote:


Hello,
How can I add notes (i.e. text) to a sink output?

sink(test.txt)
#This text will describe the test
summary(x)
sink()

How can I add that text above to the sink output?


?cat

##
Attention:\ This e-mail message is privileged and confid...{{dropped:9}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Add notes to sink output

2008-10-13 Thread Ted Harding

On 13-Oct-08 20:02:20, Michael Just wrote:
 Hello,
 How can I add notes (i.e. text) to a sink output?
 
 sink(test.txt)
#This text will describe the test
 summary(x)
 sink()
 
 How can I add that text above to the sink output?
 Thanks,
 Michael

Anything on the lines of:

  sink(test.txt)
  cat(This text will describe the test\n)
  cat(\n)
  summary(x)
  sink()




E-Mail: (Ted Harding) [EMAIL PROTECTED]
Fax-to-email: +44 (0)870 094 0861
Date: 13-Oct-08   Time: 21:12:54
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Add notes to sink output

2008-10-13 Thread Jorge Ivan Velez

Dear Michael,
You can use cat() as follows:

sink(test.txt)
cat('Here goes your text','\n','\n')  # \n writes white spaces
summary(x)
sink()

See ?cat for more information.

HTH,

Jorge


On Mon, Oct 13, 2008 at 4:02 PM, Michael Just [EMAIL PROTECTED] wrote:

 Hello,
 How can I add notes (i.e. text) to a sink output?

 sink(test.txt)
 #This text will describe the test
 summary(x)
 sink()

 How can I add that text above to the sink output?

 Thanks,
 Michael

[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Add notes to sink output

2008-10-13 Thread Michael Just

Thanks for the swift response.

cat it is.

Cheers,
Michael

On Mon, Oct 13, 2008 at 3:14 PM, Jorge Ivan Velez
[EMAIL PROTECTED]wrote:


 Dear Michael,
 You can use cat() as follows:

 sink(test.txt)
 cat('Here goes your text','\n','\n')  # \n writes white spaces
 summary(x)
 sink()

 See ?cat for more information.

 HTH,

 Jorge


 On Mon, Oct 13, 2008 at 4:02 PM, Michael Just [EMAIL PROTECTED] wrote:

 Hello,
 How can I add notes (i.e. text) to a sink output?

 sink(test.txt)
 #This text will describe the test
 summary(x)
 sink()

 How can I add that text above to the sink output?

 Thanks,
 Michael

[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Stepwise lrm()

2008-10-13 Thread useR

Hello,


I have the data set of 1 + 49 variables. One of them is binary other
are continous.

I would like to be able to fit the model with all 49 variables and
then run stepwise model selction.


I'd appriciate some code snippets...

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Fw: Logistic regresion - Interpreting (SENS) and (SPEC)

2008-10-13 Thread Robert W. Baer, Ph.D.

- Original Message - 
From: Frank E Harrell Jr [EMAIL PROTECTED]

To: John Sorkin [EMAIL PROTECTED]
Cc: r-help@r-project.org; [EMAIL PROTECTED]; 
[EMAIL PROTECTED]

Sent: Monday, October 13, 2008 2:09 PM
Subject: Re: [R] Fw: Logistic regresion - Interpreting (SENS) and (SPEC)

John Sorkin wrote:

Frank,
Perhaps I was not clear in my previous Email message. Sensitivity and 
specificity do tell us about the quality of a test in that given two 
tests the one with higher sensitivity will be better at identifying 
subjects who have a disease in a pool who have a disease, and the more 
sensitive test will be better at identifying subjects who do not have a 
disease in a pool of people who do not have a disease. It is true that 
positive predictive and negative predictive values are of greater utility 
to a clinician, but as you know these two measures are functions of 
sensitivity, specificity and disease prevalence. All other things being 
equal, given two tests one would select the one with greater sensitivity 
and specificity so in a sense they do measure the quality of a clinical 
test - but not, as I tried to explain the quality of a statistical model.

That is not very relevant John.  It is a function of all those things 
because those quantities are all deficient.

I would select the test that can move the pre-test probability a great 
deal in one or both directions.

Of course, this quantity is known as a likelihood ratio and is a function of 
sensitivity and specificity.  For 2 x 2 data one often speaks of postive 
likelihood ratio and negative likelihood ratio, but for multi-row 
contingency table one can define likelihood ratios for a series of cut-off 
points.  This has become a popular approach in evidence-based medicine when 
diagnostic tests have continuous rather than binary outputs.

You are of course correct that sensitivity and specificity are not truly 
inherent characteristics of a test as their values may change from 
population-to-population, but paretically speaking, they don't change all 
that much, certainly not as much as positive and negative predictive 
values.

They change quite a bit, and mathematically must change if the disease is 
not all-or-nothing.

I guess we will disagree about the utility of sensitivity and specificity 
as simplifying concepts.

Thank you as always for your clear thoughts and stimulating comments.

And thanks for yours John.
Frank

John

among those subjects with a disease and the one with greater specificity 
will be better at indentifying  John David Sorkin M.D., Ph.D.

Chief, Biostatistics and Informatics
University of Maryland School of Medicine Division of Gerontology
Baltimore VA Medical Center
10 North Greene Street
GRECC (BT/18/GR)
Baltimore, MD 21201-1524
(Phone) 410-605-7119
(Fax) 410-605-7913 (Please call phone number above prior to faxing)

Frank E Harrell Jr [EMAIL PROTECTED] 10/13/2008 2:35 PM 

John Sorkin wrote:
Jumping into a thread can be like jumping into a den of lions but here 
goes . . .
Sensitivity and specificity are not designed to determine the quality of 
a fit (i.e. if your model is good), but rather are characteristics of a 
test. A test that has high sensitivity will properly identify a large 
portion of people with a disease (or a characteristic) of interest. A 
test with high specificity will properly identify large proportion of 
people without a disease (or characteristic) of interest. Sensitivity 
and specificity inform the end user about the quality of a test. Other 
metrics have been designed to determine the quality of the fit, none 
that I know of are completely satisfactory. The pseudo R squared is one 
such measure.
For a given diagnostic test (or classification scheme), different 
cut-off points for identifying subject who have disease can be examined 
to see how they influence sensitivity and 1-specificity using ROC 
curves.

I await the flames that will surely come my way

John

John this has been much debated but I fail to see how backwards 
probabilities are that helpful in judging the usefulness of a test.  Why 
not condition on what we know (the test result and other baseline 
variables) and quit conditioning on what we are trying to find out 
(disease status)?  The data collected in most studies (other than 
case-control) allow one to use logistic modeling with the correct time 
order.

Furthermore, sensitivity and specificity are not constants but vary with 
subjects' characteristics.  So they are not even useful as simplifying 
concepts.

Frank

John David Sorkin M.D., Ph.D.
Chief, Biostatistics and Informatics
University of Maryland School of Medicine Division of Gerontology
Baltimore VA Medical Center
10 North Greene Street
GRECC (BT/18/GR)
Baltimore, MD 21201-1524
(Phone) 410-605-7119
(Fax) 410-605-7913 (Please call phone number above prior to faxing)

Frank E Harrell Jr [EMAIL PROTECTED] 10/13/2008 12:27 PM 

Maithili Shiva wrote:

Dear Mr Peter Dalgaard and Mr

[R] LM intercept

2008-10-13 Thread Michael Just

What is the difference when including or not including the intercept when
using lm()?

x.noint - lm(weight ~ group - 1))# omitting intercept
x - lm(weight ~ group))

This has nothing to do with forcing the intercept to 0, correct?

Thank you kindly,
Michael

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] LM intercept

2008-10-13 Thread Rolf Turner



On 14/10/2008, at 9:42 AM, Michael Just wrote:

What is the difference when including or not including the  
intercept when

using lm()?

x.noint - lm(weight ~ group - 1))# omitting intercept
x - lm(weight ~ group))

This has nothing to do with forcing the intercept to 0, correct?


On the contrary.  This is *exactly* what it means.

cheers,

Rolf Turner

##
Attention:\ This e-mail message is privileged and confid...{{dropped:9}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] LM intercept

2008-10-13 Thread Michael Just

Great,
Thanks,
Michael

On Mon, Oct 13, 2008 at 3:56 PM, Rolf Turner [EMAIL PROTECTED]wrote:


 On 14/10/2008, at 9:42 AM, Michael Just wrote:

  What is the difference when including or not including the intercept when
 using lm()?

 x.noint - lm(weight ~ group - 1))# omitting intercept
 x - lm(weight ~ group))

 This has nothing to do with forcing the intercept to 0, correct?


 On the contrary.  This is *exactly* what it means.

cheers,

Rolf Turner

 ##
 Attention:This e-mail message is privileged and confidential. If you are
 not theintended recipient please delete the message and notify the
 sender.Any views or opinions presented are solely those of the author.

 This e-mail has been scanned and cleared by MailMarshal
 www.marshalsoftware.com
 ##


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Using an image background with graphics

2008-10-13 Thread Waichler, Scott R


I would like to use a map or aerial photo as a background to plotting
solid lines and text, and semi-transparent color contours, in base and
lattice graphics.  Plot coordinates need to be consistent with the
georeferenced background.  For example, a color contour plot would have
an gray-toned aerial photograph as a background for overprinted
semi-transparent color contours of some spatially dependent variable.
Can anyone point me in the right direction on how to do this?

Thanks,
Scott Waichler
Pacific Northwest National Laboratory
[EMAIL PROTECTED]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Variable shortlisting for the logistic regression

2008-10-13 Thread useR

Hi R helpers,

One rather statistical question?


What would be the best startegy to shortlist thousands of continous
variables automaticaly using R
as the preparation for logistic regression modleing!


Thanks

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Using an image background with graphics

2008-10-13 Thread Dylan Beaudette

On Monday 13 October 2008, Waichler, Scott R wrote:
 I would like to use a map or aerial photo as a background to plotting
 solid lines and text, and semi-transparent color contours, in base and
 lattice graphics.  Plot coordinates need to be consistent with the
 georeferenced background.  For example, a color contour plot would have
 an gray-toned aerial photograph as a background for overprinted
 semi-transparent color contours of some spatially dependent variable.
 Can anyone point me in the right direction on how to do this?

 Thanks,
 Scott Waichler
 Pacific Northwest National Laboratory
 [EMAIL PROTECTED]


See spplot() and associated examples of how to use 'sp' class objects.

Here is one worked exampled with sp objects:
http://casoilresource.lawr.ucdavis.edu/drupal/node/442



-- 
Dylan Beaudette
Soil Resource Laboratory
http://casoilresource.lawr.ucdavis.edu/
University of California at Davis
530.754.7341

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] rotating points on a plot

2008-10-13 Thread Richard E. Glor

Anybody know how to rotate shapes generated with the 'points'  
function?  I'm trying to place points around a radial diagram such  
that the y-axis of individual shapes are oriented with the radii of  
the circle rather than the y-axis of the larger plot area.  Perhaps  
something analogous to the 'srt' and 'crt' functions for text that  
appear under 'par'?


Thanks,
Rich

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Add notes to sink output

2008-10-13 Thread Greg Snow

An alternative is to use txtStart (and other functions mentioned in the same 
help page) from the TeachingDemos package. This does the sinking, but can also 
include the commands as well as allow you to insert comments.  The etxt 
variants allow you to postprocess the whole transcript into a postscript (then 
pdf) file including selected graphics.

--
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
[EMAIL PROTECTED]
801.408.8111


 -Original Message-
 From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
 project.org] On Behalf Of Michael Just
 Sent: Monday, October 13, 2008 2:02 PM
 To: r-help
 Subject: [R] Add notes to sink output

 Hello,
 How can I add notes (i.e. text) to a sink output?

 sink(test.txt)
 #This text will describe the test
 summary(x)
 sink()

 How can I add that text above to the sink output?

 Thanks,
 Michael

 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-
 guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] stl outlier help request

2008-10-13 Thread rkevinburton

Currently I find that if I call stl() repeatedly I can use the weights array 
that is part of the stil output to detect outliers. I also find that if I 
repeatedly call stl() (replacing the outliers after each call) that the 
remainder portion of the stil output gets reduced. I am calling it like:

for(.index in 1:4)
{
  st - stl(mt, s.window=frequency(mt), robust=TRUE)
  outliers - which(st $ weights   1e-8)
  if(length(outliers)  0)
  {
# Replace the outliers with the season + trend
   mt[outliers] - st$time.series[,seasonal][outliers] + 
st$time.series[,trend][outliers]
  }
}

My question is, is there a better way?. One improvement would be to use the 
square of the remainder as a stopping criteria rather than a hard-coded loop. 
Not being familiar with the arguments to stl (inner, outer, etc.) and their 
bearing on the wieghts I don't know if there is a better way by simply 
specifying these arguments. So far increasing these arguments above the default 
values does not seem to reduce the remainder or weights array. I realize that I 
could look at the source but before I do I would like to request some comments 
from those who have used this function probably more than I.

Thank you.

Kevin

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Stepwise lrm()

2008-10-13 Thread Greg Snow

You should note that the author of the lrm function (at least the one in the 
Design package, I don't know of others) is also one of the most vocal opponents 
of stepwise regression methods.  Using stepwise with lrm() is kind of like 
borrowing someone's down with violence sign to hit them over the head with.  
You might want to look at the lasso2 package or get a copy of Frank's book for 
much better strategies.

--
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
[EMAIL PROTECTED]
801.408.8111


 -Original Message-
 From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
 project.org] On Behalf Of useR
 Sent: Monday, October 13, 2008 2:35 PM
 To: r-help@r-project.org
 Subject: [R] Stepwise lrm()

 Hello,


 I have the data set of 1 + 49 variables. One of them is binary other
 are continous.

 I would like to be able to fit the model with all 49 variables and
 then run stepwise model selction.


 I'd appriciate some code snippets...

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-
 guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] MiKTEX-texi2dvi

2008-10-13 Thread Felipe Carrillo

Sorry, I forgot to include a reproducible example on my last e-mail but here it 
is: Since the file is large to be included here:
The path to the foo.Rnw examples is:
www.stat.umn.edu/~charlie/Sweave/foo.Rnw 
and is suppossed to produce a pdf like this one:
http://www.stat.umn.edu/~charlie/Sweave/foo.pdf

I have downloaded MiKTEX but I don't know how to make it work. Sweave and 
Stangle seem to work fine but when I use texi2dvi it crashes.
library(tools)
  Sweave(C:/Program Files/R/R-2.7.2/bin/foo.Rnw)
Writing to file foo.tex
Processing code chunks ...
1 : echo term verbatim (label=two)
2 : echo term verbatim (label=reg)
3 : echo term verbatim (label=fig1plot)
4 : term verbatim eps pdf (label=fig1)
5 : term verbatim eps pdf (label=fig2)
6 : term hide (label=foo)
7 : term hide (label=foo2)
8 : echo term verbatim (label=blurfle)
9 : term tex (label=tab1)

You can now run LaTeX on 'foo.tex'
Stangle(C:/Program Files/R/R-2.7.2/bin/foo.Rnw)
Writing to file foo.R 
texi2dvi(foo.tex,pdf=TRUE)

C:/Program Files/R/R-2.7.2/bin/foo.tex:11: Missing 
C:/Program Files/R/R-2.7.2/bin/foo.tex:11: 
C:/Program Files/R/R-2.7.2/bin/foo.tex:11: Extra 
C:/Program Files/R/R-2.7.2/bin/foo.tex:11: Missing 
C:/Program Files/R/R-2.7.2/bin/foo.tex:11: Extra 
C:/Program Files/R/R-2.7.2/bin/foo.tex:11: Missing 
C:/Program Files/R/R-2.7.2/bin/foo.tex:11: Extra 
C:/Program Files/R/R-2.7.2/bin/foo.tex:11: Missing 
C:/Program Files/R/R-2.7.2/bin/foo.tex:11: Missing 
C:/Program Files/R/R-2.7.2/bin/foo.tex:11: Extra 
C:/Program Files/R/R-2.7.2/bin/foo.tex:11: Missing 
C:/Program Files/R/R-2.7.2/bin/foo.tex:11: Extra 
Error in texi2dvi(foo.tex, pdf = TRUE) : 
  running 'texi2dvi' on 'foo.tex' failed

Any ideas why texi2dvi is crashing? foo.tex exist on same directory as 
foo.Rnw but it says that is missing.Thanks





Felipe D. Carrillo  
Supervisory Fishery Biologist  
Department of the Interior  
US Fish  Wildlife Service  
California, USA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Variable shortlisting for the logistic regression

2008-10-13 Thread Stephan Kolassa


Hi Marko,

this may be helpful:
http://www.ingentaconnect.com/content/bpl/rssb/2008/0070/0001/art5;jsessionid=an2la3spa0n5h.alexandra?format=print

Happy modeling!
Stephan


useR schrieb:

Hi R helpers,

One rather statistical question?


What would be the best startegy to shortlist thousands of continous
variables automaticaly using R
as the preparation for logistic regression modleing!


Thanks

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] MiKTEX-texi2dvi

2008-10-13 Thread Gabor Grothendieck

One thing to try is to download Sweave.bat from
http://batchfiles.googlecode.com
and place it in the same directory as the Rnw file (or anywhere on your
path) and then from Windows console:

Sweave foo.Rnw

If MiKTeX is in a standard location Sweave.bat will find it and it
will locate R itself from the registry.

On Mon, Oct 13, 2008 at 5:42 PM, Felipe Carrillo
[EMAIL PROTECTED] wrote:
 Sorry, I forgot to include a reproducible example on my last e-mail but here 
 it is: Since the file is large to be included here:
 The path to the foo.Rnw examples is:
 www.stat.umn.edu/~charlie/Sweave/foo.Rnw
 and is suppossed to produce a pdf like this one:
 http://www.stat.umn.edu/~charlie/Sweave/foo.pdf

 I have downloaded MiKTEX but I don't know how to make it work. Sweave and 
 Stangle seem to work fine but when I use texi2dvi it crashes.
 library(tools)
  Sweave(C:/Program Files/R/R-2.7.2/bin/foo.Rnw)
 Writing to file foo.tex
 Processing code chunks ...
 1 : echo term verbatim (label=two)
 2 : echo term verbatim (label=reg)
 3 : echo term verbatim (label=fig1plot)
 4 : term verbatim eps pdf (label=fig1)
 5 : term verbatim eps pdf (label=fig2)
 6 : term hide (label=foo)
 7 : term hide (label=foo2)
 8 : echo term verbatim (label=blurfle)
 9 : term tex (label=tab1)

 You can now run LaTeX on 'foo.tex'
Stangle(C:/Program Files/R/R-2.7.2/bin/foo.Rnw)
 Writing to file foo.R
texi2dvi(foo.tex,pdf=TRUE)

 C:/Program Files/R/R-2.7.2/bin/foo.tex:11: Missing
 C:/Program Files/R/R-2.7.2/bin/foo.tex:11:
 C:/Program Files/R/R-2.7.2/bin/foo.tex:11: Extra
 C:/Program Files/R/R-2.7.2/bin/foo.tex:11: Missing
 C:/Program Files/R/R-2.7.2/bin/foo.tex:11: Extra
 C:/Program Files/R/R-2.7.2/bin/foo.tex:11: Missing
 C:/Program Files/R/R-2.7.2/bin/foo.tex:11: Extra
 C:/Program Files/R/R-2.7.2/bin/foo.tex:11: Missing
 C:/Program Files/R/R-2.7.2/bin/foo.tex:11: Missing
 C:/Program Files/R/R-2.7.2/bin/foo.tex:11: Extra
 C:/Program Files/R/R-2.7.2/bin/foo.tex:11: Missing
 C:/Program Files/R/R-2.7.2/bin/foo.tex:11: Extra
 Error in texi2dvi(foo.tex, pdf = TRUE) :
  running 'texi2dvi' on 'foo.tex' failed

 Any ideas why texi2dvi is crashing? foo.tex exist on same directory as 
 foo.Rnw but it says that is missing.Thanks





 Felipe D. Carrillo
 Supervisory Fishery Biologist
 Department of the Interior
 US Fish  Wildlife Service
 California, USA

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Fw: Logistic regresion - Interpreting (SENS) and (SPEC)

2008-10-13 Thread John Sorkin

Of course Prof Baer is correct the positive predictive value (PPV) and the 
negative predictive values (NPV) serve the function of providing conditional 
post-test probabilities
PPV: Post-test probability of disease given a positive test
NPV: Post-test probability of no disease given a negative test.

Further, PPV is a function of sensitivity (for a given specificity in a 
population with a given disease prevalence), the higher the sensitivity almost 
always the greater the PPV (it can by unchanged, but I don't believe it can be 
lower) and as
  NPV is a function of specificity (for a given sensitivity in a 
population with a given disease prevelance), the higher the specificity almost 
always the greater the NPV (it can by unchanged, but I don't believe it can be 
lower) .

Thus using Prof  Harrell's suggestion to use the test that move a pre-test 
probability a great deal in one or both directions, the test to choose is the 
one with largest sensitivity and or specificity, and thus 
sensitivity and specificity are, I believe is a good summary measures of the 
quality of a clinical test.

Finally I think Prof Harrell's observation that sensitivity and specificity 
change quite a bit, and mathematically must change if the disease is not 
all-or-nothing while true is a degenerate case of little practical importance. 


John David Sorkin M.D., Ph.D.
Chief, Biostatistics and Informatics
University of Maryland School of Medicine Division of Gerontology
Baltimore VA Medical Center
10 North Greene Street
GRECC (BT/18/GR)
Baltimore, MD 21201-1524
(Phone) 410-605-7119
(Fax) 410-605-7913 (Please call phone number above prior to faxing)

 Robert W. Baer, Ph.D. [EMAIL PROTECTED] 10/13/2008 4:41 PM 

- Original Message - 
From: Frank E Harrell Jr [EMAIL PROTECTED]
To: John Sorkin [EMAIL PROTECTED]
Cc: r-help@r-project.org; [EMAIL PROTECTED]; 
[EMAIL PROTECTED]
Sent: Monday, October 13, 2008 2:09 PM
Subject: Re: [R] Fw: Logistic regresion - Interpreting (SENS) and (SPEC)


 John Sorkin wrote:
 Frank,
 Perhaps I was not clear in my previous Email message. Sensitivity and 
 specificity do tell us about the quality of a test in that given two 
 tests the one with higher sensitivity will be better at identifying 
 subjects who have a disease in a pool who have a disease, and the more 
 sensitive test will be better at identifying subjects who do not have a 
 disease in a pool of people who do not have a disease. It is true that 
 positive predictive and negative predictive values are of greater utility 
 to a clinician, but as you know these two measures are functions of 
 sensitivity, specificity and disease prevalence. All other things being 
 equal, given two tests one would select the one with greater sensitivity 
 and specificity so in a sense they do measure the quality of a clinical 
 test - but not, as I tried to explain the quality of a statistical model.

 That is not very relevant John.  It is a function of all those things 
 because those quantities are all deficient.

 I would select the test that can move the pre-test probability a great 
 deal in one or both directions.

Of course, this quantity is known as a likelihood ratio and is a function of 
sensitivity and specificity.  For 2 x 2 data one often speaks of postive 
likelihood ratio and negative likelihood ratio, but for multi-row 
contingency table one can define likelihood ratios for a series of cut-off 
points.  This has become a popular approach in evidence-based medicine when 
diagnostic tests have continuous rather than binary outputs.

 You are of course correct that sensitivity and specificity are not truly 
 inherent characteristics of a test as their values may change from 
 population-to-population, but paretically speaking, they don't change all 
 that much, certainly not as much as positive and negative predictive 
 values.

 They change quite a bit, and mathematically must change if the disease is 
 not all-or-nothing.



 I guess we will disagree about the utility of sensitivity and specificity 
 as simplifying concepts.

 Thank you as always for your clear thoughts and stimulating comments.

 And thanks for yours John.
 Frank

 John




 among those subjects with a disease and the one with greater specificity 
 will be better at indentifying  John David Sorkin M.D., Ph.D.
 Chief, Biostatistics and Informatics
 University of Maryland School of Medicine Division of Gerontology
 Baltimore VA Medical Center
 10 North Greene Street
 GRECC (BT/18/GR)
 Baltimore, MD 21201-1524
 (Phone) 410-605-7119
 (Fax) 410-605-7913 (Please call phone number above prior to faxing)

 Frank E Harrell Jr [EMAIL PROTECTED] 10/13/2008 2:35 PM 
 John Sorkin wrote:
 Jumping into a thread can be like jumping into a den of lions but here 
 goes . . .
 Sensitivity and specificity are not designed to determine the quality of 
 a fit (i.e. if your model is good), but rather are characteristics of a 
 test. A test that has high sensitivity will

Re: [R] LM intercept

2008-10-13 Thread Peter Dalgaard


Michael Just wrote:

Great,
Thanks,
Michael

On Mon, Oct 13, 2008 at 3:56 PM, Rolf Turner [EMAIL PROTECTED]wrote:


On 14/10/2008, at 9:42 AM, Michael Just wrote:

 What is the difference when including or not including the intercept when

using lm()?

x.noint - lm(weight ~ group - 1))# omitting intercept
x - lm(weight ~ group))

This has nothing to do with forcing the intercept to 0, correct?


On the contrary.  This is *exactly* what it means.



But if group is a factor, this removes the intercept _and_ uses the full 
set of indicator variables to represent the factor, so you end up with 
the same model, just parametrized differently.


--
   O__   Peter Dalgaard Øster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark  Ph:  (+45) 35327918
~~ - ([EMAIL PROTECTED])  FAX: (+45) 35327907

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Variable shortlisting for the logistic regression

2008-10-13 Thread Frank E Harrell Jr


useR wrote:

Hi R helpers,

One rather statistical question?


What would be the best startegy to shortlist thousands of continous
variables automaticaly using R
as the preparation for logistic regression modleing!


Thanks


The easiest approach is to use a random number generator.
Frank


--
Frank E Harrell Jr   Professor and Chair   School of Medicine
 Department of Biostatistics   Vanderbilt University

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Fw: Logistic regresion - Interpreting (SENS) and (SPEC)

2008-10-13 Thread Peter Dalgaard


John Sorkin wrote:

Of course Prof Baer is correct the positive predictive value (PPV)
and the negative predictive values (NPV) serve the function of
providing conditional post-test probabilities PPV: Post-test
probability of disease given a positive test NPV: Post-test
probability of no disease given a negative test.

Further, PPV is a function of sensitivity (for a given specificity in
a population with a given disease prevalence), the higher the
sensitivity almost always the greater the PPV (it can by unchanged,
but I don't believe it can be lower) and as NPV is a function of
specificity (for a given sensitivity in a population with a given
disease prevelance), the higher the specificity almost always the
greater the NPV (it can by unchanged, but I don't believe it can be
lower) .



The PPV and NPV can be anything between 0 and 1 regardless of
sensitivity and specificity. Just apply the test to populations with a
prevalence of 0 or 1. The former gives you a PPV of 0 and an NPV of 1
since none of the positive and none of the negative will be true
positive. And vice versa.



--
   O__   Peter Dalgaard Øster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark  Ph:  (+45) 35327918
~~ - ([EMAIL PROTECTED])  FAX: (+45) 35327907

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Variable shortlisting for the logistic regression

2008-10-13 Thread David Scott


On Mon, 13 Oct 2008, Frank E Harrell Jr wrote:


useR wrote:

Hi R helpers,

One rather statistical question?


What would be the best startegy to shortlist thousands of continous
variables automaticaly using R
as the preparation for logistic regression modleing!


Thanks


The easiest approach is to use a random number generator.
Frank



Got a laugh from me Frank!

Can I nominate it for a fortune?

David

_
David Scott Department of Statistics, Tamaki Campus
The University of Auckland, PB 92019
Auckland 1142,NEW ZEALAND
Phone: +64 9 373 7599 ext 86830 Fax: +64 9 373 7000
Email:  [EMAIL PROTECTED]

Graduate Officer, Department of Statistics
Director of Consulting, Department of Statistics

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

1 2 >

1 - 100 of 125 matches

Mail list logo