[R] questions about the output of gee and its summary

2011-05-08 Thread Carrie Li
Dear R-helpers,

I am using the package gee to run a marginal model.

Here is the output.
In my simulated data, both x and z are time-varying, so I include their
interaction terms with time indicator (i.e. tind=0, if time 1, and 1 if time
2)
The data is simulated, so the true parameter of z both at time 1 and time 2
is 5, which is very close from the model output
for time 1, z = 5.0757760, and for time 2, z is 5.0757760-0.6379866 = ~5

 model=gee(y~x+z+x*tind+z*tind, family=gaussian(link = identity), id=sid,
corstr=exchangeable)
Beginning Cgee S-function, @(#) geeformula.q 4.13 98/01/27
running glm to get initial regression estimate
(Intercept) x  ztind
x:tind  z:tind
  2.9342186   1.5002601   5.0757760   2.0846327   0.1869748  -0.6379866



However, when I use the summary command, the coefficients are changed.
Am I missing anything here ??

 summary(model)

 GEE:  GENERALIZED LINEAR MODELS FOR DEPENDENT DATA
 gee S-function, version 4.13 modified 98/01/27 (1998)

Model:
 Link:  Identity
 Variance to Mean Relation: Gaussian
 Correlation Structure: Exchangeable

Call:
gee(formula = y ~ x + z + x * tind + z * tind, id = sid, family =
gaussian(link = identity),
corstr = exchangeable)

Summary of Residuals:
   Min 1Q Median 3QMax
-5.9273676 -2.0072725 -0.7169515  2.3709969  8.2377283


Coefficients:
 Estimate  Naive S.E.Naive z Robust S.E.   Robust z
(Intercept) 4.1450504 0.331866699  12.490106 0.26416  15.661403
x   1.5155102 0.008479614 178.723972 0.006854627 221.093020
z   0.6463947 0.48094   5.815617 0.100379444   6.439513
tind1.5986872 0.163851622   9.756920 0.175947744   9.086148
x:tind  0.1434216 0.005288708  27.118450 0.005924767  24.207123
z:tind  4.2951055 0.168647198  25.467992 0.166776520  25.753658

Estimated Scale Parameter:  6.800334
Number of Iterations:  10



Any helps would be very much appreciated!

Thank you,
Carrie--

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] help!! warning:only the first element has been used

2011-05-08 Thread pwldk
thanks a lot to u guys !
both works well!



--
View this message in context: 
http://r.789695.n4.nabble.com/help-warning-only-the-first-element-has-been-used-tp3505837p3506459.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Confidence intervals and polynomial fits

2011-05-08 Thread peter dalgaard

On May 7, 2011, at 16:15 , Ben Haller wrote:

 On May 6, 2011, at 4:27 PM, David Winsemius wrote:
 
 On May 6, 2011, at 4:16 PM, Ben Haller wrote:
 
 
 As for correlated coefficients: x, x^2, x^3 etc. would obviously be highly 
 correlated, for values close to zero.
 
 Not just for x close to zero:
 
 cor( (10:20)^2, (10:20)^3 )
 [1] 0.9961938
 cor( (100:200)^2, (100:200)^3 )
 [1] 0.9966219
 
  Wow, that's very interesting.  Quite unexpected, for me.  Food for thought.  
 Thanks!
 

Notice that because of the high correlations between the x^k, their parameter 
estimates will be correlated too. In practice, this means that the c.i. for the 
quartic term contains values for which you can compensate with the other 
coefficients and still have an acceptable fit to data. (Nothing strange about 
that; already in simple linear regression, you allow the intercept to change 
while varying the slope.)


-- 
Peter Dalgaard
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd@cbs.dk  Priv: pda...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] venn diagramm

2011-05-08 Thread khush ........
Dear Patrick,

Thanks for your reply, I know its hectic to c such graphs, as they are
difficult to interpret, but I got this interesting link from R
http://www.oga-lab.net/RGM2/func.php?rd_id=gplots:venn, mentioned about venn
diagram for 5 subsets.

 ## Example using a list of item names belonging to the
 ## specified group.
 ##

 ## construct some fake gene names..
 oneName - function() paste(sample(LETTERS,5,replace=TRUE),collapse=)
 geneNames - replicate(1000, oneName())

 ##
 GroupA - sample(geneNames, 400, replace=FALSE)
 GroupB - sample(geneNames, 750, replace=FALSE)
 GroupC - sample(geneNames, 250, replace=FALSE)
 GroupD - sample(geneNames, 300, replace=FALSE)
 input  -list(GroupA,GroupB,GroupC,GroupD)

 input

venn(input)

But not sure how to give files, as in this example I think they are
giving at terminal,
I am not sure this is hard to understand for me.

But I will try it with different sides and I hope I will get something
of of it. But if
you also extract some good plz let me know.

I also got a online tool called venny to draw for 4 subsets.

Thanks
khush

On Sat, May 7, 2011 at 10:28 PM, Breheny, Patrick
patrick.breh...@uky.eduwrote:

 I've never actually used it with 5 subsets, but the 'venn' function in the
 gplots package claims to be able to make a Venn diagram with up to 5
 subsets.

 --Patrick

 
 From: r-help-boun...@r-project.org [r-help-boun...@r-project.org] On
 Behalf Of khush  [bioinfo.kh...@gmail.com]
 Sent: Saturday, May 07, 2011 9:54 AM
 To: r-help@r-project.org
 Subject: [R] venn diagramm

 Dear all,

 I have a set of five datasets with string, for which I need to draw the
 venn
 diagram. there are tools available to draw venn diagram online, but limited
 to three sets. I can also generate venn for three from limma package, but
 do
 not know how to proceed with five subsets.

 Help me in drawing the same any script .

 Thank you
 Khush




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] %in% operator - NOT IN

2011-05-08 Thread Dan Abner
Hello everyone,

I am attempting to use the %in% operator with the ! to produce a NOT IN type
of operation. Why does this not work? Suggestions?

 data2[data1$char1 %in% c(string1,string2),1]-min(data1$x1)
 data2[data1$char1 ! %in% c(string1,string2),1]-max(data1$x1)+1000

Error: unexpected '!' in data2[data1$char1 !


Thanks!

Dan



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] %in% operator - NOT IN

2011-05-08 Thread Berwin A Turlach
G'day Dan,

On Sun, 8 May 2011 05:06:27 -0400
Dan Abner dan.abne...@gmail.com wrote:

 Hello everyone,
 
 I am attempting to use the %in% operator with the ! to produce a NOT
 IN type of operation. Why does this not work? Suggestions?
 
  data2[data1$char1 %in% c(string1,string2),1]-min(data1$x1)
  data2[data1$char1 ! %in%
  c(string1,string2),1]-max(data1$x1)+1000
 
 Error: unexpected '!' in data2[data1$char1 !

Try (untested)

R data2[!(data1$char1 %in% c(string1,string2)),1]-max(data1$x1)+1000 

HTH.

Cheers,

Berwin

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] %in% operator - NOT IN

2011-05-08 Thread baptiste auguie
Hi,

On 8 May 2011 21:18, Berwin A Turlach berwin.turl...@gmail.com wrote:
 G'day Dan,

 On Sun, 8 May 2011 05:06:27 -0400
 Dan Abner dan.abne...@gmail.com wrote:

 Hello everyone,

 I am attempting to use the %in% operator with the ! to produce a NOT
 IN type of operation. Why does this not work? Suggestions?

Alternatively,

example(`%in%`)

or

`%ni%` = Negate(`%in%`)

HTH,

baptiste


  data2[data1$char1 %in% c(string1,string2),1]-min(data1$x1)
  data2[data1$char1 ! %in%
  c(string1,string2),1]-max(data1$x1)+1000

 Error: unexpected '!' in data2[data1$char1 !

 Try (untested)

 R data2[!(data1$char1 %in% c(string1,string2)),1]-max(data1$x1)+1000

 HTH.

 Cheers,

        Berwin

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] question about val.surv in R

2011-05-08 Thread zhu yao
Dear R users:

I tried to use val.surv to give an internal validation of survival
prediction model.

I used the sample sources.

# Generate failure times from an exponential distribution
set.seed(123)  # so can reproduce results
n - 1000
age - 50 + 12*rnorm(n)
sex - factor(sample(c('Male','Female'), n, rep=TRUE, prob=c(.6, .4)))
cens - 15*runif(n)
h - .02*exp(.04*(age-50)+.8*(sex=='Female'))
t - -log(runif(n))/h
units(t) - 'Year'
label(t) - 'Time to Event'
ev - ifelse(t = cens, 1, 0)
t - pmin(t, cens)
S - Surv(t, ev)

# I use cph instead of psm in the example
f - cph(S ~ age + sex, y=TRUE)
w - val.surv(f)

I got an error:
Error in survreg.distributions[[fit$dist]] :
  attempt to select less than one element

Could some one explain for me?

Error in survreg.distributions[[fit$dist]] :
  attempt to select less than one element


Error in survreg.distributions[[fit$dist]] :
  attempt to select less than one element









*Yao Zhu*
*Department of Urology
Fudan University Shanghai Cancer Center
Shanghai, China*

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] %in% operator - NOT IN

2011-05-08 Thread Ted Harding
On 08-May-11 09:18:55, Berwin A Turlach wrote:
 G'day Dan,
 
 On Sun, 8 May 2011 05:06:27 -0400
 Dan Abner dan.abne...@gmail.com wrote:
 
 Hello everyone,
 I am attempting to use the %in% operator with the ! to produce
 a NOT IN type of operation. Why does this not work? Suggestions?
 
  data2[data1$char1 %in% c(string1,string2),1]-min(data1$x1)
  data2[data1$char1 ! %in%
  c(string1,string2),1]-max(data1$x1)+1000
 
 Error: unexpected '!' in data2[data1$char1 !
 
 Try (untested)
 
 R data2[!(data1$char1 %in%
 c(string1,string2)),1]-max(data1$x1)+1000 
 
 HTH.
 Cheers,
   Berwin

Berwin's suggestion should work -- it is the general way to
negate the result of an %in.

As to Why does this not work?, the point to note is that
%in% is a binary operator. If you enter
  ?%in%
you will be taken to the help page for match, where it is
pointed out that:

  ?%in%? is a more intuitive interface as a binary operator,
  which returns a logical vector indicating if there is a
  match or not for its left operand.

Specifically, therefore, the syntax of %in% requires

  X %in% Y

where X and Y are objects to which the functional definition
of %in% applies (see the same help page):

  '%in%' is currently defined as
  '%in% - function(x, table) match(x, table, nomatch = 0)  0'

In your expression (effectively X ! %in% Y) the item which
immediately precedes %in% is the !, and this is not a
valid item!

Based on the above functional definition, you could define
your own binary operator %!in% as

%!in% - function(x,table) match(x,table, nomatch = 0) == 0

or similar -- I have not tested this so cannot guarantee it!
However, it is the way to proceed if you want a NOT IN.
Then the usage could be:

data2[data1$char1 %!in% c(string1,string2),1]-max(data1$x1)+1000

Hoping ths helps,
Ted.


E-Mail: (Ted Harding) ted.hard...@wlandres.net
Fax-to-email: +44 (0)870 094 0861
Date: 08-May-11   Time: 10:35:05
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] split character vector by multiple keywords simultaneously

2011-05-08 Thread sunny

Andrew Robinson-6 wrote:
 
 A hack would be to use gsub() to prepend e.g. XXX to the keywords that
 you want, perform a strsplit() to break the lines into component
 strings, and then substr() to extract the pieces that you want from
 those strings.
 
 Cheers
 
 Andrew
 

Thanks, that got me started. I am sure there are much easier ways of doing
this, but in case someone comes looking, here's my solution:

keywordlist - c(Company name:, General manager:, Manager:)

# Attach XXX to the beginning of each keyword:
for (i in 1:length(keywordlist)) {
temp - gsub(keywordlist[i],paste(XXX,keywordlist[i],sep=),temp)
}

# Split each row into a list:
temp - strsplit(temp,XXX)
# Eliminate empty elements:
temp - lapply(temp, function(x) x[which(x!='')])

# Since each keyword happens to include a colon at the end, split each list
element generated above into exactly two parts, pre-colon for the keyword
and post-colon for the value. Since values may contain colons themselves,
avoid spurious matches by using n=2 in str_split_fixed function from stringr
package:
library(stringr)
temp - lapply(temp,function(x) str_split_fixed(x,':',n=2))

# Convert each list element into a data frame. The transpose makes sure that
the first row of each data frame is the set of keywords. Each data frame has
2 rows - one with the keywords and the second with the values:
temp - lapply(temp, function(x) replace(as.data.frame(t(x)),,t(x)))

# Copy the first row of each data frame to the name of the corresponding
column:
for (i in 1:length(temp)) {
names(temp[[i]]) - as.character(temp[[i]][1,])
}

# Now join all the data frames in the list by column names. This way it
doesn't matter if some keywords are absent in some cases:
final_data - do.call(rbind.fill,temp)

# We now have one large data frame with the odd numbered rows containing the
keywords and the even numbered rows containing the values. Since we already
have the keywords in the name, we can eliminate the odd numbered rows:
final_data - final_data[seq(2,dim(final_data)[1],2),]

-S.

--
View this message in context: 
http://r.789695.n4.nabble.com/split-character-vector-by-multiple-keywords-simultaneously-tp3497033p3506776.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] write.table vs. read.table and the argument fill

2011-05-08 Thread David Winsemius


On May 7, 2011, at 10:38 AM, Carl Witthoft wrote:

Just wondering how come read.table lets you specify fill=TRUE for  
ragged arrays, but so far as I can tell, no equivalent for  
write.table?




I imagine the answer is something along the lines of read.table  
creates a rectangular structure and write.table outputs a rectangular  
structure, so they are congruent. If I wanted to accomplish writing a  
ragged list I would consider using lapply , paste,  and writeLines.



Not a big deal, since I'm perfectly comfortable w/ write and scan  
and the other file I/O goodies.  A foolish inconsistency...  and  
all that.


--
David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] %in% operator - NOT IN

2011-05-08 Thread Duncan Mackay


At 19:06 08/05/2011, you wrote:

Hello everyone,

I am attempting to use the %in% operator with the ! to produce a NOT IN type
of operation. Why does this not work? Suggestions?

 data2[data1$char1 %in% c(string1,string2),1]-min(data1$x1)
 data2[data1$char1 ! %in% c(string1,string2),1]-max(data1$x1)+1000

Error: unexpected '!' in data2[data1$char1 !


Thanks!

Dan



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Hi Dan

See the last of the examples for ?match

%w/o% - function(x, y) x[!x %in% y] #--  x without y
(1:10) %w/o% c(3,7,12)

I think it was Peter Dalgaard who pointed this out some years ago

HTH

Regards


Duncan Mackay
Department of Agronomy and Soil Science
University of New England
ARMIDALE NSW 2351
Email: home mac...@northnet.com.au

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] write.table vs. read.table and the argument fill

2011-05-08 Thread Carl Witthoft

Well, it could be a list variable.
foo- 1:7
bar-1:9
rab-list(foo,bar)

I suppose I could do something like

oof-rbind(foo,bar)
write.table(oof)  #ignore the warnings

and then ignore or delete the redundant items in the output file.



On 5/8/11 1:51 AM, Joshua Wiley wrote:

Hi Carl,

What would the equivalent argument for write.table do?  Or perhaps
to rephrase my question what type of R object do you have in mind to
write that is a ragged array?

Josh

On Sat, May 7, 2011 at 7:38 AM, Carl Witthoftc...@witthoft.com  wrote:

Just wondering how come read.table lets you specify fill=TRUE for ragged
arrays, but so far as I can tell, no equivalent for write.table?

Not a big deal, since I'm perfectly comfortable w/ write and scan and the
other file I/O goodies.  A foolish inconsistency...  and all that.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to calculate the mean of a group in a table

2011-05-08 Thread John Kane
I see you already have three solutions but, just for the heck of it, here's 
another. I am trying to get familiar with the reshape2 package and your 
question was a good exercise for me. 


With your data set named xx:

library(reshape2)

yy - melt(xx, id=c(period, treatment, session,
 type,stage), measured=c(wage_accepted))

dcast(yy, period + stage ~ variable, mean)

As for your request for tutorials for loops and if-else materials, I cannot 
make any better suggestions than you already have. However, in R, if you are 
thinking of using these it usually means you are not thinking properly in R.  
There are usually faster and better was of achieving the desired results in 
R. Sometimes a loop or if-else is appropriate but usually in many fewer 
instances than in most other languages.

By the way type is a reserved(? name in R. Not a problem here but it's safer 
not to use reserved words in R.


--- On Sat, 5/7/11, tornanddesperate matthiasn...@gmx.ch wrote:

 From: tornanddesperate matthiasn...@gmx.ch
 Subject: [R] how to calculate the mean of a group in a table
 To: r-help@r-project.org
 Received: Saturday, May 7, 2011, 2:24 PM
 Hi its me again
 
 I don't mean to get on your nerves, but the use of R proofs
 to be a bit more
 complicated than envisaged.
 
 I would like to calculate the mean of a group of values,
 here
 wage_accepted. The group is determined by the stage and
 period, so in the
 end there should be a column with the values of the wages
 in period 1
 stage1, period 1 stage 2, period 2 stage1... Unfortunately,
 I haven't much
 of a clue on how to program this. Could you tell me how the
 function should
 roughly look like – if-else, loops included or not – in
 the end ?
 
    treatment session period stage
 wage_accepted type
 1          1   
    1      1 
    1           
 25  low
 2          1   
    1      1 
    1           
 19  low
 3          1   
    1      1 
    1           
 15  low
 4          1   
    1      1 
    2           
 32 high
 5          1   
    1      1 
    2           
 13  low
 6          1   
    1      1 
    2           
 14  low
 7          1   
    1      2 
    1           
 17  low
 8          1   
    1      2 
    1           
 12  low
 
 I am also very grateful if someone could point out a good
 internet tutorial
 resource for R functions such as if-else and loops and how
 to use them in
 conjunction with tables. It should contain simple examples
 if possible. 
 
 As ever indepted to you,
 
 Matthias
 
 
 
 --
 View this message in context: 
 http://r.789695.n4.nabble.com/how-to-calculate-the-mean-of-a-group-in-a-table-tp3505986p3505986.html
 Sent from the R help mailing list archive at Nabble.com.
 
 __
 R-help@r-project.org
 mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained,
 reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] question about val.surv in R

2011-05-08 Thread Frank Harrell
Please specify the package(s) you are using.  In this case it should be rms.

val.surv is mainly for an out-of-sample validation, as it does not penalize
for overfitting.  calibrate.cph is probably what you should be using.

To use val.surv in the fashion you are trying to use it, specify y=TRUE,
surv=TRUE to cph and specify the time point you are validating with the u
argument to val.surv (e.g., u=1 for 1 year survival probability validation).

Frank

yz wrote:
 
 Dear R users:
 
 I tried to use val.surv to give an internal validation of survival
 prediction model.
 
 I used the sample sources.
 
 # Generate failure times from an exponential distribution
 set.seed(123)  # so can reproduce results
 n - 1000
 age - 50 + 12*rnorm(n)
 sex - factor(sample(c('Male','Female'), n, rep=TRUE, prob=c(.6, .4)))
 cens - 15*runif(n)
 h - .02*exp(.04*(age-50)+.8*(sex=='Female'))
 t - -log(runif(n))/h
 units(t) - 'Year'
 label(t) - 'Time to Event'
 ev - ifelse(t = cens, 1, 0)
 t - pmin(t, cens)
 S - Surv(t, ev)
 
 # I use cph instead of psm in the example
 f - cph(S ~ age + sex, y=TRUE)
 w - val.surv(f)
 
 I got an error:
 Error in survreg.distributions[[fit$dist]] :
   attempt to select less than one element
 
 Could some one explain for me?
 
 Error in survreg.distributions[[fit$dist]] :
   attempt to select less than one element
 
 
 Error in survreg.distributions[[fit$dist]] :
   attempt to select less than one element
 
 
 
 
 
 
 
 
 
 *Yao Zhu*
 *Department of Urology
 Fudan University Shanghai Cancer Center
 Shanghai, China*
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 


-
Frank Harrell
Department of Biostatistics, Vanderbilt University
--
View this message in context: 
http://r.789695.n4.nabble.com/question-about-val-surv-in-R-tp350p3507015.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] plotting confidence bands from predict.nls

2011-05-08 Thread Uwe Ligges
Much quicker than asking for help on the list is to read the help file 
(which you have been asked to do in the posting guide you hopefully read).


?predict.nls tells us:

interval 	A character string indicating if prediction intervals or a 
confidence interval on the mean responses are to be calculated. At 
present this argument is ignored.


Best,
Uwe Ligges




On 07.05.2011 06:17, Penny Bilton wrote:

I am trying to find a confidence band for a fitted non-linear curve. I
see that the predict.nls function has an interval argument, but a
previous post indicates that this argument has not been implemented. Is
this still true? I have tried various ways to extract the interval
information from the model object without success. My code is:

Model.predict - predict(My.nls.model, se.fit=TRUE, interval =
confidence, level = 0.95) ,

where My.nls.model is an nls object, I was able to extract the
predictions okay.


Thank you for your help.
Penny.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] plotting confidence bands from predict.nls

2011-05-08 Thread Gabor Grothendieck
On Sat, May 7, 2011 at 12:17 AM, Penny Bilton pennybil...@xnet.co.nz wrote:
 I am trying to find a confidence band for a fitted non-linear curve. I see
 that the predict.nls function has an interval argument, but a previous post
 indicates that this argument has not been implemented.  Is this still true?
 I have tried various ways to extract the interval information from the model
 object without success. My code is:

 Model.predict -  predict(My.nls.model,  se.fit=TRUE,  interval =
 confidence,  level = 0.95)   ,

 where My.nls.model is an nls object, I was able to extract the predictions
 okay.


You can get these intervals using nls2.   The as.lm function has an
nls method which returns the lm model tangent to an nls model and use
can use predict.lm on that.

 library(nls2)
 fm - nls(demand ~ SSasympOrig(Time, A, lrc), data = BOD)
 predict(as.lm(fm), interval = confidence)
fit   lwr  upr
1  7.887451  3.701701 12.07320
2 12.524979  8.219483 16.83047
3 15.251674 11.813306 18.69004
4 16.854870 13.668094 20.04164
5 17.797489 14.026668 21.56831
6 18.677578 13.393630 23.96153
 predict(as.lm(fm), interval = prediction)
fitlwr  upr
1  7.887451 -0.3349547 16.10986
2 12.524979  4.2409738 20.80898
3 15.251674  7.3833942 23.11995
4 16.854870  9.0932340 24.61651
5 17.797489  9.7783530 25.81663
6 18.677578  9.8453897 27.50977
Warning message:
In predict.lm(as.lm(fm), interval = prediction) :
  Predictions on current data refer to _future_ responses


-- 
Statistics  Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] plotting confidence bands from predict.nls

2011-05-08 Thread Laurent Rhelp

Le 07/05/2011 06:17, Penny Bilton a écrit :
I am trying to find a confidence band for a fitted non-linear curve. I 
see that the predict.nls function has an interval argument, but a 
previous post indicates that this argument has not been implemented.  
Is this still true? I have tried various ways to extract the interval 
information from the model object without success. My code is:


Model.predict -  predict(My.nls.model,  se.fit=TRUE,  interval = 
confidence,  level = 0.95)   ,


where My.nls.model is an nls object, I was able to extract the 
predictions okay.



Thank you for your help.
Penny.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.

You can use the bootstrap methodology using the predicted values 
assessed through the N replications.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Rearranging variables in table in non-alphabetical (manually specified) order

2011-05-08 Thread André Júng
Dear all,

I'm trying to rearrange variables in a table in a custum order for using it 
with levelplot. So far I could only find examples showing how to sort 
alphabetically. Here is a short example: 

a - c(Anna,Anna,Michael,Klaus,Klaus,Anna,Fritz) 

b - 
c(Schnitzel,Pommes,Pommes,Schnitzel,Wurst,Schnitzel,Schnitzel) 
food - matrix(c(a,b),7)
as.data.frame(food) - tmp 
as.data.frame(table(tmp)) - X
levelplot(X$Freq ~ X$V1 * X$V2,xlab=people,ylab=food) 

Food is now ordered: Pommes, Schnitzel, Wurst.
But I need: Schnitzel, Pommes, Wurst. 

How can I define the order? I'm happy about every suggestion. 

Thanks in advance! 
Andre 


 
___
Schon gehört? WEB.DE hat einen genialen Phishing-Filter in die

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Syntax for iter.max in rms

2011-05-08 Thread Mike Harwood
Hello,

I would like to increase the number of iterations for running a
Buckley-James regression model in the rms package, but something is
apparently syntactically wrong.  The following code initially is
exactly as it appears in the help page (which runs successfully), then
a failure to converge message (resulting from specifying an
'identity' link argument, the error message by adding both the
iter.max control and 'identity' link arguments, and finally, the error
message when testing just an iter.max argument.

This was run in R 2.13.0 with rms version 3.3-0 on Ubuntu Linux 11.04.

Thank you in advance, and all insights and criticisms are appreciated.

Mike

library(rms)
 f - bj(Surv(ftime, stroke) ~ rcs(age,5) + hospital,x=TRUE, y=TRUE)
 f - bj(Surv(ftime, stroke) ~ rcs(age,5) + hospital, link='identity',x=TRUE, 
 y=TRUE)

No convergence in 50 steps
Failure in bj.fit
 f - bj(Surv(ftime, stroke) ~ rcs(age,5) + hospital, link='identity', 
 iter.max=200, x=TRUE, y=TRUE)
Error in bj(Surv(ftime, stroke) ~ rcs(age, 5) + hospital, link =
identity,  :
  unused argument(s) (iter.max = 200)
 f - bj(Surv(ftime, stroke) ~ rcs(age,5) + hospital, iter.max=200, x=TRUE, 
 y=TRUE)
Error in bj(Surv(ftime, stroke) ~ rcs(age, 5) + hospital, iter.max =
200,  :
  unused argument(s) (iter.max = 200)

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Confidence intervals and polynomial fits

2011-05-08 Thread Mike Marchywka




 From: pda...@gmail.com
 Date: Sun, 8 May 2011 09:33:23 +0200
 To: rh...@sticksoftware.com
 CC: r-help@r-project.org
 Subject: Re: [R] Confidence intervals and polynomial fits


 On May 7, 2011, at 16:15 , Ben Haller wrote:

  On May 6, 2011, at 4:27 PM, David Winsemius wrote:
 
  On May 6, 2011, at 4:16 PM, Ben Haller wrote:
 
 
  As for correlated coefficients: x, x^2, x^3 etc. would obviously be 
  highly correlated, for values close to zero.
 
  Not just for x close to zero:
 
  cor( (10:20)^2, (10:20)^3 )
  [1] 0.9961938
  cor( (100:200)^2, (100:200)^3 )
  [1] 0.9966219
 
  Wow, that's very interesting. Quite unexpected, for me. Food for thought. 
  Thanks!
 

 Notice that because of the high correlations between the x^k, their parameter 
 estimates will be correlated too. In practice, this means that the c.i. for 
 the quartic term contains values for which you can compensate with the other 
 coefficients and still have an acceptable fit to data. (Nothing strange about 
 that; already in simple linear regression, you allow the intercept to change 
 while varying the slope.)

I was trying to compose a longer message but at least for even/odd it isn't 
hard to find a set
of values for which cor is zero or to find a set of points that make sines of 
different frequencies have
non-zero correlation- this highlights the fact that the computer isn't magic 
and it
needs data to make basis functions different from each other. 
For background, you probably want to look up Taylor Series and Orthogonal 
Basis.
I would also suggest using R to add noise to your input and see what that does 
to your predictions
or for that matter take simple known data and add noise although I think in 
principal you can
do this analytically. You can always project a signal
onto some subspace and get an estimate of how good your estimate is, but that 
is different from
asking how well you can reconstruct your signal from a bunch of projections. 
If you want to know, what can I infer about the slope of my thing at x=a that 
is a
specific question about one coefficient but at this point statisticians can 
elaborate about
various issues with the other things you ignore. Also, I think you said 
something about
correclated at x=0 but you can change your origin, (x-a)^n and expand this in a 
finite series in x^m
to see what happens here. 

Also, if you are using hotmail don't think that a dot product is not html LOL 
since
hotmail know you must mean html when you use less than even in text email...




 --
 Peter Dalgaard
 Center for Statistics, Copenhagen Business School
 Solbjerg Plads 3, 2000 Frederiksberg, Denmark
 Phone: (+45)38153501
 Email: pd@cbs.dk Priv: pda...@gmail.com

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
  
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Rearranging variables in table in non-alphabetical (manually specified) order

2011-05-08 Thread Duncan Murdoch

On 11-05-08 11:10 AM, André Júng wrote:

Dear all,

I'm trying to rearrange variables in a table in a custum order for using it 
with levelplot. So far I could only find examples showing how to sort 
alphabetically. Here is a short example:

a- c(Anna,Anna,Michael,Klaus,Klaus,Anna,Fritz)

b- c(Schnitzel,Pommes,Pommes,Schnitzel,Wurst,Schnitzel,Schnitzel)
food- matrix(c(a,b),7)
as.data.frame(food) -  tmp
as.data.frame(table(tmp)) -  X
levelplot(X$Freq ~ X$V1 * X$V2,xlab=people,ylab=food)

Food is now ordered: Pommes, Schnitzel, Wurst.
But I need: Schnitzel, Pommes, Wurst.

How can I define the order? I'm happy about every suggestion.



The general answer to that question is to make b into a factor with the 
levels in a specified order, e.g. by


b - factor(b, levels=c(Schnitzel, Pommes, Wurst))

However, this doesn't survive the other strange things you are doing. 
If you simplify the rest, it should be fine.  For example:


food - data.frame(a,b)
X - as.data.frame(table(food))
levelplot(Freq ~ a * b, data= X)

Duncan Murdoch

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Rearranging variables in table in non-alphabetical (manually specified) order

2011-05-08 Thread Uwe Ligges



On 08.05.2011 17:10, André Júng wrote:

Dear all,

I'm trying to rearrange variables in a table in a custum order for using it 
with levelplot. So far I could only find examples showing how to sort 
alphabetically. Here is a short example:

a- c(Anna,Anna,Michael,Klaus,Klaus,Anna,Fritz)

b- c(Schnitzel,Pommes,Pommes,Schnitzel,Wurst,Schnitzel,Schnitzel)



Hmmm, why don't you ask your supervisor about your homework? I wonder 
who your supervisor is.


Anyway, some hint:

Replace b by a factor with levels in the desired order.



food- matrix(c(a,b),7)
as.data.frame(food) -  tmp


Please replace the two lines above by something like

food - data.frame(names=a, food=b)

beacuse 1. your matrix will destroy the factor again and is not required 
at all, and 2. never ever use - which makes your code almost unreadable.



Uwe Ligges




as.data.frame(table(tmp)) -  X
levelplot(X$Freq ~ X$V1 * X$V2,xlab=people,ylab=food)

Food is now ordered: Pommes, Schnitzel, Wurst.
But I need: Schnitzel, Pommes, Wurst.

How can I define the order? I'm happy about every suggestion.

Thanks in advance!
Andre



___
Schon gehört? WEB.DE hat einen genialen Phishing-Filter in die

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Syntax for iter.max in rms

2011-05-08 Thread Ravi Varadhan
The option `iter.max' should be an element of the Control list.  If you read 
the help file carefully, you would have noticed this.  So, try this:

f - bj(Surv(ftime, stroke) ~ rcs(age,5) + hospital, link='identity', 
control=list(iter.max=200), x=TRUE, y=TRUE)

Identity link is challenging to fit computationally.  Try setting `trave=TRUE' 
to see the convergence of the parameters.

f - bj(Surv(ftime, stroke) ~ rcs(age,5) + hospital, link='identity', 
control=list(iter.max=200, trace=TRUE), x=TRUE, y=TRUE)

Ravi.

From: r-help-boun...@r-project.org [r-help-boun...@r-project.org] On Behalf Of 
Mike Harwood [harwood...@gmail.com]
Sent: Sunday, May 08, 2011 9:27 AM
To: r-help@r-project.org
Subject: [R] Syntax for iter.max in rms

Hello,

I would like to increase the number of iterations for running a
Buckley-James regression model in the rms package, but something is
apparently syntactically wrong.  The following code initially is
exactly as it appears in the help page (which runs successfully), then
a failure to converge message (resulting from specifying an
'identity' link argument, the error message by adding both the
iter.max control and 'identity' link arguments, and finally, the error
message when testing just an iter.max argument.

This was run in R 2.13.0 with rms version 3.3-0 on Ubuntu Linux 11.04.

Thank you in advance, and all insights and criticisms are appreciated.

Mike

library(rms)
 f - bj(Surv(ftime, stroke) ~ rcs(age,5) + hospital,x=TRUE, y=TRUE)
 f - bj(Surv(ftime, stroke) ~ rcs(age,5) + hospital, link='identity',x=TRUE, 
 y=TRUE)

No convergence in 50 steps
Failure in bj.fit
 f - bj(Surv(ftime, stroke) ~ rcs(age,5) + hospital, link='identity', 
 iter.max=200, x=TRUE, y=TRUE)
Error in bj(Surv(ftime, stroke) ~ rcs(age, 5) + hospital, link =
identity,  :
  unused argument(s) (iter.max = 200)
 f - bj(Surv(ftime, stroke) ~ rcs(age,5) + hospital, iter.max=200, x=TRUE, 
 y=TRUE)
Error in bj(Surv(ftime, stroke) ~ rcs(age, 5) + hospital, iter.max =
200,  :
  unused argument(s) (iter.max = 200)

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Weighted box or violin plots in Lattice?

2011-05-08 Thread Deepayan Sarkar
On Sat, May 7, 2011 at 1:55 AM, Raphael Mazor rapha...@sccwrp.org wrote:
 Is it possible to create weighted boxplots or violin plots in lattice?

 It seems that you can specify weights for panel.histogram() and
 panel.densityplot(), but not for panel.bwplot or panel.violin().

Not for panel.histogram() either.

It's not immediately obvious how you would get weighted boxplots. For
weighted violin plots, it should not be too difficult to write your
own panel function generalizing panel.violin(); just pass in 'weights'
and 'subscripts', and get the per-panel weights inside the panel
function as weights[subscripts] (as in panel.densityplot).

The only difference is that densityplot() implements non-standard
evaluation for 'weights', which bwplot() does not, so you have to
specify 'weights' explicitly.

-Deepayan

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Cumsum in Lattice Panel Function

2011-05-08 Thread Deepayan Sarkar
On Fri, May 6, 2011 at 9:24 PM, Elliot Joel Bernstein
elliot.bernst...@fdopartners.com wrote:
 I'm trying to create an xyplot with a groups argument where the y-variable
 is the cumsum of the values stored in the input data frame. I almost have
 it, but I can't get it to automatically adjust the y-axis scale. How do I
 get the y-axis to automatically scale as it would have if the cumsum values
 had been stored in the data frame?

 Here is the code I have so far:

 require(lattice)



 dates - seq(as.Date(2011-01-01), as.Date(2011-04-30), days)
 g - 1:3


 dat - data.frame(date = rep(dates, length(g)),
                  group = rep(g, each = length(dates)),
                  value = rnorm(length(dates)*length(g)) + 0.05)


 xyplot(value ~ date, data = dat, group = group, type = 'l', grid = TRUE,
       panel = panel.superpose,
       panel.groups = function(x, y, ...) { panel.xyplot(x, cumsum(y), ...)
 })


 I want the result to look the same as if I had done

 dat$cumvalue - with(dat, unsplit(lapply(split(value, group), cumsum),
 group))
 xyplot(cumvalue ~ date, data = dat, group = group, type = 'l', grid = TRUE)

You need something along the lines of

xyplot(value ~ date, data = dat, group = group, type = 'l', grid = TRUE,
   panel = panel.superpose,
   panel.groups = function(x, y, ...) {
   panel.xyplot(x, cumsum(y), ...)
   },
   prepanel = function(x, y, groups, ...) {
   yy - unlist(tapply(y, groups, cumsum))
   list(ylim = range(yy, finite = TRUE))
   })

-Deepayan

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to alter circle size

2011-05-08 Thread Deepayan Sarkar
On Fri, May 6, 2011 at 10:21 PM, Dat Mai dat.d@gmail.com wrote:
 Hello all,

 I'm trying  to create a heatmap using 2 matrices I have: z and v. Both
 matrices represent different correlations for the same independent
 variables. The problem I have is that I wish to have the values from matrix
 z to be represented by color intensity while having the values from matrix v
 to be represented by circle size. I currently have the following in front of
 me and an unsure of what to add or change in order to achieve that goal.

 panel.corrgram.2 =
 function(x, y, z, subscripts, at = pretty(z), scale = 0.8, ...)
 {
  require(grid, quietly = TRUE)
  x - as.numeric(x)[subscripts]
  y - as.numeric(y)[subscripts]
  z - as.numeric(z)[subscripts]
  zcol - level.colors(z, at = at, ...)
  for (i in seq(along = z))
  {
    lims - range(0, z[i])
    tval - 2 * base::pi * seq(from = lims[1], to = lims[2], by = 0.01)
    grid.polygon(x = x[i] + .5 * scale * c(0, sin(tval)),
      y = y[i] + .5 * scale * c(0, cos(tval)),
      default.units = native,
      gp = gpar(fill = zcol[i]))
    grid.circle(x = x[i], y = y[i], r = .5 * scale,
      default.units = native)
  }
 }

Probably this modification

panel.corrgram.2 -
function(x, y, z, v, subscripts, at = pretty(v), scale = 0.8, ...)
{
require(grid, quietly = TRUE)
x - as.numeric(x)[subscripts]
y - as.numeric(y)[subscripts]
z - as.numeric(z)[subscripts]
zcol - level.colors(v, at = at, ...)

Rest unchanged


followed by

levelplot(signif(z,3), v = v, panel = panel.corrgram.2, ...)

will do it.

-Deepayan



 k=levelplot(signif(z,3), scales = list(x = list(rot = 45)),
 col.regions=col.sch, panel = panel.corrgram.2,
 label = num.circ, xlab=, ylab=, main=paste(output,receptor response))
 print(k)

 --
 Best,
 Dat Mai
 PhD Rotation Student
 Albert Einstein College of Medicine

        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] package update

2011-05-08 Thread eric
I tried to update my packages using update.packages() 

I got the following message:

The downloaded packages are in
‘/tmp/RtmpyDYdTX/downloaded_packages’
Warning in install.packages(update[instlib == l, Package], l, contriburl =
contriburl,  :
  'lib = /usr/lib/R/library' is not writable
Error in install.packages(update[instlib == l, Package], l, contriburl =
contriburl,  : 
  unable to install package

How do I fix this ?

--
View this message in context: 
http://r.789695.n4.nabble.com/package-update-tp3507479p3507479.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] new to loops

2011-05-08 Thread SevannaD
I have never made a loop on my own to do anything in R.  But I am hoping
someone can help me build one for the following issue:

I need to make a univariate logistic regression for each of my variables
(about 62 of them), then I need to gather up each of their coefficients (not
the intercepts), each of their 95% confidence intervals, and each of thier
odds ratios and place them in a matrix to showcase them for my thesis. 

currently, I am writing them all out one by one with the cbond method, which
has taken me a better part of a day so far and I know there has to be able
to be a way to make a loop that can do this whole process, I just havent
been able to figure it out yet.

Thanks in advance. 

--
View this message in context: 
http://r.789695.n4.nabble.com/new-to-loops-tp3507366p3507366.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] help with mysql and R: partitioning by quintile

2011-05-08 Thread gj
Hi,

I have a mysql table with fields userid,track,frequency e.g
u1,1,10
u1,2,100
u1,3,110
u1,4,200
u1,5,120
u1,6,130
.
u2,1,23
.
.
where frequency is the number of times a music track is played by a
userid

I need to turn my 'frequency' table into a rating table (it's for a
recommender system). So, for each user, I need to categorise the frequency
of tracks played by quintile so that each particular track can have 5
ratings (1-5), with the ratings allocated as follows: inter-quintile range
100-80% = rating 5,   inter-quintile range 80-60% = rating 4,
..., inter-quintile range 20-0% = rating 1)

Hence, I need to create a table with fields userid,track,rating:
u1,1,1
u1,2, 3
...

Can anybody help me to do this with R?

Regards
Gawesh

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Another quantmod question

2011-05-08 Thread Russ Abbott
I'm having troubles with the names of columns.

quantmod deal with stock quotes.  I've created an array of the first 5
closing prices from Jan 2007. (Is there a problem that the name is the same
as the variable name? There shouldn't be.)

 close

 close

2007-01-03 1416.60

2007-01-04 1418.34

2007-01-05 1409.71

2007-01-08 1412.84

2007-01-09 1412.11


When I try to create a more complex array by adding columns, the names get
fouled up.  Here's a simple example.

 cbind(changed.close = close+1, zero = 0, close)

 close zero close.1

2007-01-03 1417.600 1416.60

2007-01-04 1419.340 1418.34

2007-01-05 1410.710 1409.71

2007-01-08 1413.840 1412.84

2007-01-09 1413.110 1412.11


The first column should be called changed.close, but it's called close.
The second column has the right name. The third column should be called
close but it's called close.1. Why is that? Am I missing something?

If I change the order of the columns and let close have its original name,
there is still a problem.

 cbind(close, zero = 0, changed.close = close+1)

 close zero close.1

2007-01-03 1416.600 1417.60

2007-01-04 1418.340 1419.34

2007-01-05 1409.710 1410.71

2007-01-08 1412.840 1413.84

2007-01-09 1412.110 1413.11


Now the names on the first two columns are ok, but the third column is still
wrong. Again, why is that?  Apparently it's not letting me assign a name to
a column that comes from something that already has a name.  Is that the way
it should be?

I don't get that same problem on a simpler example.

 IX - cbind(I=0, X=(1:3))

  IX

 I X

[1,] 0 1

[2,] 0 2

[3,] 0 3

 cbind(Y = 1, Z = IX[, I], W = IX[, X])

 Y Z W

[1,] 1 0 1

[2,] 1 0 2

[3,] 1 0 3


Is this a peculiarity to xts objects?

Thanks.

*-- Russ *
*
*
P.S. Once again I feel frustrated because it's taken me far more time than
it deserves to track down and characterize this problem. I can fix it by
using the names function. But I shouldn't have to do that.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Another quantmod question

2011-05-08 Thread David Winsemius


On May 8, 2011, at 3:07 PM, Russ Abbott wrote:


I'm having troubles with the names of columns.

quantmod deal with stock quotes.  I've created an array of the first 5
closing prices from Jan 2007. (Is there a problem that the name is  
the same

as the variable name? There shouldn't be.)


close


close

2007-01-03 1416.60

2007-01-04 1418.34

2007-01-05 1409.71

2007-01-08 1412.84

2007-01-09 1412.11


When I try to create a more complex array by adding columns, the  
names get

fouled up.  Here's a simple example.


cbind(changed.close = close+1, zero = 0, close)


I suspect that you are actually using xts objects that you are  
incorrectly calling 'array's. If something is puzzling about the  
behavior of an R object the first thing to do is see what you are  
really dealing with so ... str(object)


If you load the xts package and type ?cbind.xts , you get a help page  
for merge.xts.


(In base R I do not know of a way to assign columns the way you  
propose within a `merge` call.)


Here is the code for cbind.xts:

 cbind.xts
function (..., all = TRUE, fill = NA, suffixes = NULL)
{
merge.xts(..., all = all, fill = fill, suffixes = suffixes)
}
environment: namespace:xts





close zero close.1

2007-01-03 1417.600 1416.60

2007-01-04 1419.340 1418.34

2007-01-05 1410.710 1409.71

2007-01-08 1413.840 1412.84

2007-01-09 1413.110 1412.11


The first column should be called changed.close, but it's called  
close.
The second column has the right name. The third column should be  
called
close but it's called close.1. Why is that? Am I missing  
something?


If I change the order of the columns and let close have its original  
name,

there is still a problem.


cbind(close, zero = 0, changed.close = close+1)


close zero close.1

2007-01-03 1416.600 1417.60

2007-01-04 1418.340 1419.34

2007-01-05 1409.710 1410.71

2007-01-08 1412.840 1413.84

2007-01-09 1412.110 1413.11


Now the names on the first two columns are ok, but the third column  
is still
wrong. Again, why is that?  Apparently it's not letting me assign a  
name to
a column that comes from something that already has a name.  Is that  
the way

it should be?

I don't get that same problem on a simpler example.


IX - cbind(I=0, X=(1:3))



IX


I X

[1,] 0 1

[2,] 0 2

[3,] 0 3


cbind(Y = 1, Z = IX[, I], W = IX[, X])


Y Z W

[1,] 1 0 1

[2,] 1 0 2

[3,] 1 0 3


Is this a peculiarity to xts objects?

Thanks.

*-- Russ *
*
*
P.S. Once again I feel frustrated because it's taken me far more  
time than
it deserves to track down and characterize this problem. I can fix  
it by

using the names function. But I shouldn't have to do that.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Another quantmod question

2011-05-08 Thread Joshua Wiley
Hi Russ,

On Sun, May 8, 2011 at 12:07 PM, Russ Abbott russ.abb...@gmail.com wrote:
 I'm having troubles with the names of columns.

 quantmod deal with stock quotes.  I've created an array of the first 5
 closing prices from Jan 2007. (Is there a problem that the name is the same
 as the variable name? There shouldn't be.)

 close

             close

 2007-01-03 1416.60

 2007-01-04 1418.34

 2007-01-05 1409.71

 2007-01-08 1412.84

 2007-01-09 1412.11

It would be appreciated in the future if you provided the object via
dput() or some such that is easy to paste in.



 When I try to create a more complex array by adding columns, the names get
 fouled up.  Here's a simple example.

 cbind(changed.close = close+1, zero = 0, close)

             close zero close.1

 2007-01-03 1417.60    0 1416.60

 2007-01-04 1419.34    0 1418.34

 2007-01-05 1410.71    0 1409.71

 2007-01-08 1413.84    0 1412.84

 2007-01-09 1413.11    0 1412.11


 The first column should be called changed.close, but it's called close.
 The second column has the right name. The third column should be called
 close but it's called close.1. Why is that? Am I missing something?

Yes.

mat - matrix(1:10, dimnames = list(NULL, A))
cbind(X = 11:20, Y = mat + 1)
cbind(X = 11:20, Y = mat[, A] + 1)

In particular note that:

class(mat)
class(mat[, A])

It is silly to expect to be able to pass a single name for an entire
matrix, while it makes complete sense that that would work for a
vector.  The problem with naming the column the same thing as the
object containing it, is our limited human minds get ramfeezled (R
does just fine, as you can see).


 If I change the order of the columns and let close have its original name,
 there is still a problem.

 cbind(close, zero = 0, changed.close = close+1)

             close zero close.1

 2007-01-03 1416.60    0 1417.60

 2007-01-04 1418.34    0 1419.34

 2007-01-05 1409.71    0 1410.71

 2007-01-08 1412.84    0 1413.84

 2007-01-09 1412.11    0 1413.11


 Now the names on the first two columns are ok, but the third column is still
 wrong. Again, why is that?  Apparently it's not letting me assign a name to
 a column that comes from something that already has a name.  Is that the way
 it should be?

 I don't get that same problem on a simpler example.

 IX - cbind(I=0, X=(1:3))

   IX

     I X

 [1,] 0 1

 [2,] 0 2

 [3,] 0 3

 cbind(Y = 1, Z = IX[, I], W = IX[, X])

     Y Z W

 [1,] 1 0 1

 [2,] 1 0 2

 [3,] 1 0 3


 Is this a peculiarity to xts objects?

Nope, but do check the type of object you are working with---there are
often different methods for different objects so I would not assumet
that all things would work the same.

Cheers,

Josh


 Thanks.

 *-- Russ *
 *
 *
 P.S. Once again I feel frustrated because it's taken me far more time than
 it deserves to track down and characterize this problem. I can fix it by
 using the names function. But I shouldn't have to do that.

        [[alternative HTML version deleted]]

Please post using plain text.


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


-- 
Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles
http://www.joshuawiley.com/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Another quantmod question

2011-05-08 Thread Russ Abbott
I understand Josh's example:

mat - matrix(1:10, dimnames = list(NULL, A))
cbind(X = 11:20, Y = mat + 1)
cbind(X = 11:20, Y = mat[, A] + 1)


In the line, cbind(X = 11:20, Y = mat + 1), it would be nice if an error or
warning message were issued to the effect that the Y =  part is ignored or
not applicable or something to that effect.

I still don't understand why cbind.xts doesn't do what I expect.  If I  try
what Josh suggests on my xts example, the result is still confusing.

 cbind.xts(close, A = close[,close])

 close close.1

2007-01-03 1416.60 1416.60

2007-01-04 1418.34 1418.34

2007-01-05 1409.71 1409.71

2007-01-08 1412.84 1412.84

2007-01-09 1412.11 1412.11


If this helps:
*
*

*

 dput(close)

*
*

structure(c(1416.6, 1418.34, 1409.71, 1412.84, 1412.11), .indexCLASS =
Date, .indexTZ = structure(, .Names = TZ), src = yahoo,
updated = structure(1304875186.625, class = c(POSIXct,

*
*

POSIXt)), class = c(xts, zoo), index = structure(c(1167811200,

*
*

1167897600, 1167984000, 1168243200, 1168329600), tzone = structure(,
.Names = TZ), tclass = Date), .Dim = c(5L,

*
*

1L), .Dimnames = list(NULL, close))

*

*
*
I was also unable to convert Josh's matrix example to xts.

 as.xts(mat, order.by = rownames())

Error in dimnames(x) : 'x' is missing

*
*
*The order.by parameter is required, but I don't understand the error
message and haven't been able to figure out how to fix it. If this helps:*
*
*

*

 dput(mat)structure(1:10, .Dim = c(10L, 1L), .Dimnames = list(NULL, A))

*

*
*
*This suggests that mat already has defined dimnames. So why does as.xts
give an error message about dimnames?  I was unable to figure out the
problem by looking at any of the help files. In particular the help file
for **as.xts.methods didn't seem to offer any clarifications--at least none
that I could understand.*
*
*
*
*
*-- Russ *



On Sun, May 8, 2011 at 12:26 PM, Joshua Wiley jwiley.ps...@gmail.comwrote:

 class(mat)
 class(mat[, A])


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] questions about the output of gee and its summary

2011-05-08 Thread Thomas Lumley
On Sun, May 8, 2011 at 6:36 PM, Carrie Li carrieands...@gmail.com wrote:
 Dear R-helpers,

 I am using the package gee to run a marginal model.

 Here is the output.
 In my simulated data, both x and z are time-varying, so I include their
 interaction terms with time indicator (i.e. tind=0, if time 1, and 1 if time
 2)
 The data is simulated, so the true parameter of z both at time 1 and time 2
 is 5, which is very close from the model output
 for time 1, z = 5.0757760, and for time 2, z is 5.0757760-0.6379866 = ~5

 model=gee(y~x+z+x*tind+z*tind, family=gaussian(link = identity), id=sid,
 corstr=exchangeable)
 Beginning Cgee S-function, @(#) geeformula.q 4.13 98/01/27
 running glm to get initial regression estimate
 (Intercept)                 x                  z            tind
 x:tind          z:tind
  2.9342186   1.5002601   5.0757760   2.0846327   0.1869748  -0.6379866



 However, when I use the summary command, the coefficients are changed.
 Am I missing anything here ??

Yes.

The printout above gives the starting values obtained by running
glm(), not the results of gee().  If you use summary() on the model
object, or just print the model object you will get the results of
gee().

   -thomas

-- 
Thomas Lumley
Professor of Biostatistics
University of Auckland

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Weighted box or violin plots in Lattice?

2011-05-08 Thread Thomas Lumley
On Mon, May 9, 2011 at 6:35 AM, Deepayan Sarkar
deepayan.sar...@gmail.com wrote:
 On Sat, May 7, 2011 at 1:55 AM, Raphael Mazor rapha...@sccwrp.org wrote:
 Is it possible to create weighted boxplots or violin plots in lattice?

 It seems that you can specify weights for panel.histogram() and
 panel.densityplot(), but not for panel.bwplot or panel.violin().

 Not for panel.histogram() either.

 It's not immediately obvious how you would get weighted boxplots.

The way the survey world does it is to get weighted quantiles and work
from them, as in survey:::svyboxplot().  The only tricky decision is
what to do with outliers, since I haven't been able to think of any
good way of indicating weights on them.

   -thomas

-- 
Thomas Lumley
Professor of Biostatistics
University of Auckland

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Error in AnnotationDbi package - makeProbePackage

2011-05-08 Thread Helga Garcia
Dear all,

We have developed our own Affymetrix chip (Custom Express Array, PM-only
with two species).
I want to analyse the data with the limma package, but for that I need to
built my own CDF package,
probe package and built the filters to analyse one specie or another.
I'm using the makeProbePackage available in the AnnotationDbi (for a
R-2.13.0) but I got the following error message:

Error in rep(NA, max(pm1, mm1, pm2, mm2)) : invalid 'times' argument

I already tried an old version of R-2.9.1 with a package matchprobes (that
is no longer update for R-2.13.0)
but the same error occurs.

Does anyone knows how could I overcome this problem?
Thank you very much for your attention.
Best Regards,
Helga



-- 
Helga Garcia

PhD student
Applied and Environmental Mycology Lab/Molecular Thermodynamics Lab

ITQB I - UNL
Portugal

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Plotting n a scale up to 6 decimal places

2011-05-08 Thread TZ Rizvi
Dear All,

I am trying to plot some element of the below lists which contains 10,000
rows. For instance lfdr_true on  x-axis vs hi and and I also include
estimates as points on the plot.

However, as you can notice the real difference comes in the later decimal
places and not in the first 2 to 3 decimal places. Can someone suggest  how
to expand the scale to 6 places of decimals?

 head(all_simulated_samples[[1]])

 gene lowerhi   estimate   covered
lfdr_true
100.962  0.9913568TRUE
0.9564683
200.999  0.9965426TRUE
0.9184360
300.997  0.9941567TRUE
0.9783095
400.990  0.9949874TRUE
0.9342528
500.971  0.9943081TRUE
0.9053046
600.921  0.9854756TRUE
0.9927538

Following is the format of code I am using after extracting some values from
list:
x1-min(lfdr.true)
x2-max(lfdr.true)
plot(lfdr.true, original.estimate, pch=19, col = dark red,xlim = c(x1,x2))
points(upper.limit, pch=24, col = dark green)

However, it doesn't help in differentiating the points I am plotting as they
are so close upto first two decimal places.

I would really appreciate any suggestions or comments.
Thank you.
Best Regards,
Tasneem

-- 
*Dr. Tasneem Zaihra*
*PhD(Statistics)*
*Assistant Professor*
*UNB-SJ**, Saint John, NB, CA*
*Tel:506-648-5624 (office)*

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] ARMA

2011-05-08 Thread Bojana Milosevic
Hello,Could somebody tell me what is the difference between  theese 3 calls of 
functionsarma(x,order=c(1,0)), arima(x,order=c(1,0,0)) ar(x,order=1)?I expected 
same residuals of theese three models,but unexpectably for the first two R 
requiredinitial value of something (what?)...Thanks in advance! 
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] help with mysql and R: partitioning by quintile

2011-05-08 Thread jim holtman
try this:

 # create some data
 x - data.frame(userid = paste('u', rep(1:20, each = 20), sep = '')
+   , track = rep(1:20, 20)
+   , freq = floor(runif(400, 10, 200))
+   , stringsAsFactors = FALSE
+   )
 # get the quantiles for each track
 tq - tapply(x$freq, x$track, quantile, prob = c(.2, .4, .6, .8, 1))
 # create a matrix with the rownames as the tracks to use in the findInterval
 tqm - do.call(rbind, tq)
 # now put the ratings
 require(data.table)
 x.dt - data.table(x)
 x.new - x.dt[,
+   list(userid = userid
+  , freq = freq
+  , rating = findInterval(freq
+# use track as index into
quantile matrix
+, tqm[as.character(track[1L]),]
+, rightmost.closed = TRUE
+) + 1L
+  )
+  , by = track]

 head(x.new)
 track userid freq rating
[1,] 1 u1   10  1
[2,] 1 u2   15  1
[3,] 1 u3  126  4
[4,] 1 u4  117  3
[5,] 1 u5   76  2
[6,] 1 u6  103  3



On Sun, May 8, 2011 at 2:48 PM, gj gaw...@gmail.com wrote:
 Hi,

 I have a mysql table with fields userid,track,frequency e.g
 u1,1,10
 u1,2,100
 u1,3,110
 u1,4,200
 u1,5,120
 u1,6,130
 .
 u2,1,23
 .
 .
 where frequency is the number of times a music track is played by a
 userid

 I need to turn my 'frequency' table into a rating table (it's for a
 recommender system). So, for each user, I need to categorise the frequency
 of tracks played by quintile so that each particular track can have 5
 ratings (1-5), with the ratings allocated as follows: inter-quintile range
 100-80% = rating 5,   inter-quintile range 80-60% = rating 4,
 ..., inter-quintile range 20-0% = rating 1)

 Hence, I need to create a table with fields userid,track,rating:
 u1,1,1
 u1,2, 3
 ...

 Can anybody help me to do this with R?

 Regards
 Gawesh

        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Another quantmod question

2011-05-08 Thread Jeff Ryan
Hi Russ,

Colnames don't get rewritten if they already exist. The reason is due to 
performance and how cbind is written at the R level. 

It isn't perfect per se, but the complexity and variety of dispatch that can 
take place for cbind in R, as it isn't a generic, is quite challenging to get 
to behave as one may hope.  After years of trying I'd say it is nearly 
impossible to do what you want without causing horrible memory issues on non 
trivial objects they are use in production systems **using** xts on objects 
with billions of rows.  Your simple case that has a simple workaround would 
cost everyone using in the other 99.999% of cases to pay a recurring cost that 
isn't tolerable. 

If this is frustrating to you you should stop using the class. 

Jeff

Jeffrey Ryan|Founder|jeffrey.r...@lemnica.com

www.lemnica.com

On May 8, 2011, at 2:07 PM, Russ Abbott russ.abb...@gmail.com wrote:

 I'm having troubles with the names of columns.
 
 quantmod deal with stock quotes.  I've created an array of the first 5 
 closing prices from Jan 2007. (Is there a problem that the name is the same 
 as the variable name? There shouldn't be.)
 
 
  close
 
 
 
  close
 
 2007-01-03 1416.60
 
 2007-01-04 1418.34
 
 2007-01-05 1409.71
 
 2007-01-08 1412.84
 
 2007-01-09 1412.11
 
 When I try to create a more complex array by adding columns, the names get 
 fouled up.  Here's a simple example.
 
 
 
  cbind(changed.close = close+1, zero = 0, close)
 
 
 
 
  close zero close.1
 
 
 2007-01-03 1417.600 1416.60
 
 
 2007-01-04 1419.340 1418.34
 
 
 2007-01-05 1410.710 1409.71
 
 
 2007-01-08 1413.840 1412.84
 
 
 2007-01-09 1413.110 1412.11
 
 The first column should be called changed.close, but it's called close. 
 The second column has the right name. The third column should be called 
 close but it's called close.1. Why is that? Am I missing something?
 
 If I change the order of the columns and let close have its original name, 
 there is still a problem.
 
 
  cbind(close, zero = 0, changed.close = close+1)
 
 
 
  close zero close.1
 
 2007-01-03 1416.600 1417.60
 
 2007-01-04 1418.340 1419.34
 
 2007-01-05 1409.710 1410.71
 
 2007-01-08 1412.840 1413.84
 
 2007-01-09 1412.110 1413.11
 
 Now the names on the first two columns are ok, but the third column is still 
 wrong. Again, why is that?  Apparently it's not letting me assign a name to a 
 column that comes from something that already has a name.  Is that the way it 
 should be?
 
 I don't get that same problem on a simpler example.
 
 
 
  IX - cbind(I=0, X=(1:3))
  IX
 
  I X
 
 [1,] 0 1
 
 [2,] 0 2
 
 [3,] 0 3
 
  cbind(Y = 1, Z = IX[, I], W = IX[, X])
 
  Y Z W
 
 [1,] 1 0 1
 
 [2,] 1 0 2
 
 [3,] 1 0 3
 
 Is this a peculiarity to xts objects?
 
 Thanks.
  
 -- Russ 
 
 P.S. Once again I feel frustrated because it's taken me far more time than it 
 deserves to track down and characterize this problem. I can fix it by using 
 the names function. But I shouldn't have to do that.  

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Plotting n a scale up to 6 decimal places

2011-05-08 Thread Steven Kennedy
The plot works fine for me with the example data you have given. Maybe
it is something in your par settings. Try putting a ylim in your plot
the same as you have done for the xlim:

y1-min(original.estimate)
y2-max(original.estimate)
plot(lfdr.true, original.estimate, pch=19, col = dark red,xlim =
c(x1,x2),ylim=c(y1,y2))

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] ARMA

2011-05-08 Thread Pete Brecknock

boki2b wrote:
 
 Hello,Could somebody tell me what is the difference between  theese 3
 calls of functionsarma(x,order=c(1,0)), arima(x,order=c(1,0,0))
 ar(x,order=1)?I expected same residuals of theese three models,but
 unexpectably for the first two R requiredinitial value of something
 (what?)...Thanks in advance! 
   [[alternative HTML version deleted]]
 
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 

Firstly, all of the functions you mention can be explored in more detail by
typing ?xxx, where xxx is ar, arima or arma. Note, you will need the tseries
package to run the arma function. (require(tseries))

Secondly, there is some debate around interpretation of some of the output
from these functions. Check out one viewpoint at
http://www.stat.pitt.edu/stoffer/tsa2/Rissues.htm (ISSUE 1: When is the
intercept the mean?) 

Finally, I have tried below to put the functions on a similar footing to
allow you to compare their output.

# create data
set.seed(12345)
d = rnorm(30)

library(tseries) # need for arma function

# 1. arma
# fits using conditional least squares, intercept is the intercept
arma(d,order=c(1,0)) 
  ar1  intercept  
 -0.105660.07046 

# 2. arima
# intercept is the mean, intercept derived as mean*(1-AR1)
arima(d,order=c(1,0,0),method=CSS)  
 ar1  intercept
  -0.1057 0.0638

So, intercept = 0.0638 * (1 + 0.1057) = 0.0705 ... same as arma above

# 3. ar
intercept is the intercept
ar(d,aic=FALSE, order.max=1,method=ols,intercept=TRUE, demean=FALSE) 
  1  
 -0.1057  
 Intercept: -0.07054 (0.1731)

There is a nice document on CRAN that you may find useful. 
 
http://cran.r-project.org/doc/contrib/Ricci-refcard-ts.pdf

HTH

Pete


--
View this message in context: 
http://r.789695.n4.nabble.com/ARMA-tp3507846p3507963.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] help with mysql and R: partitioning by quintile

2011-05-08 Thread Phil Spector

One way to get the ratings would be to use the ave() function:

rating = ave(x$freq,x$track,
 FUN=function(x)cut(x,quantile(x,(0:5)/5),include.lowest=TRUE))

- Phil Spector
 Statistical Computing Facility
 Department of Statistics
 UC Berkeley
 spec...@stat.berkeley.edu



On Sun, 8 May 2011, gj wrote:


Hi,

I have a mysql table with fields userid,track,frequency e.g
u1,1,10
u1,2,100
u1,3,110
u1,4,200
u1,5,120
u1,6,130
.
u2,1,23
.
.
where frequency is the number of times a music track is played by a
userid

I need to turn my 'frequency' table into a rating table (it's for a
recommender system). So, for each user, I need to categorise the frequency
of tracks played by quintile so that each particular track can have 5
ratings (1-5), with the ratings allocated as follows: inter-quintile range
100-80% = rating 5,   inter-quintile range 80-60% = rating 4,
..., inter-quintile range 20-0% = rating 1)

Hence, I need to create a table with fields userid,track,rating:
u1,1,1
u1,2, 3
...

Can anybody help me to do this with R?

Regards
Gawesh

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Error in AnnotationDbi package - makeProbePackage

2011-05-08 Thread Martin Morgan

On 05/08/2011 02:15 PM, Helga Garcia wrote:

Dear all,

We have developed our own Affymetrix chip (Custom Express Array, PM-only
with two species).
I want to analyse the data with the limma package, but for that I need to
built my own CDF package,
probe package and built the filters to analyse one specie or another.
I'm using the makeProbePackage available in the AnnotationDbi (for a
R-2.13.0) but I got the following error message:

Error in rep(NA, max(pm1, mm1, pm2, mm2)) : invalid 'times' argument

I already tried an old version of R-2.9.1 with a package matchprobes (that
is no longer update for R-2.13.0)
but the same error occurs.

Does anyone knows how could I overcome this problem?


Hi Helga -- Please ask this question on the Bioconductor mailing list, 
where you will get expert advice.


http://bioconductor.org/help/mailing-list/

Be sure that your packages are up-to-date

http://bioconductor.org/install/

and that you include the output of sessionInfo() in your post.

Martin


Thank you very much for your attention.
Best Regards,
Helga






--
Computational Biology
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109

Location: M1-B861
Telephone: 206 667-2793

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Conditional values of a vector XXXX

2011-05-08 Thread Dan Abner
Hello everyone,

Below is a metadata summary of raw data in a data frame (which itself is a
data frame). I want to add 2 columns to the data frame below that will
contain the min and max for integer and numeric class columns and NA for
factors and character vectors. Can anyone suggestion the most succinct way
of going about this?


  Variable   ClassMode

1   id integer numeric
2   x1 numeric numeric
3   x2 numeric numeric
4   x3 numeric numeric
5y numeric numeric
6   gender  factor numeric
7char1  factor numeric
I have the following, but it does not work:

x.contents.3-data.frame(id=1:ncol(x),Min=NA,Max=NA,row.names=id)
x.contents.3[!(x.contents.2$Class(factor,character)),1]-sapply(x,min)
x.contents.3[!(x.contents.2$Class(factor,character)),2]-sapply(x,max)

===

I receive the following:

 x.contents.3-data.frame(id=1:ncol(x),Min=NA,Max=NA,row.names=id)
 x.contents.3[!(x.contents.2$Class(factor,character)),1]-sapply(x,min)
Error in Summary.factor(c(1L, 2L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 2L, 2L,  :
  min not meaningful for factors
 x.contents.3[!(x.contents.2$Class(factor,character)),2]-sapply(x,max)
Error in Summary.factor(c(1L, 2L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 2L, 2L,  :
  max not meaningful for factors

Thanks!

Dan

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Another quantmod question

2011-05-08 Thread Russ Abbott
Hi Jeff,

The xts class has some very nice features, and you have done a valuable
service in developing it.

My primary frustration is how difficult it seems to be to find out what went
wrong when my code doesn't work.  I've been writing quite sophisticated code
for a fairly long time. It's not that I'm new to software development.

The column name rule is a good example.  I'm willing to live with the rule
that column names are not changed for efficiency sake.  What's difficult for
me is that I never saw that rule anywhere before.  Of course, I'm not an R
expect. I've been using it for only a couple of months. But still, I would
have expected to run into a rule like that.

Worse, since the rule is in conflict with the explicit intent of cbind--one
can name columns when using cbind; in fact the examples illustrate how to do
it--it would really be nice of cbind would issue a warning when one attempts
to rename a column in violation of that rule.  Instead, cbind is silent,
giving no hint about what went wrong.

It's those sorts of things that have caused me much frustration. And it's
these sorts of things that seem pervasive in R.  One never knows what one is
dealing with. Did something not work because there is a special case rule
that I haven't heard of? Did it not work because a special convenience was
programmed into a function in a way that conflicted with normal use?  Since
these sorts of things seem to come up so often, I find myself feeling that
there is no good way to track down problems, which leads to a sense of
helplessness and confusion. That's not what one wants in a programming
language.

-- Russ


On Sun, May 8, 2011 at 2:42 PM, Jeff Ryan jeff.a.r...@gmail.com wrote:

 Hi Russ,

 Colnames don't get rewritten if they already exist. The reason is due to
 performance and how cbind is written at the R level.

 It isn't perfect per se, but the complexity and variety of dispatch that
 can take place for cbind in R, as it isn't a generic, is quite challenging
 to get to behave as one may hope.  After years of trying I'd say it is
 nearly impossible to do what you want without causing horrible memory issues
 on non trivial objects they are use in production systems **using** xts on
 objects with billions of rows.  Your simple case that has a simple
 workaround would cost everyone using in the other 99.999% of cases to pay a
 recurring cost that isn't tolerable.

 If this is frustrating to you you should stop using the class.

 Jeff

 Jeffrey Ryan|Founder| jeffrey.r...@lemnica.com
 jeffrey.r...@lemnica.com

 www.lemnica.com

 On May 8, 2011, at 2:07 PM, Russ Abbott russ.abb...@gmail.com wrote:

 I'm having troubles with the names of columns.

 quantmod deal with stock quotes.  I've created an array of the first 5
 closing prices from Jan 2007. (Is there a problem that the name is the same
 as the variable name? There shouldn't be.)

  close

  close

 2007-01-03 1416.60

 2007-01-04 1418.34

 2007-01-05 1409.71

 2007-01-08 1412.84

 2007-01-09 1412.11


 When I try to create a more complex array by adding columns, the names get
 fouled up.  Here's a simple example.

  cbind(changed.close = close+1, zero = 0, close)

  close zero close.1

 2007-01-03 1417.600 1416.60

 2007-01-04 1419.340 1418.34

 2007-01-05 1410.710 1409.71

 2007-01-08 1413.840 1412.84

 2007-01-09 1413.110 1412.11


 The first column should be called changed.close, but it's called close.
 The second column has the right name. The third column should be called
 close but it's called close.1. Why is that? Am I missing something?

 If I change the order of the columns and let close have its original name,
 there is still a problem.

  cbind(close, zero = 0, changed.close = close+1)

  close zero close.1

 2007-01-03 1416.600 1417.60

 2007-01-04 1418.340 1419.34

 2007-01-05 1409.710 1410.71

 2007-01-08 1412.840 1413.84

 2007-01-09 1412.110 1413.11


 Now the names on the first two columns are ok, but the third column is
 still wrong. Again, why is that?  Apparently it's not letting me assign a
 name to a column that comes from something that already has a name.  Is that
 the way it should be?

 I don't get that same problem on a simpler example.


  IX - cbind(I=0, X=(1:3))

   IX

  I X

 [1,] 0 1

 [2,] 0 2

 [3,] 0 3

  cbind(Y = 1, Z = IX[, I], W = IX[, X])

  Y Z W

 [1,] 1 0 1

 [2,] 1 0 2

 [3,] 1 0 3


 Is this a peculiarity to xts objects?

 Thanks.

 *-- Russ *
 *
 *
 P.S. Once again I feel frustrated because it's taken me far more time than
 it deserves to track down and characterize this problem. I can fix it by
 using the names function. But I shouldn't have to do that.



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html

Re: [R] new to loops

2011-05-08 Thread Scott Chamberlain
Not knowing what format your data is in or what model you are using...

df # is your data frame with columns the variables you are running regressions 
for
datout - data.frame(coeff = NA, conf_low = NA, conf_high = NA, odd = NA) # a 
table to put your results in
for(i in 1:length(names(df)[2:10])) {
fit - glm(data[,1] ~ data[,i], data = df, etc...)
datout[i,] - fit[e.g, 1:4] # determine what values in your model output are 
what you need
}
datout # a table with all your output for each variable

On Sunday, May 8, 2011 at 11:58 AM, SevannaD wrote:

I have never made a loop on my own to do anything in R. But I am hoping
 someone can help me build one for the following issue:
 
 I need to make a univariate logistic regression for each of my variables
 (about 62 of them), then I need to gather up each of their coefficients (not
 the intercepts), each of their 95% confidence intervals, and each of thier
 odds ratios and place them in a matrix to showcase them for my thesis. 
 
 currently, I am writing them all out one by one with the cbond method, which
 has taken me a better part of a day so far and I know there has to be able
 to be a way to make a loop that can do this whole process, I just havent
 been able to figure it out yet.
 
 Thanks in advance. 
 
 --
 View this message in context: 
 http://r.789695.n4.nabble.com/new-to-loops-tp3507366p3507366.html
 Sent from the R help mailing list archive at Nabble.com.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Error in AnnotationDbi package - makeProbePackage

2011-05-08 Thread Helga Garcia
Dear all,

We have developed our own Affymetrix chip (Custom Express Array, PM-only
with two species).
I want to analyse the data with the limma package, but for that I need to
built my own CDF package,
probe package and built the filters to analyse one specie or another.
I'm using the makeProbePackage available in the AnnotationDbi (for a
R-2.13.0) but I got the following error message:

Error in rep(NA, max(pm1, mm1, pm2, mm2)) : invalid 'times' argument

I already tried an old version of R-2.9.1 with a package matchprobes (that
is no longer update for R-2.13.0)
but the same error occurs.

Does anyone knows how could I overcome this problem?
Thank you very much for your attention.
Best Regards,
Helga



-- 
Helga Garcia

PhD student
Applied and Environmental Mycology Lab/Molecular Thermodynamics Lab

ITQB I - UNL
Portugal

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Pretty printing numbers

2011-05-08 Thread Worik R
Friends

I am trying to format a number to a string so 2189.745 goes to 2,189.35
and 309283.929 goes to 309,283.93

I have tried to use formatC(X, big.mark=,,drop0trailing=FALSE, format=f)
but it does not get the number of decimals correct.  Specifying digits does
not work as that is significant digits.  I could use a switch statement
switching on the size of the number to be formated to set the digits
parameter but that is complex and leaves me insecure that thre may be other
problems I have not uncovered.

Is there a simple way of using library functions to insert big.marks and get
the number of trailing digits correct?  Perhaps some way of using sprintf?

cheers
Worik

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Pretty printing numbers

2011-05-08 Thread Peter Langfelder
On Sun, May 8, 2011 at 4:41 PM, Worik R wor...@gmail.com wrote:
 Friends

 I am trying to format a number to a string so 2189.745 goes to 2,189.35
 and 309283.929 goes to 309,283.93

 I have tried to use formatC(X, big.mark=,,drop0trailing=FALSE, format=f)
 but it does not get the number of decimals correct.  Specifying digits does
 not work as that is significant digits.  I could use a switch statement
 switching on the size of the number to be formated to set the digits
 parameter but that is complex and leaves me insecure that thre may be other
 problems I have not uncovered.

 Is there a simple way of using library functions to insert big.marks and get
 the number of trailing digits correct?  Perhaps some way of using sprintf?


Try this inserting a round() step before the final printing:

x = 309283.929

formatC(round(x, 2), big.mark=,,drop0trailing=TRUE, format=f)

[1] 309,283.93


HTH,

Peter


 cheers
 Worik

        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Sent from my Linux computer. Way better than iPad :)

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Pretty printing numbers

2011-05-08 Thread Worik R
 formatC(round(2189.745, 2), big.mark=,,format=f)
[1] 2,189.7400

Unfortunately this does not work

Worik

On Mon, May 9, 2011 at 11:45 AM, Peter Langfelder 
peter.langfel...@gmail.com wrote:

 On Sun, May 8, 2011 at 4:41 PM, Worik R wor...@gmail.com wrote:
  Friends
 
  I am trying to format a number to a string so 2189.745 goes to 2,189.35
  and 309283.929 goes to 309,283.93
 
  I have tried to use formatC(X, big.mark=,,drop0trailing=FALSE,
 format=f)
  but it does not get the number of decimals correct.  Specifying digits
 does
  not work as that is significant digits.  I could use a switch statement
  switching on the size of the number to be formated to set the digits
  parameter but that is complex and leaves me insecure that thre may be
 other
  problems I have not uncovered.
 
  Is there a simple way of using library functions to insert big.marks and
 get
  the number of trailing digits correct?  Perhaps some way of using
 sprintf?
 

 Try this inserting a round() step before the final printing:

 x = 309283.929

 formatC(round(x, 2), big.mark=,,drop0trailing=TRUE, format=f)

 [1] 309,283.93


 HTH,

 Peter


  cheers
  Worik
 
 [[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 



 --
 Sent from my Linux computer. Way better than iPad :)


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Pretty printing numbers

2011-05-08 Thread David Winsemius


On May 8, 2011, at 8:02 PM, Worik R wrote:


formatC(round(2189.745, 2), big.mark=,,format=f)

[1] 2,189.7400

Unfortunately this does not work


Because you did not follow his example.
--
David.


Worik

On Mon, May 9, 2011 at 11:45 AM, Peter Langfelder 
peter.langfel...@gmail.com wrote:


On Sun, May 8, 2011 at 4:41 PM, Worik R wor...@gmail.com wrote:

Friends

I am trying to format a number to a string so 2189.745 goes to  
2,189.35

and 309283.929 goes to 309,283.93

I have tried to use formatC(X, big.mark=,,drop0trailing=FALSE,

format=f)
but it does not get the number of decimals correct.  Specifying  
digits

does
not work as that is significant digits.  I could use a switch  
statement

switching on the size of the number to be formated to set the digits
parameter but that is complex and leaves me insecure that thre may  
be

other

problems I have not uncovered.

Is there a simple way of using library functions to insert  
big.marks and

get

the number of trailing digits correct?  Perhaps some way of using

sprintf?




Try this inserting a round() step before the final printing:

x = 309283.929

formatC(round(x, 2), big.mark=,,drop0trailing=TRUE, format=f)

[1] 309,283.93


HTH,

Peter



cheers
Worik

  [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide

http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.





--
Sent from my Linux computer. Way better than iPad :)



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Pretty printing numbers

2011-05-08 Thread Worik R
On Mon, May 9, 2011 at 12:06 PM, David Winsemius dwinsem...@comcast.netwrote:


 On May 8, 2011, at 8:02 PM, Worik R wrote:

  formatC(round(2189.745, 2), big.mark=,,format=f)

 [1] 2,189.7400

 Unfortunately this does not work


 Because you did not follow his example.


That is true.  I did not notice the drop0trailing  argument.

That is perfect thank you.

All the pieces were there, I should have been able to make it work with out
having to ask,  so thanks for your help.

Worik

 --
 David.


 Worik

 On Mon, May 9, 2011 at 11:45 AM, Peter Langfelder 
 peter.langfel...@gmail.com wrote:

  On Sun, May 8, 2011 at 4:41 PM, Worik R wor...@gmail.com wrote:

 Friends

 I am trying to format a number to a string so 2189.745 goes to
 2,189.35
 and 309283.929 goes to 309,283.93

 I have tried to use formatC(X, big.mark=,,drop0trailing=FALSE,

 format=f)

 but it does not get the number of decimals correct.  Specifying digits

 does

 not work as that is significant digits.  I could use a switch statement
 switching on the size of the number to be formated to set the digits
 parameter but that is complex and leaves me insecure that thre may be

 other

 problems I have not uncovered.

 Is there a simple way of using library functions to insert big.marks and

 get

 the number of trailing digits correct?  Perhaps some way of using

 sprintf?



 Try this inserting a round() step before the final printing:

 x = 309283.929

 formatC(round(x, 2), big.mark=,,drop0trailing=TRUE, format=f)

 [1] 309,283.93


 HTH,

 Peter


  cheers
 Worik

  [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide

 http://www.R-project.org/posting-guide.html

 and provide commented, minimal, self-contained, reproducible code.




 --
 Sent from my Linux computer. Way better than iPad :)


[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


 David Winsemius, MD
 West Hartford, CT



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Pretty printing numbers

2011-05-08 Thread Henrique Dallazuanna
formatC(round(2189.745, 2), big.mark=,,format=f, drop0trailing = TRUE)

On Sun, May 8, 2011 at 9:02 PM, Worik R wor...@gmail.com wrote:
 formatC(round(2189.745, 2), big.mark=,,format=f)
 [1] 2,189.7400

 Unfortunately this does not work

 Worik

 On Mon, May 9, 2011 at 11:45 AM, Peter Langfelder 
 peter.langfel...@gmail.com wrote:

 On Sun, May 8, 2011 at 4:41 PM, Worik R wor...@gmail.com wrote:
  Friends
 
  I am trying to format a number to a string so 2189.745 goes to 2,189.35
  and 309283.929 goes to 309,283.93
 
  I have tried to use formatC(X, big.mark=,,drop0trailing=FALSE,
 format=f)
  but it does not get the number of decimals correct.  Specifying digits
 does
  not work as that is significant digits.  I could use a switch statement
  switching on the size of the number to be formated to set the digits
  parameter but that is complex and leaves me insecure that thre may be
 other
  problems I have not uncovered.
 
  Is there a simple way of using library functions to insert big.marks and
 get
  the number of trailing digits correct?  Perhaps some way of using
 sprintf?
 

 Try this inserting a round() step before the final printing:

 x = 309283.929

 formatC(round(x, 2), big.mark=,,drop0trailing=TRUE, format=f)

 [1] 309,283.93


 HTH,

 Peter


  cheers
  Worik
 
         [[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 



 --
 Sent from my Linux computer. Way better than iPad :)


        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Henrique Dallazuanna
Curitiba-Paraná-Brasil
25° 25' 40 S 49° 16' 22 O

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Hosmer-Lemeshow 'goodness of fit'

2011-05-08 Thread viostorm
I'm trying to do a Hosmer-Lemeshow 'goodness of fit' test on my logistic
regression model.

I found some code here:

http://sas-and-r.blogspot.com/2010/09/example-87-hosmer-and-lemeshow-goodness.html

The R code is above is a little complicated for me but I'm having trouble
with my answer:

Hosmer-Lemeshow: p=0.6163585
le Cessie and Houwelingen test (Design library): p=0.2843620

The above link indicated they should be approximately equal which in my case
they are not, any suggestions or is there a package function people would
recommend in R for use with a logistic regression model?

Thanks in advance,

-Rob Schutt


Robert Schutt, MD, MCS 
Resident - Department of Internal Medicine
University of Virginia, Charlottesville, Virginia 


--



# Compute the Hosmer-Lemeshow 'goodness-of-fit' test

cd.full_model = glm(formula = Collaterals ~ CHF + Age + CABG 
+ relevel (as.factor (num.obst.vessels),one) 
+ Current.smoker + DM + HTN + ace.inhibitor 
+ MI, family = binomial(link = logit))

hosmerlem = function(y, yhat, g=10) {
  cutyhat = cut(yhat,
 breaks = quantile(yhat, probs=seq(0,
   1, 1/g)), include.lowest=TRUE)
  obs = xtabs(cbind(1 - y, y) ~ cutyhat)
  expect = xtabs(cbind(1 - yhat, yhat) ~ cutyhat)
  chisq = sum((obs - expect)^2/expect)
  P = 1 - pchisq(chisq, g - 2)
  return(list(chisq=chisq,p.value=P))
}

hosmerlem(y=Collaterals, yhat=fitted(cd.full_model))

33
# Compute the le Cessie and Houwelingen test 

f - lrm(Collaterals ~ CHF + Age + CABG 
+ relevel (as.factor (num.obst.vessels),one) 
+ Current.smoker + DM + HTN + ace.inhibitor 
+ MI, x=TRUE, y=TRUE)

library(Design)  
resid(f, 'gof')

Output:

 hosmerlem(y=Collaterals, yhat=fitted(cd.full_model))
$chisq
[1] 6.275889

$p.value
[1] 0.6163585

 resid(f, 'gof')
Sum of squared errors Expected value|H0SD 
  118.5308396   118.3771115 0.1435944 
Z P 
1.0705717 0.2843620 


--
View this message in context: 
http://r.789695.n4.nabble.com/Hosmer-Lemeshow-goodness-of-fit-tp3508127p3508127.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] recommendation on B for validate.lrm () ?

2011-05-08 Thread viostorm

Thanks so much for the reply it was exceptionally helpful!  A couple of
questions:

1. I was under the impression that k-fold with B=10 would train on 9/10,
validate on 1/10, and repeat 10 times for each different 1/10th.  Is this
how the procedure works in R?

2. Is the reason you recommend repeating k-fold 100 times because the
partitioning is random, ie not 1st 10th, 2nd 10, et cetera so you might
obtain slightly different results?





--
View this message in context: 
http://r.789695.n4.nabble.com/recommendation-on-B-for-validate-lrm-tp3486200p3508143.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] nls problem with R

2011-05-08 Thread sterlesser
I am sorry,Andrew,I don't get you.
Please forgive my poor English.

--
View this message in context: 
http://r.789695.n4.nabble.com/nls-problem-with-R-tp3494454p3508131.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] nls problem with R

2011-05-08 Thread sterlesser
Thanks Mike.
Your suggestion is really helpful.I did with the your instruction , it
really works out.
What's more,can you use this package
http://cran.r-project.org/web/packages/minpack.lm/index.html
it use Levenberg-Marquardt algorithm.
Can this package do with four parameters?
Thanks again

--
View this message in context: 
http://r.789695.n4.nabble.com/nls-problem-with-R-tp3494454p3508126.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] new to loops

2011-05-08 Thread SevannaD
So in my first try before I got your message, this is what I did:

orconf-list()
 ccoef-list()
 or-list()
 coef-list()
 out-list()
 for (i in 1:49){
out[[i]]-glm(y~var[[i]],family=binomial(link=logit))
coef[[i]]-out[[i]]$coef[2]
 or[[i]]-exp(out[[i]]$coef[2])
 bond-matrix(out[[i]]$coef[2],
exp(out[[i]]$coef[2]),confint(out[[i]]$coef[2]),exp(confint(out[[i]]$coef[2])))
}
But it did not work due to confint(out[[i]]$coef[2] and the exp one. Said
Error in object$coefficients : $ operator is invalid for atomic vectors.

would I would to identify conf_low and conf_high as two separate things?

Thanks

On Sun, May 8, 2011 at 4:31 PM, Scott Chamberlain-3 [via R] 
ml-node+3508106-1763482049-235...@n4.nabble.com wrote:

 Not knowing what format your data is in or what model you are using...

 df # is your data frame with columns the variables you are running
 regressions for
 datout - data.frame(coeff = NA, conf_low = NA, conf_high = NA, odd = NA) #
 a table to put your results in
 for(i in 1:length(names(df)[2:10])) {
 fit - glm(data[,1] ~ data[,i], data = df, etc...)
 datout[i,] - fit[e.g, 1:4] # determine what values in your model output
 are what you need
 }
 datout # a table with all your output for each variable

 On Sunday, May 8, 2011 at 11:58 AM, SevannaD wrote:

 I have never made a loop on my own to do anything in R. But I am hoping

  someone can help me build one for the following issue:
 
  I need to make a univariate logistic regression for each of my variables
  (about 62 of them), then I need to gather up each of their coefficients
 (not
  the intercepts), each of their 95% confidence intervals, and each of
 thier
  odds ratios and place them in a matrix to showcase them for my thesis.
 
  currently, I am writing them all out one by one with the cbond method,
 which
  has taken me a better part of a day so far and I know there has to be
 able
  to be a way to make a loop that can do this whole process, I just havent
  been able to figure it out yet.
 
  Thanks in advance.
 
  --
  View this message in context:
 http://r.789695.n4.nabble.com/new-to-loops-tp3507366p3507366.htmlhttp://r.789695.n4.nabble.com/new-to-loops-tp3507366p3507366.html?by-user=t
  Sent from the R help mailing list archive at Nabble.com.
 
  __
  [hidden 
  email]http://user/SendEmail.jtp?type=nodenode=3508106i=0by-user=tmailing
   list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.htmlhttp://www.r-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 

 [[alternative HTML version deleted]]

 __
 [hidden 
 email]http://user/SendEmail.jtp?type=nodenode=3508106i=1by-user=tmailing 
 list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.htmlhttp://www.r-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


 --
  If you reply to this email, your message will be added to the discussion
 below:

 http://r.789695.n4.nabble.com/help-with-a-vector-loop-problem-tp3507366p3508106.html
 To unsubscribe from help with a vector loop problem, click 
 herehttp://r.789695.n4.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=3507366code=c2V2YW5uYWRhcmtzdGFyQGdtYWlsLmNvbXwzNTA3MzY2fC0xMzE0Mjc1OTM3.




--
View this message in context: 
http://r.789695.n4.nabble.com/help-with-a-vector-loop-problem-tp3507366p3508141.html
Sent from the R help mailing list archive at Nabble.com.
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Hosmer-Lemeshow 'goodness of fit'

2011-05-08 Thread Frank Harrell
Please read the documentation carefully, and replace the Design package with
the newer rms package.
The older Hosmer-Lemeshow test requires binning and has lower power.  It
also does not penalize for overfitting.  The newer goodness of fit test in
rms/Design should not agree with Hosmer-Lemeshow.

Frank

viostorm wrote:
 
 I'm trying to do a Hosmer-Lemeshow 'goodness of fit' test on my logistic
 regression model.
 
 I found some code here:
 
 http://sas-and-r.blogspot.com/2010/09/example-87-hosmer-and-lemeshow-goodness.html
 
 The R code is above is a little complicated for me but I'm having trouble
 with my answer:
 
 Hosmer-Lemeshow: p=0.6163585
 le Cessie and Houwelingen test (Design library): p=0.2843620
 
 The above link indicated they should be approximately equal which in my
 case they are not, any suggestions or is there a package function people
 would recommend in R for use with a logistic regression model?
 
 Thanks in advance,
 
 -Rob Schutt
 
 
 Robert Schutt, MD, MCS 
 Resident - Department of Internal Medicine
 University of Virginia, Charlottesville, Virginia 
 
 
 --
 
 
 
 # Compute the Hosmer-Lemeshow 'goodness-of-fit' test
 
 cd.full_model = glm(formula = Collaterals ~ CHF + Age + CABG 
   + relevel (as.factor (num.obst.vessels),one) 
   + Current.smoker + DM + HTN + ace.inhibitor 
   + MI, family = binomial(link = logit))
 
 hosmerlem = function(y, yhat, g=10) {
   cutyhat = cut(yhat,
  breaks = quantile(yhat, probs=seq(0,
1, 1/g)), include.lowest=TRUE)
   obs = xtabs(cbind(1 - y, y) ~ cutyhat)
   expect = xtabs(cbind(1 - yhat, yhat) ~ cutyhat)
   chisq = sum((obs - expect)^2/expect)
   P = 1 - pchisq(chisq, g - 2)
   return(list(chisq=chisq,p.value=P))
 }
 
 hosmerlem(y=Collaterals, yhat=fitted(cd.full_model))
 
 33
 # Compute the le Cessie and Houwelingen test 
 
 f - lrm(Collaterals ~ CHF + Age + CABG 
   + relevel (as.factor (num.obst.vessels),one) 
   + Current.smoker + DM + HTN + ace.inhibitor 
   + MI, x=TRUE, y=TRUE)
 
 library(Design)  
 resid(f, 'gof')
 
 Output:
 
 hosmerlem(y=Collaterals, yhat=fitted(cd.full_model))
 $chisq
 [1] 6.275889
 
 $p.value
 [1] 0.6163585
 
 resid(f, 'gof')
 Sum of squared errors Expected value|H0SD 
   118.5308396   118.3771115 0.1435944 
 Z P 
 1.0705717 0.2843620
 


-
Frank Harrell
Department of Biostatistics, Vanderbilt University
--
View this message in context: 
http://r.789695.n4.nabble.com/Hosmer-Lemeshow-goodness-of-fit-tp3508127p3508185.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] recommendation on B for validate.lrm () ?

2011-05-08 Thread Frank Harrell
Yes that's how it works, but a single run does not provide sufficient
precision unless your sample size is enormous.  When you partition into
tenths again the partitions will be different so yes there is some
randomness.  Averaging over 100 times averages out the randomness.  Or just
use the bootstrap with B=300 (depending on sample size).

Frank


viostorm wrote:
 
 Thanks so much for the reply it was exceptionally helpful!  A couple of
 questions:
 
 1. I was under the impression that k-fold with B=10 would train on 9/10,
 validate on 1/10, and repeat 10 times for each different 1/10th.  Is this
 how the procedure works in R?
 
 2. Is the reason you recommend repeating k-fold 100 times because the
 partitioning is random, ie not 1st 10th, 2nd 10, et cetera so you might
 obtain slightly different results?
 


-
Frank Harrell
Department of Biostatistics, Vanderbilt University
--
View this message in context: 
http://r.789695.n4.nabble.com/recommendation-on-B-for-validate-lrm-tp3486200p3508187.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] discriminant analysis

2011-05-08 Thread Sylvia Rocha
I am a student of ecology from Brazil and I need a tutorial on discriminant 
analysis, can someone help me?

sylvia

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] discriminant analysis

2011-05-08 Thread David Cross
There is a good tutorial here:

http://www.statsoft.com/textbook/discriminant-function-analysis/

I doubt your question is appropriate for this list ... good luck! 

David Cross
d.cr...@tcu.edu
www.davidcross.us




On May 8, 2011, at 7:28 PM, Sylvia Rocha wrote:

 I am a student of ecology from Brazil and I need a tutorial on discriminant 
 analysis, can someone help me?
 
 sylvia
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Another quantmod question

2011-05-08 Thread Joshua Ulrich
Russ,

On May 8, 2011 6:29 PM, Russ Abbott russ.abb...@gmail.com wrote:

 Hi Jeff,

 The xts class has some very nice features, and you have done a valuable
 service in developing it.

 My primary frustration is how difficult it seems to be to find out what went
 wrong when my code doesn't work.  I've been writing quite sophisticated code
 for a fairly long time. It's not that I'm new to software development.

 The column name rule is a good example.  I'm willing to live with the rule
 that column names are not changed for efficiency sake.  What's difficult for
 me is that I never saw that rule anywhere before.  Of course, I'm not an R
 expect. I've been using it for only a couple of months. But still, I would
 have expected to run into a rule like that.

 Worse, since the rule is in conflict with the explicit intent of cbind--one
 can name columns when using cbind; in fact the examples illustrate how to do
 it--it would really be nice of cbind would issue a warning when one attempts
 to rename a column in violation of that rule.  Instead, cbind is silent,
 giving no hint about what went wrong.

Naming columns is not the explicit intent of cbind.  The explicit
intent is to combine objects by columns.  Please don't overstate the
case.

While the examples for the generic show naming columns, neither
?cbind.zoo or ?cbind.xts have such examples.  That's a hint.

 It's those sorts of things that have caused me much frustration. And it's
 these sorts of things that seem pervasive in R.  One never knows what one is
 dealing with. Did something not work because there is a special case rule
 that I haven't heard of? Did it not work because a special convenience was
 programmed into a function in a way that conflicted with normal use?  Since
 these sorts of things seem to come up so often, I find myself feeling that
 there is no good way to track down problems, which leads to a sense of
 helplessness and confusion. That's not what one wants in a programming
 language.

If that's not what one wants, one can always write their own
programming language.

Seriously, it seems like you want to rant more than understand what's
going on.  You have the R and xts help pages and the source code.  The
Note section of help(cbind) tells you that the method dispatch is
different.  It even tells you what R source file to look at to see how
dispatching is done.  Compare the relevant source files from
base::cbind and xts::cbind.xts, look at the R Language Definition
manual to see how method dispatch is normally done.

But you've been writing quite sophisticated code for a fairly long
time, so I'm not telling you anything you don't know... you just don't
think you should have to do the legwork.

 -- Russ



--
Joshua Ulrich  |  FOSS Trading: www.fosstrading.com



 On Sun, May 8, 2011 at 2:42 PM, Jeff Ryan jeff.a.r...@gmail.com wrote:

  Hi Russ,
 
  Colnames don't get rewritten if they already exist. The reason is due to
  performance and how cbind is written at the R level.
 
  It isn't perfect per se, but the complexity and variety of dispatch that
  can take place for cbind in R, as it isn't a generic, is quite challenging
  to get to behave as one may hope.  After years of trying I'd say it is
  nearly impossible to do what you want without causing horrible memory issues
  on non trivial objects they are use in production systems **using** xts on
  objects with billions of rows.  Your simple case that has a simple
  workaround would cost everyone using in the other 99.999% of cases to pay a
  recurring cost that isn't tolerable.
 
  If this is frustrating to you you should stop using the class.
 
  Jeff
 
  Jeffrey Ryan    |    Founder    |     jeffrey.r...@lemnica.com
  jeffrey.r...@lemnica.com
 
  www.lemnica.com
 
  On May 8, 2011, at 2:07 PM, Russ Abbott russ.abb...@gmail.com wrote:
 
  I'm having troubles with the names of columns.
 
  quantmod deal with stock quotes.  I've created an array of the first 5
  closing prices from Jan 2007. (Is there a problem that the name is the same
  as the variable name? There shouldn't be.)
 
   close
 
               close
 
  2007-01-03 1416.60
 
  2007-01-04 1418.34
 
  2007-01-05 1409.71
 
  2007-01-08 1412.84
 
  2007-01-09 1412.11
 
 
  When I try to create a more complex array by adding columns, the names get
  fouled up.  Here's a simple example.
 
   cbind(changed.close = close+1, zero = 0, close)
 
               close zero close.1
 
  2007-01-03 1417.60    0 1416.60
 
  2007-01-04 1419.34    0 1418.34
 
  2007-01-05 1410.71    0 1409.71
 
  2007-01-08 1413.84    0 1412.84
 
  2007-01-09 1413.11    0 1412.11
 
 
  The first column should be called changed.close, but it's called close.
  The second column has the right name. The third column should be called
  close but it's called close.1. Why is that? Am I missing something?
 
  If I change the order of the columns and let close have its original name,
  there is still a problem.
 
   cbind(close, zero = 0, changed.close = 

[R] reading a .mdf file in R

2011-05-08 Thread Mauricio Romero
Hi,

 

I have a very large .mdf database (36 GB) that I want to import to R. 

I'm using a computer with 64 GB of RAM, running on windows server 2008 R2
with SQL.

 

Could anyone guide me through the process. I have absolutely no idea what to
do.

 

Thanks,

 

 

Mauricio Romero 

 


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Another quantmod question

2011-05-08 Thread Joshua Wiley
On Sun, May 8, 2011 at 1:17 PM, Russ Abbott russ.abb...@gmail.com wrote:
 I understand Josh's example:

 mat - matrix(1:10, dimnames = list(NULL, A))
 cbind(X = 11:20, Y = mat + 1)
 cbind(X = 11:20, Y = mat[, A] + 1)

 In the line, cbind(X = 11:20, Y = mat + 1), it would be nice if an error or
 warning message were issued to the effect that the Y =  part is ignored or
 not applicable or something to that effect.
 I still don't understand why cbind.xts doesn't do what I expect.  If I  try
 what Josh suggests on my xts example, the result is still confusing.

 cbind.xts(close, A = close[,close])

  close close.1

 2007-01-03 1416.60 1416.60

 2007-01-04 1418.34 1418.34

 2007-01-05 1409.71 1409.71

 2007-01-08 1412.84 1412.84

 2007-01-09 1412.11 1412.11


 If this helps:

 dput(close)

 structure(c(1416.6, 1418.34, 1409.71, 1412.84, 1412.11), .indexCLASS =
 Date, .indexTZ = structure(, .Names = TZ), src = yahoo, updated =
 structure(1304875186.625, class = c(POSIXct,

 POSIXt)), class = c(xts, zoo), index = structure(c(1167811200,

 1167897600, 1167984000, 1168243200, 1168329600), tzone = structure(,
 .Names = TZ), tclass = Date), .Dim = c(5L,

 1L), .Dimnames = list(NULL, close))

It does.  Thank you.  It sounds like you want to keep the xts class,
so this may not be useful to you, but if you are willing to drop it at
the point you are cbind()ing things together:

cbind(changed.close = as.matrix(close + 1)[, close], zero = 0, close)

should do the trick.

Cheers,

Josh


 I was also unable to convert Josh's matrix example to xts.

 as.xts(mat, order.by = rownames())

 Error in dimnames(x) : 'x' is missing

 The order.by parameter is required, but I don't understand the error message
 and haven't been able to figure out how to fix it. If this helps:

 dput(mat)
 structure(1:10, .Dim = c(10L, 1L), .Dimnames = list(NULL, A))

 This suggests that mat already has defined dimnames. So why does as.xts give
 an error message about dimnames?  I was unable to figure out the problem by
 looking at any of the help files. In particular the help file
 for as.xts.methods didn't seem to offer any clarifications--at least none
 that I could understand.

 -- Russ


 On Sun, May 8, 2011 at 12:26 PM, Joshua Wiley jwiley.ps...@gmail.com
 wrote:

 class(mat)
 class(mat[, A])




-- 
Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles
http://www.joshuawiley.com/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Another quantmod question

2011-05-08 Thread Jeff Ryan
Hi Russ,

We're of course getting into some incredibly fine-level detail on how all of
this works.  I'll try and explain issues as I recall them over the
development of xts and cbind.xts

xts started as an extension of zoo.  zoo is an extension of 'ts' (greatly
simplified comparison of course, but stay with me)

Achim and Gabor have put tremendous effort into the design of zoo - with a
primary focus on keeping it consistent with base R behavior.  That is, try
not to introduce unnecessary changes to the interface an R user is
accustomed to.  The logic being that this makes for a more consistent
interface as well as a easier learning curve and hence greater/faster
adoption rate.

'xts' extends this, though with a bit more flexibility in terms of
consistency.  Why? Simply put - some things about R annoyed me coming from a
time-series background.  Number one was the fact that lag() is backwards.
 Backwards from expectation, nearly all literature, and all standard
definitions.  So xts breaks with lag(, n=1) behavior.  This is obviously
confusing to some - but was the gamble I was willing to take - consistency
(with R) be damned! ;-)

So, now back to cbind.  cbind and merge in zoo-land (and xts by extension)
are the same. This isn't the case for other classes that use these - but
that is 'allowable' and 'expected' under a class dispatch system.  The docs
for ?cbind state:

For ‘cbind’ (‘rbind’) the column (row) names are taken from the
 ‘colnames’ (‘rownames’) of the arguments if these are matrix-like.
 Otherwise from the names of the arguments or where those are not
 supplied and ‘deparse.level  0’, by deparsing the expressions
 given, for ‘deparse.level = 1’ only if that gives a sensible name
 (a ‘symbol’, see ‘is.symbol’).

Based on that, I'd argue that xts does it right. Of course I'll also point
out that this is incorrect thinking as well - since this is a description
for the generic - and not for xts.  But again in a highly configurable
object/class system, where you start to make a distinction of right and
wrong is itself up for debate.

At the other end of the argument spectrum is _why not_.  That is, why can't
cbind.xts handle the names to replace the colnames of objects passed in.
 Here is where I'll point out that I am really just going by memory.

Three major items are involved in cbind.  One is that dispatch is quite
unlike nearly every other dispatch in R.  This is a fact - nothing to do
with xts.

*  cbind isn't a generic (it's an .Internal call)
*  it uses ...
*  cbind can be called in numerous ways (I'll list only the common ones -
but with R you can do even crazier things)

   do.call(cbind,
   do.call(cbind.xts,
   cbind,
   cbind.xts,
   merge,
   merge.xts,
   do.call(merge,
   do.call(merge.xts

The rules of dispatch on cbind are really at a level that R-help has no
business discussing.  The second part is where things actually get tricky
though.  They all behave differently with respect to how args are handled -
when eval'd, etc.

I'm sure you have read how R strains itself on 'big data'.  This is true and
false.  Improper use (or just naive use) can cause object copies in places
you really don't want.  Much of xts at this point is implemented in custom C
code. The gain here is that you can make it eas(ier) to avoid copies until
you need them by writing in C.  Obvious, but needs to be said.

To figure out what the columns have - and if names are attached to the
objects in the pairlist (the ... in this context) - you have to be very
careful.  Touch anything in the wrong place or wrong time and you lose a
figurative arm and leg to memory copies.  So, in 99.% of cases - where
you aren't naming (which would be an extra feature above and beyond c(olumn)
binding [the reason for cbind] - you run a very real risk of getting nailed
for copies you don't want.  On 10MM obs that is almost manageable. On 100's
of millions or billions - it is kill -9 time.

To compound the issue - recall all of those different dispatch methods.  Yep
- they all behave just a bit differently.  How?  Honestly - I don't know or
care.  I simply know you can't easily make the behavior consistent amongst
those calls.  I have tried. And tried.

End of day, and a very long R-help email, xts is different than base R.  It
is even different than it's 'parent' zoo behavior.  But in exchange for this
difference (and bit of learning/adjustment) you get a class that is faster
than anything else.

Period.

 x - .xts(1:1e7, 1:1e7)  # our time series object
 m - coredata(x)  # a matrix

 str(x)
An ‘xts’ object from 1969-12-31 18:00:01 to 1970-04-26 12:46:40 containing:
  Data: int [1:1000, 1] 1 2 3 4 5 6 7 8 9 10 ...
  Indexed by objects of class: [POSIXt,POSIXct] TZ: America/Chicago
  xts Attributes:
 NULL

 str(m)
 int [1:1000, 1] 1 2 3 4 5 6 7 8 9 10 ...

 system.time(x[,1])  # get the first column
   user  system elapsed
  0.017   0.000   0.017
 system.time(m[,1])  # ditto
   user  system elapsed
  

Re: [R] maximum likelihood convergence reproducing Anderson Blundell 1982 Econometrica R vs Stata

2011-05-08 Thread Alex Olssen
Thank you all for your input.

Unfortunately my problem is not yet resolved.  Before I respond to
individual comments I make a clarification:

In Stata, using the same likelihood function as above, I can reproduce
EXACTLY (to 3 decimal places or more, which is exactly considering I
am using different software) the results from model 8 of the paper.

I take this as an indication that I am using the same likelihood
function as the authors, and that it does indeed work.
The reason I am trying to estimate the model in R is because while
Stata reproduces model 8 perfectly it has convergence
difficulties for some of the other models.

Peter Dalgaard,

Better starting values would help. In this case, almost too good
values are available:

start - c(coef(lm(y1~x1+x2+x3)), coef(lm(y2~x1+x2+x3)))

which appears to be the _exact_ solution.

Thanks for the suggestion.  Using these starting values produces the
exact estimate that Dave Fournier emailed me.
If these are the exact solution then why did the author publish
different answers which are completely reproducible in
Stata and Tsp?

Ravi,

Thanks for introducing optimx to me, I am new to R.  I completely
agree that you can get higher log-likelihood values
than what those obtained with optim and the starting values suggested
by Peter.  In fact, in Stata, when I reproduce
the results of model 8 to more than 3 dp I get a log-likelihood of 54.039139.

Furthermore if I estimate model 8 without symmetry imposed on the
system I reproduce the Likelihood Ratio reported
in the paper to 3 decimal places as well, suggesting that the
log-likelihoods I am reporting differ from those in the paper
only due to a constant.

Thanks for your comments,

I am still highly interested in knowing why the results of the
optimisation in R are so different to those in Stata?

I might try making my convergence requirements more stringent.

Kind regards,

Alex

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Weighted box or violin plots in Lattice?

2011-05-08 Thread Deepayan Sarkar
On Mon, May 9, 2011 at 2:20 AM, Thomas Lumley tlum...@uw.edu wrote:
 On Mon, May 9, 2011 at 6:35 AM, Deepayan Sarkar
 deepayan.sar...@gmail.com wrote:
 On Sat, May 7, 2011 at 1:55 AM, Raphael Mazor rapha...@sccwrp.org wrote:
 Is it possible to create weighted boxplots or violin plots in lattice?

 It seems that you can specify weights for panel.histogram() and
 panel.densityplot(), but not for panel.bwplot or panel.violin().

 Not for panel.histogram() either.

 It's not immediately obvious how you would get weighted boxplots.

 The way the survey world does it is to get weighted quantiles and work
 from them, as in survey:::svyboxplot().  The only tricky decision is
 what to do with outliers, since I haven't been able to think of any
 good way of indicating weights on them.

So I guess with bwplot() one would have to write a version of
boxplot.stats() that uses weighted quantiles, and then write a
modified panel function.

A crude approximation (if the weights don't vary too much) may be to
scale and round the weights to integers, and then repeat each data
point; e.g., newx - rep(x, w), etc.

-Deepayan

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Another quantmod question

2011-05-08 Thread Russ Abbott
Jeff,

Clearly you (and others) have put a lot of work into xts -- and I'm the
beneficiary. So I'll stop complaining.

Thanks for the class (both code and explanation).

*-- Russ *



On Sun, May 8, 2011 at 8:23 PM, Jeff Ryan jeff.a.r...@gmail.com wrote:

 Hi Russ,

 We're of course getting into some incredibly fine-level detail on how all
 of this works.  I'll try and explain issues as I recall them over the
 development of xts and cbind.xts

 xts started as an extension of zoo.  zoo is an extension of 'ts' (greatly
 simplified comparison of course, but stay with me)

 Achim and Gabor have put tremendous effort into the design of zoo - with a
 primary focus on keeping it consistent with base R behavior.  That is, try
 not to introduce unnecessary changes to the interface an R user is
 accustomed to.  The logic being that this makes for a more consistent
 interface as well as a easier learning curve and hence greater/faster
 adoption rate.

 'xts' extends this, though with a bit more flexibility in terms of
 consistency.  Why? Simply put - some things about R annoyed me coming from a
 time-series background.  Number one was the fact that lag() is backwards.
  Backwards from expectation, nearly all literature, and all standard
 definitions.  So xts breaks with lag(, n=1) behavior.  This is obviously
 confusing to some - but was the gamble I was willing to take - consistency
 (with R) be damned! ;-)

 So, now back to cbind.  cbind and merge in zoo-land (and xts by extension)
 are the same. This isn't the case for other classes that use these - but
 that is 'allowable' and 'expected' under a class dispatch system.  The docs
 for ?cbind state:

 For ‘cbind’ (‘rbind’) the column (row) names are taken from the
  ‘colnames’ (‘rownames’) of the arguments if these are matrix-like.
  Otherwise from the names of the arguments or where those are not
  supplied and ‘deparse.level  0’, by deparsing the expressions
  given, for ‘deparse.level = 1’ only if that gives a sensible name
  (a ‘symbol’, see ‘is.symbol’).

 Based on that, I'd argue that xts does it right. Of course I'll also
 point out that this is incorrect thinking as well - since this is a
 description for the generic - and not for xts.  But again in a highly
 configurable object/class system, where you start to make a distinction of
 right and wrong is itself up for debate.

 At the other end of the argument spectrum is _why not_.  That is, why can't
 cbind.xts handle the names to replace the colnames of objects passed in.
  Here is where I'll point out that I am really just going by memory.

 Three major items are involved in cbind.  One is that dispatch is quite
 unlike nearly every other dispatch in R.  This is a fact - nothing to do
 with xts.

 *  cbind isn't a generic (it's an .Internal call)
 *  it uses ...
 *  cbind can be called in numerous ways (I'll list only the common ones -
 but with R you can do even crazier things)

do.call(cbind,
do.call(cbind.xts,
cbind,
cbind.xts,
merge,
merge.xts,
do.call(merge,
do.call(merge.xts

 The rules of dispatch on cbind are really at a level that R-help has no
 business discussing.  The second part is where things actually get tricky
 though.  They all behave differently with respect to how args are handled -
 when eval'd, etc.

 I'm sure you have read how R strains itself on 'big data'.  This is true
 and false.  Improper use (or just naive use) can cause object copies in
 places you really don't want.  Much of xts at this point is implemented in
 custom C code. The gain here is that you can make it eas(ier) to avoid
 copies until you need them by writing in C.  Obvious, but needs to be said.

 To figure out what the columns have - and if names are attached to the
 objects in the pairlist (the ... in this context) - you have to be very
 careful.  Touch anything in the wrong place or wrong time and you lose a
 figurative arm and leg to memory copies.  So, in 99.% of cases - where
 you aren't naming (which would be an extra feature above and beyond c(olumn)
 binding [the reason for cbind] - you run a very real risk of getting nailed
 for copies you don't want.  On 10MM obs that is almost manageable. On 100's
 of millions or billions - it is kill -9 time.

 To compound the issue - recall all of those different dispatch methods.
  Yep - they all behave just a bit differently.  How?  Honestly - I don't
 know or care.  I simply know you can't easily make the behavior consistent
 amongst those calls.  I have tried. And tried.

 End of day, and a very long R-help email, xts is different than base R.  It
 is even different than it's 'parent' zoo behavior.  But in exchange for this
 difference (and bit of learning/adjustment) you get a class that is faster
 than anything else.

 Period.

  x - .xts(1:1e7, 1:1e7)  # our time series object
  m - coredata(x)  # a matrix

  str(x)
 An ‘xts’ object from 1969-12-31 18:00:01 to 1970-04-26 12:46:40 containing:
   Data: int 

Re: [R] Hosmer-Lemeshow 'goodness of fit'

2011-05-08 Thread viostorm
Again, thanks so much for your help.

Is there a for dummies version for interpreting the le Cessie and
Houwelingen test.  I read the 1991 biometrics paper but honestly got lost in
the math.

Is it interpreted the same way the Hosmer-Lemeshow test is?  ie, a
non-significant result means model fits well.   

I guess what does, what would my p-value of p=0.284362 tell you about my
model?  

I would like to describe the goodness of fit of my model in a paper but I'm
worried the average medical reader would not know how to interpret the
result of this test whereas there are lots of references on interpreting
Hosmer-Lemeshow.

(my cross validated c-statistic was 0.69 and r^2 was 0.15)

Thanks again for your all help! (also with my k-fold crossvalidation
question!)

-Rob

 
Robert Schutt, MD, MCS 
Resident - Department of Internal Medicine 
University of Virginia, Charlottesville, Virginia


--
View this message in context: 
http://r.789695.n4.nabble.com/Hosmer-Lemeshow-goodness-of-fit-tp3508127p3508219.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] new to loops

2011-05-08 Thread sunny
Maybe this is what you're looking for:

# x is your set of explanatory variables (10 of them):
x - array(rnorm(1), dim=c(1000,10))

# y is your dependent variable:
y - rbinom(1000, 1, 0.3)

# run the regression on each column of x:
reg - apply(x, 2, function(z) glm(y ~ z, family=binomial(link='logit')))

# the previous output is a list. Now collect whatever you need from each
list element:
bond - lapply(reg, function(z) c(z$coeff[2], exp(z$coeff[2]),
confint(z)[2,], exp(confint(z)[2,])))

# collect everything together as a matrix:
bond - do.call(rbind, bond)
bond

-S.


SevannaD wrote:
 
 So in my first try before I got your message, this is what I did:
 
 orconf-list()
  ccoef-list()
  or-list()
  coef-list()
  out-list()
  for (i in 1:49){
 out[[i]]-glm(y~var[[i]],family=binomial(link=logit))
 coef[[i]]-out[[i]]$coef[2]
  or[[i]]-exp(out[[i]]$coef[2])
  bond-matrix(out[[i]]$coef[2],
 exp(out[[i]]$coef[2]),confint(out[[i]]$coef[2]),exp(confint(out[[i]]$coef[2])))
 }
 But it did not work due to confint(out[[i]]$coef[2] and the exp one. Said
 Error in object$coefficients : $ operator is invalid for atomic vectors.
 
 would I would to identify conf_low and conf_high as two separate things?
 
 Thanks
 
 On Sun, May 8, 2011 at 4:31 PM, Scott Chamberlain-3 [via R] 
 ml-node+3508106-1763482049-235...@n4.nabble.com wrote:
 
 Not knowing what format your data is in or what model you are using...

 df # is your data frame with columns the variables you are running
 regressions for
 datout - data.frame(coeff = NA, conf_low = NA, conf_high = NA, odd = NA)
 #
 a table to put your results in
 for(i in 1:length(names(df)[2:10])) {
 fit - glm(data[,1] ~ data[,i], data = df, etc...)
 datout[i,] - fit[e.g, 1:4] # determine what values in your model output
 are what you need
 }
 datout # a table with all your output for each variable

 On Sunday, May 8, 2011 at 11:58 AM, SevannaD wrote:

 I have never made a loop on my own to do anything in R. But I am hoping

  someone can help me build one for the following issue:
 
  I need to make a univariate logistic regression for each of my
 variables
  (about 62 of them), then I need to gather up each of their coefficients
 (not
  the intercepts), each of their 95% confidence intervals, and each of
 thier
  odds ratios and place them in a matrix to showcase them for my thesis.
 
  currently, I am writing them all out one by one with the cbond method,
 which
  has taken me a better part of a day so far and I know there has to be
 able
  to be a way to make a loop that can do this whole process, I just
 havent
  been able to figure it out yet.
 
  Thanks in advance.
 
  --
  View this message in context:
 http://r.789695.n4.nabble.com/new-to-loops-tp3507366p3507366.htmllt;http://r.789695.n4.nabble.com/new-to-loops-tp3507366p3507366.html?by-user=tgt;
  Sent from the R help mailing list archive at Nabble.com.
 
  __
  [hidden
 email]lt;http://user/SendEmail.jtp?type=nodeamp;node=3508106amp;i=0amp;by-user=tgt;mailing
 list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.htmllt;http://www.r-project.org/posting-guide.htmlgt;
  and provide commented, minimal, self-contained, reproducible code.
 

 [[alternative HTML version deleted]]

 __
 [hidden
 email]lt;http://user/SendEmail.jtp?type=nodeamp;node=3508106amp;i=1amp;by-user=tgt;mailing
 list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.htmllt;http://www.r-project.org/posting-guide.htmlgt;
 and provide commented, minimal, self-contained, reproducible code.


 --
  If you reply to this email, your message will be added to the discussion
 below:

 http://r.789695.n4.nabble.com/help-with-a-vector-loop-problem-tp3507366p3508106.html
 To unsubscribe from help with a vector loop problem, click
 herelt;http://r.789695.n4.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codeamp;node=3507366amp;code=c2V2YW5uYWRhcmtzdGFyQGdtYWlsLmNvbXwzNTA3MzY2fC0xMzE0Mjc1OTM3gt;.


 


--
View this message in context: 
http://r.789695.n4.nabble.com/help-with-a-vector-loop-problem-tp3507366p3508483.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.