Re: [R] "na.strings" and the like; suspending interpretation of "NA"

2009-08-04 Thread Peter Dalgaard

Jan Theodore Galkowski wrote:

Can someone point me to the proper place in the documentation or on the
Wiki where I can learn how to get R to stop interpreting the string "NA"
as something special?  I have a table in a database which contains
(among other things) country codes and continent codes.  The standard
set of two-letter codes includes "NA" to denote "North America". I
learned of the "na.strings" parameter for RODBC's "sqlQuery", being able
to shut down this interpretation when data is read in.

However, in the program which uses this data, I (must) have some other
instance where the "NA" gets spontaneously"interpreted as "not
available", shows up in vectors and lists as "", and breaks
function. I temporarily solved the problem by defining all instances of
"NA" in the database as "NAC".  It still would be good to know a
generaly solution.  I've seen something mentioned in conjunction with
"options", but I'm not sure what that is about.


The general paradigm is that this shouldn't happen... Back in the old 
days, R had no such thing as character NA, and users had to sort out the 
North America, noradrenaline, Neil Armstrong, etc., issues for 
themselves. Nowadays we do have calculus that preserves "NA" as distinct 
from ; so if one is converted to the other, it could signify a bug.


It could also be due to particularly silly code on your behalf, but in 
either case we need to see the effect narrowed down to a reproducible 
stretch of code.


--
   O__   Peter Dalgaard Øster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark  Ph:  (+45) 35327918
~~ - (p.dalga...@biostat.ku.dk)  FAX: (+45) 35327907

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] two-factor linear models with missing cells

2009-08-04 Thread Peter Dalgaard

Murray Jorgensen wrote:

Hi Peter,

there is no problem if the missing cell is not in the first row or 
column: the corresponding interaction parameter is omitted. In my case 
the data in the (1,4) cell is missing. What results is clear to me now: 
the (3,4) interaction parameter is dropped so that "(Intercept) + Biv" 
now refers to the mean of the (3,4) cell rather than the that of the 
(1,4) cell making the (3,4) cell a sort of 'honorary' member of the 
first row. This could have been done to the (2,4) cell but I guess the 
rule is to drop the cell with the highest sum of row and column number.


Ouch. I missed the point completely there...

The generic rule is that singularities (those not expected from the 
model formula) are detected by goin through the model matrix 
left-to-right. In this case, it happens when you get to the (3,4) 
interaction term because the sum of the (2,4) and (3,4) design columns 
is equal to the main effect of (*,4) except in the positions where you 
have no data. So the net effect is that that term is set to zero.


However, it's not true that "(Intercept) + Biv" refers to the (3,4) cell 
mean. You are missing the Athree term (3.755+3.330-1.635 = 5.450). The 
prediction for (1,4) is (3.755 + 0 - 1.635 = 2.120) is not equal to any 
cell mean. Rather, it comes about by completing the 2x2 table consisting 
of the 1st and 3rd row and the 1st and 4th column, assuming no interaction:


  1 4
1 3.755 ?
3 7.085 5.450

3.755 + (5.450 - 7.085) = 2.120




Murray Jorgensen

Peter Dalgaard wrote:

Murray Jorgensen wrote:

I am wondering how to interpret the parameter estimates that lm()
reports in this sort of situation:

y = round(rnorm(n=24,mean=5,sd=2),2)
A = gl(3,2,24,labels=c("one","two","three"))
B = gl(4,6,24,labels=c("i","ii","iii","iv"))
# Make both observations for A=1, B=4 missing
y[19] = NA
y[20] = NA
data.frame(y,A,B)
nonadd = lm(y ~ A * B)



summary(nonadd)


Call:
lm(formula = y ~ A * B)

Residuals:
Min 1Q Median 3Q Max
-3.555e+00 -7.675e-01 -6.939e-17 7.675e-01 3.555e+00

Coefficients: (1 not defined because of singularities)
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.755 1.667 2.252 0.0457 *
Atwo 1.655 2.358 0.702 0.4974
Athree 3.330 2.358 1.412 0.1856
Bii 1.435 2.358 0.609 0.5552
Biii 2.055 2.358 0.871 0.4021
Biv -1.635 2.358 -0.693 0.5025
Atwo:Bii -1.145 3.335 -0.343 0.7378
Athree:Bii -4.535 3.335 -1.360 0.2011
Atwo:Biii -3.230 3.335 -0.969 0.3536
Athree:Biii -2.105 3.335 -0.631 0.5408
Atwo:Biv 1.655 3.335 0.496 0.6295
Athree:Biv NA NA NA NA
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1   1

Residual standard error: 2.358 on 11 degrees of freedom
(2 observations deleted due to missingness)
Multiple R-squared: 0.2797, Adjusted R-squared: -0.3752
F-statistic: 0.4271 on 10 and 11 DF, p-value: 0.9044


fitted(nonadd)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 21
3.755 3.755 5.410 5.410 7.085 7.085 5.190 5.190 5.700 5.700 3.985 3.985
5.810 5.810 4.235 4.235 7.035 7.035 5.430
22 23 24
5.430 5.450 5.450

t(model.matrix(nonadd)%*%coef(nonadd))

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 21 22 23 24
[1,] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA

I guess that the parameter estimates reported are linear combinations of
the cell means, but which linear combinations and how does lm() decide
what parameters to report?

Cheers, Murray



What's the problem? The parameters are defined as usual for the 
two-way layout:


The intercept is the fitted value in the top left corner
The A coefficients are the fitted values in the first column minus the 
intercept.

The B coefficients vice versa.
The interaction coefficients are the fitted values minus the sum of 
the the intercept and the corresponding A and B coefficients.


One interaction coefficient is set missing because you have no data, 
but except for that, the fitted values equal the cell means.







--
   O__   Peter Dalgaard Øster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark  Ph:  (+45) 35327918
~~ - (p.dalga...@biostat.ku.dk)  FAX: (+45) 35327907

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Rank of matrix

2009-08-04 Thread Alex Roy
Dear all,
 Rank of a matrix depends on which factors? Only on rows or
coumns?  or both ? If there is a collinearlity in the variables ( columns )
does it effects the rank?



> X<-matrix((rnorm(1)),50)
> dim(X)
[1]  50 200
> qr(X)$rank
[1] 50
> X[,2]<-X[,30]
> qr(X)$rank
[1] 50
> X[10,]<-X[7,]
> qr(X)$rank
[1] 49

Thanks

Alex

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] lme funcion in R

2009-08-04 Thread ONKELINX, Thierry
Dear Harry,

Your model seems rather complex. Do you have enough data to support it?
Did you check for multicollinearity between the variables?

HTH,

Thierry
 




ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek / Research Institute for Nature
and Forest
Cel biometrie, methodologie en kwaliteitszorg / Section biometrics,
methodology and quality assurance
Gaverstraat 4
9500 Geraardsbergen
Belgium
tel. + 32 54/436 185
thierry.onkel...@inbo.be
www.inbo.be

To call in the statistician after the experiment is done may be no more
than asking him to perform a post-mortem examination: he may be able to
say what the experiment died of.
~ Sir Ronald Aylmer Fisher

The plural of anecdote is not data.
~ Roger Brinner

The combination of some data and an aching desire for an answer does not
ensure that a reasonable answer can be extracted from a given body of
data.
~ John Tukey

-Oorspronkelijk bericht-
Van: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
Namens Hongwei Dong
Verzonden: maandag 3 augustus 2009 19:45
Aan: r-help@r-project.org
Onderwerp: Re: [R] lme funcion in R

Thanks for the replies above. Here are my script and data structure:
library(nlme)
tlevel<-lme(fixed = LN_unitlandval ~
MH_D+APT_D+ResOth_D+NonRes_D+Vacant_D+access_emp1+pct_vacant+transit_D+p
ark_dum,data=lusdrdata,random
= ~MH_D+APT_D+ResOth_D+NonRes_D+Vacant_D | TAZ)

str:

$ TAZ : int 100 100 100 100 100 100 100 100 100 100 ...
$ MH_D : num 0 0 0 0 0 0 0 0 0 0 ...
$ APT_D : num 0 0 0 0 0 0 0 0 0 0 ... $ ResOth_D : num 0 0 0 0 0 0 0 0 0
0 ... $ NonRes_D : num 0 0 0 0 0 0 0 0 0 1 ...
$ Vacant_D : num 1 1 1 0 0 1 1 1 1 0 ...
$ access_emp1 : num 45.8 45.8 45.8 45.8 45.8 ...
$ pct_vacant : num 0.8 0.8 0.8 0.8 0.8 0.8 0.8 0.8 0.8 0.8 ... $
transit_D :
num 0 0 0 0 0 0 0 0 0 0 ... $ park_dum : num 0 0 0 0 0 0 0 0 0 0 ...


Thanks.

Harry



On Mon, Aug 3, 2009 at 10:36 AM, Jason Morgan 
wrote:

> On 2009.08.03 10:15:46, Hongwei Dong wrote:
> > Hi, R users,
> >   I'm using the "lme" function in R to estimate a 2 level mixed 
> > effects model, in which the size of the subject groups are 
> > different. It turned
> out
> > that It takes forever for R to converge. I also tried the same thing

> > in
> SPSS
> > and SPSS can give the results out within 20 minutes. Anyone can give

> > me
> some
> > advice on the lme function in R, especially why R does not converge?
> Thanks.
> >
> > Harry
>
> Hello Harry,
>
> As Chuck mentions, providing some more information on the model and 
> the data you are using would be helpful. Also, be sure to compare the 
> optimization methods used in SPSS to that used in R. You can change 
> the optimization method in R if the default seems to be causing 
> issues. See help(lmeControl) for numerous setting options.
>
> ~Jason
>
> --
> Jason W. Morgan
> Graduate Student
> Department of Political Science
> *The Ohio State University*
> 154 North Oval Mall
> Columbus, Ohio 43210
>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Dit bericht en eventuele bijlagen geven enkel de visie van de schrijver weer 
en binden het INBO onder geen enkel beding, zolang dit bericht niet bevestigd is
door een geldig ondertekend document. The views expressed in  this message 
and any annex are purely those of the writer and may not be regarded as stating 
an official position of INBO, as long as the message is not confirmed by a duly 
signed document.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to catch an error using try

2009-08-04 Thread ONKELINX, Thierry
 
Put your function in try() and the test it's class

gene.seq <- try(getSequence (id=gene.map[,"ensembl_transcript_id"],
type="ensembl_transcript_id", seqType="3utr", mart=hmart))
if(class(gene.seq) == "try-error"){
#code to run when an error occurs
} else {
#code to run when no error occurs
}

HTH,

Thierry


ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek / Research Institute for Nature
and Forest
Cel biometrie, methodologie en kwaliteitszorg / Section biometrics,
methodology and quality assurance
Gaverstraat 4
9500 Geraardsbergen
Belgium
tel. + 32 54/436 185
thierry.onkel...@inbo.be
www.inbo.be

To call in the statistician after the experiment is done may be no more
than asking him to perform a post-mortem examination: he may be able to
say what the experiment died of.
~ Sir Ronald Aylmer Fisher

The plural of anecdote is not data.
~ Roger Brinner

The combination of some data and an aching desire for an answer does not
ensure that a reasonable answer can be extracted from a given body of
data.
~ John Tukey

-Oorspronkelijk bericht-
Van: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
Namens mau...@alice.it
Verzonden: maandag 3 augustus 2009 17:54
Aan: r-h...@stat.math.ethz.ch
Onderwerp: [R] How to catch an error using try

Sometimes the following function call causes a database exception:

>  gene.seq <- getSequence (id=gene.map[,"ensembl_transcript_id"], 
> type="ensembl_transcript_id",
+  seqType="3utr", mart=hmart)

I understand the above function must be called by try to capture the
eventual error.
WHat is not clear to me is how to realize that an error has occurred.
The on-line documentation mentions an invisible object of class
"try-error".
How shall I test whetehr such object has been created or not ?
I assume it is created whenever an error does occur ?

Thank you for your help.
Maura 


tutti i telefonini TIM!


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Dit bericht en eventuele bijlagen geven enkel de visie van de schrijver weer 
en binden het INBO onder geen enkel beding, zolang dit bericht niet bevestigd is
door een geldig ondertekend document. The views expressed in  this message 
and any annex are purely those of the writer and may not be regarded as stating 
an official position of INBO, as long as the message is not confirmed by a duly 
signed document.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] two-factor linear models with missing cells

2009-08-04 Thread Murray Jorgensen

Hi Peter,

there is no problem if the missing cell is not in the first row or 
column: the corresponding interaction parameter is omitted. In my case 
the data in the (1,4) cell is missing. What results is clear to me now: 
the (3,4) interaction parameter is dropped so that "(Intercept) + Biv" 
now refers to the mean of the (3,4) cell rather than the that of the 
(1,4) cell making the (3,4) cell a sort of 'honorary' member of the 
first row. This could have been done to the (2,4) cell but I guess the 
rule is to drop the cell with the highest sum of row and column number.


Murray Jorgensen

Peter Dalgaard wrote:

Murray Jorgensen wrote:

I am wondering how to interpret the parameter estimates that lm()
reports in this sort of situation:

y = round(rnorm(n=24,mean=5,sd=2),2)
A = gl(3,2,24,labels=c("one","two","three"))
B = gl(4,6,24,labels=c("i","ii","iii","iv"))
# Make both observations for A=1, B=4 missing
y[19] = NA
y[20] = NA
data.frame(y,A,B)
nonadd = lm(y ~ A * B)



summary(nonadd)


Call:
lm(formula = y ~ A * B)

Residuals:
Min 1Q Median 3Q Max
-3.555e+00 -7.675e-01 -6.939e-17 7.675e-01 3.555e+00

Coefficients: (1 not defined because of singularities)
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.755 1.667 2.252 0.0457 *
Atwo 1.655 2.358 0.702 0.4974
Athree 3.330 2.358 1.412 0.1856
Bii 1.435 2.358 0.609 0.5552
Biii 2.055 2.358 0.871 0.4021
Biv -1.635 2.358 -0.693 0.5025
Atwo:Bii -1.145 3.335 -0.343 0.7378
Athree:Bii -4.535 3.335 -1.360 0.2011
Atwo:Biii -3.230 3.335 -0.969 0.3536
Athree:Biii -2.105 3.335 -0.631 0.5408
Atwo:Biv 1.655 3.335 0.496 0.6295
Athree:Biv NA NA NA NA
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1   1

Residual standard error: 2.358 on 11 degrees of freedom
(2 observations deleted due to missingness)
Multiple R-squared: 0.2797, Adjusted R-squared: -0.3752
F-statistic: 0.4271 on 10 and 11 DF, p-value: 0.9044


fitted(nonadd)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 21
3.755 3.755 5.410 5.410 7.085 7.085 5.190 5.190 5.700 5.700 3.985 3.985
5.810 5.810 4.235 4.235 7.035 7.035 5.430
22 23 24
5.430 5.450 5.450

t(model.matrix(nonadd)%*%coef(nonadd))

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 21 22 23 24
[1,] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA

I guess that the parameter estimates reported are linear combinations of
the cell means, but which linear combinations and how does lm() decide
what parameters to report?

Cheers, Murray



What's the problem? The parameters are defined as usual for the 
two-way layout:


The intercept is the fitted value in the top left corner
The A coefficients are the fitted values in the first column minus the 
intercept.

The B coefficients vice versa.
The interaction coefficients are the fitted values minus the sum of 
the the intercept and the corresponding A and B coefficients.


One interaction coefficient is set missing because you have no data, 
but except for that, the fitted values equal the cell means.




--
Dr Murray Jorgensen  http://www.stats.waikato.ac.nz/Staff/maj.html
Department of Statistics, University of Waikato, Hamilton, New Zealand
Email: m...@waikato.ac.nzFax 7 838 4155
Phone  +64 7 838 4773 wkHome +64 7 825 0441   Mobile 021 0200 8350

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Merge partially duplicated rows

2009-08-04 Thread Rnewbie

Thanks very much :handshake:


David Winsemius wrote:
> 
> 
> You might want to look at the ave function. It will calculate a  
> function within IDs and you can assign that as another row in the  
> datafrme before you exclude the duplicates.
> 
> David Winsemius, MD
> Heritage Laboratories
> West Hartford, CT
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Merge-partially-duplicated-rows-tp24790752p24803781.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] package to convert coordiantes to timezone

2009-08-04 Thread Thomas Steiner
Thanks you, this helps.
Great,
Thomas


2009/8/3 Michael Denslow :
> Hi Thomas,
>
> On Sun, Aug 2, 2009 at 11:02 AM, Thomas Steiner wrote:
>> Is there a R-package with a function that returns me the timezone, if
>> I hand over longitude and latitude?
>> I know online services like
>> http://ws.geonames.org/timezone?lat=-38.01&lng=147 and
>> http://www.earthtools.org/webservices.htm#timezone and wodner if this
>> exists for R too.
>> Thanks for helping,
>> thomas
>>
>
> There is a geonames package.
>
>> library(geonames)
> Loading required package: rjson
>> GNtimezone(-38.01,147)
>              time countryName rawOffset dstOffset countryCode
> gmtOffset lng          timezoneId    lat
> 1 2009-08-03 10:23   Australia        10        10          AU
> 11 147 Australia/Melbourne -38.01
> Warning message:
> In readLines(u) :
>  incomplete final line found on
> 'http://ws.geonames.org/timezoneJSON?lat=-38.01&lng=147'
>
> Not sure why I get this error but I think this is what you are after,
>
> Michael
>
> _
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
>
> --
> Michael Denslow
>
> Graduate Student
> I.W. Carpenter Jr. Herbarium [BOON]
> Department of Biology
> Appalachian State University
> Boone, North Carolina U.S.A.
>
> -- AND --
>
> Communications Manager
> Southeast Regional Network of Expertise and Collections
> sernec.org
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Rank of matrix

2009-08-04 Thread Martin Maechler
> "AR" == Alex Roy 
> on Tue, 4 Aug 2009 09:56:42 +0200 writes:

AR> Dear all, Rank of a matrix depends on which factors?
AR> Only on rows or coumns?  or both ? If there is a
AR> collinearlity in the variables ( columns ) does it
AR> effects the rank?


This has nothing to do with R,
even though you provide R code here:.

Please learn about this, maybe from Wikipedia:

 http://en.wikipedia.org/wiki/Rank_%28linear_algebra%29

>> X<-matrix((rnorm(1)),50) dim(X)
AR> [1] 50 200
>> qr(X)$rank
AR> [1] 50
>> X[,2]<-X[,30] qr(X)$rank
AR> [1] 50
>> X[10,]<-X[7,] qr(X)$rank
AR> [1] 49

Note that the rank of a matrix is well defined in theoretical
linear algebra, but in practice is quite a bit more complicated.

For this reason, and to bring this back to a topic more fit to R-help :

The package 'Matrix' (typically part of R, since R 2.9.0),
has a function 
rankMatrix()

whose options {and implementation; just type 'rankMatrix' !}
show you a bit about the problematics.

Regards,
Martin Maechler, ETH Zurich

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] fitting a truncated power law

2009-08-04 Thread glen_b



John Sanders-2 wrote:
> 
> How can I fit a truncated power law to a vector? I can't find a function
> to do that. If the function provides an AIC, even better.
> 

Okay, "power law" I understand - f(x) = k.x^a, or on the log-scale log(f(x))
= log(k) + a log(x) (linear)

I was unfamiliar with the term "truncated power law", but after looking on
the internet I see that the term implies what appears to be replacing the
linear fit with a linear spline fit to log(y) in terms of log(x)  - but the
usual application seems to be to fit probability distribution to count data;
in this case you fit essentially a two-part Pareto distribution (or Zipf if
the variable is discrete) - again the log-fitted-density is like a linear
spline in the logs.

Is the vector of data you have counts to which you wish to fit a
distribution, or is it a set of measurements?

If I understand the problem correctly, I think it could probably be done
using linear splines with GLMs, which can be done in a couple of packages.

-- 
View this message in context: 
http://www.nabble.com/fitting-a-truncated-power-law-tp24798791p24804531.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] parameter asterisks

2009-08-04 Thread alexander russell
Hello,
Is there a clearcut answer as to why R prints 'NA' sometimes instead of
standard errors?


mle2(minuslogl = nlikfun4, start = list(a = 1, c = 1, d = 0.2,

b = 0.1, b1 = 0.1), method = "Nelder-Mead")

Coefficients:

Estimate Std. Error z value Pr(z)

a 3.83845751 0.47320236 8.1117 4.993e-16 ***

c 0.95545367 NA NA NA

d -0.22509015 NA NA NA

b 0.04260199 0.00743892 5.7269 1.023e-08 ***

b1 0.00212538 0.00031189 6.8145 9.459e-12 ***

---

regards,

s

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Strange error with ROCR

2009-08-04 Thread Noah Silverman
Hello,

I've come across a strange error...


Here is what happens:

model <- svm(traindata,trainlabels, type="C-classification", 
kernel="radial", cost=10,  class.weights=c("win"=3,"lose"=1), 
scale=FALSE, probability = TRUE)
predictions <- predict(model, traindata)
pred <- prediction(predictions, trainlabels)


This returns an error:
Error in prediction(predictions, trainlabels) :
   Format of predictions is invalid.

Yet my predictions is just a matrix of predicted labels.  Nothing 
fancy.  (In fact, my step follow the exact example on the ROCR homepage.)

A search through google for "Format of predictions is invalid" returns 
zero results.

Can anyone suggest how I might fix this problem?

Thank You,





[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] asc class object - how to get positions (coordinates) for a given raster ID?

2009-08-04 Thread Paulo Eduardo Cardoso
In a raster asc object, I'd like to take the positions (x and y coordinates)
for a given "pixel" ID. Any idea about how to do this?
___
Paulo E. Cardoso

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Strange error with ROCR

2009-08-04 Thread Christian Schulz

Hi,

you need  the score value , have a look at ?svm.predict and in the ROCR 
example.


traindata <- as.data.frame(matrix(runif(1000),ncol=10))
trainlabels <-  
as.factor(sample(c("win","lose"),nrow(data),replace=T,prob=c(0.5,0.5)))


model <- svm(traindata,trainlabels, type="C-classification", 
kernel="radial", cost=10,

class.weights=c("win"=3,"lose"=1), scale=FALSE, probability = TRUE)

prediction <- predict(model, traindata, decision.values = TRUE, 
probability = TRUE)

probs <-  attr(prediction, "probabilities")[,1]
pred <- prediction(probs,trainlabels)

HTH Christian


Hello,

I've come across a strange error...


Here is what happens:

model <- svm(traindata,trainlabels, type="C-classification", 
kernel="radial", cost=10,  class.weights=c("win"=3,"lose"=1), 
scale=FALSE, probability = TRUE)

predictions <- predict(model, traindata)
pred <- prediction(predictions, trainlabels)


This returns an error:
Error in prediction(predictions, trainlabels) :
   Format of predictions is invalid.

Yet my predictions is just a matrix of predicted labels.  Nothing 
fancy.  (In fact, my step follow the exact example on the ROCR homepage.)


A search through google for "Format of predictions is invalid" returns 
zero results.


Can anyone suggest how I might fix this problem?

Thank You,





[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] fitted.values less than observed values

2009-08-04 Thread Federico Calboli

Hi All,

I have some data where the dependent variable is a score, low (1:3) or  
high (8:9), and the independent variables are 21 genotypic markers.  
I'm fitting a logistic regression on the whole dataset after  
transforming the score to 0/1 and normal linear regression on the high  
and low subsets.


I all cases I have a numer of cases of data 'duplications', i.e.  
different individuals with the same score and the same genotype at the  
21 markers.


When I do:

mod$fitted.values I get a number of fitted values corresponding to the  
umber of unique lines in the dataset. Is there a way to have the  
fitted  values match the observation, even though some are duplicated  
and so have the same fitted value? I could do it by hand but it's  
laborious and I'd venture there is a better way.


Best,

Federico


--
Federico C. F. Calboli
Department of Epidemiology and Public Health
Imperial College, St. Mary's Campus
Norfolk Place, London W2 1PG

Tel +44 (0)20 75941602   Fax +44 (0)20 75943193

f.calboli [.a.t] imperial.ac.uk
f.calboli [.a.t] gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] matrix

2009-08-04 Thread Ismail, Riyad
Hi  

 

I have dataset that consists of two columns 

 

AB0.102

 

AC   -0.002

 

BA   -0.102

 

BC   0.270

 

CA   0.002

 

CB  -0.270

 

 

I wish to create a matrix so that I can eventually plot the data. 

 

 

 

A

B

C

A

1

0.102

-0.002

B

-0.102

1

0.27

C

0.002

-0.27

1

 

 

 

Any help or guidance would be greatly appreciated 

 

 

 

Dr Riyad Ismail 

GIS / Remote Sensing Analyst


Sappi Forests (Reg No 1976/02426/07)

Tel  +27 (0)33 347 6650

Fax +27 (0)33 347 6790

E-mail: riyad.ism...@sappi.com 

 


The opinions contained in this message are those of the sender only and do not 
reflect those of Sappi Limited or any of its subsidiary or associated 
companies. 

"This message, including any attachment(s), may contain information which is 
private, privileged or confidential and is intended solely for the use of the 
individual or entity named in this message. If you are not the intended 
recipient of this message, please notify the sender thereof and destroy/delete 
the message. Neither the sender nor Sappi Limited (including its subsidiaries 
and associated companies, jointly referred to as 'Sappi') shall incur any 
liability resulting directly or indirectly from accessing any of the attached 
files which may contain a virus or the like. Any opinions, statements, 
conclusions and other information contained in this message and/or its 
attachment(s) that do not relate or refer to the official business of Sappi 
Limited or any of its subsidiary or associated companies shall be regarded as 
neither provided nor approved by any Sappi company. Views expressed in this 
message or its attachment(s) may not necessarily be those of Sappi and Sappi 
cann!
 ot be held liable for any direct or indirect loss or injury resulting from the 
contents of a message and/or its attachments."


Directors names

A list of Sappi companies and the names of their directors can be retrieved 
from 
http://www.sappi.com/SappiWeb/Investor+info/Corporate+governance/Board+resumes



#
Scanned by MailMarshal - Marshal's comprehensive email content security 
solution. 
Download a free evaluation of MailMarshal at www.marshal.com
#

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] asc class object - how to get positions (coordinates) for a given raster ID?

2009-08-04 Thread Clément Calenge
The function getXYcoords in the package adehabitat might help. You may 
also consider asc2spixdf, which returns an object of class 
SpatialPixelsDataFrame (where the coordinates are stored as well as the 
values of the pixel).

Hope this helps,

Clément Calenge.



Paulo Eduardo Cardoso wrote:

In a raster asc object, I'd like to take the positions (x and y coordinates)
for a given "pixel" ID. Any idea about how to do this?
___
Paulo E. Cardoso

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


  



--
Clément CALENGE
Office national de la chasse et de la faune sauvage
Saint Benoist - 78610 Auffargis
tel. (33) 01.30.46.54.14

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] matrix

2009-08-04 Thread ONKELINX, Thierry
Split the first column into two columns. One with the first letter and
one with the last letter. Then you cast() from the reshape-package to
create the matrix you want.

HTH,

Thierry
 




ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek / Research Institute for Nature
and Forest
Cel biometrie, methodologie en kwaliteitszorg / Section biometrics,
methodology and quality assurance
Gaverstraat 4
9500 Geraardsbergen
Belgium
tel. + 32 54/436 185
thierry.onkel...@inbo.be
www.inbo.be

To call in the statistician after the experiment is done may be no more
than asking him to perform a post-mortem examination: he may be able to
say what the experiment died of.
~ Sir Ronald Aylmer Fisher

The plural of anecdote is not data.
~ Roger Brinner

The combination of some data and an aching desire for an answer does not
ensure that a reasonable answer can be extracted from a given body of
data.
~ John Tukey

-Oorspronkelijk bericht-
Van: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
Namens Ismail, Riyad
Verzonden: dinsdag 4 augustus 2009 13:36
Aan: r-help@r-project.org
Onderwerp: [R] matrix

Hi  



I have dataset that consists of two columns 



AB0.102



AC   -0.002



BA   -0.102



BC   0.270



CA   0.002



CB  -0.270





I wish to create a matrix so that I can eventually plot the data. 







A

B

C

A

1

0.102

-0.002

B

-0.102

1

0.27

C

0.002

-0.27

1







Any help or guidance would be greatly appreciated 







Dr Riyad Ismail 

GIS / Remote Sensing Analyst


Sappi Forests (Reg No 1976/02426/07)

Tel  +27 (0)33 347 6650

Fax +27 (0)33 347 6790

E-mail: riyad.ism...@sappi.com 




The opinions contained in this message are those of the sender only and
do not reflect those of Sappi Limited or any of its subsidiary or
associated companies. 

"This message, including any attachment(s), may contain information
which is private, privileged or confidential and is intended solely for
the use of the individual or entity named in this message. If you are
not the intended recipient of this message, please notify the sender
thereof and destroy/delete the message. Neither the sender nor Sappi
Limited (including its subsidiaries and associated companies, jointly
referred to as 'Sappi') shall incur any liability resulting directly or
indirectly from accessing any of the attached files which may contain a
virus or the like. Any opinions, statements, conclusions and other
information contained in this message and/or its attachment(s) that do
not relate or refer to the official business of Sappi Limited or any of
its subsidiary or associated companies shall be regarded as neither
provided nor approved by any Sappi company. Views expressed in this
message or its attachment(s) may not necessarily be those of Sappi and
Sappi cann!
 ot be held liable for any direct or indirect loss or injury resulting
from the contents of a message and/or its attachments."


Directors names

A list of Sappi companies and the names of their directors can be
retrieved from
http://www.sappi.com/SappiWeb/Investor+info/Corporate+governance/Board+r
esumes




#
Scanned by MailMarshal - Marshal's comprehensive email content security
solution. 
Download a free evaluation of MailMarshal at www.marshal.com

#

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Dit bericht en eventuele bijlagen geven enkel de visie van de schrijver weer 
en binden het INBO onder geen enkel beding, zolang dit bericht niet bevestigd is
door een geldig ondertekend document. The views expressed in  this message 
and any annex are purely those of the writer and may not be regarded as stating 
an official position of INBO, as long as the message is not confirmed by a duly 
signed document.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] matrix

2009-08-04 Thread Henrique Dallazuanna
You can try something about like this:
merge(transform(x, V1 = gsub("^[A-Z]", "", x$V1),
   V3 = gsub("[A-Z]$", "", x$V1)),
  data.frame(V1 = LETTERS[1:3],
 V3 = LETTERS[1:3],
 V2 = 1),
  by = c("V1", "V3"), all = TRUE)

On Tue, Aug 4, 2009 at 8:35 AM, Ismail, Riyad wrote:

> Hi
>
>
>
> I have dataset that consists of two columns
>
>
>
> AB0.102
>
>
>
> AC   -0.002
>
>
>
> BA   -0.102
>
>
>
> BC   0.270
>
>
>
> CA   0.002
>
>
>
> CB  -0.270
>
>
>
>
>
> I wish to create a matrix so that I can eventually plot the data.
>
>
>
>
>
>
>
> A
>
> B
>
> C
>
> A
>
> 1
>
> 0.102
>
> -0.002
>
> B
>
> -0.102
>
> 1
>
> 0.27
>
> C
>
> 0.002
>
> -0.27
>
> 1
>
>
>
>
>
>
>
> Any help or guidance would be greatly appreciated
>
>
>
>
>
>
>
> Dr Riyad Ismail
>
> GIS / Remote Sensing Analyst
>
>
> Sappi Forests (Reg No 1976/02426/07)
>
> Tel  +27 (0)33 347 6650
>
> Fax +27 (0)33 347 6790
>
> E-mail: riyad.ism...@sappi.com
>
>
>
>
> The opinions contained in this message are those of the sender only and do
> not reflect those of Sappi Limited or any of its subsidiary or associated
> companies.
>
> "This message, including any attachment(s), may contain information which
> is private, privileged or confidential and is intended solely for the use of
> the individual or entity named in this message. If you are not the intended
> recipient of this message, please notify the sender thereof and
> destroy/delete the message. Neither the sender nor Sappi Limited (including
> its subsidiaries and associated companies, jointly referred to as 'Sappi')
> shall incur any liability resulting directly or indirectly from accessing
> any of the attached files which may contain a virus or the like. Any
> opinions, statements, conclusions and other information contained in this
> message and/or its attachment(s) that do not relate or refer to the official
> business of Sappi Limited or any of its subsidiary or associated companies
> shall be regarded as neither provided nor approved by any Sappi company.
> Views expressed in this message or its attachment(s) may not necessarily be
> those of Sappi and Sappi cann!
>  ot be held liable for any direct or indirect loss or injury resulting from
> the contents of a message and/or its attachments."
>
>
> Directors names
>
> A list of Sappi companies and the names of their directors can be retrieved
> from
> http://www.sappi.com/SappiWeb/Investor+info/Corporate+governance/Board+resumes
>
>
>
>
> #
> Scanned by MailMarshal - Marshal's comprehensive email content security
> solution.
> Download a free evaluation of MailMarshal at www.marshal.com
>
> #
>
>[[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Henrique Dallazuanna
Curitiba-Paraná-Brasil
25° 25' 40" S 49° 16' 22" O

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Manova post hoc

2009-08-04 Thread Michelangelo La Spina

Hello, mailing list!

I'm using a manova in R to study the responses of four dependent variables. I 
would like to do a post hoc analysis, but I don`t know which is the best and 
how to introduce in R.

I'm using a pairwise t test, but I'm not sure if it is correct, I like to use 
tukeyHSD, but with manova doesn`t work.

How I can solve this problem?

Thank you very much.

_
[[elided Hotmail spam]]
on otros internautas las noticias que más te interesan, y votar las que otras 
personas han destacado.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] FW: matrix

2009-08-04 Thread Ismail, Riyad



My apologies, to elaborate 

I carried out a correlation analysis and I want to plot the data (similar to 
the graphs available in the corrplot package. However, the results are as 
follows (I have many more combinations):

AB0.102
AC   -0.002
BA   -0.102
BC   0.270
CA   0.002
CB  -0.270

I now want to create a matrix that uses the names from the first column as a 
"reference", so in the matrix the first row name will be "A" and the first 
column name will be "B" with a value of 0.102. I will be using the script 
available on http://www.phaget4.org/R/image_matrix.html to plot the data

The opinions contained in this message are those of the sender only and do not 
reflect those of Sappi Limited or any of its subsidiary or associated 
companies. 

"This message, including any attachment(s), may contain information which is 
private, privileged or confidential and is intended solely for the use of the 
individual or entity named in this message. If you are not the intended 
recipient of this message, please notify the sender thereof and destroy/delete 
the message. Neither the sender nor Sappi Limited (including its subsidiaries 
and associated companies, jointly referred to as 'Sappi') shall incur any 
liability resulting directly or indirectly from accessing any of the attached 
files which may contain a virus or the like. Any opinions, statements, 
conclusions and other information contained in this message and/or its 
attachment(s) that do not relate or refer to the official business of Sappi 
Limited or any of its subsidiary or associated companies shall be regarded as 
neither provided nor approved by any Sappi company. Views expressed in this 
message or its attachment(s) may not necessarily be those of Sappi and Sappi 
cann!
 ot be held liable for any direct or indirect loss or injury resulting from the 
contents of a message and/or its attachments."


Directors names

A list of Sappi companies and the names of their directors can be retrieved 
from 
http://www.sappi.com/SappiWeb/Investor+info/Corporate+governance/Board+resumes


#
Scanned by MailMarshal - Marshal's comprehensive email content security 
solution. 
Download a free evaluation of MailMarshal at www.marshal.com
#

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] error message "memory not mapped"

2009-08-04 Thread Anne Skoeries
Hi there,

I'm automatically generating buttons depending on the amount of rows  
my dataframe consists of. The buttons are supposed to call a barplot- 
function. Generating the buttons and displaying the barplot isn't a  
problem, but once I press one of the buttons, I'm getting an error- 
message.

My dataframe contains only integers. The number of rows usually  
doesn't exceed 6 rows.

So, if anyone knows, what I'm doing wrong, please let me know. I don't  
think, that I really have to generate each button seperately.


this is how I generate my buttons:

base <- tktoplevel()
for(i in 1:nrow(dataframe))
   tkgrid(but <- tkbutton(base, text = paste("Barplot:", i),  
command =  function() {barplot(height = data.matrix(dataframe[i,]),  
names.arg = names(dataframe))}))


this is the error-code I get:

*** caught segfault ***
address 0xc023, cause 'memory not mapped'

Possible actions:
1: abort (with core dump, if enabled)
2: normal R exit
3: exit R without saving workspace
4: exit R saving workspace
error in background error handler:
out of stack space (infinite loop?)
 while executing
"::tcl::Bgerror {out of stack space (infinite loop?)} {-code 1 -level  
0 -errorcode NONE -errorinfo {out of stack space (infinite loop?)
 while execu..."

And here is my sessionInfo:
R version 2.9.1 (2009-06-26)
i386-apple-darwin8.11.1

locale:
de_DE.UTF-8/de_DE.UTF-8/C/C/de_DE.UTF-8/de_DE.UTF-8

attached base packages:
[1] tcltk stats graphics  grDevices utils datasets   
methods   base

Thanks for the help,
--
Anne Skoeries




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] fitting a truncated power law

2009-08-04 Thread John Sanders


Dear Glen_b,

The function I'm trying to fit has the form:

P(k)
~ k^(-y) exp (– k ⁄ kx) 

And deals with count data. I'm a newbie, so any more specific suggestion would 
be greatly appreciated.

john



John Sanders-2 wrote:
>
> How can I fit a truncated power law to a vector? I can't find a function
> to do that. If the function provides an AIC, even better.
>

Okay, "power law" I understand - f(x) = k.x^a, or on the log-scale log(f(x))
= log(k) + a log(x) (linear)

I was unfamiliar with the term "truncated power law", but after looking on
the internet I see that the term implies what appears to be replacing the
linear fit with a linear spline fit to log(y) in terms of log(x)  - but the
usual application seems to be to fit probability distribution to count data;
in this case you fit essentially a two-part Pareto distribution (or Zipf if
the variable is discrete) - again the log-fitted-density is like a linear
spline in the logs.

Is the vector of data you have counts to which you wish to fit a
distribution, or is it a set of measurements?

If I understand the problem correctly, I think it could probably be done
using linear splines with GLMs, which can be done in a couple of packages.




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to get w and b in SVR? (package e1071)

2009-08-04 Thread marlene marchena
Hi Steve,

First of all, thanks for your answer.

I did all that you suggested and now I have some questions.


1) when I use > svm.m1$coefs I have 180 values (the number of SV) of 10 and
-10, I think that it is related with C=10 because when I change for C=100 I
have 180 values of 100 and -100. I can't understand the meaning of that in
the SVR model. Do you know what it means?

2) For svm.m1$y.scale I get two values $`scaled:center` and $`scaled:scale.
The first one is the original value and the second one is the scale value
then my b will be scaled:center. Is that correct?

3) Which is the best form to put the data for SVM: in its original form
without scale and using scale=TRUE or using normalize date using
(x-min)/(max-min) like NN with scale=FALSE?

Thanks again,

Marlene.


Bellow my model:


> svm.m1  (I chose the parameters of the model  using tune.svm() )



Call:

svm(formula = QQ ~ ., data = train, cost = 10, gamma = 1e-06)





Parameters:

   SVM-Type:  eps-regression

 SVM-Kernel:  radial

   cost:  10

  gamma:  1e-06

epsilon:  0.1





Number of Support Vectors:  180



> str(svm.m1)

List of 30

 $ call : language svm(formula = QQ ~ ., data = train, cost = 10, gamma
= 1e-06)

 $ type : num 3

 $ kernel   : num 2

 $ cost : num 10

 $ degree   : num 3

 $ gamma: num 1e-06

 $ coef0: num 0

 $ nu   : num 0.5

 $ epsilon  : num 0.1

 $ sparse   : logi FALSE

 $ scaled   : logi [1:7] TRUE TRUE TRUE TRUE TRUE TRUE ...

 $ x.scale  :List of 2

  ..$ scaled:center: Named num [1:7] 0.23 0.23 0.23 0.234 0.238 ...

  .. ..- attr(*, "names")= chr [1:7] "diff1" "diff2" "diff3" "diff4" ...

  ..$ scaled:scale : Named num [1:7] 0.183 0.182 0.182 0.187 0.194 ...

  .. ..- attr(*, "names")= chr [1:7] "diff1" "diff2" "diff3" "diff4" ...

 $ y.scale  :List of 2

  ..$ scaled:center: num 0.227

  ..$ scaled:scale : num 0.182

 $ nclasses : int 2

 $ levels   : num [1:120] -0.98398 0.00101 -0.63924 0.18159 1.72474 ...

 $ tot.nSV  : int 180

 $ nSV  : int [1:2] 0 0

 $ labels   : int [1:2] 0 0

 $ SV   : num [1:180, 1:7] 1.4296 -0.9948 -0.012 -0.6508 0.1682 ...

  ..- attr(*, "dimnames")=List of 2

  .. ..$ : chr [1:180] "1" "2" "3" "4" ...

  .. ..$ : chr [1:7] "diff1" "diff2" "diff3" "diff4" ...

 $ index: int [1:180] 1 2 3 4 5 6 7 8 9 10 ...

 $ rho  : num 0.139

 $ compprob : logi FALSE

 $ probA: NULL

 $ probB: NULL

 $ sigma: NULL

 $ coefs: num [1:180, 1] -10 10 -10 10 10 -10 -10 10 10 10 ...

 $ na.action: NULL

 $ fitted   : Named num [1:196] 0.202 0.203 0.202 0.202 0.202 ...

  ..- attr(*, "names")= chr [1:196] "1" "2" "3" "4" ...

 $ residuals: Named num [1:196] -1.1864 -0.2018 -0.8417 -0.0206 1.5224 ...

  ..- attr(*, "names")= chr [1:196] "1" "2" "3" "4" ...

 $ terms:Classes 'terms', 'formula' length 3 QQ ~ diff1 + diff2 + diff3
+ diff4 + diff5 + diff6 + diff7

  .. ..- attr(*, "variables")= language list(QQ, diff1, diff2, diff3, diff4,
diff5, diff6, diff7)

  .. ..- attr(*, "factors")= int [1:8, 1:7] 0 1 0 0 0 0 0 0 0 0 ...

  .. .. ..- attr(*, "dimnames")=List of 2

  .. .. .. ..$ : chr [1:8] "QQ" "diff1" "diff2" "diff3" ...

  .. .. .. ..$ : chr [1:7] "diff1" "diff2" "diff3" "diff4" ...

  .. ..- attr(*, "term.labels")= chr [1:7] "diff1" "diff2" "diff3" "diff4"
...

  .. ..- attr(*, "order")= int [1:7] 1 1 1 1 1 1 1

  .. ..- attr(*, "intercept")= num 0

  .. ..- attr(*, "response")= int 1

  .. ..- attr(*, ".Environment")=

  .. ..- attr(*, "predvars")= language list(QQ, diff1, diff2, diff3, diff4,
diff5, diff6, diff7)

  .. ..- attr(*, "dataClasses")= Named chr [1:8] "numeric" "numeric"
"numeric" "numeric" ...

  .. .. ..- attr(*, "names")= chr [1:8] "QQ" "diff1" "diff2" "diff3" ...

 - attr(*, "class")= chr [1:2] "svm.formula" "svm"



> svm.m1$coefs (some examples, I have 180 SV)

   [,1]

  [1,]  -10

  [2,]   10

  [3,]  -10

  [4,]   10

  [5,]   10

  [6,]  -10

  [7,]  -10

  [8,]   10

  [9,]   10

 [10,]   10



> svm.m1$SV

   diff1   diff2diff3diff4
diff5
diff6diff7

11.429563809 -0.539452422 -0.376635036  3.227053246  3.765529896
0.072436459
-0.898142838

2   -0.994802571  1.429804286 -0.540826469 -0.386515451  3.087548981
3.780174219  0.087268726

3   -0.011951338 -0.998945652  1.429470704 -0.546407873 -0.394807530
3.100422296  3.821459922

4   -0.650804641 -0.014317301 -1.000562474  1.372301171 -0.548894103
-0.391030760  3.136858202

50.168238056 -0.654325731 -0.015413890 -0.994106648  1.300144754
-0.545519835 -0.379505172

61.708038328  0.166197899 -0.655760471 -0.034752129 -0.980336501
1.308349045
-0.535096474

7   -0.585281225  1.708782321  0.165196685 -0.658332568 -0.055817075
-0.978089239  1.331999124

8   -0.683566350 -0.588683840  1.708596139  0.141129535 -0.656754703
-0.051154801 -0.970752111

9   -0.028332192 -0.687146676 -0.590083898  1.644118288  0.113678155
-0.653662187 -0.037204315

10   0.659663673 -0.030727774 -0.688598758 -0.5943

[R] matrix

2009-08-04 Thread Ismail, Riyad



My apologies, to elaborate 

I carried out a correlation analysis and I want to plot the data (similar to 
the graphs available in the corrplot package. However, the results are as 
follows (I have many more combinations):

AB0.102
AC   -0.002
BA   -0.102
BC   0.270
CA   0.002
CB  -0.270

I now want to create a matrix that uses the names from the first column as a 
"reference", so in the matrix the first row name will be "A" and the first 
column name will be "B" with a value of 0.102. I will be using the script 
available on http://www.phaget4.org/R/image_matrix.html to plot the data

The opinions contained in this message are those of the sender only and do not 
reflect those of Sappi Limited or any of its subsidiary or associated 
companies. 

"This message, including any attachment(s), may contain information which is 
private, privileged or confidential and is intended solely for the use of the 
individual or entity named in this message. If you are not the intended 
recipient of this message, please notify the sender thereof and destroy/delete 
the message. Neither the sender nor Sappi Limited (including its subsidiaries 
and associated companies, jointly referred to as 'Sappi') shall incur any 
liability resulting directly or indirectly from accessing any of the attached 
files which may contain a virus or the like. Any opinions, statements, 
conclusions and other information contained in this message and/or its 
attachment(s) that do not relate or refer to the official business of Sappi 
Limited or any of its subsidiary or associated companies shall be regarded as 
neither provided nor approved by any Sappi company. Views expressed in this 
message or its attachment(s) may not necessarily be those of Sappi and Sappi 
cann!
 ot be held liable for any direct or indirect loss or injury resulting from the 
contents of a message and/or its attachments."


Directors names

A list of Sappi companies and the names of their directors can be retrieved 
from 
http://www.sappi.com/SappiWeb/Investor+info/Corporate+governance/Board+resumes


#
Scanned by MailMarshal - Marshal's comprehensive email content security 
solution. 
Download a free evaluation of MailMarshal at www.marshal.com
#

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] parameter asterisks

2009-08-04 Thread Ben Bolker



alexander russell-2 wrote:
> 
> Hello,
> Is there a clearcut answer as to why R prints 'NA' sometimes instead of
> standard errors?
> 
> 
> mle2(minuslogl = nlikfun4, start = list(a = 1, c = 1, d = 0.2,
> 
> b = 0.1, b1 = 0.1), method = "Nelder-Mead")
> 
> Coefficients:
> 
> Estimate Std. Error z value Pr(z)
> 
> a 3.83845751 0.47320236 8.1117 4.993e-16 ***
> 
> c 0.95545367 NA NA NA
> 
> d -0.22509015 NA NA NA
> 
> b 0.04260199 0.00743892 5.7269 1.023e-08 ***
> 
> b1 0.00212538 0.00031189 6.8145 9.459e-12 ***
> 
> 

  This is in the bbmle package so I'll bite ...
The standard errors in the summary() for mle2
are based on inverting the Hessian (the second-derivative
matrix computed at the maximum likelihood estimate).
This sometimes runs into numerical problems, because
these derivatives have to be computed by finite differences.
My best guess (in the absence of a repeatable example,
hint, hint) is that you might  be able to get around the problem
by setting "parscale" (i.e. something like
control=list(parscale=c(a=4,c=1,d=0.2,b=0.05,b1=0.002)) --
you just need to get the order of magnitude right) to let
R know that the expected values of the parameters vary
by an order of magnitude.

  Ben Bolker

-- 
View this message in context: 
http://www.nabble.com/parameter-asterisks-tp24804823p24807838.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Problems with lqs()

2009-08-04 Thread Kathrin Schmidt
Dear List-Members,

I have a problem with the function lqs() from package MASS. In some cases it 
produces different results for the same settings and needs a random seed to be 
set, in other cases not. 
I really cannot understand, why this happens. As well I do not understand what 
exactly you need the random seed for. Is it a starting point for iterations? Or 
do different results occur because of the estimation doesn't converge?


I tried data "phones" from package MASS. You find this example as well in the 
MASS-book on page 160.

> lqs(calls~year, data=phones, method="lms")
Call:
lqs.formula(formula = calls ~ year, data = phones, method = "lms")

Coefficients:
(Intercept) year  
-55.9471.155  

Scale estimates 0.9377 0.9095 

> 
> lqs(calls~year, data=phones, method="S")
Call:
lqs.formula(formula = calls ~ year, data = phones, method = "S")

Coefficients:
(Intercept) year  
  -52.5  1.1  

Scale estimates 2.129 

You can do it over and over again and get the same coefficients. In contrast, 
if u use other data like cats from MASS or simulated data, u get different 
outputs every time u start the code if not electing a random.seed.

> lqs(Hwt~Bwt, data=cats, method="S") 
Call:
lqs.formula(formula = Hwt ~ Bwt, data = cats, method = "S")

Coefficients:
(Intercept)  Bwt  
 0.2625   3.6250  

Scale estimates 1.474 

> lqs(Hwt~Bwt, data=cats, method="S") 
Call:
lqs.formula(formula = Hwt ~ Bwt, data = cats, method = "S")

Coefficients:
(Intercept)  Bwt  
 0.4714   3.5714  

Scale estimates 1.474 



Example with simulated data:

> b0<--1
> b1<-6
> b2<-0.8
> b3<--0.5
> 
> x1<-runif(200,-3,3)
> x2<-runif(200,20,40)
> x3<-rbinom(200, 1, 0.7)
> e<-rnorm(200,0,1)
> y<-b0+b1*x1+b2*x2+b3*x3+e
> lqs(y~x1, method="lms")
Call:
lqs.formula(formula = y ~ x1, method = "lms")

Coefficients:
(Intercept)   x1  
 22.2394.964  

Scale estimates 5.379 4.891 

> 
> lqs(y~x1, method="S")
Call:
lqs.formula(formula = y ~ x1, method = "S")

Coefficients:
(Intercept)   x1  
 23.1935.743  

Scale estimates 5.642 

> lqs(y~x1, method="lms")
Call:
lqs.formula(formula = y ~ x1, method = "lms")

Coefficients:
(Intercept)   x1  
 21.1765.255  

Scale estimates 5.383 5.023 

> 
> lqs(y~x1, method="S")
Call:
lqs.formula(formula = y ~ x1, method = "S")

Coefficients:
(Intercept)   x1  
  22.55 5.46  

Scale estimates 5.642 


Thanks for your help,
I appreciate it!

K. Schmidt



Neu: WEB.DE Doppel-FLAT mit Internet-Flatrate + Telefon-Flatrate
für nur 19,99 Euro/mtl.!* http://produkte.web.de/go/02/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Comparison of Output from "dwtest" and "durbin.watson"

2009-08-04 Thread Tom La Bone

Allow me to reword this question. I have performed two fits to the same set
of data below: a weighted fit and an unweighted fit. I performed the
Durbin-Watson tests on each fit using "dwtest" and "durbin.watson". For a
given fit (weighted or unweighted), should both dwtest and durbin.watson be
giving me the same DW statistic and p-value? Should I get the same DW
statistic and p-value for the weighted and unweighted fits as I do using
dwtest?


> library(lmtest)
Loading required package: zoo

Attaching package: 'zoo'


The following object(s) are masked from package:base :

 as.Date.numeric 

> library(car)
> X <- c(4.8509E-1,8.2667E-2,6.4010E-2,5.1188E-2,3.4492E-2,2.1660E-2,
+ 3.2242E-3,1.8285E-3)
> Y <- c(2720,1150,1010,790,482,358,78,35)
> W <- 1/Y^2
> 
> fit <- lm(Y ~ X - 1)
> dwtest(fit,alternative="two.sided")

Durbin-Watson test

data:  fit 
DW = 0.7599, p-value = 0.05935
alternative hypothesis: true autocorelation is not 0 

> durbin.watson(fit,alternative="two.sided")
 lag Autocorrelation D-W Statistic p-value
   1   0.5897666 0.7599161   0.368
 Alternative hypothesis: rho != 0
> 
> fit <- lm(Y ~ X - 1,weights=W)
> dwtest(fit,alternative="two.sided")

Durbin-Watson test

data:  fit 
DW = 0.7599, p-value = 0.05935
alternative hypothesis: true autocorelation is not 0 

> durbin.watson(fit,alternative="two.sided")
 lag Autocorrelation D-W Statistic p-value
   1 -0.07663672  1.2090760.77
 Alternative hypothesis: rho != 0
> 

-- 
View this message in context: 
http://www.nabble.com/Comparison-of-Output-from-%22dwtest%22-and-%22durbin.watson%22-tp24783494p24808540.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Comparison of Output from "dwtest" and "durbin.watson"

2009-08-04 Thread Ronggui Huang
I think the statistics are the same, but the p-values are not exactly
the same as they used different methods for the p-value. car uses
bootstrapping and lmtest uses the "pan" algorithm, said from the help
pages.


2009/8/4 Tom La Bone :
>
> Allow me to reword this question. I have performed two fits to the same set
> of data below: a weighted fit and an unweighted fit. I performed the
> Durbin-Watson tests on each fit using "dwtest" and "durbin.watson". For a
> given fit (weighted or unweighted), should both dwtest and durbin.watson be
> giving me the same DW statistic and p-value? Should I get the same DW
> statistic and p-value for the weighted and unweighted fits as I do using
> dwtest?
>
>
>> library(lmtest)
> Loading required package: zoo
>
> Attaching package: 'zoo'
>
>
>        The following object(s) are masked from package:base :
>
>         as.Date.numeric
>
>> library(car)
>> X <- c(4.8509E-1,8.2667E-2,6.4010E-2,5.1188E-2,3.4492E-2,2.1660E-2,
> +         3.2242E-3,1.8285E-3)
>> Y <- c(2720,1150,1010,790,482,358,78,35)
>> W <- 1/Y^2
>>
>> fit <- lm(Y ~ X - 1)
>> dwtest(fit,alternative="two.sided")
>
>        Durbin-Watson test
>
> data:  fit
> DW = 0.7599, p-value = 0.05935
> alternative hypothesis: true autocorelation is not 0
>
>> durbin.watson(fit,alternative="two.sided")
>  lag Autocorrelation D-W Statistic p-value
>   1       0.5897666     0.7599161   0.368
>  Alternative hypothesis: rho != 0
>>
>> fit <- lm(Y ~ X - 1,weights=W)
>> dwtest(fit,alternative="two.sided")
>
>        Durbin-Watson test
>
> data:  fit
> DW = 0.7599, p-value = 0.05935
> alternative hypothesis: true autocorelation is not 0
>
>> durbin.watson(fit,alternative="two.sided")
>  lag Autocorrelation D-W Statistic p-value
>   1     -0.07663672      1.209076    0.77
>  Alternative hypothesis: rho != 0
>>
>
> --
> View this message in context: 
> http://www.nabble.com/Comparison-of-Output-from-%22dwtest%22-and-%22durbin.watson%22-tp24783494p24808540.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
HUANG Ronggui, Wincent
PhD Candidate
Dept of Public and Social Administration
City University of Hong Kong
Home page: http://asrr.r-forge.r-project.org/rghuang.html

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Order statistic in wtd.quantile

2009-08-04 Thread Frank E Harrell Jr

Michael Becher wrote:

Dear Professor Harrell,

please allow me a brief question about the wtd.quantile() function in 
your marvelous R package Hmisc that I was not able to answer using the 
documentation or the web. For type="quantile", what is the exact 
statistic that is estimated? The documentation says that the same 
interpolated order statistic as in the quantile() function is used. As 
far as I can tell, the quantile() function has 5 different interpolated 
order statistics (types 4-9). The default for quantile() is type 7 - is 
the same used in wtd.quantile? Or is it tpye 8, which is recommended by 
Hyndman and Fan (1996), since it is median unbiased and distribution 
free? Or something else?


Thanks a lot for your attention.

Respectfully,

Michael


---
Michael Becher
Princeton University
Department of Politics
mbec...@princeton.edu


Here is an empirical way to answer the question, for the case where the 
weights are all equal to 1.  Methods could be said to be identical when 
then maximum difference is < 1e-10.  You can see that the default for 
wtd.quantile equals the default for quantile.  -Frank


require(Hmisc)

set.seed(1)
u <-  eval(formals(wtd.quantile)$type)
v <- as.character(1:9)
r <- matrix(0, nrow=length(u), ncol=9, dimnames=list(u,v))

for(n in c(8, 13, 22, 29))
  {
x <- rnorm(n)
for(i in 1:5) {
  probs <- sort( runif(9))
  for(wtype in u) {
wq <- wtd.quantile(x, type=wtype, weights=rep(1,length(x)), 
probs=probs)

for(qtype in 1:9) {
  rq <- quantile(x, type=qtype, probs=probs)
  r[wtype, qtype] <- max(r[wtype,qtype], max(abs(wq-rq)))
}
  }
}
  }

r
1 2 34 5 
 6
quantile0.5729927 0.5729927 0.7547948 5.022317e-01 0.3725233 
6.416295e-01
(i-1)/(n-1) 0.5729927 0.5729927 0.7547948 5.022317e-01 0.3725233 
6.416295e-01
i/(n+1) 0.7088892 0.7088892 0.8569561 7.195455e-01 0.3208148 
4.440892e-16
i/n 0.7774653 0.7774653 0.4093181 4.440892e-16 0.4284780 
7.195455e-01

   7 8 9
quantile0.00e+00 0.4717554 0.4422707
(i-1)/(n-1) 8.881784e-16 0.4717554 0.4422707
i/(n+1) 6.416295e-01 0.2138765 0.2406111
i/n 5.022317e-01 0.5464169 0.5169322




--
Frank E Harrell Jr   Professor and Chair   School of Medicine
 Department of Biostatistics   Vanderbilt University

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Nonlinear mixed binary regression model

2009-08-04 Thread Caio Azevedo
Dear Daniel,

Thanks a lot for your answer. I will take a look in such package.

Best regards,

Caio

On Tue, Jul 28, 2009 at 10:38 PM, Daniel Malter  wrote:

> Caio, check the lme4 library. The lmer function allows for fixed and random
> effects.
>
> Daniel
>
>
> -
> cuncta stricte discussurus
> -
>
> -Ursprüngliche Nachricht-
> Von: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] Im
> Auftrag von Caio Azevedo
> Gesendet: Tuesday, July 28, 2009 6:43 PM
> An: r-help@r-project.org
> Betreff: [R] Nonlinear mixed binary regression model
>
> Hi all,
>
> Is there any package in R which fits binary regression models with a probit
> (or logit) link function related to a nonlinear predictor with both fixed
> and random effects? Something like that:
>
> Y ~ Bernoulli(p)
>
>
> probit(p) = h(X*Beta + Z*b), where ``h'' is a nonlinear function, X and Z
> are known design matrices, Beta are fixed effects and b are random effects.
>
>
> Thanks all in advance,
>
>
> Best regards,
>
> Caio
>
>[[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] RServe - How to use 'createReference' method?

2009-08-04 Thread joaodaniel

Hello,

Actually you are right. I was looking on the wrong direction. But I still
got some doubts about how to use those methods.

I want to open a file on the client machine, for example a txt
tab-delimited, and create a dataframe with its information on R.

I got to create a RFileInputStream object, using the openFile method, so it
store the file information. Then I must use the read() method from the
RFileInputStream to get file information. And finally, I should use the
assing() method to relate the data with an R object. Is that right?


Romain Francois-2 wrote:
> 
> Hi,
> 
> The Rserve implementation of REngine does not support references (yet?). 
> Anyway, I don't think references are what you need here.
> 
> You probably want the methods createFile and openFile that create 
> RFileInputStream and RFileOutputStream which you can use to transfer 
> files through the R server wire.
> http://www.rforge.net/org/docs/org/rosuda/REngine/Rserve/RConnection.html
> 
> Romain
> 
> On 08/03/2009 04:57 PM, joaodaniel wrote:
>>
>> I need to input a txt, or xls, file from a client to R, using RServe.
>>
>>> From what I've been reading, the best way to do this, is using the
>> 'createReference' method, from REngine package.
>> But I couldn't find any documents exemplifying it's use. I got  to upload
>> a
>> file from java? And then? How do i refeer the file to this method?
>>
>> Best Regards,
>>
>> J. Daniel
> 
> 
> -- 
> Romain Francois
> Professional R Enthusiast
> +33(0) 6 28 91 30 30
> http://romainfrancois.blog.free.fr
> |- http://tr.im/vfxe : R GUI page on the R wiki
> |- http://tr.im/tlNb : RGG#155, 156 and 157
> `- http://tr.im/rw0p : useR! slides
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
> 

-- 
View this message in context: 
http://www.nabble.com/RServe---How-to-use-%27createReference%27-method--tp24798632p24809232.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Building package with vignette

2009-08-04 Thread Richard Chandler
Hello,

I have a package that builds fine using R CMD build pkg --no-vignette,
but I get the following error when running R CMD build pkg:

** building package indices ...
Error in setwd(OutVignetteDir) : cannot change working directory
ERROR: installing package indices failed

I don't know why it can't change directories since I am running as
administrator in Windows Vista. I also don't know what directory it is
trying to change to. I am able to build the vignette pdf using
texi2dvi Overview.Rnw --pdf. My path is specified as suggested in
Appendix E of the Installation and Admin manual, I have MikTex
installed, and I have read the section on Writing vignettes in Writing
R extensions. Can anyone tell me what I am missing?

Thanks,
Richard


platform   i386-pc-mingw32
arch   i386
os mingw32
system i386, mingw32
status
major  2
minor  9.1
year   2009
month  06
day26
svn rev48839
language   R
version.string R version 2.9.1 (2009-06-26)



Richard Chandler

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Efficient coding

2009-08-04 Thread Daniele Amberti
I have a piece of code as the one at the bottom, unfortunately since it 
involves time series from a db it's not easy to give to mailing list a working 
script.
It becomes very slow after few hundred iterations over variable sp (must 
process several thousands).
The Rprof() indicates that the problem is the use of gc(). Can someone indicate 
what I have to take care of not to call gc() so often?

Thanks in advance


-
toProcess <- unique(tsr$cSpillingPointID)
first <- 1
for(sp in toProcess)
{
  tsrSUB <- tsr[tsr$cSpillingPointID == sp,]
  dmnd <- tsrSUB[is.na(tsrSUB$cDeviceClass),]
  if(nrow(dmnd) != 1)
warning(paste("cSpillingPointID", sp, "has more than one demand time 
Series. Used first."))

  tsrValCount <- getTimeSeries(chn, 
tsrSUB[!is.na(tsrSUB$cDeviceClass),]$cTimeSeriesID, nDaysBack = 366*nyears)
  tsrValDmnd  <- getTimeSeries(chn, dmnd$cTimeSeriesID[1], nDaysBack = 
366*nyears)

  sequence <- timeSequence(from = format(start(tsrValDmnd), '%Y-%m-%d'), to = 
format(end(tsrValDmnd), '%Y-%m-%d'), by = 'day')
  if(format(start(tsrValDmnd), '%H') != '00')
tsrValDmnd <- window(tsrValDmnd, sequence[3], end(tsrValDmnd))
  if(format(end(tsrValDmnd), '%H') != '23')
tsrValDmnd <- window(tsrValDmnd, start(tsrValDmnd), 
sequence[length(sequence)-2])

  sequence <- sequence - 3600
  tsrValDmnd <- aggregate(tsrValDmnd, by = sequence, sum)
  sequence <- seriesPositions(tsrValDmnd)
  newPositions(tsrValDmnd) <- sequence - 23*3600
  #head(tsrValDmnd2)
  #sum(head(tsrValDmnd, 24))
  tsrValSub <- cbind(tsrValDmnd, tsrValCount)
  tsrValSub <- na.omit(tsrValSub)
  head(tsrValSub)

  if(nrow(tsrValSub) > 1)
  {
  dif <- na.omit(tsrValSub[,-1] -lag(tsrValSub[,-1],1))
  head(dif)

  costant <- is.costant(dif)
  if(any(costant == FALSE))
  {
a <- (dif != 0) *1:nrow(dif)
a <- seriesPositions(dif[abs(a[a!=0]),])
from <- min(a) - ndays *24*60*60
to <- max(a) + ndays *24*60*60
tsrValSub <- window(tsrValSub, from, to)
  }  else
  {
if(nrow(tsrValSub) > ndays) tsrValSub <- sample(tsrValSub, ndays)
  }

  if(nrow(tsrValSub) > 1)
  {
row.names(tsrSUB) <- tsrSUB$cTimeSeriesID
if(any(is.na(tsrSUB$cDeviceClass)))
  tsrSUB[is.na(tsrSUB$cDeviceClass),]$cDeviceClass <- 'DMND'

tsrval...@units <- tsrsub[tsrval...@units,]$cDeviceClass

if(first == 1)
  dat <- data.frame(cSpillingPointID = sp,
year = format(seriesPositions(tsrValSub), '%Y'),
month = format(seriesPositions(tsrValSub), '%m'),
day = format(seriesPositions(tsrValSub), '%d'),
tsrValSub,
stringsAsFactors = F)
else
  dat <- merge(dat,
   data.frame(cSpillingPointID = sp,
  year = format(seriesPositions(tsrValSub), '%Y'),
  month = format(seriesPositions(tsrValSub), '%m'),
  day = format(seriesPositions(tsrValSub), '%d'),
  tsrValSub,
  stringsAsFactors = F),
   all.x = T, all.y = T)

cat(paste('Added', nrow(tsrValSub), 'rows.\n'))
cat(paste('...', round(first/length(toProcess)*100, 2), '%...\n'))
  }
  }
  else cat(paste('...', round(first/length(toProcess)*100, 2), '%...\n'))

  if(any(first %in% seq(100, length(toProcess), 250) )) gc()
  first <- first +1
}


ORS Srl

Via Agostino Morando 1/3 12060 Roddi (Cn) - Italy
Tel. +39 0173 620211
Fax. +39 0173 620299 / +39 0173 433111
Web Site www.ors.it


Qualsiasi utilizzo non autorizzato del presente messaggio e dei suoi allegati ? 
vietato e potrebbe costituire reato.
Se lei avesse ricevuto erroneamente questo messaggio, Le saremmo grati se 
provvedesse alla distruzione dello stesso
e degli eventuali allegati.
Opinioni, conclusioni o altre informazioni riportate nella e-mail, che non 
siano relative alle attivit? e/o
alla missione aziendale di O.R.S. Srl si intendono non  attribuibili alla 
societ? stessa, n? la impegnano in alcun modo.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Comparison of Output from "dwtest" and "durbin.watson"

2009-08-04 Thread Tom La Bone

My concern is that the two tests give different DW statistics for the
weighted fit and very different p-values for the same DW statistic for the
unweighted fit. Is there a "right" answer here?





-- 
View this message in context: 
http://www.nabble.com/Comparison-of-Output-from-%22dwtest%22-and-%22durbin.watson%22-tp24783494p24809734.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Help with reshaping data.frame

2009-08-04 Thread davidr
Thanks to Hadley (shortest and sweetest), Eric and Gabor.
I was _so_ close.
Now I think I've learned some things about the reshape function and package!
(Also transform and interaction.)
Thanks to you all,
-- David


-Original Message-
From: hadley wickham [mailto:h.wick...@gmail.com] 
Sent: Monday, August 03, 2009 4:57 PM
To: David Reiner 
Cc: R-help@r-project.org
Subject: Re: [R] Help with reshaping data.frame

On Mon, Aug 3, 2009 at 5:23 PM,  wrote:
> I'm having trouble reshaping a data.frame from long to wide.
> (I think that's the right terminology; feel free to educate me.)
> I've looked at the reshape function and package and plyr package,
> but I can't quite figure out how to do this after a dozen variations.
>
> I have a data.frame with more levels than this, but similar to:
>
>> tst
>   K1 K2 K3   V1 V2  V3
> 1  10  D  a 0.08 99 105
> 2  20  D  a 0.00 79 522
> 3  30  D  a 0.31 70 989
> 5  20  E  a 0.08 73 251
> 6  30  E  a   NA 38 323
> 7  10  D  b   NA 76 775
> 8  20  D  b 0.26 84 372
> 9  30  D  b 0.24 51 680
> 10 10  E  b 0.11 85 532
> 12 30  E  b 0.07 20 364
>> dput(tst)
> structure(list(K1 = c(10, 20, 30, 20, 30, 10, 20, 30, 10, 30),
>K2 = structure(c(1L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 2L, 2L), .Label =
> c("D",
>"E"), class = "factor"), K3 = structure(c(1L, 1L, 1L, 1L,
>1L, 2L, 2L, 2L, 2L, 2L), .Label = c("a", "b"), class = "factor"),
>V1 = c(0.08, 0, 0.31, 0.08, NA, NA, 0.26, 0.24, 0.11, 0.07
>), V2 = c(99, 79, 70, 73, 38, 76, 84, 51, 85, 20), V3 = c(105,
>522, 989, 251, 323, 775, 372, 680, 532, 364)), .Names = c("K1",
> "K2", "K3", "V1", "V2", "V3"), row.names = c(1L, 2L, 3L, 5L,
> 6L, 7L, 8L, 9L, 10L, 12L), class = "data.frame")
>> str(tst)
> 'data.frame':   10 obs. of  6 variables:
>  $ K1: num  10 20 30 20 30 10 20 30 10 30
>  $ K2: Factor w/ 2 levels "D","E": 1 1 1 2 2 1 1 1 2 2
>  $ K3: Factor w/ 2 levels "a","b": 1 1 1 1 1 2 2 2 2 2
>  $ V1: num  0.08 0 0.31 0.08 NA NA 0.26 0.24 0.11 0.07
>  $ V2: num  99 79 70 73 38 76 84 51 85 20
>  $ V3: num  105 522 989 251 323 775 372 680 532 364
>
> I want a data.frame with columns K1, D.a.V1, D.a.V2, D.a.V3, D.b.V1,
> D.b.V2, ..., E.b.V3,
> with NA's where there is missing data.
>
> Any direction would be appreciated. sessionInfo() below.

library(reshape)
tst_m <- melt(tst, id = c("K1","K2", "K3"))
cast(tst_m, K1 ~ K2 + K3 + variable)

Hadley

-- 
http://had.co.nz/


This e-mail and any materials attached hereto, including, without limitation, 
all content hereof and thereof (collectively, "Rho Content") are confidential 
and proprietary to Rho Trading Securities, LLC ("Rho") and/or its affiliates, 
and are protected by intellectual property laws.  Without the prior written 
consent of Rho, the Rho Content may not (i) be disclosed to any third party or 
(ii) be reproduced or otherwise used by anyone other than current employees of 
Rho or its affiliates, on behalf of Rho or its affiliates.
 
THE RHO CONTENT IS PROVIDED AS IS, WITHOUT REPRESENTATIONS OR WARRANTIES OF ANY 
KIND.  TO THE MAXIMUM EXTENT PERMISSIBLE UNDER APPLICABLE LAW, RHO HEREBY 
DISCLAIMS ANY AND ALL WARRANTIES, EXPRESS AND IMPLIED, RELATING TO THE RHO 
CONTENT, AND NEITHER RHO NOR ANY OF ITS AFFILIATES SHALL IN ANY EVENT BE LIABLE 
FOR ANY DAMAGES OF ANY NATURE WHATSOEVER, INCLUDING, BUT NOT LIMITED TO, 
DIRECT, INDIRECT, CONSEQUENTIAL, SPECIAL AND PUNITIVE DAMAGES, LOSS OF PROFITS 
AND TRADING LOSSES, RESULTING FROM ANY PERSON'S USE OR RELIANCE UPON, OR 
INABILITY TO USE, ANY RHO CONTENT, EVEN IF RHO IS ADVISED OF THE POSSIBILITY OF 
SUCH DAMAGES OR IF SUCH DAMAGES WERE FORESEEABLE.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] read.csv from a remote machine

2009-08-04 Thread Olga Lyashevska

Dear all,

I am trying to import data with read.csv and my file is on remote  
machine.
I believe that I need to open a connection, not sure about syntax  
though.


I would appreciate any suggestions,
Thanks!

Olga

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] read.csv from a remote machine

2009-08-04 Thread Steve Lianoglou

Hi,

On Aug 4, 2009, at 10:37 AM, Olga Lyashevska wrote:


Dear all,

I am trying to import data with read.csv and my file is on remote  
machine.
I believe that I need to open a connection, not sure about syntax  
though.


I would appreciate any suggestions,


Look at the different ways you can open file connections in the ? 
connections help page. If the file is accessible via http, or ftp I  
guess you're in luck, otherwise I'd imagine you have to just copy the  
file to your local system then open it.


-steve

--
Steve Lianoglou
Graduate Student: Computational Systems Biology
  |  Memorial Sloan-Kettering Cancer Center
  |  Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] One critical question in R

2009-08-04 Thread Hyo Karen Lee
Hi,
I have one critical question in using R.
I am currently working on some research which involves huge amounts
of data(it is about 15GB).
I am trying to use R in this research rather than using SAS or STATA.
(The company where I am working right now, is trying to switch SAS/STATA to
R)

As far as I know, the memory limit in R is 4GB;
However, I believe that there are ways to handle the large dataset.
Most of my works in R would be something like cleaning the data or running a
simple regression(OLS/Logit) though.

The whole company relies on me when it comes to R.
Please teach me how to deal with large data in R.
If you can, please give me a response very soon.
Thank you very much.

Regards,
Hyo

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Writing a NetCDF file in R

2009-08-04 Thread Steve Murray

Dear all,

I am attempting to convert 10 NetCDF files into a single NetCDF file, due to 
the data input requirements of a model I hope to use. I am using the ncdf 
package, version 1.6. The data are global-scale water values, on a monthly 
basis for 10 years (ie. 120 months of data in total; at present the data are 
separated by year, with 12 months of data in each file - mrunoff_1986 through 
to mrunoff_1995). For each month there are 720 longitude x 360 latitude values, 
each with a corresponding runoff value (although some of these may be NAs).

Here is my code so far:


# READ IN NetCDF FILES FROM DISK

library(ncdf)

year <- 1986:1995
file_list <- paste("mrunoff-",year,".nc", sep="")
file_list

start <- 1986

for (i in file_list) {
assign(paste("netcdf_",start,"_temp", sep=""),open.ncdf(i))
start = start+1
  }

# Start converting 10 files into 1

latitude <- get.var.ncdf(netcdf_1986_temp, "Lat")
longitude <- get.var.ncdf(netcdf_1986_temp, "Lon")
month <- get.var.ncdf(netcdf_1986_temp, "Mon")

year <- 1986:1995

mrunoff_1986 <- get.var.ncdf(netcdf_1986_temp, "mrunoff")
mrunoff_1987 <- get.var.ncdf(netcdf_1987_temp, "mrunoff")
mrunoff_1988 <- get.var.ncdf(netcdf_1988_temp, "mrunoff")
mrunoff_1989 <- get.var.ncdf(netcdf_1989_temp, "mrunoff")
mrunoff_1990 <- get.var.ncdf(netcdf_1990_temp, "mrunoff")
mrunoff_1991 <- get.var.ncdf(netcdf_1991_temp, "mrunoff")
mrunoff_1992 <- get.var.ncdf(netcdf_1992_temp, "mrunoff")
mrunoff_1993 <- get.var.ncdf(netcdf_1993_temp, "mrunoff")
mrunoff_1994 <- get.var.ncdf(netcdf_1994_temp, "mrunoff")
mrunoff_1995 <- get.var.ncdf(netcdf_1995_temp, "mrunoff")


# Define variable dimensions

dimx <- dim.def.ncdf("Lon", "deg E", as.double(longitude))
dimy <- dim.def.ncdf("Lat", "deg N", as.double(latitude))
month <- dim.def.ncdf("Mon", "Months: Jan 86=1, Dec 95=120)", 1:120)
year <- dim.def.ncdf("Year", "year", year)

# Assign data: extract mrunoff from each of the 10 files and put into one place

mrunoff_data <- dim.def.ncdf("mrunoff", "mm/month", c(mrunoff_1986, 
mrunoff_1987, mrunoff_1988, mrunoff_1989, mrunoff_1990, mrunoff_1991, 
mrunoff_1992, mrunoff_1993, mrunoff_1994, mrunoff_1995))

# Define runoff variable

mrunoff_dims <- var.def.ncdf("mrunoff_out", "mm/month", list(dimx, dimy, 
month), -.0, "Global Monthly Runoff for 1986-1995", "double")

# Create file

mrunoff_file <- create.ncdf("mrunoff.nc", mrunoff_dims)

# Put mrunoff data into the file

put.var.ncdf(mrunoff_file, mrunoff_dims, mrunoff_data)

# Write to disk

# close.ncdf(mrunoff_file)




However, when I run the code, I get the following error message:

> put.var.ncdf(mrunoff_file, mrunoff_dims, mrunoff_data)
Error in put.var.ncdf(mrunoff_file, mrunoff_dims, mrunoff_data) : 
  put.var.ncdf: error: you asked to write 31104000 values, but the passed data 
array only has 8 entries!


I can understand where the 31104000 comes from ((720*360)*12)*10, but am 
confused as to why only 8 values are being passed to put.var.ncdf. I therefore 
tried doing a couple of tests to shed some light on this:

> length(dimx); length(dimy); length(mrunoff_dims); length(mrunoff_data)
[1] 8
[1] 8
[1] 9
[1] 8


> str(dimx); str(dimy)
List of 8
 $ name : chr "Lon"
 $ units: chr "deg E"
 $ vals : num [1:720] 0.25 0.75 1.25 1.75 2.25 2.75 3.25 3.75 4.25 4.75 
...
 $ len  : int 720
 $ id   : num -1
 $ unlim: logi FALSE
 $ dimvarid : num -1
 $ create_dimvar: logi TRUE
 - attr(*, "class")= chr "dim.ncdf"

List of 8
 $ name : chr "Lat"
 $ units: chr "deg N"
 $ vals : num [1:360] -89.8 -89.2 -88.8 -88.2 -87.8 ...
 $ len  : int 360
 $ id   : num -1
 $ unlim: logi FALSE
 $ dimvarid : num -1
 $ create_dimvar: logi TRUE
 - attr(*, "class")= chr "dim.ncdf"


> str(mrunoff_dims); str(mrunoff_data)
List of 9
 $ name: chr "mrunoff_out"
 $ units   : chr "mm/month"
 $ missval : num -
 $ longname: chr "Global Monthly Runoff for 1986-1995"
 $ id  : num -1
 $ prec: chr "double"
 $ dim :List of 3
  ..$ :List of 8
  .. ..$ name : chr "Lon"
  .. ..$ units: chr "deg E"
  .. ..$ vals : num [1:720] 0.25 0.75 1.25 1.75 2.25 2.75 3.25 3.75 
4.25 4.75 ...
  .. ..$ len  : int 720
  .. ..$ id   : num -1
  .. ..$ unlim: logi FALSE
  .. ..$ dimvarid : num -1
  .. ..$ create_dimvar: logi TRUE
  .. ..- attr(*, "class")= chr "dim.ncdf"
  ..$ :List of 8
  .. ..$ name : chr "Lat"
  .. ..$ units: chr "deg N"
  .. ..$ vals : num [1:360] -89.8 -89.2 -88.8 -88.2 -87.8 ...
  .. ..$ len  : int 360
  .. ..$ id   : num -1
  .. ..$ unlim: logi FALSE
  .. ..$ dimvarid : num -1
  .. ..$ create_dimvar: logi TRUE
  .. ..- attr(*, "class")= chr "dim.ncdf"
  ..$ :List of 8
  .. ..$ name : chr "Mon"
  .. ..$ units: chr "Months since Jan 1986 (Jan 86 =1)"
  .. ..$ vals : int [1:120] 1 2 3 4 5 6 7 8 9 10 ...
  .. ..$ l

Re: [R] One critical question in R

2009-08-04 Thread Steve Lianoglou

Hi,

On Aug 4, 2009, at 11:20 AM, Hyo Karen Lee wrote:


Hi,
I have one critical question in using R.
I am currently working on some research which involves huge amounts
of data(it is about 15GB).
I am trying to use R in this research rather than using SAS or STATA.
(The company where I am working right now, is trying to switch SAS/ 
STATA to

R)

As far as I know, the memory limit in R is 4GB;


While that might be true on windows(?), I'm pretty/quite (positively,  
even) sure that's not true on 64bit linux/osx.



However, I believe that there are ways to handle the large dataset.
Most of my works in R would be something like cleaning the data or  
running a

simple regression(OLS/Logit) though.


One place to look would be the bigmemory package:

http://cran.r-project.org/web/packages/bigmemory/

As well as the other packages listed in the High Performance Computing  
view on CRAN:


http://cran.r-project.org/web/views/HighPerformanceComputing.html

Specifically the "Large memory and out-of-memory data" section.

-steve

--
Steve Lianoglou
Graduate Student: Computational Systems Biology
  |  Memorial Sloan-Kettering Cancer Center
  |  Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Comparison of Output from "dwtest" and "durbin.watson"

2009-08-04 Thread Achim Zeileis

On Tue, 4 Aug 2009, Tom La Bone wrote:


My concern is that the two tests give different DW statistics for the
weighted fit and very different p-values for the same DW statistic for the
unweighted fit. Is there a "right" answer here?


dwtest() is not handling WLS at the moment. I'll have a look whether there 
is an easy way to fix that, but I guess the best I could do at the moment 
is to throw an error. (The statistic is easy to compute, but the 
exact/approximate p-values are less straightforward.)

Z






--
View this message in context: 
http://www.nabble.com/Comparison-of-Output-from-%22dwtest%22-and-%22durbin.watson%22-tp24783494p24809734.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] One critical question in R

2009-08-04 Thread Nordlund, Dan (DSHS/RDA)
> -Original Message-
> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On
> Behalf Of Hyo Karen Lee
> Sent: Tuesday, August 04, 2009 8:21 AM
> To: r-help@r-project.org
> Subject: [R] One critical question in R
> 
> Hi,
> I have one critical question in using R.
> I am currently working on some research which involves huge amounts
> of data(it is about 15GB).
> I am trying to use R in this research rather than using SAS or STATA.
> (The company where I am working right now, is trying to switch SAS/STATA to
> R)
> 
> As far as I know, the memory limit in R is 4GB;

The memory limit depends on your hardware and OS which you haven't told us 
about.  With Linux and a 64-bit computer the limit MUCH higher.  With 32-bit MS 
Windows OS you won't likely get even 3GB. 

> However, I believe that there are ways to handle the large dataset.

You can use a database program like MySQL for example.  If you have files that 
are on the order of 15GB in size, I don't thinlk you are going to have much 
success cleaning the data use R (well I know I wouldn't, but maybe one of the 
experts here can help you out).  You may be able to use the biglm package for 
analuses, or read in just the data you need for your regressions.  If you more 
help you will need to tell us more about what your data is like, with more 
specifics about what your analyses will look like.  

> Most of my works in R would be something like cleaning the data or running a
> simple regression(OLS/Logit) though.
> 
> The whole company relies on me when it comes to R.
> Please teach me how to deal with large data in R.
> If you can, please give me a response very soon.
> Thank you very much.
> 
> Regards,
> Hyo
> 

Dan

Daniel J. Nordlund
Washington State Department of Social and Health Services
Planning, Performance, and Accountability
Research and Data Analysis Division
Olympia, WA  98504-5204
 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] survdiff for left-truncated data?

2009-08-04 Thread Terry Therneau
> Does anyone know if there is a function like survdiff which can also handle
> left-truncated and right-censored data? When I use it on left-truncated and
> right-censored data I get an error message saying Right censored data only.

 coxph(Surv(time1, time2, status) ~ factor(group), data=mydata)
 
 The 'score' test from a Cox model is identical to the logrank test. 

 (Well, almost identical - if there are two deaths on the same day the LR 
calculation uses an n-1 at one point where the Cox uses an n.  Neither is 
right/wrong, just the choice of the authors of the two different papers.  The 
difference is never of any consequence, usually several digits out in the test 
statistic: just enough to force addendums like this one.)
 
  To recreate the observed -expected columns
  
  fit0 <- coxph(Surv(time1, time2, status) ~ factor(group), data=mydata,
 iter=0, na.action=na.exclude)
  o.minus.e <- tapply(resid(fit0), mydata$group, sum)
  obs   <- tapply(mydata$status, mdata$group, sum)
  cbind(observed=obs, expected= obs- o.minus.e, "o-e"=o.minus.e)
 
 
   Terry Therneau

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] One critical question in R

2009-08-04 Thread Barry Rowlingson
On Tue, Aug 4, 2009 at 4:20 PM, Hyo Karen Lee wrote:

> I am currently working on some research which involves huge amounts
> of data(it is about 15GB).

 One point nobody has seemed to make yet is that the above statement
is meaningless...

 Do you have a CSV file that is 15GB big? The important number is the
product of the numbers of rows and columns, not the file size. It
takes 21 bytes to store "1.2345678901234567890" in a CSV file, but
only 8 to store it in R. There's a reduction in size of nearly a
factor of three.

 Or do you have an XLS file that is 15GB big? In which case, who knows
how much bloat Microsoft have stuffed in there. Again, the important
number is the product of the numbers of rows and columns.

 The fundamental thing is the number of numbers (and factors), not the
file size.

Barry

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] read.csv from a remote machine

2009-08-04 Thread Barry Rowlingson
On Tue, Aug 4, 2009 at 3:37 PM, Olga Lyashevska wrote:
> Dear all,
>
> I am trying to import data with read.csv and my file is on remote machine.
> I believe that I need to open a connection, not sure about syntax though.

 If it's on an HTTP server then you don't need to faff with
connections, just give the URL to read.csv:

 > data = read.csv("http://foo.example.com/file.csv";)

Probably works with ftp: too. How remote is it?

Barry

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] RServe - How to use 'createReference' method?

2009-08-04 Thread Romain Francois

Hi,

So the file you want to read is on the client machine, and you want to 
transfer it to the server and read it into R ?


What I guess you need is :
- open a FileInputStream, many many tutorials on the web will tell you 
how this works.

- open a RFileOutputStream
- call the read(byte[] b) method of the FileInputStream as many times as 
necessary to get a byte[] each time

- send each of these byte[] to the RFileOutputStream
- close the RFileOutputStream when there is nothing more to send
- then your file is in the server side now, you can read it into R using 
whatever suitable R command read.csv, read.delim, ...


Does that help ?

Romain


On 08/04/2009 04:01 PM, joaodaniel wrote:


Hello,

Actually you are right. I was looking on the wrong direction. But I still
got some doubts about how to use those methods.

I want to open a file on the client machine, for example a txt
tab-delimited, and create a dataframe with its information on R.

I got to create a RFileInputStream object, using the openFile method, so it
store the file information. Then I must use the read() method from the
RFileInputStream to get file information. And finally, I should use the
assing() method to relate the data with an R object. Is that right?


Romain Francois-2 wrote:

Hi,

The Rserve implementation of REngine does not support references (yet?).
Anyway, I don't think references are what you need here.

You probably want the methods createFile and openFile that create
RFileInputStream and RFileOutputStream which you can use to transfer
files through the R server wire.
http://www.rforge.net/org/docs/org/rosuda/REngine/Rserve/RConnection.html

Romain

On 08/03/2009 04:57 PM, joaodaniel wrote:

I need to input a txt, or xls, file from a client to R, using RServe.


 From what I've been reading, the best way to do this, is using the

'createReference' method, from REngine package.
But I couldn't find any documents exemplifying it's use. I got  to upload
a
file from java? And then? How do i refeer the file to this method?

Best Regards,

J. Daniel


--
Romain Francois
Professional R Enthusiast
+33(0) 6 28 91 30 30
http://romainfrancois.blog.free.fr
|- http://tr.im/vfxe : R GUI page on the R wiki
|- http://tr.im/tlNb : RGG#155, 156 and 157
`- http://tr.im/rw0p : useR! slides

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] fitted.values less than observed values

2009-08-04 Thread Federico Calboli

Actually, I tried doing

data2 = unique(data)
mod = lm(y ~ x1 + ... + xn, data2)
fitted(mod)

and I still get les fitted values than observations.

Federico


On 4 Aug 2009, at 12:18, Federico Calboli wrote:


Hi All,

I have some data where the dependent variable is a score, low (1:3) or
high (8:9), and the independent variables are 21 genotypic markers.
I'm fitting a logistic regression on the whole dataset after
transforming the score to 0/1 and normal linear regression on the high
and low subsets.

I all cases I have a numer of cases of data 'duplications', i.e.
different individuals with the same score and the same genotype at the
21 markers.

When I do:

mod$fitted.values I get a number of fitted values corresponding to the
umber of unique lines in the dataset. Is there a way to have the
fitted  values match the observation, even though some are duplicated
and so have the same fitted value? I could do it by hand but it's
laborious and I'd venture there is a better way.

Best,

Federico


--
Federico C. F. Calboli
Department of Epidemiology and Public Health
Imperial College, St. Mary's Campus
Norfolk Place, London W2 1PG

Tel +44 (0)20 75941602   Fax +44 (0)20 75943193

f.calboli [.a.t] imperial.ac.uk
f.calboli [.a.t] gmail.com







--
Federico C. F. Calboli
Department of Epidemiology and Public Health
Imperial College, St. Mary's Campus
Norfolk Place, London W2 1PG

Tel +44 (0)20 75941602   Fax +44 (0)20 75943193

f.calboli [.a.t] imperial.ac.uk
f.calboli [.a.t] gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] error in Elastic net

2009-08-04 Thread ram basnet
Dear R users,
 
I am new user for elastic net. I am trying to use elasticnet library.
I have marker data with 359 markers and 168 samples, and response is 
metabolites. I am trying to do regression between a metabolite and markers.
 But i am getting the following error:
 
> en<-enet(marker,as.numeric(vio),lambda=0.5,normalize=FALSE,intercept=TRUE)
Error in one %*% x : requires numeric matrix/vector arguments

Then, I convert marker into numeric by using the following command. And, here 
also getting error.
 
> as.numeric(marker)
Error: (list) object cannot be coerced to type 'double'
> is.numeric(marker)
[1] FALSE

 
Alternatively, I converted marker into numeric by using data.frame command, it 
seems markers are now converted into numeric. 
 
> is.factor(datafram01[,1])
[1] FALSE
> is.numeric(datafram01[,1])
[1] TRUE

But when i did again elastic net, i got the same error: 
 
> en<-enet(datafram01,as.numeric(vio),lambda=0.5,normalize=FALSE,intercept=TRUE)
Error in one %*% x : requires numeric matrix/vector arguments

Does someone have ideas to overcome these problem?
 
If it is, it will be great help for me.
 
Thanks in advance.
 
Sincerely,
Ram Kumar Basnet
Wageningen University,
The Netherlands
 


  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] [R-pkgs] Deducer 0.1 : An intuitive cross-platform data analysis GUI

2009-08-04 Thread Ian Fellows
Deducer 0.1 has been released to CRAN

Deducer is designed to be a free, easy to use, alternative to proprietary
software such as SPSS, JMP, and Minitab. It has a menu system to do common
data manipulation and data analysis tasks, and an excel-like spreadsheet in
which to view and edit data frames. The goal of the project is to two fold. 

1. Provide an intuitive interface so that non-technical users 
   can learn and perform analyses without programming getting 
   in their way. 
2. Increase the efficiency of expert R users when performing 
 common tasks by replacing hundreds of keystrokes with a few 
 mouse clicks. Also, as much as possible the GUI should not 
 get in their way if they just want to do some programming. 

Deducer is integrated into the Windows RGui, and the cross-platform Java
console JGR, and is also usable and accessible from the command line.
Screen shots and examples can be viewed in the online wiki manual:

http://www.deducer.org/pmwiki/pmwiki.php?n=Main.DeducerManual

Comments and questions are more than welcome. A discussion group has been
created for any questions or recommendations.

http://groups.google.com/group/deducer

Deducer Features:

Data manipulation:
1. Factor editor
2. Variable recoding
3. data sorting
4. data frame merging
5. transposing a data frame
6. subseting

Analysis:
1. Frequencies
2. Descriptives
3. Contingency tables
a. Nicely formatted tables with optional
i. Percentages
ii. Expected counts
iii. Residuals
b. Statistical tests
i. chi-squared 
ii. likelihood ratio
iii. fisher's exact
iv. mantel haenszel
v. kendall's tau
vi. spearman's rho
vii. kruskal-wallis
viii. mid-p values for all exact/monte carlo tests
4. One sample tests
a. T-test
b. Shapiro-wilk
c. Histogram/box-plot summaries
5. Two sample tests
a. T-test (student and welch)
b. Permutation test
c. Wilcoxon
d. Brunner-munzel
e. Kolmogorov-smirnov
f. Jitter/box-plot group comparison
6. K-sample tests
a. Anova (usual and welch)
b. Kruskal-wallis
c. Jitter/boxplot comparison
7. Correlation
a. Nicely formatted correlation matrices
b. Pearson's
c. Kendall's
d. Spearman's
e. Scatterplot paneled array
f. Circle plot
g. Full correlation matrix plot
8.Generalized Linear Models
a. Model preview
b. Intuitive model builder
c. diagnostic plots 
d. Component residual and added variable plots
e. Anova (type II and III implementing LR, Wald and F tests)
f. Parameter summary tables and parameter correlations
g. Influence and colinearity diagnostics
h. Post-hoc tests and confidence intervals 
   with (or without) adjustments for multiple testing.
i. Custom linear hypothesis tests
j. Effect mean summaries (with confidence intervals), and
plots
k. Exports: Residuals, Standardized residuals, Studentized
   residuals, Predicted Values (linear and link), Cooks 
   distance, DFBETA, DFFITS, hat values, and Cov Ratio
l. Observation weights and subseting
9. Logistic Regression
a. All GLM features
b. ROC Plot
10. Linear Model
a. All GLM features
b. Heteroskedastic robust tests

___
R-packages mailing list
r-packa...@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-packages

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] error in Elastic net

2009-08-04 Thread Steve Lianoglou

Hi,

On Aug 4, 2009, at 1:03 PM, ram basnet wrote:


Dear R users,

I am new user for elastic net. I am trying to use elasticnet library.
I have marker data with 359 markers and 168 samples, and response is  
metabolites. I am trying to do regression between a metabolite and  
markers.

 But i am getting the following error:

en<- 
enet 
(marker,as.numeric(vio),lambda=0.5,normalize=FALSE,intercept=TRUE)

Error in one %*% x : requires numeric matrix/vector arguments

Then, I convert marker into numeric by using the following command.  
And, here also getting error.



as.numeric(marker)

Error: (list) object cannot be coerced to type 'double'

is.numeric(marker)

[1] FALSE


Alternatively, I converted marker into numeric by using data.frame  
command, it seems markers are now converted into numeric.



is.factor(datafram01[,1])

[1] FALSE

is.numeric(datafram01[,1])

[1] TRUE

But when i did again elastic net, i got the same error:

en<- 
enet 
(datafram01 
,as.numeric(vio),lambda=0.5,normalize=FALSE,intercept=TRUE)

Error in one %*% x : requires numeric matrix/vector arguments

Does someone have ideas to overcome these problem?

If it is, it will be great help for me.


My guess is that your "marker" variable needs to be a matrix, not a  
list, and not a data.frame.


The rows of the matrix will correspond to the individual observations  
and the columns are the features/predictors of each observation -- so  
in your case, it will be a matrix with 168 rows and 359 columns.


Try that and see if it works.

-steve

--
Steve Lianoglou
Graduate Student: Computational Systems Biology
  |  Memorial Sloan-Kettering Cancer Center
  |  Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] RServe - How to use 'createReference' method?

2009-08-04 Thread Romain Francois

Hi,

I've done a more complete response in my blog. http://tr.im/vshK

Romain


On 08/04/2009 06:10 PM, Romain Francois wrote:


Hi,

So the file you want to read is on the client machine, and you want to
transfer it to the server and read it into R ?

What I guess you need is :
- open a FileInputStream, many many tutorials on the web will tell you
how this works.
- open a RFileOutputStream
- call the read(byte[] b) method of the FileInputStream as many times as
necessary to get a byte[] each time
- send each of these byte[] to the RFileOutputStream
- close the RFileOutputStream when there is nothing more to send
- then your file is in the server side now, you can read it into R using
whatever suitable R command read.csv, read.delim, ...

Does that help ?

Romain


On 08/04/2009 04:01 PM, joaodaniel wrote:


Hello,

Actually you are right. I was looking on the wrong direction. But I still
got some doubts about how to use those methods.

I want to open a file on the client machine, for example a txt
tab-delimited, and create a dataframe with its information on R.

I got to create a RFileInputStream object, using the openFile method,
so it
store the file information. Then I must use the read() method from the
RFileInputStream to get file information. And finally, I should use the
assing() method to relate the data with an R object. Is that right?


Romain Francois-2 wrote:

Hi,

The Rserve implementation of REngine does not support references (yet?).
Anyway, I don't think references are what you need here.

You probably want the methods createFile and openFile that create
RFileInputStream and RFileOutputStream which you can use to transfer
files through the R server wire.
http://www.rforge.net/org/docs/org/rosuda/REngine/Rserve/RConnection.html


Romain

On 08/03/2009 04:57 PM, joaodaniel wrote:

I need to input a txt, or xls, file from a client to R, using RServe.


From what I've been reading, the best way to do this, is using the

'createReference' method, from REngine package.
But I couldn't find any documents exemplifying it's use. I got to
upload
a
file from java? And then? How do i refeer the file to this method?

Best Regards,

J. Daniel





--
Romain Francois
Professional R Enthusiast
+33(0) 6 28 91 30 30
http://romainfrancois.blog.free.fr
|- http://tr.im/vshK : Transfer files through Rserve
|- http://tr.im/vfxe : R GUI page on the R wiki
`- http://tr.im/tlNb : RGG#155, 156 and 157

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Revolutions blog: July Roundup

2009-08-04 Thread David M Smith
I write about R every weekday at the Revolutions blog:
http://blog.revolution-computing.com

In case you missed them, here are some articles from last month of
particular interest to R users.

http://bit.ly/2HPlOe announced a directory of R user groups available
on the Revolutions blog.

http://bit.ly/12u7e7 noted that the Society of Actuaries promotes R
with a regular column.

http://bit.ly/Kh8eL noted that the New York Times mentioned R in the
context of SPSS's sale to IBM (as did several other media outlets).

http://bit.ly/4yJ5Zf offered a review of the BioConductor 2009
conference, and linked to my slides on parallel programming with R.

http://bit.ly/kLSyJ linked to the "Rosetta Code" site, where you can
see standard computing problems solved in many languages, including R.

http://bit.ly/YmG28 linked to John D Cook's tip-sheet for programmers
of other languages learning R.

http://bit.ly/fRVgM contended that working with interesting data --
like the top 100 song list -- is a good way to learn R.

http://bit.ly/12yRwd noted that R had a significant presence at the
open-source conference OSCON this year, and linked to slides of R
presentations.

http://bit.ly/PhtiK linked to an example of web-scraping using R and Rcurl.

http://bit.ly/N0mbO provides some examples of using iterators in R.

http://bit.ly/B4vR noted that Forbes has identified R as an
open-source venture "worth watching".

http://bit.ly/ym6QN links to an O'Reilly interview with REvolution's
Danese Cooper, talking about Open Government and R.

http://bit.ly/1abbqc showed how one website is using R to improve
performance through analysis of DNS lookup times.

http://bit.ly/GkKw4 offered a review of the UseR! 2009 conference
overall, and http://bit.ly/BcZCW discussed some of the presentations.

http://bit.ly/n788Q prompted a discussion about solving the Knapsack
Problem in R.

http://bit.ly/yuryk pointed to the Learning R blog, where a comparison
of ggplo2 and lattice graphics is ongoing.

http://bit.ly/JfZAk noted that presentations from the Rmetrics
financial conference are available for download.

http://bit.ly/QxXIB noted another instance of R being used at Google
(for boxplots).

(I've provided short URLs above because many mailers break the long
direct URLs.)

Other non-R-specific stories in July covered power-law distributions,
temporal illusions, game theory, commercial open-source, the Netflix
prize, and ferrofluid. The R Community Calendar has also been updated
at:
http://blog.revolution-computing.com/calendar.html

It's been a bit of a quiet month on the blog thanks to the conference
travel time-crunch, but things should be back to normal (or at least
skew-Normal) now. Thanks to everyone who provided comments and tips
and please keep them coming to da...@revolution-computing.com . (If
you sent me a suggestion and haven't had a response yet, I'm *almost*
through my email backlog - apologies for the delay.)

Regards to all,
# David Smith

--
David M Smith 
Director of Community, REvolution Computing www.revolution-computing.com
Tel: +1 (206) 577-4778 x3203 (San Francisco, USA)

Check out our upcoming events schedule at www.revolution-computing.com/events

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] fitted.values less than observed values

2009-08-04 Thread David Winsemius
Your first posting made me think that you were complaining that the  
fitted values were less than the raw values. Your second posting makes  
me think that you may be conflating the English word "less" with  the  
word English "fewer". Many native speakers make the same error, but in  
this context it may be a critical problem for communicating what you  
are seeing (or not seeing).


Perhaps you could be more expansive about what you see and what you  
expect with explicit attention to the numbers involved? Even better  
would be small *reproducible* example.


--
David

On Aug 4, 2009, at 12:51 PM, Federico Calboli wrote:


Actually, I tried doing

data2 = unique(data)
mod = lm(y ~ x1 + ... + xn, data2)
fitted(mod)

and I still get les fitted values than observations.

Federico


On 4 Aug 2009, at 12:18, Federico Calboli wrote:


Hi All,

I have some data where the dependent variable is a score, low (1:3)  
or

high (8:9), and the independent variables are 21 genotypic markers.
I'm fitting a logistic regression on the whole dataset after
transforming the score to 0/1 and normal linear regression on the  
high

and low subsets.

I all cases I have a numer of cases of data 'duplications', i.e.
different individuals with the same score and the same genotype at  
the

21 markers.

When I do:

mod$fitted.values I get a number of fitted values corresponding to  
the

umber of unique lines in the dataset. Is there a way to have the
fitted  values match the observation, even though some are duplicated
and so have the same fitted value? I could do it by hand but it's
laborious and I'd venture there is a better way.

Best,

Federico



David Winsemius, MD
Heritage Laboratories
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] regex question

2009-08-04 Thread ravi

Hi,
I am getting stuck over an apparently simple problem in the use of regular 
expressions :
To collect together the first letters of the words from the Perl motto, “There 
is more than one way to do it” in the following form – TIMTOWTDI. 
I tried the following code :
 
# A regex problem with the Perl motto
astr<-"There is more than one way to do it"
b1<-grep("\\<", astr,value=T)
## This just retrieves  the whole string
## Next trial with gregexpr
b2<-gregexpr("\\<",astr)
## This gives  :
> b3
[[1]]
[1]  1  7 10 15 20 24 28 31 34
attr(,"match.length")
[1] 0 0 0 0 0 0 0 0 0
 
A vector of indices corresponding to the first letter is obtained all right 
with gregexpr but the next step is not so clear. I am not able to figure out 
how I can use this information to pick out the letters from the original 
string. My problem is that I don’t know how I can treat the string as a vector 
and pluck out the letters.
 
There may be many ways to do it, but I have not succeeded in coming up with 
even one way! I will appreciate any tips that I can get.
Thanking you,
Ravi

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] fitted.values less than observed values

2009-08-04 Thread Federico Calboli

On 4 Aug 2009, at 18:27, David Winsemius wrote:


Your first posting made me think that you were complaining that the
fitted values were less than the raw values. Your second posting makes
me think that you may be conflating the English word "less" with  the
word English "fewer". Many native speakers make the same error, but in
this context it may be a critical problem for communicating what you
are seeing (or not seeing).


ok, so what I meant is

length(mod$fitted) < length(observations)



Perhaps you could be more expansive about what you see and what you
expect with explicit attention to the numbers involved? Even better
would be small *reproducible* example.


I'll have to cook that up, the data is more or less confidential. Not  
very much but enough no to go on google ;)


F




--
David

On Aug 4, 2009, at 12:51 PM, Federico Calboli wrote:


Actually, I tried doing

data2 = unique(data)
mod = lm(y ~ x1 + ... + xn, data2)
fitted(mod)

and I still get les fitted values than observations.

Federico


On 4 Aug 2009, at 12:18, Federico Calboli wrote:


Hi All,

I have some data where the dependent variable is a score, low (1:3)
or
high (8:9), and the independent variables are 21 genotypic markers.
I'm fitting a logistic regression on the whole dataset after
transforming the score to 0/1 and normal linear regression on the
high
and low subsets.

I all cases I have a numer of cases of data 'duplications', i.e.
different individuals with the same score and the same genotype at
the
21 markers.

When I do:

mod$fitted.values I get a number of fitted values corresponding to
the
umber of unique lines in the dataset. Is there a way to have the
fitted  values match the observation, even though some are  
duplicated

and so have the same fitted value? I could do it by hand but it's
laborious and I'd venture there is a better way.

Best,

Federico



David Winsemius, MD
Heritage Laboratories
West Hartford, CT



--
Federico C. F. Calboli
Department of Epidemiology and Public Health
Imperial College, St. Mary's Campus
Norfolk Place, London W2 1PG

Tel +44 (0)20 75941602   Fax +44 (0)20 75943193

f.calboli [.a.t] imperial.ac.uk
f.calboli [.a.t] gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] fitted.values less than observed values

2009-08-04 Thread Federico Calboli

On 4 Aug 2009, at 18:27, David Winsemius wrote:


Your first posting made me think that you were complaining that the
fitted values were less than the raw values. Your second posting makes
me think that you may be conflating the English word "less" with  the
word English "fewer". Many native speakers make the same error, but in
this context it may be a critical problem for communicating what you
are seeing (or not seeing).

Perhaps you could be more expansive about what you see and what you
expect with explicit attention to the numbers involved? Even better
would be small *reproducible* example.


Problem solved, I realised there are NAs in the data which I had  
completely forgot about (serves me right for digging up old data to  
add results to a paper). Without any irony or sarcasm, thanks for the  
grammar correction, it might prove useful in the future.


Best,

Federico




--
David

On Aug 4, 2009, at 12:51 PM, Federico Calboli wrote:


Actually, I tried doing

data2 = unique(data)
mod = lm(y ~ x1 + ... + xn, data2)
fitted(mod)

and I still get les fitted values than observations.

Federico


On 4 Aug 2009, at 12:18, Federico Calboli wrote:


Hi All,

I have some data where the dependent variable is a score, low (1:3)
or
high (8:9), and the independent variables are 21 genotypic markers.
I'm fitting a logistic regression on the whole dataset after
transforming the score to 0/1 and normal linear regression on the
high
and low subsets.

I all cases I have a numer of cases of data 'duplications', i.e.
different individuals with the same score and the same genotype at
the
21 markers.

When I do:

mod$fitted.values I get a number of fitted values corresponding to
the
umber of unique lines in the dataset. Is there a way to have the
fitted  values match the observation, even though some are  
duplicated

and so have the same fitted value? I could do it by hand but it's
laborious and I'd venture there is a better way.

Best,

Federico



David Winsemius, MD
Heritage Laboratories
West Hartford, CT



--
Federico C. F. Calboli
Department of Epidemiology and Public Health
Imperial College, St. Mary's Campus
Norfolk Place, London W2 1PG

Tel +44 (0)20 75941602   Fax +44 (0)20 75943193

f.calboli [.a.t] imperial.ac.uk
f.calboli [.a.t] gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] regex question

2009-08-04 Thread Gabor Grothendieck
Try this:

> library(gsubfn)
> strapply(astr, "\\w+", ~ substr(x, 1, 1), simplify = c)
 [1] "T" "i" "m" "t" "o" "w" "t" "d" "i" "i" "t" "f" "f" "T"


On Tue, Aug 4, 2009 at 1:28 PM, ravi wrote:
>
> Hi,
> I am getting stuck over an apparently simple problem in the use of regular 
> expressions :
> To collect together the first letters of the words from the Perl motto, 
> “There is more than one way to do it” in the following form – TIMTOWTDI.
> I tried the following code :
>
> # A regex problem with the Perl motto
> astr<-"There is more than one way to do it"
> b1<-grep("\\<", astr,value=T)
> ## This just retrieves  the whole string
> ## Next trial with gregexpr
> b2<-gregexpr("\\<",astr)
> ## This gives  :
>> b3
> [[1]]
> [1]  1  7 10 15 20 24 28 31 34
> attr(,"match.length")
> [1] 0 0 0 0 0 0 0 0 0
>
> A vector of indices corresponding to the first letter is obtained all right 
> with gregexpr but the next step is not so clear. I am not able to figure out 
> how I can use this information to pick out the letters from the original 
> string. My problem is that I don’t know how I can treat the string as a 
> vector and pluck out the letters.
>
> There may be many ways to do it, but I have not succeeded in coming up with 
> even one way! I will appreciate any tips that I can get.
> Thanking you,
> Ravi
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] regex question

2009-08-04 Thread Gabor Grothendieck
And here is a second way:

> strapply(astr, "(\\w)\\w+", c, simplify = c)
 [1] "T" "i" "m" "t" "o" "w" "t" "d" "i" "i" "t" "f" "f" "T"


On Tue, Aug 4, 2009 at 1:42 PM, Gabor
Grothendieck wrote:
> Try this:
>
>> library(gsubfn)
>> strapply(astr, "\\w+", ~ substr(x, 1, 1), simplify = c)
>  [1] "T" "i" "m" "t" "o" "w" "t" "d" "i" "i" "t" "f" "f" "T"
>
>
> On Tue, Aug 4, 2009 at 1:28 PM, ravi wrote:
>>
>> Hi,
>> I am getting stuck over an apparently simple problem in the use of regular 
>> expressions :
>> To collect together the first letters of the words from the Perl motto, 
>> “There is more than one way to do it” in the following form – TIMTOWTDI.
>> I tried the following code :
>>
>> # A regex problem with the Perl motto
>> astr<-"There is more than one way to do it"
>> b1<-grep("\\<", astr,value=T)
>> ## This just retrieves  the whole string
>> ## Next trial with gregexpr
>> b2<-gregexpr("\\<",astr)
>> ## This gives  :
>>> b3
>> [[1]]
>> [1]  1  7 10 15 20 24 28 31 34
>> attr(,"match.length")
>> [1] 0 0 0 0 0 0 0 0 0
>>
>> A vector of indices corresponding to the first letter is obtained all right 
>> with gregexpr but the next step is not so clear. I am not able to figure out 
>> how I can use this information to pick out the letters from the original 
>> string. My problem is that I don’t know how I can treat the string as a 
>> vector and pluck out the letters.
>>
>> There may be many ways to do it, but I have not succeeded in coming up with 
>> even one way! I will appreciate any tips that I can get.
>> Thanking you,
>> Ravi
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] read.csv from a remote machine

2009-08-04 Thread Olga Lyashevska

Thanks Barry and Steve,

I am trying to import data with read.csv and my file is on remote  
machine.
I believe that I need to open a connection, not sure about syntax  
though.

Probably works with ftp: too. How remote is it?


In fact it is a bit more complicated.
I am working on a Mac machine, from this machine I establish ssh with a  
Linux machine. I run R on Linux, while all my data files are stored on  
Mac. So in this case although physically I am using Mac, it is in fact  
remote. I hope it answers your question.



Thanks again,
Olga

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] R's database capabilities

2009-08-04 Thread Jim Bouldin

I admit that I've not done a thorough search on this topic, but from the
several instructional manuals and/or tutorials I've looked at, I don't see
any mention of relational database capabilities in R?  Have I missed
something, and if so, can someone  point me in the right direction to get
started?  Thanks!


Jim Bouldin, PhD
Research Ecologist
Department of Plant Sciences, UC Davis
Davis CA, 95616
530-554-1740

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R's database capabilities

2009-08-04 Thread Romain Francois

Hi,

Did your search include this :

R> RSiteSearch( "database" )

Romain

On 08/04/2009 08:04 PM, Jim Bouldin wrote:


I admit that I've not done a thorough search on this topic, but from the
several instructional manuals and/or tutorials I've looked at, I don't see
any mention of relational database capabilities in R?  Have I missed
something, and if so, can someone  point me in the right direction to get
started?  Thanks!


Jim Bouldin, PhD
Research Ecologist
Department of Plant Sciences, UC Davis
Davis CA, 95616
530-554-1740



--
Romain Francois
Professional R Enthusiast
+33(0) 6 28 91 30 30
http://romainfrancois.blog.free.fr
|- http://tr.im/vshK : Transfer files through Rserve
|- http://tr.im/vfxe : R GUI page on the R wiki
`- http://tr.im/tlNb : RGG#155, 156 and 157

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Caculate first difference from a dataframe; write a simulation

2009-08-04 Thread Meenu Sahi
Dear R Users

I'm writing my first simulation in R.
I've put across my problems with a smaller example in the attachment along
with the questions.

Please help.

Best regards
Meenu

mydat<-read.table(textConnection("Level spread change State
4.57 1.6 BlF NA
4.45 2.04 BrS NA
3.07 2.49 BlS NA
3.26 -0.26 BlF NA
2.80 0.22 BrF NA
3.22 2.5 BrS NA
4.2 -0.34 BlF NA
3.80 0.35 BrS NA
4.28 1.78 BrF NA
5.4 -0.34 BrF NA
4.89 0.24 BlF NA"), header=TRUE,as.is=TRUE)

mydat3<-data.frame(mydat)

q<-quantile(mydat3[[1]],c(0,0.25,0.75,1))
z.level<-cut(mydat3[[1]],q,include.lowest=T)
summary(z.level)

q<-quantile(mydat3[[2]],c(0,0.25,0.75,1))
z.shape<-cut(mydat3[[2]],q,include.lowest=T)
summary(z.shape)

#to identify States in mydat3
attach(mydat3)
state1<-data.frame(mydat3[spread>=-Inf & spread<= -0.02 & Level>=-Inf & 
Level<=3.24,])
state2<-data.frame(mydat3[spread>-0.02 & spread<= 1.91 & Level>=-Inf & 
Level<=3.24,])
state3<-data.frame(mydat3[spread>1.91 & spread<= Inf & Level>=-Inf & 
Level<=3.24,])
state4<-data.frame(mydat3[spread>=-Inf & spread<= -0.02 & Level>3.24 & 
Level<=4.51,])
state5<-data.frame(mydat3[spread>-0.02 & spread<= 1.91 & Level>3.24 & 
Level<=4.51,])
state6<-data.frame(mydat3[spread>1.91 & spread<= Inf & Level>3.24 & 
Level<=4.51,])
state7<-data.frame(mydat3[spread>=-Inf & spread<= -0.02 & Level>4.51 & 
Level<=Inf,])
state8<-data.frame(mydat3[spread>-0.02 & spread<= 1.91 & Level>4.51 & 
Level<=Inf,])
state9<-data.frame(mydat3[spread>1.91 & spread<= Inf & Level>4.51 & 
Level<=Inf,])
detach(mydat3) 


state2<-transform(state2,
State="State2"
)
state3<-transform(state3,
State="State3"
)
state4<-transform(state5,
State="State4"
)
state5<-transform(state5,
State="State5"
)
state6<-transform(state6,
State="State6"
)
state7<-transform(state7,
State="State7"
)
state8<-transform(state8,
State="State8"
)


mydat4<-data.frame(rbind(state2,state3,state4,state5,state6,state7,state8))
## To identify states - can it be done in an easier way?
#
##Question1:I want to calculate the first difference of mydat4[,1:2]
t<-diff(mydat4[,1:2],1)
#The command fails, why? Because it fails I've written a code for calculating 
the first diff

c<-dim(mydat4)
e<-mydat4[-c[1],]
dim(e)
f<-mydat4[-1,]
dim(f)
#f
firstDiff<-data.frame(f[,-c(3:4)]-e[,-c(3:4)])

# Draw a random sample for the first differences
d<-dim(mydat3)
z<-mydat3[d[1],1:2]
z.t<-t(z)
x<-data.frame(t(firstDiff[,1:2]))
y<-sample(x[,1:2],1,replace=T)
ad<-z.t+y
##Output of ad is given below
ad
 X6
Level  5.04
spread 0.25
###
##Question2: How can I easily identify which out of the 9 states does ad fall 
into with Level =5.04 and spread=0.25?
##OUTPUT
ad
 X6
Level  5.04
spread 0.25
state  state8

##
##Question3: I want to write a simulation of the order given below: HOw can I 
in R store each 
simulation result in deltay[s,t] where s keeps track of scenarios and t keeps 
track of time within a scenario
Do I need to declare 
See below in 'program-language'.

For (s in 1:1){
yield[s,1]=z.t 
#How can I write yield[s,]<-z.t ?
For (t in 1:60){
deltay[s,t] = sample(x[,1:2],1,replace=T)
yield[s,t+1]=y[s,t]+ deltay[s,t]
}
write scenario s
# how can I write scenario s into a dataframe that can be stored for future 
calculations
}
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Strange error with ROCR

2009-08-04 Thread Noah Silverman
Good point.  I'm not sure how I missed that.

This does lead to an additional question:

Is the "probability of the true label" the best prediction to feed to 
the ROCR package, or is it better to use the "decision.value"

Anybody have any experience on this one?

Thanks!

-N

On 8/4/09 3:28 AM, Christian Schulz wrote:
> Hi,
>
> you need  the score value , have a look at ?svm.predict and in the 
> ROCR example.
>
> traindata <- as.data.frame(matrix(runif(1000),ncol=10))
> trainlabels <-  
> as.factor(sample(c("win","lose"),nrow(data),replace=T,prob=c(0.5,0.5)))
>
> model <- svm(traindata,trainlabels, type="C-classification", 
> kernel="radial", cost=10,
> class.weights=c("win"=3,"lose"=1), scale=FALSE, probability = TRUE)
>
> prediction <- predict(model, traindata, decision.values = TRUE, 
> probability = TRUE)
> probs <-  attr(prediction, "probabilities")[,1]
> pred <- prediction(probs,trainlabels)
>
> HTH Christian
>
>> Hello,
>>
>> I've come across a strange error...
>>
>>
>> Here is what happens:
>>
>> model <- svm(traindata,trainlabels, type="C-classification", 
>> kernel="radial", cost=10,  class.weights=c("win"=3,"lose"=1), 
>> scale=FALSE, probability = TRUE)
>> predictions <- predict(model, traindata)
>> pred <- prediction(predictions, trainlabels)
>>
>>
>> This returns an error:
>> Error in prediction(predictions, trainlabels) :
>>Format of predictions is invalid.
>>
>> Yet my predictions is just a matrix of predicted labels.  Nothing 
>> fancy.  (In fact, my step follow the exact example on the ROCR 
>> homepage.)
>>
>> A search through google for "Format of predictions is invalid" 
>> returns zero results.
>>
>> Can anyone suggest how I might fix this problem?
>>
>> Thank You,
>>
>>
>>
>>
>>
>> [[alternative HTML version deleted]]
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide 
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R's database capabilities

2009-08-04 Thread David Winsemius




On Aug 4, 2009, at 2:04 PM, Jim Bouldin wrote:



I admit that I've not done a thorough search on this topic, but from  
the
several instructional manuals and/or tutorials I've looked at, I  
don't see

any mention of relational database capabilities in R?  Have I missed
something, and if so, can someone  point me in the right direction  
to get

started?  Thanks!



It appears you missed the functions that do searching. Try:

??"sql"   # two pages of hits (but depends on what you have installed)

RSiteSearch("sql")# 398 hits

RSiteSearch("database")   # 1200+ hits

--

David Winsemius, MD
Heritage Laboratories
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Sweave, cm-lgc and minus signs

2009-08-04 Thread Bert Stumm
Hello,

since a couple of days I'm trying hard to elicit a certain thing
out of the Sweave function of R. Unfortunately I'm quite unsuccessful.

It's only about a small, ridiculous minus sign, which does not appear
in the final pdf of a latex file, if I try to incorporate the Computer-
Modern fonts into the plot. It seems, that R uses different encodings
for minus signs, which are put "by hand" and which are plotted from a
variable with negative value.

###

As a fast example everybody can retrace immediately, there is the nice
example of Paul Murrell which can be found at:
http://www.stat.auckland.ac.nz/~paul/R/CM/CMR.html

Essentially, what I need from this page are the following 4 files for 
creating my final pdf:

http://www.stat.auckland.ac.nz/~paul/R/CM/cmTutorial.Rnw

This example needs the according Tex-package and the files for the symbol
faces found at: 
http://www.ctan.org/tex-archive/help/Catalogue/entries/cm-lgc.html 
http://www.stat.auckland.ac.nz/~paul/R/CM/cmsyase.afm
http://www.stat.auckland.ac.nz/~paul/R/CM/cmsyase.pfb

###

Creating the tex-file with
R CMD Sweave cmTutorial.Rnw
(if the 'lattice' package is not loaded immediately then type a 
library(lattice)
in front of the 'print(<>)'
)

and compiling it with
pdflatex cmTutorial.tex

yields a perfect plot. BUT, if minus signs appear in the plot, it does not
work poperly anymore! Just change the endpoints in the 'histogram' 
function to 'c(-59.5, 76.5)' ( instead of 'c(59.5, 76.5)' ) and run the
two upper commands again. Then, I see a minus sign in the file 
'cmTutorial-latticeShow.pdf' but NOT in the 'cmTutorial.pdf'. 

However, putting sth. like 'mtext("-3.14...",1,-2)' somewhere, I will see
a nice minus sign also in the final plot. But it will look different 
compared to the minus which I can see in 'cmTutorial-latticeShow.pdf'.

###

Can anybody explain me, how to get rid of this problem?
How can pdflatex change something in the included pdf?

Cheers,
Frank





-- 
Jetzt kostenlos herunterladen: Internet Explorer 8 und Mozilla Firefox 3 -
sicherer, schneller und einfacher! http://portal.gmx.net/de/go/atbrowser

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Completion for custom "$" operator?

2009-08-04 Thread Vitalie S.

Dear UseRs,

I declared a `$` method for a S4 class. Can I have ab automatic completion  
for this operator in R? Lists and environment objects provide this feature  
by default, but my object is an extension of "function" class which does  
not have subseting defined. How to be?


Thanks for any input.
Vitalie.

--

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Strange error with ROCR

2009-08-04 Thread Tobias Sing
> Is the "probability of the true label" the best prediction to feed to
> the ROCR package, or is it better to use the "decision.value"

Since AFAIK they are related by a monotonous transformation, both
approaches should lead to the same ROC curve, shouldn't they? (not
tested)

On Tue, Aug 4, 2009 at 8:14 PM, Noah Silverman wrote:
> Good point.  I'm not sure how I missed that.
>
> This does lead to an additional question:
>
> Is the "probability of the true label" the best prediction to feed to
> the ROCR package, or is it better to use the "decision.value"
>
> Anybody have any experience on this one?
>
> Thanks!
>
> -N
>
> On 8/4/09 3:28 AM, Christian Schulz wrote:
>> Hi,
>>
>> you need  the score value , have a look at ?svm.predict and in the
>> ROCR example.
>>
>> traindata <- as.data.frame(matrix(runif(1000),ncol=10))
>> trainlabels <-
>> as.factor(sample(c("win","lose"),nrow(data),replace=T,prob=c(0.5,0.5)))
>>
>> model <- svm(traindata,trainlabels, type="C-classification",
>> kernel="radial", cost=10,
>> class.weights=c("win"=3,"lose"=1), scale=FALSE, probability = TRUE)
>>
>> prediction <- predict(model, traindata, decision.values = TRUE,
>> probability = TRUE)
>> probs <-  attr(prediction, "probabilities")[,1]
>> pred <- prediction(probs,trainlabels)
>>
>> HTH Christian
>>
>>> Hello,
>>>
>>> I've come across a strange error...
>>>
>>>
>>> Here is what happens:
>>>
>>> model <- svm(traindata,trainlabels, type="C-classification",
>>> kernel="radial", cost=10,  class.weights=c("win"=3,"lose"=1),
>>> scale=FALSE, probability = TRUE)
>>> predictions <- predict(model, traindata)
>>> pred <- prediction(predictions, trainlabels)
>>>
>>>
>>> This returns an error:
>>> Error in prediction(predictions, trainlabels) :
>>>    Format of predictions is invalid.
>>>
>>> Yet my predictions is just a matrix of predicted labels.  Nothing
>>> fancy.  (In fact, my step follow the exact example on the ROCR
>>> homepage.)
>>>
>>> A search through google for "Format of predictions is invalid"
>>> returns zero results.
>>>
>>> Can anyone suggest how I might fix this problem?
>>>
>>> Thank You,
>>>
>>>
>>>
>>>
>>>
>>>     [[alternative HTML version deleted]]
>>>
>>> __
>>> R-help@r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>
>        [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Strange error with ROCR

2009-08-04 Thread Noah Silverman
I hadn't thought of that.  I'll run some tests...

-N


On 8/4/09 11:49 AM, Tobias Sing wrote:
>> Is the "probability of the true label" the best prediction to feed to
>> the ROCR package, or is it better to use the "decision.value"
>>  
> Since AFAIK they are related by a monotonous transformation, both
> approaches should lead to the same ROC curve, shouldn't they? (not
> tested)
>
> On Tue, Aug 4, 2009 at 8:14 PM, Noah Silverman  
> wrote:
>
>> Good point.  I'm not sure how I missed that.
>>
>> This does lead to an additional question:
>>
>> Is the "probability of the true label" the best prediction to feed to
>> the ROCR package, or is it better to use the "decision.value"
>>
>> Anybody have any experience on this one?
>>
>> Thanks!
>>
>> -N
>>
>> On 8/4/09 3:28 AM, Christian Schulz wrote:
>>  
>>> Hi,
>>>
>>> you need  the score value , have a look at ?svm.predict and in the
>>> ROCR example.
>>>
>>> traindata<- as.data.frame(matrix(runif(1000),ncol=10))
>>> trainlabels<-
>>> as.factor(sample(c("win","lose"),nrow(data),replace=T,prob=c(0.5,0.5)))
>>>
>>> model<- svm(traindata,trainlabels, type="C-classification",
>>> kernel="radial", cost=10,
>>> class.weights=c("win"=3,"lose"=1), scale=FALSE, probability = TRUE)
>>>
>>> prediction<- predict(model, traindata, decision.values = TRUE,
>>> probability = TRUE)
>>> probs<-  attr(prediction, "probabilities")[,1]
>>> pred<- prediction(probs,trainlabels)
>>>
>>> HTH Christian
>>>
>>>
 Hello,

 I've come across a strange error...


 Here is what happens:

 model<- svm(traindata,trainlabels, type="C-classification",
 kernel="radial", cost=10,  class.weights=c("win"=3,"lose"=1),
 scale=FALSE, probability = TRUE)
 predictions<- predict(model, traindata)
 pred<- prediction(predictions, trainlabels)


 This returns an error:
 Error in prediction(predictions, trainlabels) :
 Format of predictions is invalid.

 Yet my predictions is just a matrix of predicted labels.  Nothing
 fancy.  (In fact, my step follow the exact example on the ROCR
 homepage.)

 A search through google for "Format of predictions is invalid"
 returns zero results.

 Can anyone suggest how I might fix this problem?

 Thank You,





  [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

  
>>>
>> [[alternative HTML version deleted]]
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>  

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] ggplot2 course: 19 October, Austin TX / the web

2009-08-04 Thread Hadley Wickham
Hi all,

On October 19, I'll be offering a one data ggplot2 course in
conjunction with the ISMI Manufacturing Week. The course is open to
all and you can attend in person (Austin TX) or over the web.  More
information available from http://lookingatdata.com/ismi-2009/

Regards,

Hadley

-- 
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] array slice notation?

2009-08-04 Thread Steve Jaffe

Suppose I have an n-diml array A and I want to extract the first "row" -- ie
all elements A[1, ...]

Interactively if I know 'n' I can write A[1,] with (n-1) commas. 

How do I do the same more generally, eg in a script?

(I can think of doing this by converting A to a vector then extracting the
approp elements then reshaping it to an array, but I wonder if there isn't a
more straightforward approach)
Thanks

-- 
View this message in context: 
http://www.nabble.com/array-slice-notation--tp24814643p24814643.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] session logging

2009-08-04 Thread Allan Engelhardt
sink(..., type=c("output","message"), split=TRUE) at the beginning of 
your session should do it?


Jacob Wegelin wrote:


Consider all the text that one sees on the console during an R session.

Is there a way, within R, to make all this text--both the "output" and 
the "messages"--automatically get copied to a single text file, in 
addition to seeing it on the console?


If I remember to save the console to a file at the end of my R 
session, that does it. But


(1) That requires pointing and clicking--can it be automated as a text 
command?


(2) It would be nice to issue the text command at the start of the R 
session, such as "log this entire session in mylog.txt, append", if 
this would ensure that the session is logged whether I remember to 
save the console or not.


As far as I can tell,

sink(file="mylog.txt")

will hide the output from me and put it into mylog.txt. But it still 
shows me the error messages.


An attempt to put the output and messages into separate files returns 
an error:



sink("junkout.txt", type="output")
sink("junkmsg.txt", type="message")

Error in sink("junkmsg.txt", type = "message") :
  'file' must be NULL or an already open connection

and at any rate I'd like both messages in the same file, just like on 
the console.


People who run R at the unix command line apparently use the unix 
command -script-. But I mean something that will work within R, 
platform-independent.


A 2003 post to R-help suggests savehistory(), but this does *not* save 
the console; I tried it just now. Another post from the same thread 
suggests using emacs. But that is not platform-independent.


The existence of the 2003 thread suggests that this issue comes up 
periodically. Was it a deliberate design decision not to make logs 
available, in contrast to the way logging works in Stata?


I use the Rgui on a MacBook Pro:


sessionInfo()

R version 2.8.1 (2008-12-22) i386-apple-darwin8.11.1

locale:
en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

other attached packages:
[1] lme4_0.999375-28   Matrix_0.999375-21 lattice_0.17-17
foreign_0.8-29

loaded via a namespace (and not attached):
[1] boot_1.2-34 grid_2.8.1


Thanks for any insights.

Jacob A. Wegelin
Assistant Professor
Department of Biostatistics
Virginia Commonwealth University
730 East Broad Street Room 3006
P. O. Box 980032
Richmond VA 23298-0032
U.S.A. E-mail: jwege...@vcu.edu URL: http://www.people.vcu.edu/~jwegelin

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] array slice notation?

2009-08-04 Thread Steve Lianoglou

Hi,

On Aug 4, 2009, at 3:23 PM, Steve Jaffe wrote:



Suppose I have an n-diml array A and I want to extract the first  
"row" -- ie

all elements A[1, ...]

Interactively if I know 'n' I can write A[1,] with (n-1) commas.

How do I do the same more generally, eg in a script?

(I can think of doing this by converting A to a vector then  
extracting the
approp elements then reshaping it to an array, but I wonder if there  
isn't a

more straightforward approach)


You actually don't have to convert A to a vector, you can use vector- 
style indexing into a matrix:


R> m <- matrix(1:20, 4)
R> m
 [,1] [,2] [,3] [,4] [,5]
[1,]159   13   17
[2,]26   10   14   18
[3,]37   11   15   19
[4,]48   12   16   20

R> m[,3]
[1]  9 10 11 12

R> m[9:12]
[1]  9 10 11 12

You're just left to calculate the correct (linear) indices, which I  
guess isn't too (too) bad.


-steve

--
Steve Lianoglou
Graduate Student: Computational Systems Biology
  |  Memorial Sloan-Kettering Cancer Center
  |  Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] scatterplot3d bug??

2009-08-04 Thread Uwe Ligges

Will take a look when back from holidays in 2 weeks.

Uwe



David Winsemius wrote:
Not sure you can call it a bug when the help page says for angles in 
range (180, 360) that some support functions may "not work properly".



On Aug 3, 2009, at 2:12 PM, Vivek Ayer wrote:


Hey guys,

Not sure if I encountered a bug with the scatterplot3d function.
Here's the calls I made:

s3d1 <-scatterplot3d(TotLogDisttenp,TotDifftenp, TotMeasuredRSLtenp, 
pch=16,highlight.3d=TRUE,angle=40, type="h",main="MRSL ~ LogDist + 
Diff");

s3d1$plane3d(fitols);


s3d1 <- scatterplot3d(TotLogDisttenp,TotDifftenp, 
TotMeasuredRSLtenp,pch=16, 
highlight.3d=TRUE,angle=130,type="h",main="MRSL~ LogDist + Diff");

s3d1$plane3d(fitols);
s3d1 <- scatterplot3d(TotLogDisttenp,TotDifftenp, 
TotMeasuredRSLtenp,pch=16, 
highlight.3d=TRUE,angle=210,type="h",main="MRSL ~ LogDist + Diff");

s3d1$plane3d(fitols);


 I suspect s3d1$plane3d(fitols)  uses one of the futzed functions.

s3d1 <- scatterplot3d(TotLogDisttenp,TotDifftenp, 
TotMeasuredRSLtenp,pch=16, 
highlight.3d=TRUE,angle=310,type="h",main="MRSL~ LogDist + Diff");

s3d1$plane3d(fitols);

Essentially four plots showing the data from different angles. This
includes the fit plane. The first two graphs make sense, but for the
latter two, the fit plane is not making sense.

Take a look at the attached png.

Is it a bug?


David Winsemius, MD
Heritage Laboratories
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] array slice notation?

2009-08-04 Thread Steve Jaffe

Yes, I was thinking more in terms of mental operations than physical.
I think the following works, but it doesn't seem entirely transparent :-)

Given array A, and a vector of row indices v  (ie 1 <= v <= dim(A)[1]), the
slice of rows v is

A[ outer(v, dim(A)[1]*( 1:prod(dim(A)[-1])-1 ), '+') ]


Steve Lianoglou-6 wrote:
> 
> You actually don't have to convert A to a vector, you can use vector- 
> style indexing into a matrix:
> 

-- 
View this message in context: 
http://www.nabble.com/array-slice-notation--tp24814643p24815116.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Completion for custom "$" operator?

2009-08-04 Thread Deepayan Sarkar
On Tue, Aug 4, 2009 at 11:37 AM, Vitalie S. wrote:
> Dear UseRs,
>
> I declared a `$` method for a S4 class. Can I have ab automatic completion
> for this operator in R? Lists and environment objects provide this feature
> by default, but my object is an extension of "function" class which does not
> have subseting defined. How to be?

Completion should be automatic if you define names() to return the valid names.

-Deepayan

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] array slice notation?

2009-08-04 Thread Steve Jaffe

Although you *do* have to re-assign the dimensions, otherwise the result is
just a flat vector, ie
 
slice <- A[ outer(v, dim(A)[1]*( 1:prod(dim(A)[-1])-1 ), '+') ]
dim(slice) <- dim(A)[-1]


Steve Jaffe wrote:
> 
> A[ outer(v, dim(A)[1]*( 1:prod(dim(A)[-1])-1 ), '+') ]
> 

-- 
View this message in context: 
http://www.nabble.com/array-slice-notation--tp24814643p24815435.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] array slice notation?

2009-08-04 Thread Søren Højsgaard
You can do 
> A <- HairEyeColor
> do.call("[", c(list(A),list(1,T,T)))
   Sex
Eye Male Female
  Brown   32 36
  Blue11  9
  Hazel   10  5
  Green3  2

Regards
Søren


Fra: r-help-boun...@r-project.org [r-help-boun...@r-project.org] På vegne 
af Steve Jaffe [sja...@riskspan.com]
Sendt: 4. august 2009 21:23
Til: r-help@r-project.org
Emne: [R]  array slice notation?

Suppose I have an n-diml array A and I want to extract the first "row" -- ie
all elements A[1, ...]

Interactively if I know 'n' I can write A[1,] with (n-1) commas.

How do I do the same more generally, eg in a script?

(I can think of doing this by converting A to a vector then extracting the
approp elements then reshaping it to an array, but I wonder if there isn't a
more straightforward approach)
Thanks

--
View this message in context: 
http://www.nabble.com/array-slice-notation--tp24814643p24814643.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] array slice notation?

2009-08-04 Thread Steve Jaffe

correction -- that would work for a single row, if you want the result to be
an array with one fewer dimensions. But in general you get an array of the
same dimension you started with (where the first dimension may be length 1).
So:

dim(slice) <- c(length(v), dim(arr)[-1])


Although you *do* have to re-assign the dimensions, otherwise the result is
just a flat vector, ie
 

-- 
View this message in context: 
http://www.nabble.com/array-slice-notation--tp24814643p24816190.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] session logging

2009-08-04 Thread Jacob Wegelin
Thanks. This seems to mirror the output *both* to the console *and* to the
sink file. The error messages, on the other hand, show up only in the
console.

On Tue, Aug 4, 2009 at 3:24 PM, Allan Engelhardt  wrote:

> sink(..., type=c("output","message"), split=TRUE) at the beginning of your
> session should do it?
>
> Jacob Wegelin wrote:
>
>>
>> Consider all the text that one sees on the console during an R session.
>>
>> Is there a way, within R, to make all this text--both the "output" and the
>> "messages"--automatically get copied to a single text file, in addition to
>> seeing it on the console?
>>
>> If I remember to save the console to a file at the end of my R session,
>> that does it. But
>>
>> (1) That requires pointing and clicking--can it be automated as a text
>> command?
>>
>> (2) It would be nice to issue the text command at the start of the R
>> session, such as "log this entire session in mylog.txt, append", if this
>> would ensure that the session is logged whether I remember to save the
>> console or not.
>>
>> As far as I can tell,
>>
>> sink(file="mylog.txt")
>>
>> will hide the output from me and put it into mylog.txt. But it still shows
>> me the error messages.
>>
>> An attempt to put the output and messages into separate files returns an
>> error:
>>
>>  sink("junkout.txt", type="output")
>>> sink("junkmsg.txt", type="message")
>>>
>> Error in sink("junkmsg.txt", type = "message") :
>>  'file' must be NULL or an already open connection
>>
>> and at any rate I'd like both messages in the same file, just like on the
>> console.
>>
>> People who run R at the unix command line apparently use the unix command
>> -script-. But I mean something that will work within R,
>> platform-independent.
>>
>> A 2003 post to R-help suggests savehistory(), but this does *not* save the
>> console; I tried it just now. Another post from the same thread suggests
>> using emacs. But that is not platform-independent.
>>
>> The existence of the 2003 thread suggests that this issue comes up
>> periodically. Was it a deliberate design decision not to make logs
>> available, in contrast to the way logging works in Stata?
>>
>> I use the Rgui on a MacBook Pro:
>>
>>  sessionInfo()
>>>
>> R version 2.8.1 (2008-12-22) i386-apple-darwin8.11.1
>>
>> locale:
>> en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
>>
>> attached base packages:
>> [1] stats graphics  grDevices utils datasets  methods   base
>>
>> other attached packages:
>> [1] lme4_0.999375-28   Matrix_0.999375-21 lattice_0.17-17
>> foreign_0.8-29
>>
>> loaded via a namespace (and not attached):
>> [1] boot_1.2-34 grid_2.8.1
>>
>>
>> Thanks for any insights.
>>
>> Jacob A. Wegelin
>> Assistant Professor
>> Department of Biostatistics
>> Virginia Commonwealth University
>> 730 East Broad Street Room 3006
>> P. O. Box 980032
>> Richmond VA 23298-0032
>> U.S.A. E-mail: jwege...@vcu.edu URL: http://www.people.vcu.edu/~jwegelin
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Variable names as inputs...

2009-08-04 Thread Horacio Samaniego
That is exactly what I needed! I dunno why I did not think about it
earlier duuhhh!!!


thanks very much!!!

sapply(modelos, AIC) did the job!

more specifically, (I wanted a dataframe to make comparaisons easy)

k=as.data.frame(sapply(modelos,AIC))

df_results=cbind(k,modelo=rownames(k))


So, very much appreciated! This is a great list!


H

On Tue, Aug 4, 2009 at 4:54 PM, Jean V Adams  wrote:

>
> Your object models is a list of lm() fits.
> So, you could use sapply() to calculate the AIC of each model, and save the
> results in a vector:
>
> sapply(modelos, AIC)
>
> Hope this helps.
>
> Jean

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] array slice notation?

2009-08-04 Thread Jim Lemon

Hi Steve,
I had the same problem when writing the hierobarp function in plotrix. 
Have a look at the source code (v2.6-4 in my copy lines 128-133) as the 
solution seems to work well.


Jim

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Stacked plots with common x-axis and different y-axis

2009-08-04 Thread Jason Rupert
Is there a place that shows how to create two plots that are stacked on top of 
each other where they share a common x-axis scale, but have differnt y-axis 
scale?

Say have the following data: airquality
Stack plot(airquality$Day, airquality$Wind) on top of  plot(airquality$Day, 
airquality$Temp). 

I am interested in stacking the two on top of each other with no seam, or 
plotting the two lines with two different y-axis scales on the same plot.  

Thanks for any feedback and insights.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] graph data mining

2009-08-04 Thread Steve Lianoglou

Hi,

On Aug 3, 2009, at 10:37 PM, Weiwei Shi wrote:


Hi

I am wondering if R or some others provide free-to-use graph mining  
tools,

like mining some frequent structure in a x-y plot data?


It sounds like you're looking for a library does frequent subgraph  
mining across a graph dataset. I'm not sure what you mean by x-y plot  
data, but if you can convert that into a graph, then look here:


http://www.borgelt.net/moss.html

It also has an implementation of the (famous) gSpan algorithm for  
closed graph patterns (CloseGraph) in it as well.


It's not an R library though ... it's written in Java.

HTH,
-steve

--
Steve Lianoglou
Graduate Student: Computational Systems Biology
  |  Memorial Sloan-Kettering Cancer Center
  |  Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Re ferencing columns and pulling selected data

2009-08-04 Thread PDXRugger

Please consider the following inputs:
PrsnSerialno<-c(735,1147,2019,4131,4131,4217,4629,4822,4822,5979,5979,6128,6128,7004,7004,
7004,7004,7004,7438,7438,9402,9402,9402,10115,10115,11605,12693,12693,12693)

PrsnAge<-c(59,48,42,24,24,89,60,43,47,57,56,76,76,66,70,14,7,3,62,62,30,10,7,20,21,50,53,44,29)

IsHead<-c(TRUE,TRUE,TRUE,TRUE,FALSE,TRUE,TRUE,FALSE,TRUE,TRUE,FALSE,TRUE,FALSE,FALSE,TRUE,FALSE,FALSE,
FALSE,TRUE,FALSE,TRUE,FALSE,FALSE,FALSE,TRUE,FALSE,TRUE,FALSE,FALSE)

PrsnData<-cbind(PrsnSerialno,PrsnAge,IsHead)

HhSerialno<-c(735,1147,2019,4131,4217,4629,4822,5979,6128,7004,7438,9402,10115,11605,12693)
HhData<-cbind(HhSerialno)

What i would like to do is to add a age column to HhData that would
correspond to the serial number and which is also the oldest person in the
house, or what corresponds to "TRUE"(designates oldest person).  The TRUE
false doesnt have to be considered but is preferable.  

The result would then be:
HhSerialno HhAge
735 59
114748
201942
413124
421789
462960
482247
597957
612876
700470
743862
940230
10115   21
11605   50
12693   53

I tried
PumsHh..$Age<-PumsPrsn[PumsPrsn$SERIALNO==PumsHh..$Serialno,PumsPrsn$AGE]
but becaseu teh data frames are of different length it doesnt work so im
unsure of another way of doing this.  Thanks in advance

JR

-- 
View this message in context: 
http://www.nabble.com/Referencing-columns-and-pulling-selected-data-tp24813802p24813802.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Wakeby Curve

2009-08-04 Thread amna khan
Hi Sir
How to get Wakeby distribution curve on L-moment ratio diagram?

Regards

-- 
AMINA SHAHZADI
Department of Statistics
GC University Lahore, Pakistan.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] problem with pattern matching

2009-08-04 Thread Rnewbie

dear all,

I got a problem with pattern matching using grep. I extracted a list of
characters from a data frame, and I tried to match this list of characters
to a column from another data frame. In return, I got only one match, but
there should be far more matches. Any ideas what has gone wrong?

Another question, if I also want to match the whole of the elements against
the non-initial parts of the elements in another table. Which command should
I use?

Thanks
-- 
View this message in context: 
http://www.nabble.com/problem-with-pattern-matching-tp24810298p24810298.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Oracle ODBC driver for Linux

2009-08-04 Thread Luis Ridao Cruz
R-help,

I get the following error message when trying to connect to an Oracle database
through R (2.8.1) under Linux (Ubuntu 9.04).

> channel<-odbcConnect("magnus",uid="luisr",pwd="juanayzakarias")
Warning messages:
1: In odbcDriverConnect(st, ...) :
  [RODBC] ERROR: state IM002, code 0, message [unixODBC][Driver Manager]Data 
source name not found, and no default driver specified
2: In odbcDriverConnect(st, ...) : ODBC connection failed

I connect to the Oracle database in the following way:
$ sqlplus username/passw...@myipaddress/databaseName

Doing some googling I found a mail thread with the following code:

con<-odbcDriverConnect("SERVER=myIPaddress;DRIVER=oracle;DATABASE=databaseName")
Warning messages:
1: In odbcDriverConnect("SERVER=192.168.20.129;DRIVER=oracle;DATABASE=MAGNUS") :
  [RODBC] ERROR: state IM002, code 0, message [unixODBC][Driver Manager]Data 
source name not found, and no default driver specified
2: In odbcDriverConnect("SERVER=192.168.20.129;DRIVER=oracle;DATABASE=MAGNUS") :
  ODBC connection failed

which it is not the correct syntax (I also need some help on this).
The problem is with the driver. It seems that the file "odbcinst.ini" contains 
the information
of the driver but mine is empty although I have already run the Oracle 
Universal Installer
to install the Oracle ODBC driver for linux.
In that same mail thread there are examples on how this file should look like 
but on this I would need also help
as there is no example on Oracle drivers (just MySql, PostgreSQLt)

Thanks in advance.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] RODBC package to connect to Oracle database Linux

2009-08-04 Thread Luis Ridao Cruz
R-help,

I get the following error message when trying to connect to an Oracle
database
through R (2.8.1) under Linux (Ubuntu 9.04).

> channel<-odbcConnect("magnus",uid="luisr",pwd="juanayzakarias")
Warning messages:
1: In odbcDriverConnect(st, ...) :
  [RODBC] ERROR: state IM002, code 0, message
[unixODBC][DriverManager]Data source name not found, and no default
driver specified
2: In odbcDriverConnect(st, ...) : ODBC connection failed

I connect to the Oracle database in the following way:
$ sqlplus username/passw...@myipaddress/databaseName

Doing some googling I found a mail thread with the following code:

con<-odbcDriverConnect("SERVER=myIPaddress;DRIVER=oracle;DATABASE=databaseName")
Warning messages:
1: In
odbcDriverConnect("SERVER=192.168.20.129;DRIVER=oracle;DATABASE=MAGNUS")
:
  [RODBC] ERROR: state IM002, code 0, message
[unixODBC][DriverManager]Data source name not found, and no default
driver specified
2: In
odbcDriverConnect("SERVER=192.168.20.129;DRIVER=oracle;DATABASE=MAGNUS")
:
  ODBC connection failed

which it is not the correct syntax (I also need some help on this).
The problem is with the driver. It seems that the file "odbcinst.ini"
contains the information
of the driver but mine is empty although I have already run the Oracle
Universal Installer
to install the Oracle ODBC driver for linux.
In that same mail thread there are examples on how this file should look
like but on this I would need also help
as there is no example on Oracle drivers (just MySql, PostgreSQLt)

Thanks in advance






Luis Ridao Cruz
Faroe Marine Research Institute
Nóatún 1, P.O. Box 3051
FO-110 Tórshavn
Faroe Islands
Tel : (+298) 353900, Tel (direct) : (+298) 353912
Mob.:(+298) 580800, Fax: : (+298) 353901
e-mail: lu...@hav.fo
   luri...@hotmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] read.csv from a remote machine

2009-08-04 Thread Mark Wardle
I would use sshfs or an alternatively remote file system access. Ssh  
to your Linux box and then mount the mac os x filesystem via sshfs, or  
afs for example. Alternatively, can't you copy the data to the Linux  
box using sftp first?


--
Dr. Mark Wardle
Specialist registrar, Neurology
(Sent from my mobile)


On 4 Aug 2009, at 18:50, "Olga Lyashevska"  wrote:


Thanks Barry and Steve,

I am trying to import data with read.csv and my file is on remote  
machine.
I believe that I need to open a connection, not sure about syntax  
though.

Probably works with ftp: too. How remote is it?


In fact it is a bit more complicated.
I am working on a Mac machine, from this machine I establish ssh  
with a Linux machine. I run R on Linux, while all my data files are  
stored on Mac. So in this case although physically I am using Mac,  
it is in fact remote. I hope it answers your question.



Thanks again,
Olga

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] "na.strings" and the like; suspending interpretation of "NA"

2009-08-04 Thread Jan Theodore Galkowski
The magic I was looking for is to pass "as.is=TRUE" to "sqlQuery" of
RODBC.
The reference to "read.table" is a little oblique, but with that, all
works fine.

An education!  

:-)

Thanks much,

   Jan

On Tue, 04 Aug 2009 09:05:05 +0200, "Peter Dalgaard"
 said:
> Jan Theodore Galkowski wrote:
> > Can someone point me to the proper place in the documentation or on the
> > Wiki where I can learn how to get R to stop interpreting the string "NA"
> > as something special?  I have a table in a database which contains
> > (among other things) country codes and continent codes.  The standard
> > set of two-letter codes includes "NA" to denote "North America". I
> > learned of the "na.strings" parameter for RODBC's "sqlQuery", being able
> > to shut down this interpretation when data is read in.
> > 
> > However, in the program which uses this data, I (must) have some other
> > instance where the "NA" gets spontaneously"interpreted as "not
> > available", shows up in vectors and lists as "", and breaks
> > function. I temporarily solved the problem by defining all instances of
> > "NA" in the database as "NAC".  It still would be good to know a
> > generaly solution.  I've seen something mentioned in conjunction with
> > "options", but I'm not sure what that is about.
> 
> The general paradigm is that this shouldn't happen... Back in the old 
> days, R had no such thing as character NA, and users had to sort out the 
> North America, noradrenaline, Neil Armstrong, etc., issues for 
> themselves. Nowadays we do have calculus that preserves "NA" as distinct 
> from ; so if one is converted to the other, it could signify a bug.
> 
> It could also be due to particularly silly code on your behalf, but in 
> either case we need to see the effect narrowed down to a reproducible 
> stretch of code.

[snip]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] problem with pattern matching

2009-08-04 Thread David Winsemius


On Aug 4, 2009, at 11:16 AM, Rnewbie wrote:



dear all,

I got a problem with pattern matching using grep. I extracted a list  
of
characters from a data frame, and I tried to match this list of  
characters
to a column from another data frame. In return, I got only one  
match, but

there should be far more matches. Any ideas what has gone wrong?


In general this falls into the category of  a request to "read my  
mind". One, out of probably an infinite number, of ways to get such a  
result is to use if()  when you needed ifelse().




Another question, if I also want to match the whole of the elements  
against
the non-initial parts of the elements in another table. Which  
command should

I use?


Cannot even assign a semantic meaning to that one. What is are "non- 
initial parts of the elements of another table"?



**

  provide commented, minimal, self-contained, reproducible code.

**


Thanks


David Winsemius, MD
Heritage Laboratories
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Error in elastic net

2009-08-04 Thread ram basnet




Dear R users,

 

I am new user for elastic net. I am trying to use elasticnet
library.

I have marker data with 359 markers and 168
samples, and response is metabolites. I am trying to do regression between
a metabolite and markers.

 But i am getting the following error:

 

>
en<-enet(marker,as.numeric(vio),lambda=0.5,normalize=FALSE,intercept=TRUE)

Error in one %*% x : requires numeric matrix/vector
arguments

Then, I convert marker into numeric by using the following
command. And, here also getting error.

 

> as.numeric(marker)

Error: (list) object cannot be coerced to type
'double'

> is.numeric(marker)

[1] FALSE

 

Alternatively, I converted marker into numeric by using
data.frame command, it seems markers are now converted into numeric. 

 

> is.factor(datafram01[,1])

[1] FALSE

> is.numeric(datafram01[,1])

[1] TRUE

But when i did again elastic net, i got the same error: 

 

>
en<-enet(datafram01,as.numeric(vio),lambda=0.5,normalize=FALSE,intercept=TRUE)

Error in one %*% x : requires numeric matrix/vector
arguments

Does someone have ideas to overcome these problem?

 

If it is, it will be great help for me.

 

Thanks in advance.

 

Sincerely,

Ram Kumar Basnet

Wageningen University,

The Netherlands




  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Logistic Regression

2009-08-04 Thread Noah Silverman
Hi,

Trying to setup a logistic regression model.  (Something new to me. I 
usually use SVM.)

The person explaining the concept explained to me that I can include a 
"group" variable so that the probabilities predicted by the model will 
be "per group"

Does this make sense to anyone?  If so, how would I implement this?  
Using the glm or lrm function?

Thanks!

-N

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Logistic Regression

2009-08-04 Thread David Winsemius


On Aug 4, 2009, at 6:33 PM, Noah Silverman wrote:


Hi,

Trying to setup a logistic regression model.  (Something new to me. I
usually use SVM.)

The person explaining the concept explained to me that I can include a
"group" variable so that the probabilities predicted by the model will
be "per group"

Does this make sense to anyone?


Yes.


If so, how would I implement this?
Using the glm or lrm function?


Yes.

--

David Winsemius, MD
Heritage Laboratories
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Logistic Regression

2009-08-04 Thread Noah Silverman
Thanks David,

But HOW do I indicate the "grouping" variable in the formula?

Thanks!

-N

On 8/4/09 3:37 PM, David Winsemius wrote:
>
> On Aug 4, 2009, at 6:33 PM, Noah Silverman wrote:
>
>> Hi,
>>
>> Trying to setup a logistic regression model.  (Something new to me. I
>> usually use SVM.)
>>
>> The person explaining the concept explained to me that I can include a
>> "group" variable so that the probabilities predicted by the model will
>> be "per group"
>>
>> Does this make sense to anyone?
>
> Yes.
>
>> If so, how would I implement this?
>> Using the glm or lrm function?
>
> Yes.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Logistic Regression

2009-08-04 Thread David Winsemius

On Aug 4, 2009, at 6:38 PM, Noah Silverman wrote:

> Thanks David,
>
> But HOW do I indicate the "grouping" variable in the formula?

Hard to tell. You have told us absolutely nothing about the problem.  
Discrete variables cause no problems in formulas. Perhaps one of :

?factor
?cut
?quantile

>
> Thanks!
>
> -N
>
> On 8/4/09 3:37 PM, David Winsemius wrote:
>>
>>
>> On Aug 4, 2009, at 6:33 PM, Noah Silverman wrote:
>>
>>> Hi,
>>>
>>> Trying to setup a logistic regression model.  (Something new to  
>>> me. I
>>> usually use SVM.)
>>>
>>> The person explaining the concept explained to me that I can  
>>> include a
>>> "group" variable so that the probabilities predicted by the model  
>>> will
>>> be "per group"
>>>
>>> Does this make sense to anyone?
>>
>> Yes.
>>
>>> If so, how would I implement this?
>>> Using the glm or lrm function?
>>
>> Yes.
>>

David Winsemius, MD
Heritage Laboratories
West Hartford, CT


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Logistic Regression

2009-08-04 Thread Noah Silverman
I guess I didn't explain it well enough.

I have a number of training examples.  They have 4 fields.
label, v1, v2, group

The label is binary ("yes", "no")

My  understanding (Quite possible wrong.) was that there was a way to 
train the LR to estimate probabilities "per group"

In pseudo-code it would be:
lrm( label ~ v1 + v2, group_by(group)

-N




On 8/4/09 3:41 PM, David Winsemius wrote:
>
> On Aug 4, 2009, at 6:38 PM, Noah Silverman wrote:
>
>> Thanks David,
>>
>> But HOW do I indicate the "grouping" variable in the formula?
>
> Hard to tell. You have told us absolutely nothing about the problem. 
> Discrete variables cause no problems in formulas. Perhaps one of :
>
> ?factor
> ?cut
> ?quantile
>
>>
>> Thanks!
>>
>> -N
>>
>> On 8/4/09 3:37 PM, David Winsemius wrote:
>>>
>>> On Aug 4, 2009, at 6:33 PM, Noah Silverman wrote:
>>>
 Hi,

 Trying to setup a logistic regression model.  (Something new to me. I
 usually use SVM.)

 The person explaining the concept explained to me that I can include a
 "group" variable so that the probabilities predicted by the model will
 be "per group"

 Does this make sense to anyone?
>>>
>>> Yes.
>>>
 If so, how would I implement this?
 Using the glm or lrm function?
>>>
>>> Yes.
>>>
>
> David Winsemius, MD
> Heritage Laboratories
> West Hartford, CT
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Logistic Regression

2009-08-04 Thread David Winsemius

On Aug 4, 2009, at 6:45 PM, Noah Silverman wrote:

> I guess I didn't explain it well enough.
>
> I have a number of training examples.  They have 4 fields.
> label, v1, v2, group
>
> The label is binary ("yes", "no")
>
> My  understanding (Quite possible wrong.) was that there was a way  
> to train the LR to estimate probabilities "per group"
>
> In pseudo-code it would be:
> lrm( label ~ v1 + v2, group_by(group)
>

Why not :

lrm( label ~ v1 + v2 + group)

?

>
> On 8/4/09 3:41 PM, David Winsemius wrote:
>>
>>
>> On Aug 4, 2009, at 6:38 PM, Noah Silverman wrote:
>>
>>> Thanks David,
>>>
>>> But HOW do I indicate the "grouping" variable in the formula?
>>
>> Hard to tell. You have told us absolutely nothing about the  
>> problem. Discrete variables cause no problems in formulas. Perhaps  
>> one of :
>>
>> ?factor
>> ?cut
>> ?quantile
>>
>>>
>>> Thanks!
>>>
>>> -N
>>>
>>> On 8/4/09 3:37 PM, David Winsemius wrote:


 On Aug 4, 2009, at 6:33 PM, Noah Silverman wrote:

> Hi,
>
> Trying to setup a logistic regression model.  (Something new to  
> me. I
> usually use SVM.)
>
> The person explaining the concept explained to me that I can  
> include a
> "group" variable so that the probabilities predicted by the  
> model will
> be "per group"
>
> Does this make sense to anyone?

 Yes.

> If so, how would I implement this?
> Using the glm or lrm function?

 Yes.

David Winsemius, MD
Heritage Laboratories
West Hartford, CT


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Re ferencing columns and pulling selected data

2009-08-04 Thread milton ruser
Hi there,

It may not be so ellegant, but you can try:

PrsnData<-data.frame(cbind(PrsnSerialno,PrsnAge,IsHead))
PrsnData.subset<-subset(PrsnData, PrsnSerialno %in% HhSerialno)
PrsnData.subset

PrsnData.subset.maxage<-aggregate(PrsnData.subset["PrsnAge"],
list(PrsnData.subset$PrsnSerialno), max)
PrsnData.subset.maxage

bests

milton
On Tue, Aug 4, 2009 at 2:25 PM, PDXRugger  wrote:

>
> Please consider the following inputs:
>
> PrsnSerialno<-c(735,1147,2019,4131,4131,4217,4629,4822,4822,5979,5979,6128,6128,7004,7004,
>
> 7004,7004,7004,7438,7438,9402,9402,9402,10115,10115,11605,12693,12693,12693)
>
>
> PrsnAge<-c(59,48,42,24,24,89,60,43,47,57,56,76,76,66,70,14,7,3,62,62,30,10,7,20,21,50,53,44,29)
>
>
> IsHead<-c(TRUE,TRUE,TRUE,TRUE,FALSE,TRUE,TRUE,FALSE,TRUE,TRUE,FALSE,TRUE,FALSE,FALSE,TRUE,FALSE,FALSE,
> FALSE,TRUE,FALSE,TRUE,FALSE,FALSE,FALSE,TRUE,FALSE,TRUE,FALSE,FALSE)
>
> PrsnData<-cbind(PrsnSerialno,PrsnAge,IsHead)
>
>
> HhSerialno<-c(735,1147,2019,4131,4217,4629,4822,5979,6128,7004,7438,9402,10115,11605,12693)
> HhData<-cbind(HhSerialno)
>
> What i would like to do is to add a age column to HhData that would
> correspond to the serial number and which is also the oldest person in the
> house, or what corresponds to "TRUE"(designates oldest person).  The TRUE
> false doesnt have to be considered but is preferable.
>
> The result would then be:
> HhSerialno HhAge
> 735 59
> 114748
> 201942
> 413124
> 421789
> 462960
> 482247
> 597957
> 612876
> 700470
> 743862
> 940230
> 10115   21
> 11605   50
> 12693   53
>
> I tried
> PumsHh..$Age<-PumsPrsn[PumsPrsn$SERIALNO==PumsHh..$Serialno,PumsPrsn$AGE]
> but becaseu teh data frames are of different length it doesnt work so im
> unsure of another way of doing this.  Thanks in advance
>
> JR
>
> --
> View this message in context:
> http://www.nabble.com/Referencing-columns-and-pulling-selected-data-tp24813802p24813802.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


  1   2   >