[R] optim, L-BFGS-B | constrained bounds on parms?

2014-09-18 Thread Evan Cooch
Or, something to that effect. Following is an example of what I'm 
working with basic ABO blood type ML estimation from observed type 
(phenotypic) frequencies. First, I generate a log-likelihood function. 
mu[1] -> mu[2] are allele freqs for A and B alleles, respectively. Since 
freq of O allele is redundant, I use 1-mu[1]-mu[2] for that. The terms 
in the function are the probability expressions for the expected values 
of each phenotype.

But, that is somewhat besides the point:

f_abo <- function(mu) { 
25*log(mu[1]^2+2*mu[1]*(1-mu[1]-mu[2]))+25*log(mu[2]^2+2*mu[2]*(1-mu[1]-mu[2]))+50*log(2*mu[1]*mu[2])+15*log((1-mu[1]-mu[2])^2)
 
}


So, I want to come up with MLE for mu[1] and mu[2] (for alleleic freqs 
for A and B alleles, respectively. Now, given the data, I know (from 
having maximized this likelihood outside of R) that the MLE for mu[1] is 
0.37176, and for mu[2], the same -- mu[2]=0.371763. I confirm this in 
MATLAB, and Maple, and Mathematica, using various non-linear 
solvers/optimization routines. They all yielded recisely the right answers.

But, stuck trying to come up with a general approach to getting the 
'right estimates' in R, that doesn't rely on strong prior knowledge of 
the parameters. I tried the following - I used L-BFGDS-B' because this 
is a 'boxed' optimzation: mu[1] and mu[2] are both parameters on the 
interval [0,1].

  results <- optim(c(0.3,0.3), f_abo,
  method = "L-BFGS-B", lower=c(0.1,0.1), upper=c(0.9,0.9),
   hessian = TRUE,control=list(fnscale=-1))

but that through the following error at me:

L-BFGS-B needs finite values of 'fn'

OK, fine. Taking that literally, and thinking a bit, clear that the 
problem is that the upper bound on the parms creates the problem. So, I 
try the crude approach of making the upper bound for each 0.5:


  results <- optim(c(0.3,0.3), f_abo,
  method = "L-BFGS-B", lower=c(0.1,0.1), upper=c(0.5,0.5),
   hessian = TRUE,control=list(fnscale=-1))


No errors this time, but no estimates either. At all.

OK -- so I 'cheat', and since I know that mu[1]=mu[2]=0.37176, I make 
another change to the upper limit, using 0.4 for both parms:



  results <- optim(c(0.3,0.3), f_abo,
  method = "L-BFGS-B", lower=c(0.1,0.1), upper=c(0.4,0.4),
   hessian = TRUE,control=list(fnscale=-1))


Works perfectly, and...right estimates too. ;-)

But, I could get there from here because I had prior knowledge of the 
parameter values. In other words, I cheated (not a thinly veiled 
suggestion that prior information is cheating, of course ;-)

What I'm trying to figure out is how to do a constrained optimization 
with R, where mu[1] and mu[2] are estimated subject to the constraint that

0 <= mu[1]+mu[2] <= 1

There seems to be no obvious way to impose that -- which creates a 
problem for optim since if I set 'vague' bounds on the parms (as per 
original attempt), optim tries combinations (like mu[1]=0.9, mu[2]=0.9), 
which aren't plausible, given the constraint that 0 <= mu[1]+mu[2] <= 1. 
Further, in this example, mu[1]=mu[2]. That might not be the case, and I 
might need to set upper bound on a parameter to be >0.5. But, without 
knowing which parameter, I'd need to set both from (say) 0.1 -> 0.9.

Is this possible with optim, or do I need to use a different package? If 
I can get there from here using optim, what do I need to do, either to 
my call to the optim routine, or the function that I pass to it?

This sort of thing is quite easy in (say) Maple. I simply execute

NLPSolve(f_abo,initialpoint={mu[1]=0.2,mu[2]=0.2},{mu[1]+mu[2]<=1},mu[1]=0.1..0.9,mu[2]=0.1..0.9,maximize);

where I'm telling the NLPSolve function that there is a constraint for 
mu[1] and mu[2] (as above), which lets me set bounds on the parameter 
over larger interval. Can I do the same in R?

Again, I'm trying to avoid having to use a 'good guess'. I know I can 
gene count to come up with a quick and dirty starting point (the basis 
for the EM algorithm commonly used for this), but again, I'm trying to 
avoid that.

Thanks very much in advance.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Failure with .Rprofile on Mac OS X

2014-09-18 Thread Amos B. Elberg
David - the startup directory for Terminal.app shouldn't affect where R
looks for .Rprofile.  If R is started from the command line, it should
look in whatever is the user's current directory (which will be ~/ if
Terminal was just launched), and then ~/  .  It shouldn't be looking in
/Applications/ unless you happen to have cd'd to /Applications before
launching R.

(You put up the environment variables present in one launch and absent
from another, but what I was really looking for is whether something in
his shell is changing a path.  Because mac environment variables are
funky that way.)
> David Winsemius 
> September 19, 2014 at 12:57 AM
>
> Dear Gang Chen;
>
> The .Rprofile is loaded from the startup directory. Terminal.app will
> start up in /Applications/ while your R.app session appears to be
> starting in a different directory. (We don't know what your startup
> directories are.)  I'm using R.app in /Applications/ so my .Rprofile
> has the same effect regardless of whether I run from R.app or from a
> bash console.
>
> See this portion of the Mac-FAQ:
>
> http://cran.r-project.org/bin/macosx/RMacOSX-FAQ.html#The-current-and-startup-working-directories
>
>
>  See ?Startup for more specifics that are generic to all R versions:
>
>
> On Sep 18, 2014, at 7:04 PM, Amos B. Elberg wrote:
>
>> The only reason that *should* happen is if there's an .Rprofile in
>> the directory you're in when you start R.
>>
>> Where *exactly* is the .Rprofile file you want loaded, what directory
>> are you starting from, and what does R say is the user's home
>> directory? Did you make *any* changes to Rprofile.site, or Renviron?
>>
>> What is the output from Sys.getenv() in gui and cli, and do they differ?
>
> They might differ even if the default directories are the same (as
> they are on my setup). I have a somewhat older version on this laptop
> but there are names of environment variables that are not present in
> both directions:
>
> I ran AppEnv <- dput( Sys.getenv() ) on my R.app session and then ran
> the corresponding command on a Terminal console session:
>
> These are the difference (on a R 2.15.2 setup):
>
> > AppEnv[ !names(AppEnv) %in% names(conEnv)]
> R_GUI_APP_REVISION  R_GUI_APP_VERSION
> "6435" "1.53"
> > names( conEnv[ !names(conEnv) %in% names(AppEnv)] ) # i.e. missing
> in the GUI installation
>
>  [1] "COLUMNS"  "DYLD_LIBRARY_PATH"   
> "GDK_USE_XFT"  "INFOPATH"
>  [5] "LINES""MANPATH" 
> "PERL5LIB" "PWD"
>  [9] "SHLVL""TERM"
> "TERM_PROGRAM" "TERM_PROGRAM_VERSION"
> [13] "XDG_CACHE_HOME"   "XDG_CONFIG_DIRS" 
> "XDG_CONFIG_HOME"  "XDG_DATA_DIRS"
> [17] "XDG_DATA_HOME"
>
>  If there are further points of discussion they should be thrashed out
> (with greater details about sessionInfo() and startup settings), over
> on the R-MAC-SIG mailing list.
>
>
>>
>>
>>> On Sep 18, 2014, at 11:18 AM, Gang Chen  wrote:
>>>
>>> When R starts in GUI (e.g., /Applications/R.app/Contents/MacOS/R) on
>>> my Mac OS X 10.7.5, the startup configuration in .Rprofile works fine.
>>> However, when R starts on the terminal (e.g.,
>>> /Library/Frameworks/R.framework/Resources/bin/R), it does not work at
>>> all. What could be the reason for the failure?
>>>
>>> Thanks,
>>> Gang
>
> David Winsemius, MD
> Alameda, CA, USA
>
> Gang Chen 
> September 18, 2014 at 11:18 AM
> When R starts in GUI (e.g., /Applications/R.app/Contents/MacOS/R) on
> my Mac OS X 10.7.5, the startup configuration in .Rprofile works fine.
> However, when R starts on the terminal (e.g.,
> /Library/Frameworks/R.framework/Resources/bin/R), it does not work at
> all. What could be the reason for the failure?
>
> Thanks,
> Gang
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R/Ubuntu, “package ‘stats’ in options(”defaultPackages“) was not found”

2014-09-18 Thread David Winsemius


On Sep 18, 2014, at 10:28 AM, davide.chi...@gmail.com wrote:


I tried with a different mirror, but nothing changed...

Any other idea?


Post to the r-SIG--debian mailing list?
Search that list's Archives?

--  
David.


Thanks anyway

-- Davide

2014-09-17 10:39 GMT-04:00 Jeff Newmiller :
Try a different mirror? Precise is getting kind of old... they may  
not be keeping all of the old files on that mirror.


---
Jeff NewmillerThe .   .  Go  
Live...
DCN:Basics: ##.#.   ##.#.   
Live Go...
 Live:   OO#.. Dead: OO#..   
Playing

Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
/Software/Embedded Controllers)   .OO#.   .OO#.   
rocks...1k

---
Sent from my phone. Please excuse my brevity.

On September 17, 2014 5:51:08 AM PDT, "davide.chi...@gmail.com" > wrote:

Yes, I've followed the instructions described here:
http://cran.r-project.org/bin/linux/ubuntu/README

I've added
deb http:///bin/linux/ubuntu precise/
to the /etc/apt/sources.list file.

Any idea?

Thanks a lot!

-- Davide

2014-09-17 2:42 GMT-04:00 Jeff Newmiller :

Are you using the apt sources described on CRAN for Ubuntu? I don't

expect stock 12.04 would give you R3.1.1, yet I have not seen this
problem on machines using the CRAN apt repositories.



---

Jeff NewmillerThe .   .  Go

Live...
DCN:Basics: ##.#.   ##.#.   
Live

Go...

 Live:   OO#.. Dead: OO#..

Playing
Research Engineer (Solar/BatteriesO.O#.   #.O#.   
with

/Software/Embedded Controllers)   .OO#.   .OO#.

rocks...1k



---

Sent from my phone. Please excuse my brevity.

On September 16, 2014 6:40:31 PM PDT, "davide.chi...@gmail.com"

 wrote:

Sorry guys for the errors in my behavior. I apologize.

I installed R by using commands:
apt-get install r-base
apt-get install r-base-dev

Here's the output of sessioninfo();


sessionInfo()

R version 3.1.1 (2014-07-10)
Platform: i686-pc-linux-gnu (32-bit)

locale:
[1] LC_CTYPE=it_IT.UTF-8   LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8LC_COLLATE=it_IT.UTF-8
[5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=it_IT.UTF-8
[7] LC_PAPER=en_US.UTF-8   LC_NAME=C
[9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] graphics  grDevices utils datasets  methods   base

loaded via a namespace (and not attached):
[1] tcltk_3.1.1 tools_3.1.1


Any idea? Thanks!

-- Davide

2014-09-16 17:15 GMT-04:00 David Winsemius  
:


On Sep 16, 2014, at 12:19 PM, davide.chi...@gmail.com wrote:


Hi guys
I'm having some troubles in installing the "topicmodels" package

in

my

R system on a Linux Ubuntu machine.
I also described the problem here: http://bit.ly/1m8Ah6Z


(You were asked in the Posting Guide to not crosspost. And when  
you

post to Stack Overflow you should respond to requests for

clarification
which you have not done either. You will never get useful  
answers if

you don't respond to requests for clarification.)




I have just installed R 3.1.1 on my Linux Ubuntu 12.04.5 LTS.


More details are needed. How did you do this?



Then Iwanted to install the topicmodels package, and so I type
install.packages("topicmodels"), but the installation did not

work.



It seems that I do not have the "stats" package installed in my
default packages.


That would be somewhat unusual, but possible. You were asked in  
the

Rhelp Posting Guide to provide the output of sessionInfo().




--
David.


]

Here's the log:

++LOG+START


install.packages("topicmodels");

Installing package into ‘/usr/local/lib/R/site-library’
(as ‘lib’ is unspecified)
--- Please select a CRAN mirror for use in this session ---
also installing the dependencies ‘modeltools’, ‘slam’, ‘tm’

provo con l'URL


'http://cran.utstat.utoronto.ca/src/contrib/modeltools_0.2-21.tar.gz'

Content type 'application/x-gzip' length 14794 bytes (14 Kb)
URL aperto
==
downloaded 14 Kb

provo con l'URL

'http://cran.utstat.utoronto.ca/src/contrib/slam_0.1-32.tar.gz'

Content type 'application/x-gzip' length 46672 bytes (45 Kb)
URL aperto
==
downloaded 45 Kb

provo con l'URL

'http://cran.utstat.utoronto.ca/src/contrib/tm_0.6.tar.gz'

Content type 'application/x-gzip' length 505212 bytes (493 Kb)
URL aperto
==
downloaded 493 Kb

provo con l'URL


'http://cran.utstat.utoronto.ca/src/contrib/topicmodels_0.2-1.tar.gz'

Content type 'applic

Re: [R] Failure with .Rprofile on Mac OS X

2014-09-18 Thread David Winsemius


Dear Gang Chen;

The .Rprofile is loaded from the startup directory. Terminal.app will  
start up in /Applications/ while your R.app session appears to be  
starting in a different directory. (We don't know what your startup  
directories are.)  I'm using R.app in /Applications/ so my .Rprofile  
has the same effect regardless of whether I run from R.app or from a  
bash console.


See this portion of the Mac-FAQ:

http://cran.r-project.org/bin/macosx/RMacOSX-FAQ.html#The-current-and-startup-working-directories

 See ?Startup for more specifics that are generic to all R versions:


On Sep 18, 2014, at 7:04 PM, Amos B. Elberg wrote:

The only reason that *should* happen is if there's an .Rprofile in  
the directory you're in when you start R.


Where *exactly* is the .Rprofile file you want loaded, what  
directory are you starting from, and what does R say is the user's  
home directory? Did you make *any* changes to Rprofile.site, or  
Renviron?


What is the output from Sys.getenv() in gui and cli, and do they  
differ?


They might differ even if the default directories are the same (as  
they are on my setup). I have a somewhat older version on this laptop  
but there are names of environment variables that are not present in  
both directions:


I ran AppEnv <- dput( Sys.getenv() ) on my R.app session and then ran  
the corresponding command on a Terminal console session:


These are the difference (on a R 2.15.2 setup):

> AppEnv[ !names(AppEnv) %in% names(conEnv)]
R_GUI_APP_REVISION  R_GUI_APP_VERSION
"6435" "1.53"
> names( conEnv[ !names(conEnv) %in% names(AppEnv)] ) # i.e. missing  
in the GUI installation


 [1] "COLUMNS"  "DYLD_LIBRARY_PATH" 
"GDK_USE_XFT"  "INFOPATH"
 [5] "LINES""MANPATH"   
"PERL5LIB" "PWD"
 [9] "SHLVL""TERM"  
"TERM_PROGRAM" "TERM_PROGRAM_VERSION"
[13] "XDG_CACHE_HOME"   "XDG_CONFIG_DIRS"   
"XDG_CONFIG_HOME"  "XDG_DATA_DIRS"

[17] "XDG_DATA_HOME"

 If there are further points of discussion they should be thrashed  
out (with greater details about sessionInfo() and startup settings),  
over on the R-MAC-SIG mailing list.







On Sep 18, 2014, at 11:18 AM, Gang Chen  wrote:

When R starts in GUI (e.g., /Applications/R.app/Contents/MacOS/R) on
my Mac OS X 10.7.5, the startup configuration in .Rprofile works  
fine.

However, when R starts on the terminal (e.g.,
/Library/Frameworks/R.framework/Resources/bin/R), it does not work at
all. What could be the reason for the failure?

Thanks,
Gang


David Winsemius, MD
Alameda, CA, USA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] How to plot a similar graph

2014-09-18 Thread Marie-Eve St-Onge
Dear all, I would like to draw something similar to the following picture, does 
anyone know a better strategy to start?
http://www.psrd.hawaii.edu/WebImg/Pyx-thermometer.gif
Eve   
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] read.table() 1Gb text dataframe

2014-09-18 Thread EMISHIA PERSIOUS
r code for the packages cstmr ,gRain,gRc,gRim,gRbase using probability models,


On Friday, September 19, 2014 7:05 AM, Henrik Bengtsson  
wrote:
 


As a start, make sure you specify the 'colClasses' argument.  BTW,
using that you can even go to the extreme and read one column at the
time, if it comes down to that.

To read a 10% subset of the rows, you can use R.filesets as:

library(R.filesets)
db <- TabularTextFile(pathname)
n <- nbrOfRows(db)
data <- readDataFrame(db, rows=seq(from=1, to=n, length.out=0.10*n))

It is also useful to specify 'colClasses' here. In addition to
specifying them ordered by column, as for read.table(), you also
specify them by column names (or regular expressions of the column
names), e.g.

data <- readDataFrame(db, colClasses=c("*"="NULL", "(x|y)"="integer",
outcome="numeric", "id"="character"), rows=seq(from=1, to=n,
length.out=0.10*n))

That 'colClasses' specifies that the default is drop all columns, read
columns 'x' and 'y' as integers, and so on.

BTW, if you know 'n' upfront you can skip the setup of TabularTextFile
and just do:

data <- readDataFrame(pathname, rows=seq(from=1, to=n, length.out=0.10*n))


Hope this helps

Henrik

On Thu, Sep 18, 2014 at 4:48 PM, Stephen HK Wong  wrote:
> Dear All,
>
> I have a table of 4 columns and many millions rows separated by 
> tab-delimited. I don't have enough memory to read.table in that 1 Gb file. 
> And actually I have 12 text files like that. Is there a way that I can just 
> randomly read.table() in 10% of rows ? I was able to do that using colbycol 
> package, but it is not not available. Many thanks!!
>
>
>
> Stephen HK Wong
> Stanford, California 94305-5324
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Pseudo R squared for quantile regression with replicates

2014-09-18 Thread Anthony Damico
here is a reproducible example, mostly from ?withReplicates.  i think
something would have to be done using return.replicates=TRUE to manually
compute survey-adjusted residuals, but i'm not really sure what nor whether
the pseudo r^2 would be meaningful  :/


library(survey)
library(quantreg)

data(api)

## one-stage cluster sample
dclus1<-svydesign(id=~dnum, weights=~pw, data=apiclus1, fpc=~fpc)

## convert to bootstrap
bclus1<-as.svrepdesign(dclus1,type="bootstrap", replicates=100)

## median regression
fit <- withReplicates(bclus1, quote(coef(rq(api00~api99, tau=0.5,
weights=.weights,method="fn"

# # # no longer from ?withReplicates # # #
# from https://stat.ethz.ch/pipermail/r-help/2006-August/110386.html
rho <- function(u,tau=.5)u*(tau - (u < 0))

V <- sum(rho(fit$resid, fit$tau)) # # breaks


On Thu, Sep 18, 2014 at 1:55 PM, David L Carlson  wrote:

> It is hard to say because we do not have enough information. R has
> approximately 6,000 packages and you have not told us which ones you are
> using. You have not told us much about your data and you have not told us
> where to find the query from August 2006. The basic problem is that your
> "fit" is not the same as the "f" in the query. Your fit object is not very
> complicated. If you look at the output from str(fit) you will see that fit
> is an "atomic" vector (note the wording in your error message) with a
> series of attributes that are probably documented in the help pages for the
> functions you are using. There is nothing called resid inside fit. It is
> likely that the post you are looking at refers to the output from rq(...)
> or perhaps predict(rq(...)), but not the output from withReplicates(...,
> quote(coef(rq(... which is what fit is.
>
> -
> David L Carlson
> Department of Anthropology
> Texas A&M University
> College Station, TX 77840-4352
>
> -Original Message-
> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
> On Behalf Of Donia Smaali Bouhlila
> Sent: Thursday, September 18, 2014 9:54 AM
> To: r-help@r-project.org
> Subject: [R] Pseudo R squared for quantile regression with replicates
>
> Hi,
>
>
> I am a new user of r software. I intend to do quantile regressions with
> complex survey data using replicate method. I have ran the following
> commands successfully:
>
>
>   mydesign
>
> <-svydesign(ids=~IDSCHOOL,strata=~IDSTRATE,data=TUN,nest=TRUE,weights=~TOTWGT)
> bootdesign <- as.svrepdesign(mydesign,type="auto",replicates=150)
>
>   fit<-
>
> withReplicates(bootdesign,quote(coef(rq(Math1~Female+Age+calculator+computer+desk+
> +
>
> dictionary+internet+work+Book2+Book3+Book4+Book5+Pedu1+Pedu2+Pedu3+Pedu4+Born1+Born2,tau=0.5,weights=.weights,
> method="fn"
>
>
>
>
> I want get the pseudo R squared but I failed. I read a query dating from
> August 2006, [R] Pseudo R for Quant Reg and the answer to it:
>
>
> rho <- function(u,tau=.5)u*(tau - (u < 0))
>   V <- sum(rho(f$resid, f$tau))
>
>
>   I copied it and paste it , replacing f by fit I get this error message:
> Error in fit$resid : $ operator is invalid for atomic vectors, I don't
> know what it means
>
> The fit object is likely to be quite complicated  I used str() to see
> what it looks like:
>
>
>
> str (fit)
> Class 'svrepstat'  atomic [1:19] 713.24 -24.01 -18.37 9.05 7.71 ...
>..- attr(*, "var")= num [1:19, 1:19] 2839.3 10.2 -122.1 -332.4 -42.3
> ...
>.. ..- attr(*, "dimnames")=List of 2
>.. .. ..$ : chr [1:19] "(Intercept)" "Female" "Age" "calculator" ...
>.. .. ..$ : chr [1:19] "(Intercept)" "Female" "Age" "calculator" ...
>.. ..- attr(*, "means")= Named num [1:19] 710.97 -24.03 -18.3 9.39
> 7.58 ...
>.. .. ..- attr(*, "names")= chr [1:19] "(Intercept)" "Female" "Age"
> "calculator" ...
>..- attr(*, "statistic")= chr "theta"
>
> How can I retrieve the residuals?? and calculate the pseudo R squared??
>
>
> Any help please
>
>
> --
> Dr. Donia Smaali Bouhlila
> Associate-Professor
> Department of Economics
> Faculté des Sciences Economiques et de Gestion de Tunis
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] read.table() 1Gb text dataframe

2014-09-18 Thread Henrik Bengtsson
As a start, make sure you specify the 'colClasses' argument.  BTW,
using that you can even go to the extreme and read one column at the
time, if it comes down to that.

To read a 10% subset of the rows, you can use R.filesets as:

library(R.filesets)
db <- TabularTextFile(pathname)
n <- nbrOfRows(db)
data <- readDataFrame(db, rows=seq(from=1, to=n, length.out=0.10*n))

It is also useful to specify 'colClasses' here. In addition to
specifying them ordered by column, as for read.table(), you also
specify them by column names (or regular expressions of the column
names), e.g.

data <- readDataFrame(db, colClasses=c("*"="NULL", "(x|y)"="integer",
outcome="numeric", "id"="character"), rows=seq(from=1, to=n,
length.out=0.10*n))

That 'colClasses' specifies that the default is drop all columns, read
columns 'x' and 'y' as integers, and so on.

BTW, if you know 'n' upfront you can skip the setup of TabularTextFile
and just do:

data <- readDataFrame(pathname, rows=seq(from=1, to=n, length.out=0.10*n))


Hope this helps

Henrik

On Thu, Sep 18, 2014 at 4:48 PM, Stephen HK Wong  wrote:
> Dear All,
>
> I have a table of 4 columns and many millions rows separated by 
> tab-delimited. I don't have enough memory to read.table in that 1 Gb file. 
> And actually I have 12 text files like that. Is there a way that I can just 
> randomly read.table() in 10% of rows ? I was able to do that using colbycol 
> package, but it is not not available. Many thanks!!
>
>
>
> Stephen HK Wong
> Stanford, California 94305-5324
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] read.table() 1Gb text dataframe

2014-09-18 Thread Stephen HK Wong
Dear All,

I have a table of 4 columns and many millions rows separated by tab-delimited. 
I don't have enough memory to read.table in that 1 Gb file. And actually I have 
12 text files like that. Is there a way that I can just randomly read.table() 
in 10% of rows ? I was able to do that using colbycol package, but it is not 
not available. Many thanks!!



Stephen HK Wong
Stanford, California 94305-5324

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Failure with .Rprofile on Mac OS X

2014-09-18 Thread Amos B. Elberg
The only reason that *should* happen is if there's an .Rprofile in the 
directory you're in when you start R.

Where *exactly* is the .Rprofile file you want loaded, what directory are you 
starting from, and what does R say is the user's home directory? Did you make 
*any* changes to Rprofile.site, or Renviron?

What is the output from Sys.getenv() in gui and cli, and do they differ?


> On Sep 18, 2014, at 11:18 AM, Gang Chen  wrote:
> 
> When R starts in GUI (e.g., /Applications/R.app/Contents/MacOS/R) on
> my Mac OS X 10.7.5, the startup configuration in .Rprofile works fine.
> However, when R starts on the terminal (e.g.,
> /Library/Frameworks/R.framework/Resources/bin/R), it does not work at
> all. What could be the reason for the failure?
> 
> Thanks,
> Gang
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] RGtk2 drawing area as cairo device - no points

2014-09-18 Thread Michael Lawrence
Just wanted to acknowledge this. It's a known issue, and one that has been
tricky to solve, because it's platform-specific, so it's probably some sort
of bug in the abstraction (GDK).

On Wed, Sep 17, 2014 at 12:26 AM, François Rebaudo <
francois.reba...@legs.cnrs-gif.fr> wrote:

> Hi,
> The following code adapted from Michael post (https://stat.ethz.ch/
> pipermail/r-help/2012-March/306069.html) works just fine on Linux Debian,
> but not on Windows 7 (no points on plots 2 and 3). More surprisingly, if the
> first plot is a boxplot, it works on both OS... and if I do a pdf (using
> pdf()), I get my points... Thanks in advance for your
> help.
>
> library(RGtk2)
> library(cairoDevice)
> win = gtkWindow(show = FALSE)
> win$setDefaultSize(500, 500)
> da = gtkDrawingArea()
> asCairoDevice(da)
> win$add(da)
> win$showAll()
> layout(matrix(c(1,1,2,3),2,2,byrow=TRUE))
> par(mar=c(0,0,0,0))
> plot(1:10) #boxplot(1:10)
> plot(1:10)
> plot(1:10)
>
>  sessionInfo()
>>
> R version 3.1.0 (2014-04-10)
> Platform: x86_64-w64-mingw32/x64 (64-bit)
>
> locale:
> [1] LC_COLLATE=French_France.1252  LC_CTYPE=French_France.1252
> [3] LC_MONETARY=French_France.1252 LC_NUMERIC=C
> [5] LC_TIME=French_France.1252
>
> attached base packages:
> [1] stats graphics  grDevices utils datasets  methods   base
>
> loaded via a namespace (and not attached):
> [1] tools_3.1.0
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Using R in our commercial business application

2014-09-18 Thread Marc Schwartz

On Sep 18, 2014, at 3:42 PM, Duncan Murdoch  wrote:

> On 18/09/2014 2:35 PM, Marc Schwartz wrote:
>> On Sep 18, 2014, at 4:36 AM, Pasu  wrote:
>> 
>> > Hi
>> >
>> > I would like to know how to use R in our commercial business application
>> > which we plan to host in cloud or deploy on customer's premise.
>> >
>> > 1. Using R and its package, does it enforce that my commercial business
>> > application should be distributed under GPL, as the statistical derivation
>> > (output) by using R will be presented to the end users as part of of our
>> > commercial business application
>> > 2. Whom to contact to get commercial license if required for using R?
>> >
>> > Rgds
>> > Pasupathy
>> 
>> 
>> You will not get a definitive legal opinion here and my comments below do 
>> not represent any formal opinion on the part of any organization.
>> 
>> There is nothing preventing you or your company from using R as an end user. 
>> There are many of us who use R in commercial settings and in general, the 
>> output of a GPL'd application (text or binary) is not considered to be also 
>> GPL'd.
>> 
>> The subtleties get into the distribution of R (which you seem to plan to 
>> do), the nature of any additional functionality/code that you or your 
>> company may write/distribute, how that code interacts with R and/or modifies 
>> R source code copyrighted by the R Foundation and others. If you distribute 
>> R to clients, you will need to make R's source code available to them in 
>> some manner along with any modifications to that same code, while preserving 
>> appropriate copyrights.
>> 
>> A proprietary (closed source) application cannot be licensed under the GPL, 
>> but your company's application/code may be forced to be GPL (the so called 
>> viral aspect of the GPL) depending upon how your application is implemented 
>> as I noted in the prior paragraph. Thus, you may be forced to make your 
>> source code available to your clients as well.
>> 
>> If you plan to move forward, you should consult with an attorney well 
>> educated in software licensing and distribution issues, especially as they 
>> pertain to the GPL. The risks are not inconsequential of falling on the 
>> wrong side of the GPL.
>> 
>> The official R distribution is not available via a commercial or developer 
>> license, but there are commercial vendors of R and a Google search will 
>> point you in their direction, if desired. However, since their products are 
>> founded upon the official R distribution and the GPL, they will have similar 
>> issues with respect to any enhancements that they have created and 
>> therefore, your concerns do not necessarily go away. They will have also 
>> consulted legal counsel on these issues because the viability of their 
>> business depends upon it.
> 
> I agree with all of that but for one thing:  not all distributions are built 
> on the GPL'd original.  I believe Tibco is selling an independent 
> implementation.
> 
> Duncan Murdoch


Thanks Duncan, I stand corrected. 

A quick Google search supports the point that the Tibco "TERR" system is an 
independent, closed-source, "re-implementation" of R, not based upon GPL R.

Regards,

Marc

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Using R in our commercial business application

2014-09-18 Thread Duncan Murdoch

On 18/09/2014 2:35 PM, Marc Schwartz wrote:

On Sep 18, 2014, at 4:36 AM, Pasu  wrote:

> Hi
>
> I would like to know how to use R in our commercial business application
> which we plan to host in cloud or deploy on customer's premise.
>
> 1. Using R and its package, does it enforce that my commercial business
> application should be distributed under GPL, as the statistical derivation
> (output) by using R will be presented to the end users as part of of our
> commercial business application
> 2. Whom to contact to get commercial license if required for using R?
>
> Rgds
> Pasupathy


You will not get a definitive legal opinion here and my comments below do not 
represent any formal opinion on the part of any organization.

There is nothing preventing you or your company from using R as an end user. 
There are many of us who use R in commercial settings and in general, the 
output of a GPL'd application (text or binary) is not considered to be also 
GPL'd.

The subtleties get into the distribution of R (which you seem to plan to do), 
the nature of any additional functionality/code that you or your company may 
write/distribute, how that code interacts with R and/or modifies R source code 
copyrighted by the R Foundation and others. If you distribute R to clients, you 
will need to make R's source code available to them in some manner along with 
any modifications to that same code, while preserving appropriate copyrights.

A proprietary (closed source) application cannot be licensed under the GPL, but 
your company's application/code may be forced to be GPL (the so called viral 
aspect of the GPL) depending upon how your application is implemented as I 
noted in the prior paragraph. Thus, you may be forced to make your source code 
available to your clients as well.

If you plan to move forward, you should consult with an attorney well educated 
in software licensing and distribution issues, especially as they pertain to 
the GPL. The risks are not inconsequential of falling on the wrong side of the 
GPL.

The official R distribution is not available via a commercial or developer 
license, but there are commercial vendors of R and a Google search will point 
you in their direction, if desired. However, since their products are founded 
upon the official R distribution and the GPL, they will have similar issues 
with respect to any enhancements that they have created and therefore, your 
concerns do not necessarily go away. They will have also consulted legal 
counsel on these issues because the viability of their business depends upon it.


I agree with all of that but for one thing:  not all distributions are 
built on the GPL'd original.  I believe Tibco is selling an independent 
implementation.


Duncan Murdoch

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] "missings=function(x) x[x==998|x==999]<-NA" doesn't work...

2014-09-18 Thread Ben Tupper
Hi,

On Sep 18, 2014, at 10:13 AM, Doreen Mueller  wrote:

> Hi!
> 
> I want to have a function that assigns NAs to certain values of my 
> variable "var" in the dataset "d". This doesn't work:
> 
>> missings=function(x) x[x==998|x==999]<-NA
>> missings(d$var)
>> table(d$var, useNA="always")
> 
>0  1  999  
> 220  752  321 5264 
> 
> I don't get any error messages, but "d$var" remains unchanged. The 
> function:
>> missings=function(x) x[x==90|x==99]<<-NA
> doesn't work either, and I read that "<<-" is "dangerous" anyway?
> 

You are so close.  R returns the value of the last thing evaluated in your 
function.  In this case, the *copy* of your input argument was modified within 
the function, but you didn't return the value of the copy to the calling 
environment.  You need to explicitly return the modified value.

> missings <- function(x) { x[ (x==998) | (x==999) ] <- NA ; return(x) }
> missings(990:1010)
 [1]  990  991  992  993  994  995  996  997   NA   NA 1000 1001 1002 1003 1004 
1005 1006 1007
[19] 1008 1009 1010

By the way, don't forget to switch your email client to use text instead of 
html when sending a message to the list.

Cheers,
Ben





> It is important for me to work with variable names (and therefore with 
> functions instead loops) because the number and order of variables in my 
> dataset changes regularly.
> 
> Thank you,
> Doreen
>   [[alternative HTML version deleted]]
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Data frame which includes a non-existent date

2014-09-18 Thread Richard M. Heiberger
Frank,

Dates are extremely difficult.  I recommend you do not attempt to do
your own data computations with paste().
Use the lubridate package.
> install.packages(lubridate)
> library(lubridate)
Read the end section of
> vignette("lubridate")

>From that you will most likely be wanting one of these
>  ymd("19480229") %m+% years(65)
[1] "2013-02-28 UTC"

> daydiff <-  ymd("19480229") - floor_date(ymd("19480229"), "month")
> floor_date(ymd("19480229"), "month") + years(65) + daydiff
[1] "2013-03-01 UTC"
>

Rich

On Thu, Sep 18, 2014 at 11:22 AM, Frank S.  wrote:
>
>
> Hi to all members of the list,
>
> I have a data frame with subjects who can get into a certain study from 
> 2010-01-01 onwards. Small example:
>
> DF <- data.frame(id=as.factor(1:3), born=as.Date(c("1939/10/28", 
> "1946/02/23", "1948/02/29")))
>
>   id   born
> 1  1 1939-10-28
> 2  2 1946-02-23
> 3  3 1948-02-29
>
> Now, I add a new column "enter" as follows:
>
> 1) If the subject is 65 years old before 2010-01-01, then enter=2010-01-01.
> 2) If the subject i NOT 65 years old before 2010-01-01, then enter="Date on 
> which subject reach 65"
>
> DF_new <- data.frame(DF,
>  enter= as.Date( ifelse(unclass(round(difftime(open, DF$born)/365.25,1))<=65,
> paste(year(DF$born)+65,substr(DF$born,6,10),sep="-"), paste(open))) )
>
> The problem is that the DF_new output has a NA in subject id=3:
>
>   id   born  enter
> 1  1 1939-10-28 2010-01-01
> 2  2 1946-02-23 2011-02-23
> 3  3 1948-02-29   
>
> I'm afraid (I'm not really sure) that the matter is that subject id=3 would 
> reach 65 yr at 2013-02-29, but this date does not exist,
> so R gives a missing.
>
> Can any help me?
>
> Thank you!!!
>
>
>
>
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Using R in our commercial business application

2014-09-18 Thread Marc Schwartz
On Sep 18, 2014, at 4:36 AM, Pasu  wrote:

> Hi
> 
> I would like to know how to use R in our commercial business application
> which we plan to host in cloud or deploy on customer's premise.
> 
> 1. Using R and its package, does it enforce that my commercial business
> application should be distributed under GPL, as the statistical derivation
> (output) by using R will be presented to the end users as part of of our
> commercial business application
> 2. Whom to contact to get commercial license if required for using R?
> 
> Rgds
> Pasupathy


You will not get a definitive legal opinion here and my comments below do not 
represent any formal opinion on the part of any organization.

There is nothing preventing you or your company from using R as an end user. 
There are many of us who use R in commercial settings and in general, the 
output of a GPL'd application (text or binary) is not considered to be also 
GPL'd.

The subtleties get into the distribution of R (which you seem to plan to do), 
the nature of any additional functionality/code that you or your company may 
write/distribute, how that code interacts with R and/or modifies R source code 
copyrighted by the R Foundation and others. If you distribute R to clients, you 
will need to make R's source code available to them in some manner along with 
any modifications to that same code, while preserving appropriate copyrights.

A proprietary (closed source) application cannot be licensed under the GPL, but 
your company's application/code may be forced to be GPL (the so called viral 
aspect of the GPL) depending upon how your application is implemented as I 
noted in the prior paragraph. Thus, you may be forced to make your source code 
available to your clients as well.

If you plan to move forward, you should consult with an attorney well educated 
in software licensing and distribution issues, especially as they pertain to 
the GPL. The risks are not inconsequential of falling on the wrong side of the 
GPL.

The official R distribution is not available via a commercial or developer 
license, but there are commercial vendors of R and a Google search will point 
you in their direction, if desired. However, since their products are founded 
upon the official R distribution and the GPL, they will have similar issues 
with respect to any enhancements that they have created and therefore, your 
concerns do not necessarily go away. They will have also consulted legal 
counsel on these issues because the viability of their business depends upon it.

Regards,

Marc Schwartz

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] "missings=function(x) x[x==998|x==999]<-NA" doesn't work...

2014-09-18 Thread Sarah Goslee
You need to assign the output of missings() to something. For that
matter, missings() needs some output.

d <- data.frame(a=1:5, b=6:10, var=c(1, 1, 998, 999, 2))

missings <- function(x) {
x[x==998|x==999]<-NA
x
}

d$var <- missings(d$var)


> d
  a  b var
1 1  6   1
2 2  7   1
3 3  8  NA
4 4  9  NA
5 5 10   2


Sarah

On Thu, Sep 18, 2014 at 10:13 AM, Doreen Mueller  wrote:
> Hi!
>
> I want to have a function that assigns NAs to certain values of my
> variable "var" in the dataset "d". This doesn't work:
>
>> missings=function(x) x[x==998|x==999]<-NA
>> missings(d$var)
>> table(d$var, useNA="always")
>
> 0  1  999 
>  220  752  321 5264
>
> I don't get any error messages, but "d$var" remains unchanged. The
> function:
>> missings=function(x) x[x==90|x==99]<<-NA
> doesn't work either, and I read that "<<-" is "dangerous" anyway?
>
> It is important for me to work with variable names (and therefore with
> functions instead loops) because the number and order of variables in my
> dataset changes regularly.
>
> Thank you,
> Doreen



-- 
Sarah Goslee
http://www.functionaldiversity.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R/Ubuntu, “package ‘stats’ in options(”defaultPackages“) was not found”

2014-09-18 Thread davide.chi...@gmail.com
I tried with a different mirror, but nothing changed...

Any other idea?

Thanks anyway

-- Davide

2014-09-17 10:39 GMT-04:00 Jeff Newmiller :
> Try a different mirror? Precise is getting kind of old... they may not be 
> keeping all of the old files on that mirror.
>
> ---
> Jeff NewmillerThe .   .  Go Live...
> DCN:Basics: ##.#.   ##.#.  Live Go...
>   Live:   OO#.. Dead: OO#..  Playing
> Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
> /Software/Embedded Controllers)   .OO#.   .OO#.  rocks...1k
> ---
> Sent from my phone. Please excuse my brevity.
>
> On September 17, 2014 5:51:08 AM PDT, "davide.chi...@gmail.com" 
>  wrote:
>>Yes, I've followed the instructions described here:
>>http://cran.r-project.org/bin/linux/ubuntu/README
>>
>>I've added
>>deb http:///bin/linux/ubuntu precise/
>>to the /etc/apt/sources.list file.
>>
>>Any idea?
>>
>>Thanks a lot!
>>
>>-- Davide
>>
>>2014-09-17 2:42 GMT-04:00 Jeff Newmiller :
>>> Are you using the apt sources described on CRAN for Ubuntu? I don't
>>expect stock 12.04 would give you R3.1.1, yet I have not seen this
>>problem on machines using the CRAN apt repositories.
>>>
>>---
>>> Jeff NewmillerThe .   .  Go
>>Live...
>>> DCN:Basics: ##.#.   ##.#.  Live
>>Go...
>>>   Live:   OO#.. Dead: OO#..
>>Playing
>>> Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
>>> /Software/Embedded Controllers)   .OO#.   .OO#.
>>rocks...1k
>>>
>>---
>>> Sent from my phone. Please excuse my brevity.
>>>
>>> On September 16, 2014 6:40:31 PM PDT, "davide.chi...@gmail.com"
>> wrote:
Sorry guys for the errors in my behavior. I apologize.

I installed R by using commands:
apt-get install r-base
apt-get install r-base-dev

Here's the output of sessioninfo();

> sessionInfo()
R version 3.1.1 (2014-07-10)
Platform: i686-pc-linux-gnu (32-bit)

locale:
 [1] LC_CTYPE=it_IT.UTF-8   LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8LC_COLLATE=it_IT.UTF-8
 [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=it_IT.UTF-8
 [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
 [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] graphics  grDevices utils datasets  methods   base

loaded via a namespace (and not attached):
[1] tcltk_3.1.1 tools_3.1.1


Any idea? Thanks!

-- Davide

2014-09-16 17:15 GMT-04:00 David Winsemius :
>
> On Sep 16, 2014, at 12:19 PM, davide.chi...@gmail.com wrote:
>
>> Hi guys
>> I'm having some troubles in installing the "topicmodels" package
>>in
my
>> R system on a Linux Ubuntu machine.
>> I also described the problem here: http://bit.ly/1m8Ah6Z
>
> (You were asked in the Posting Guide to not crosspost. And when you
post to Stack Overflow you should respond to requests for
>>clarification
which you have not done either. You will never get useful answers if
you don't respond to requests for clarification.)
>
>>
>> I have just installed R 3.1.1 on my Linux Ubuntu 12.04.5 LTS.
>
> More details are needed. How did you do this?
>
>
>> Then Iwanted to install the topicmodels package, and so I type
>> install.packages("topicmodels"), but the installation did not
>>work.
>
>> It seems that I do not have the "stats" package installed in my
>> default packages.
>
> That would be somewhat unusual, but possible. You were asked in the
Rhelp Posting Guide to provide the output of sessionInfo().
>
>
>
> --
> David.
>
>
> ]
>> Here's the log:
>>
>> ++LOG+START
>>
>>> install.packages("topicmodels");
>> Installing package into ‘/usr/local/lib/R/site-library’
>> (as ‘lib’ is unspecified)
>> --- Please select a CRAN mirror for use in this session ---
>> also installing the dependencies ‘modeltools’, ‘slam’, ‘tm’
>>
>> provo con l'URL
>>
'http://cran.utstat.utoronto.ca/src/contrib/modeltools_0.2-21.tar.gz'
>> Content type 'application/x-gzip' length 14794 bytes (14 Kb)
>> URL aperto
>> ==
>> downloaded 14 Kb
>>
>> provo con l'URL
'http://cran.utstat.utoronto.ca/src/contrib/slam_0.1-32.tar.gz'
>> Content type 'application/x-gzip' length 46672 bytes (45 Kb)
>> URL aperto
>> ==

[R] "missings=function(x) x[x==998|x==999]<-NA" doesn't work...

2014-09-18 Thread Doreen Mueller
Hi!

I want to have a function that assigns NAs to certain values of my 
variable "var" in the dataset "d". This doesn't work:

> missings=function(x) x[x==998|x==999]<-NA
> missings(d$var)
> table(d$var, useNA="always")

0  1  999  
 220  752  321 5264 

I don't get any error messages, but "d$var" remains unchanged. The 
function:
> missings=function(x) x[x==90|x==99]<<-NA
doesn't work either, and I read that "<<-" is "dangerous" anyway?

It is important for me to work with variable names (and therefore with 
functions instead loops) because the number and order of variables in my 
dataset changes regularly.

Thank you,
Doreen
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Data frame which includes a non-existent date

2014-09-18 Thread Frank S.


Hi to all members of the list,
 
I have a data frame with subjects who can get into a certain study from 
2010-01-01 onwards. Small example:
 
DF <- data.frame(id=as.factor(1:3), born=as.Date(c("1939/10/28", "1946/02/23", 
"1948/02/29")))

  id   born
1  1 1939-10-28
2  2 1946-02-23
3  3 1948-02-29
 
Now, I add a new column "enter" as follows:
 
1) If the subject is 65 years old before 2010-01-01, then enter=2010-01-01.
2) If the subject i NOT 65 years old before 2010-01-01, then enter="Date on 
which subject reach 65"
 
DF_new <- data.frame(DF, 
 enter= as.Date( ifelse(unclass(round(difftime(open, DF$born)/365.25,1))<=65,
paste(year(DF$born)+65,substr(DF$born,6,10),sep="-"), paste(open))) )
 
The problem is that the DF_new output has a NA in subject id=3:
 
  id   born  enter
1  1 1939-10-28 2010-01-01
2  2 1946-02-23 2011-02-23
3  3 1948-02-29   
 
I'm afraid (I'm not really sure) that the matter is that subject id=3 would 
reach 65 yr at 2013-02-29, but this date does not exist,
so R gives a missing.
 
Can any help me?
 
Thank you!!!
 
 
 
 

  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Pseudo R squared for quantile regression with replicates

2014-09-18 Thread David L Carlson
It is hard to say because we do not have enough information. R has 
approximately 6,000 packages and you have not told us which ones you are using. 
You have not told us much about your data and you have not told us where to 
find the query from August 2006. The basic problem is that your "fit" is not 
the same as the "f" in the query. Your fit object is not very complicated. If 
you look at the output from str(fit) you will see that fit is an "atomic" 
vector (note the wording in your error message) with a series of attributes 
that are probably documented in the help pages for the functions you are using. 
There is nothing called resid inside fit. It is likely that the post you are 
looking at refers to the output from rq(...) or perhaps predict(rq(...)), but 
not the output from withReplicates(..., quote(coef(rq(... which is what fit 
is.

-
David L Carlson
Department of Anthropology
Texas A&M University
College Station, TX 77840-4352

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Donia Smaali Bouhlila
Sent: Thursday, September 18, 2014 9:54 AM
To: r-help@r-project.org
Subject: [R] Pseudo R squared for quantile regression with replicates

Hi,


I am a new user of r software. I intend to do quantile regressions with 
complex survey data using replicate method. I have ran the following 
commands successfully:


  mydesign 
<-svydesign(ids=~IDSCHOOL,strata=~IDSTRATE,data=TUN,nest=TRUE,weights=~TOTWGT) 
bootdesign <- as.svrepdesign(mydesign,type="auto",replicates=150)

  fit<- 
withReplicates(bootdesign,quote(coef(rq(Math1~Female+Age+calculator+computer+desk+
 
+ 
dictionary+internet+work+Book2+Book3+Book4+Book5+Pedu1+Pedu2+Pedu3+Pedu4+Born1+Born2,tau=0.5,weights=.weights,
 
method="fn"




I want get the pseudo R squared but I failed. I read a query dating from 
August 2006, [R] Pseudo R for Quant Reg and the answer to it:


rho <- function(u,tau=.5)u*(tau - (u < 0))
  V <- sum(rho(f$resid, f$tau))


  I copied it and paste it , replacing f by fit I get this error message:
Error in fit$resid : $ operator is invalid for atomic vectors, I don't 
know what it means

The fit object is likely to be quite complicated  I used str() to see 
what it looks like:



str (fit)
Class 'svrepstat'  atomic [1:19] 713.24 -24.01 -18.37 9.05 7.71 ...
   ..- attr(*, "var")= num [1:19, 1:19] 2839.3 10.2 -122.1 -332.4 -42.3 
...
   .. ..- attr(*, "dimnames")=List of 2
   .. .. ..$ : chr [1:19] "(Intercept)" "Female" "Age" "calculator" ...
   .. .. ..$ : chr [1:19] "(Intercept)" "Female" "Age" "calculator" ...
   .. ..- attr(*, "means")= Named num [1:19] 710.97 -24.03 -18.3 9.39 
7.58 ...
   .. .. ..- attr(*, "names")= chr [1:19] "(Intercept)" "Female" "Age" 
"calculator" ...
   ..- attr(*, "statistic")= chr "theta"

How can I retrieve the residuals?? and calculate the pseudo R squared??


Any help please


-- 
Dr. Donia Smaali Bouhlila
Associate-Professor
Department of Economics
Faculté des Sciences Economiques et de Gestion de Tunis

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Using R in our commercial business application

2014-09-18 Thread Pasu
Hi

I would like to know how to use R in our commercial business application
which we plan to host in cloud or deploy on customer's premise.

1. Using R and its package, does it enforce that my commercial business
application should be distributed under GPL, as the statistical derivation
(output) by using R will be presented to the end users as part of of our
commercial business application
2. Whom to contact to get commercial license if required for using R?

Rgds
Pasupathy

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Using read.csv.sql() to read in specific columns

2014-09-18 Thread Doran, Harold
I am dealing with data frames that have thousands of columns and hundreds of 
thousands of rows and only need a few specific columns from the data. The data 
take various formats, but normally are tab-delimited.

I have written the following which is working as expected. However, because I�m 
so new at using sqldf(), just looking for some verification from users that 
this is in fact efficient and correct in the R-ish sense of the word and 
generalizable to larger data sets.

Harold

tmp <- data.frame(replicate(50, rnorm(10)))
names(tmp) <- paste('item', 1:50, sep='')
write.table(tmp, 'tmp.txt')
read.csv.sql("tmp.txt", sql = "select item1, item2, item50 from file", sep = ' 
')

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Failure with .Rprofile on Mac OS X

2014-09-18 Thread Gang Chen
When R starts in GUI (e.g., /Applications/R.app/Contents/MacOS/R) on
my Mac OS X 10.7.5, the startup configuration in .Rprofile works fine.
However, when R starts on the terminal (e.g.,
/Library/Frameworks/R.framework/Resources/bin/R), it does not work at
all. What could be the reason for the failure?

Thanks,
Gang

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Pseudo R squared for quantile regression with replicates

2014-09-18 Thread Donia Smaali Bouhlila

Hi,


I am a new user of r software. I intend to do quantile regressions with 
complex survey data using replicate method. I have ran the following 
commands successfully:



 mydesign 
<-svydesign(ids=~IDSCHOOL,strata=~IDSTRATE,data=TUN,nest=TRUE,weights=~TOTWGT) 
bootdesign <- as.svrepdesign(mydesign,type="auto",replicates=150)


 fit<- 
withReplicates(bootdesign,quote(coef(rq(Math1~Female+Age+calculator+computer+desk+ 
+ 
dictionary+internet+work+Book2+Book3+Book4+Book5+Pedu1+Pedu2+Pedu3+Pedu4+Born1+Born2,tau=0.5,weights=.weights, 
method="fn"





I want get the pseudo R squared but I failed. I read a query dating from 
August 2006, [R] Pseudo R for Quant Reg and the answer to it:



rho <- function(u,tau=.5)u*(tau - (u < 0))
 V <- sum(rho(f$resid, f$tau))


 I copied it and paste it , replacing f by fit I get this error message:
Error in fit$resid : $ operator is invalid for atomic vectors, I don't 
know what it means


The fit object is likely to be quite complicated  I used str() to see 
what it looks like:




str (fit)
Class 'svrepstat'  atomic [1:19] 713.24 -24.01 -18.37 9.05 7.71 ...
  ..- attr(*, "var")= num [1:19, 1:19] 2839.3 10.2 -122.1 -332.4 -42.3 
...

  .. ..- attr(*, "dimnames")=List of 2
  .. .. ..$ : chr [1:19] "(Intercept)" "Female" "Age" "calculator" ...
  .. .. ..$ : chr [1:19] "(Intercept)" "Female" "Age" "calculator" ...
  .. ..- attr(*, "means")= Named num [1:19] 710.97 -24.03 -18.3 9.39 
7.58 ...
  .. .. ..- attr(*, "names")= chr [1:19] "(Intercept)" "Female" "Age" 
"calculator" ...

  ..- attr(*, "statistic")= chr "theta"

How can I retrieve the residuals?? and calculate the pseudo R squared??


Any help please


--
Dr. Donia Smaali Bouhlila
Associate-Professor
Department of Economics
Faculté des Sciences Economiques et de Gestion de Tunis

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R "write" strange behavior in huge file

2014-09-18 Thread Maxime Vallee
Thank you, it is exactly that.

I have followed your idea of chunks (1 GB chunks, on the safe side), and 
appended them. Worked like charm, thank you.

--Maxime



From: "Stefan Evert (Mailing Lists)" 
mailto:stefa...@collocations.de>>
Date: mercredi 17 septembre 2014 15:39
To: Maxime Vall�e mailto:vall...@iarc.fr>>
Cc: R-help Mailing List mailto:r-help@r-project.org>>
Subject: Re: [R] R "write" strange behavior in huge file

You probably told R to write out the file as a single long line with fields 
separated alternately by 380 TABs and one newline � that�s what the ncol 
argument does (write is just a small wrapper around cat()).

cat() doesn�t print lines that are longer than 2 GiB, so it will insert an 
extra \n after every 2 GiB of data. (IIRC, this is because in the C code, 
fill=FALSE is replaced by fill=MAX_INT or so.)

The only way around this limitation that I can think of is to write a wrapper 
function that breaks up the matrix or list of vectors in smaller chunks and 
appends them separately to the output file.  I�m planning to add such a 
function to one of my packages, so I�d be interested if somebody has a better 
solution.

Best,
Stefan


On 16 Sep 2014, at 18:54, Maxime Vallee 
mailto:vall...@iarc.fr>> wrote:

In my script I have one list of 1,132,533 vectors (each vector contains
381 elements).

When I use "write" to save this list in a flat text file (I unlist my
list, separate by tabs, and set ncol to 381), I end up with a file of
1,132,535 lines (2 additional lines). I checked back, my R list do not
have those two additional items before writing.

With awk, I determined if lines where not made of 381 fields: there were
two, separated by around 400k lines.

I made sub-files, using those "incomplete" lines as boundaries. My files
are very close in size : 1.9 GB (respectively 1971841853 B and 1972614897
B). It feels like a 32 bit / 64 bit issue.

My R version is this:
./Rscript -e 'sessionInfo()$platform'
[1] "x86_64-unknown-linux-gnu (64-bit)"

There is somewhere, reaching 1.9 GB, something that is changing my tabs to
unwanted carriage returns...
Any idea that might cause this, and if it looks solvable in R?


---
This message and its attachments are strictly confidenti...{{dropped:11}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] apply block of if statements with menu function

2014-09-18 Thread rl

On 2014-09-16 12:35, PIKAL Petr wrote:


So if result of menu is 0 (you did not choose anything) you can either
stay with 0, then switch does not return anything or add 1 and let
evaluate something meaningful specified in second and following
positions of switch command.



Thanks for your explanation, which completed my understanding! :) For 
the benefit of other novices, below is an example to demonstrate how 
'switch' and 'menu' can be used:


switch(menu(c(1,2),graphics=FALSE,title='select something'), 
{(seq(1:10))}, {(rnorm(20))})


However, how to make the option '0 to exit' to appear in the command 
terminal?


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Extract model from deriv3 or nls

2014-09-18 Thread Riley, Steve
Hello!

I am trying to figure out how to extract the model equation when using deriv3 
with nls.

Here is my example:
#
# Generate derivatives
#
Puro.fun2 <- deriv3(expr = ~(Vmax + VmaxT*state) * conc/(K + Kt * state + conc),
name = c("Vmax","VmaxT","K","Kt"),
function.arg = function(conc, state, Vmax, VmaxT, K, Kt) 
NULL)
#
# Fit model using derivative function
#
Puro.fit1 <- nls(rate ~ Puro.fun2(conc, state == "treated", Vmax, VmaxT, K, Kt),
 data = Puromycin,
 start = c(Vmax = 160, VmaxT = 47, K = 0.043, Kt = 0.05))

Normally I would use summary(Puro.fit1)$formula to extract the model but 
because I am implementing deriv3, the following gets returned:

> summary(Puro.fit1)$formula
rate ~ Puro.fun2(conc, state == "treated", Vmax, VmaxT, K, Kt)

What I would like to do is find something that returns:

rate ~ (Vmax + VmaxT*state) * conc/(K + Kt * state + conc)

Is there a way to extract this? Please advise. Thanks for your time.

Steve
860-441-3435


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Training a model using glm

2014-09-18 Thread Mohan Radhakrishnan
Thanks Max and Dennis. Based on the syntax change I got the result for the
PCA part also.

training2 <- training[,grepl("^IL",names(training))]


preProc <- preProcess(training2,method="pca",thresh=0.8)

test2 <- testing[,grepl("^IL",names(testing))]


trainpca <- predict(preProc, training2)

testpca <- predict(preProc, test2)


modelFitpca <- train(training1$diagnosis ~ .,method="glm",data=trainpca)


confusionMatrix(test1$diagnosis,predict(modelFitpca, testpca))


Mohan

On Thu, Sep 18, 2014 at 12:43 PM, Mohan Radhakrishnan <
radhakrishnan.mo...@gmail.com> wrote:

> Oh. I understand now. There is nothing wrong with the logic. It is the
> syntax.
>
> > library(AppliedPredictiveModeling)
>
> *Warning message:*
>
> *package ‘AppliedPredictiveModeling’ was built under R version 3.1.1 *
>
> > set.seed(3433)
>
> > data(AlzheimerDisease)
>
> > adData = data.frame(diagnosis,predictors)
>
> > inTrain = createDataPartition(adData$diagnosis, p = 3/4)[[1]]
>
> > training = adData[ inTrain,]
>
> > testing = adData[-inTrain,]
>
> > training1 <- training[,grepl("^IL|^diagnosis",names(training))]
>
> >
>
> > test1 <- testing[,grepl("^IL|^diagnosis",names(testing))]
>
> > modelFit <- train(diagnosis ~ .,method="glm",data=training1)
>
> > confusionMatrix(test1$diagnosis,predict(modelFit, test1))
>
> Confusion Matrix and Statistics
>
>
>   Reference
>
> Prediction Impaired Control
>
>   Impaired2  20
>
>   Control 9  51
>
>
>
>Accuracy : 0.6463
>
>  95% CI : (0.533, 0.7488)
>
> No Information Rate : 0.8659
>
> P-Value [Acc > NIR] : 1.0
>
>
>
>   Kappa : -0.0702
>
>  Mcnemar's Test P-Value : 0.06332
>
>
>
> Sensitivity : 0.18182
>
> Specificity : 0.71831
>
>  Pos Pred Value : 0.09091
>
>  Neg Pred Value : 0.85000
>
>  Prevalence : 0.13415
>
>  Detection Rate : 0.02439
>
>Detection Prevalence : 0.26829
>
>   Balanced Accuracy : 0.45006
>
>
>
>'Positive' Class : Impaired
>
>
> Thanks,
>
> Mohan
>
> On Thu, Sep 18, 2014 at 12:21 AM, Max Kuhn  wrote:
>
>> You have not shown all of your code and it is difficult to diagnose the
>> issue.
>>
>> I assume that you are using the data from:
>>
>>library(AppliedPredictiveModeling)
>>data(AlzheimerDisease)
>>
>> If so, there is example code to analyze these data in that package. See
>> ?scriptLocation.
>>
>> We have no idea how you got to the `training` object (package versions
>> would be nice too).
>>
>> I suspect that Dennis is correct. Try using more normal syntax without
>> the $ indexing in the formula. I wouldn't say it is (absolutely) wrong but
>> it doesn't look right either.
>>
>> Max
>>
>>
>> On Wed, Sep 17, 2014 at 2:04 PM, Mohan Radhakrishnan <
>> radhakrishnan.mo...@gmail.com> wrote:
>>
>>> Hi Dennis,
>>>
>>>  Why is there that warning ? I think my syntax is
>>> right. Isn't it not? So the warning can be ignored ?
>>>
>>> Thanks,
>>> Mohan
>>>
>>> On Wed, Sep 17, 2014 at 9:48 PM, Dennis Murphy 
>>> wrote:
>>>
>>> > No reproducible example (i.e., no data) supplied, but the following
>>> > should work in general, so I'm presuming this maps to the caret
>>> > package as well. Thoroughly untested.
>>> >
>>> > library(caret)# something you failed to mention
>>> >
>>> > ...
>>> > modelFit <- train(diagnosis ~ ., data = training1)# presumably a
>>> > logistic regression
>>> > confusionMatrix(test1$diagnosis, predict(modelFit, newdata = test1,
>>> > type = "response"))
>>> >
>>> > For GLMs, there are several types of possible predictions. The default
>>> > is 'link', which associates with the linear predictor. caret may have
>>> > a different syntax so you should check its help pages re the supported
>>> > predict methods.
>>> >
>>> > Hint: If a function takes a data = argument, you don't need to specify
>>> > the variables as components of the data frame - the variable names are
>>> > sufficient. You should also do some reading to understand why the
>>> > model formula I used is correct if you're modeling one variable as
>>> > response and all others in the data frame as covariates.
>>> >
>>> > Dennis
>>> >
>>> > On Tue, Sep 16, 2014 at 11:15 PM, Mohan Radhakrishnan
>>> >  wrote:
>>> > > I answered this question which was part of the online course
>>> correctly by
>>> > > executing some commands and guessing.
>>> > >
>>> > > But I didn't get the gist of this approach though my R code works.
>>> > >
>>> > > I have a training and test dataset.
>>> > >
>>> > >> nrow(training)
>>> > >
>>> > > [1] 251
>>> > >
>>> > >> nrow(testing)
>>> > >
>>> > > [1] 82
>>> > >
>>> > >> head(training1)
>>> > >
>>> > >diagnosisIL_11IL_13IL_16   IL_17E IL_1alpha  IL_3
>>> > > IL_4
>>> > >
>>> > > 6   Impaired 6.103215 1.282549 2.671032 3.637051 -8.180721 -3.863233
>>> > > 1.208960
>>> > >
>>> > > 10  Impaired 4.593226 1.269463 3.476091 3.637051 -7.369791 -4.017384
>>> > > 1.808289
>

Re: [R] Training a model using glm

2014-09-18 Thread Mohan Radhakrishnan
Oh. I understand now. There is nothing wrong with the logic. It is the
syntax.

> library(AppliedPredictiveModeling)

*Warning message:*

*package ‘AppliedPredictiveModeling’ was built under R version 3.1.1 *

> set.seed(3433)

> data(AlzheimerDisease)

> adData = data.frame(diagnosis,predictors)

> inTrain = createDataPartition(adData$diagnosis, p = 3/4)[[1]]

> training = adData[ inTrain,]

> testing = adData[-inTrain,]

> training1 <- training[,grepl("^IL|^diagnosis",names(training))]

>

> test1 <- testing[,grepl("^IL|^diagnosis",names(testing))]

> modelFit <- train(diagnosis ~ .,method="glm",data=training1)

> confusionMatrix(test1$diagnosis,predict(modelFit, test1))

Confusion Matrix and Statistics


  Reference

Prediction Impaired Control

  Impaired2  20

  Control 9  51



   Accuracy : 0.6463

 95% CI : (0.533, 0.7488)

No Information Rate : 0.8659

P-Value [Acc > NIR] : 1.0



  Kappa : -0.0702

 Mcnemar's Test P-Value : 0.06332



Sensitivity : 0.18182

Specificity : 0.71831

 Pos Pred Value : 0.09091

 Neg Pred Value : 0.85000

 Prevalence : 0.13415

 Detection Rate : 0.02439

   Detection Prevalence : 0.26829

  Balanced Accuracy : 0.45006



   'Positive' Class : Impaired


Thanks,

Mohan

On Thu, Sep 18, 2014 at 12:21 AM, Max Kuhn  wrote:

> You have not shown all of your code and it is difficult to diagnose the
> issue.
>
> I assume that you are using the data from:
>
>library(AppliedPredictiveModeling)
>data(AlzheimerDisease)
>
> If so, there is example code to analyze these data in that package. See
> ?scriptLocation.
>
> We have no idea how you got to the `training` object (package versions
> would be nice too).
>
> I suspect that Dennis is correct. Try using more normal syntax without the
> $ indexing in the formula. I wouldn't say it is (absolutely) wrong but it
> doesn't look right either.
>
> Max
>
>
> On Wed, Sep 17, 2014 at 2:04 PM, Mohan Radhakrishnan <
> radhakrishnan.mo...@gmail.com> wrote:
>
>> Hi Dennis,
>>
>>  Why is there that warning ? I think my syntax is
>> right. Isn't it not? So the warning can be ignored ?
>>
>> Thanks,
>> Mohan
>>
>> On Wed, Sep 17, 2014 at 9:48 PM, Dennis Murphy  wrote:
>>
>> > No reproducible example (i.e., no data) supplied, but the following
>> > should work in general, so I'm presuming this maps to the caret
>> > package as well. Thoroughly untested.
>> >
>> > library(caret)# something you failed to mention
>> >
>> > ...
>> > modelFit <- train(diagnosis ~ ., data = training1)# presumably a
>> > logistic regression
>> > confusionMatrix(test1$diagnosis, predict(modelFit, newdata = test1,
>> > type = "response"))
>> >
>> > For GLMs, there are several types of possible predictions. The default
>> > is 'link', which associates with the linear predictor. caret may have
>> > a different syntax so you should check its help pages re the supported
>> > predict methods.
>> >
>> > Hint: If a function takes a data = argument, you don't need to specify
>> > the variables as components of the data frame - the variable names are
>> > sufficient. You should also do some reading to understand why the
>> > model formula I used is correct if you're modeling one variable as
>> > response and all others in the data frame as covariates.
>> >
>> > Dennis
>> >
>> > On Tue, Sep 16, 2014 at 11:15 PM, Mohan Radhakrishnan
>> >  wrote:
>> > > I answered this question which was part of the online course
>> correctly by
>> > > executing some commands and guessing.
>> > >
>> > > But I didn't get the gist of this approach though my R code works.
>> > >
>> > > I have a training and test dataset.
>> > >
>> > >> nrow(training)
>> > >
>> > > [1] 251
>> > >
>> > >> nrow(testing)
>> > >
>> > > [1] 82
>> > >
>> > >> head(training1)
>> > >
>> > >diagnosisIL_11IL_13IL_16   IL_17E IL_1alpha  IL_3
>> > > IL_4
>> > >
>> > > 6   Impaired 6.103215 1.282549 2.671032 3.637051 -8.180721 -3.863233
>> > > 1.208960
>> > >
>> > > 10  Impaired 4.593226 1.269463 3.476091 3.637051 -7.369791 -4.017384
>> > > 1.808289
>> > >
>> > > 11  Impaired 6.919778 1.274133 2.154845 4.749337 -7.849364 -4.509860
>> > > 1.568616
>> > >
>> > > 12  Impaired 3.218759 1.286356 3.593860 3.867347 -8.047190 -3.575551
>> > > 1.916923
>> > >
>> > > 13  Impaired 4.102821 1.274133 2.876338 5.731246 -7.849364 -4.509860
>> > > 1.808289
>> > >
>> > > 16  Impaired 4.360856 1.278484 2.776394 5.170380 -7.662778 -4.017384
>> > > 1.547563
>> > >
>> > >  IL_5   IL_6 IL_6_Receptor IL_7 IL_8
>> > >
>> > > 6  -0.4004776  0.1856864   -0.51727788 2.776394 1.708270
>> > >
>> > > 10  0.1823216 -1.53427580.09668586 2.154845 1.701858
>> > >
>> > > 11  0.1823216 -1.09654120.35404039 2.924466 1.719944
>> > >
>> > > 12  0.3364722 -0.39871860.09668586 2.924466 1.675557
>> > >
>> > > 13  0.000  0.4223589