from:"Wensui Liu"

[R] install packages automatically

2007-09-10 Thread Wensui Liu

Dear Listers,
I am a little tired of installing all packages I want every time when
I instill a new version of R.
Say, if I have a list of packages I need to use, is it possible to
tell R to install them all for me automatically rather than I install
them one by one?
Thx.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] off-topic: better OS for statistical computing

2007-09-10 Thread Wensui Liu

Good morning, everyone,
I am sorry for this off-topic post but think I can get great answer
from this list.
My question is what is the best OS on PC (laptop) for statistical
computing and why.
I really appreciate your insight.
Have a nice day.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] sqldf rocks

2007-09-07 Thread Wensui Liu

Man,
I love this package and the guy who contributes it!


-- 
===
"I am dying with the help of too many
physicians." - Alexander the Great, on his deathbed
=======
WenSui Liu
(http://spaces.msn.com/statcompute/blog)

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] size limitations in R

2007-08-31 Thread Wensui Liu

can't agree more with Danial.
I love sqlite db and use it to exchange data between R, python, and
SAS. data stored in sqlite is 100 times better than in csv, because
all data attributes can be preserved.

On 8/31/07, Daniel Lakeland <[EMAIL PROTECTED]> wrote:
> On Fri, Aug 31, 2007 at 01:31:12PM +0100, Fabiano Vergari wrote:
>
> > I am a SAS user currently evaluating R as a possible addition or
> > even replacement for SAS. The difficulty I have come across straight
> > away is R's apparent difficulty in handling relatively large data
> > files. Whilst I would not expect it to handle datasets with millions
> > of records, I still really need to be able to work with dataset with
> > 100,000+ records and 100+ variables. Yet, when reading a .csv file
> > with 180,000 records and about 200 variables, the software virtually
> > ground to a halt (I stopped it after 1 hour). Are there guidelines
> > or maybe a limitations document anywhere that helps me assess the
> > size
>
> 180k records with 200 variables = 36 million entries, if they're
> numeric then they're doubles taking up 8 bytes, so 288 MB of RAM. This
> should be perfectly fine for R, as long as you have that much free
> RAM.
>
> However, the routines that read CSV and tabular delimited files are
> relatively inefficient for such large files.
>
> In order to handle large data files, it is better to use one of the
> database interfaces. My preference would be sqlite unless I already
> had the data on a mysql or other database server.
>
> the documentation for the packages RSQLite and SQLiteDF should be
> helpful, as well as the documentation for SQLite itself, which has a
> facility for efficiently importing CSV and similar files directly to a
> SQLite database.
>
> eg: http://netadmintools.com/art572.html
>
>
>
> --
> Daniel Lakeland
> [EMAIL PROTECTED]
> http://www.street-artists.org/~dlakelan
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
===
"I am dying with the help of too many
physicians." - Alexander the Great, on his deathbed
===
WenSui Liu
(http://spaces.msn.com/statcompute/blog)

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] nnet 10-fold cross-validation

2007-07-23 Thread Wensui Liu

there is no such thing in nnet(), if i understand correctly.
how hard it is to code one though?

On 7/23/07, S.O. Nyangoma <[EMAIL PROTECTED]> wrote:
> Hi
> It clear that to do a classification with svm under 10-fold cross
> validation one uses
>
> svm(Xm, newlabs, type = "C-classification", kernel = "linear",cross =
> 10)
>
> What corresponds to the nnet?
> nnet(.,cross=10)?
>
> Regards
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
===
"I am dying with the help of too many
physicians." - Alexander the Great, on his deathbed
===
WenSui Liu
(http://spaces.msn.com/statcompute/blog)

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] package with roc, sensitivity, specificity, kappa etc

2007-07-01 Thread Wensui Liu

for ROC and AUC calculation, you might try verification package.

On 7/1/07, Fredrik Lundgren <[EMAIL PROTECTED]> wrote:
> Dear Guru's,
>
> Is there a package (R of course) with programs for diagnostics - roc,
> sens , spec, kappa etc?
>
> Best wishes Fredrik L
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
WenSui Liu
A lousy statistician who happens to know a little programming
(http://spaces.msn.com/statcompute/blog)

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] method of rpart when response variable is binary?

2007-06-15 Thread Wensui Liu

you might use default setting if you use as.factor(y)~x in rpart(), I think.

On 6/15/07, ronggui <[EMAIL PROTECTED]> wrote:
> Dear all,
>
> I would like to model the relationship between y and x. y is binary
> variable, and x is a count variable which may be possion-distribution.
>
> I think it is better to divide x into intervals and change it to a
> factor before calling glm(y~x,data=dat,family=binomail).
>
> I try to use rpart. As y is binary, I use "class" method and get the
> following result.
> > rpart(y~x,data=dat,method="class")
> n=778 (22 observations deleted due to missingness)
>
> node), split, n, loss, yval, (yprob)
>   * denotes terminal node
>
> 1) root 778 67 0 (0.91388175 0.08611825) *
>
>
> If with the default method, I get such a result.
>
> > rpart(y~x,data=dat)
> n=778 (22 observations deleted due to missingness)
>
> node), split, n, deviance, yval
>   * denotes terminal node
>
> 1) root 778 61.230080 0.08611825
>   2) x< 19.5 750 53.514670 0.0773
> 4) x< 1.25 390 17.169230 0.04615385 *
> 5) x>=1.25 360 35.60 0.1110 *
>   3) x>=19.5 28  6.107143 0.32142860 *
>
> If I use 1.25 and 19.5 as the cutting points, change x into factor by
> >x2 <- cut(q34b,breaks=c(0,1.25,19.5,200),right=F)
>
> The coef in y~x2 is significant and makes sense.
>
> My problem is: is it OK use the default method in rpart when response
> varibale is binary one?  Thanks.
>
>
> --
> Ronggui Huang
> Department of Sociology
> Fudan University, Shanghai, China
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
WenSui Liu
A lousy statistician who happens to know a little programming
(http://spaces.msn.com/statcompute/blog)

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] "R is not a validated software package.."

2007-06-08 Thread Wensui Liu

Bert,
I just want to make sure what I said is not overstated to offend
statistician who use SAS. actually, i am using SAS daily and able to
use it pretty well. ^_^
What I meant are:
1) I don't understand the mentality
2) using SAS instead of R might be related to job-security.
which is very different from "their mentality is related to job security".

On 6/8/07, Bert Gunter <[EMAIL PROTECTED]> wrote:
> Frank et. al:
>
> I believe this is a bit too facile. 21 CFR Part 11 does necessitate a
> software validation **process** -- but this process does not require any
> particular software. Rather, it requires that those using whatever software
> demonstrate to the FDA's satisfaction that the software does what it's
> supposed to do appropriately. This includes a lot more than assuring, say,
> the numerical accuracy of computations; I think it also requires
> demonstration that the data are "secure," that it is properly transferred
> from one source to another, etc. I assume that the statistical validation of
> R would be relatively simple, as R already has an extensive test suite, and
> it would simply be a matter of providing that test suite info. A bit more
> might be required, but I don't think it's such a big deal.
>
> I think Wensui Liu's characterization of clinical statisticians as having a
> mentality "related to job security" is a canard. Although I work in
> nonclinical, my observation is that clinical statistics is complex and
> difficult, not only because of many challenging statistical issues, but also
> because of the labyrinthian complexities of the regulated and extremely
> costly environment in which they work. It is certainly a job that I could
> not do.
>
> That said, probably the greatest obstacle to change from SAS is neither
> obstinacy nor ignorance, but rather inertia: pharmaceutical companies have
> over the decades made a huge investment in SAS infrastructure to support the
> collection, organization, analysis, and submission of data for clinical
> trials. To convert this to anything else would be a herculean task involving
> huge expense, risk, and resources. R, S-Plus (and much else -- e.g. numerous
> "unvalidated" data mining software packages) are routinely used by clinical
> statisticians to better understand their data and for "exploratory" analyses
> that are used to supplement official analyses (e.g. for trying to justify
> collection of tissue samples or a pivotal study in a patient subpopulation).
> But it is difficult for me to see how one could make a business case to
> change clinical trial analysis software infrastructure from SAS to S-Plus,
> SPSS, or anything else.
>
> **DISCLAINMER**
> My opinions only. They do not in any way represent the view of my company or
> its employees.
>
>
> Bert Gunter
> Genentech Nonclinical Statistics
> South San Francisco, CA 94404
> 650-467-7374
>
>
> -Original Message-
> From: [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED] On Behalf Of Frank E Harrell Jr
> Sent: Friday, June 08, 2007 7:45 AM
> To: Giovanni Parrinello
> Cc: r-help@stat.math.ethz.ch
> Subject: Re: [R] "R is not a validated software package.."
>
> Giovanni Parrinello wrote:
> > Dear All,
> > discussing with a statistician of a pharmaceutical company I received
> > this answer about the statistical package that I have planned to use:
> >
> > As R is not a validated software package, we would like to ask if it
> > would rather be possible for you to use SAS, SPSS or another approved
> > statistical software system.
> >
> > Could someone suggest me a 'polite' answer?
> > TIA
> > Giovanni
> >
>
> Search the archives and you'll find a LOT of responses.
>
> Briefly, in my view there are no requirements, just some pharma
> companies that think there are.  FDA is required to accepted all
> submissions, and they get some where only Excel was used, or Minitab,
> and lots more.  There is a session on this at the upcoming R
> International Users Meeting in Iowa in August.  The session will include
> dicussions of federal regulation compliance for R, for those users who
> feel that such compliance is actually needed.
>
> Frank
>
> --
> Frank E Harrell Jr   Professor and Chair   School of Medicine
>   Department of Biostatistics   Vanderbilt University
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproduc

Re: [R] "R is not a validated software package.."

2007-06-08 Thread Wensui Liu

agree with Frank.
as far as I've known, FDA doesn't encourage or discourage the usage of
software.

On 6/8/07, Frank E Harrell Jr <[EMAIL PROTECTED]> wrote:
> Giovanni Parrinello wrote:
> > Dear All,
> > discussing with a statistician of a pharmaceutical company I received
> > this answer about the statistical package that I have planned to use:
> >
> > As R is not a validated software package, we would like to ask if it
> > would rather be possible for you to use SAS, SPSS or another approved
> > statistical software system.
> >
> > Could someone suggest me a 'polite' answer?
> > TIA
> > Giovanni
> >
>
> Search the archives and you'll find a LOT of responses.
>
> Briefly, in my view there are no requirements, just some pharma
> companies that think there are.  FDA is required to accepted all
> submissions, and they get some where only Excel was used, or Minitab,
> and lots more.  There is a session on this at the upcoming R
> International Users Meeting in Iowa in August.  The session will include
> dicussions of federal regulation compliance for R, for those users who
> feel that such compliance is actually needed.
>
> Frank
>
> --
> Frank E Harrell Jr   Professor and Chair   School of Medicine
>   Department of Biostatistics   Vanderbilt University
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
WenSui Liu
A lousy statistician who happens to know a little programming
(http://spaces.msn.com/statcompute/blog)

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] "R is not a validated software package.."

2007-06-08 Thread Wensui Liu

I like to know the answer as well.
To be honest, I really have hard time to understand the mentality of
clinical trial guys and rather believe it is something related to job
security.

On 6/8/07, Giovanni Parrinello <[EMAIL PROTECTED]> wrote:
> Dear All,
> discussing with a statistician of a pharmaceutical company I received
> this answer about the statistical package that I have planned to use:
>
> As R is not a validated software package, we would like to ask if it
> would rather be possible for you to use SAS, SPSS or another approved
> statistical software system.
>
> Could someone suggest me a 'polite' answer?
> TIA
> Giovanni
>
> --
> dr. Giovanni Parrinello
> External Lecturer
> Medical Statistics Unit
> Department of Biomedical Sciences
> Viale Europa, 11 - 25123 Brescia Italy
> Tel: +390303717528
> Fax: +390303717488
> email: [EMAIL PROTECTED]
>
>
> [[alternative HTML version deleted]]
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
WenSui Liu
A lousy statistician who happens to know a little programming
(http://spaces.msn.com/statcompute/blog)

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Tools For Preparing Data For Analysis

2007-06-08 Thread Wensui Liu

 I've found new
> > employment and living quarters and settled in, I will continue to
> > enhance Vilno in my spare time.
> > The founder: that would be me, Robert Wilkins
> > Find it at: code.google.com/p/vilno ( GNU GPL )
> > ( In particular, the tarball at code.google.com/p/vilno/downloads/list
> > , since I have yet to figure out how to use Subversion ).
> >
> >
> > 4. Who knows?
> > It was not easy to find out about the existence of DAP and PSPP. So
> > who knows what else is out there. However, I think you'll find a lot
> > more statistics software ( regression , etc ) out there, and not so
> > much data transformation software. Not many people work on data
> > preparation software. In fact, the category is so obscure that there
> > isn't one agreed term: data cleaning , data munging , data crunching ,
> > or just getting the data ready for analysis.
> >
> > ______
> > R-help@stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
>
> --
> Frank E Harrell Jr   Professor and Chair   School of Medicine
>   Department of Biostatistics   Vanderbilt University
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
WenSui Liu
A lousy statistician who happens to know a little programming
(http://spaces.msn.com/statcompute/blog)

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Neural Net. in R

2007-06-06 Thread Wensui Liu

Hi, there,
I am surprised you didn't mention nnet package.
You can find very good information in Dr Ripley's MASS book about the
usage of nnet package.

On 6/5/07, Ehsan Rasa <[EMAIL PROTECTED]> wrote:
> Hi everyone,
>
> I'm a graduate student of engineering, lately introduced with R. and using R
> for my project and thesis. I'm trying to use R for implementing a neural
> network regression model and apply it to my database. I found three R
> packages ("AMORE" , "grnnR" , "neural") in R website, but their manuals are
> not really user-friendly in my idea. I was wondering if anyone has a written
> code in R using any of these packages for a feed-forward back-propagation
> neural network in R that I can use it. That'll be a remedy for my nightmare
> which already took quite time from me.
> I would really appreciate it.
>
> Sincerely,
> Jason.
>
> [[alternative HTML version deleted]]
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
WenSui Liu
A lousy statistician who happens to know a little programming
(http://spaces.msn.com/statcompute/blog)

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Import data from Access

2007-05-31 Thread Wensui Liu

library(RODBC);
mdbConnect <- odbcConnectAccess("C:\\db.mdb");
data <- sqlFetch(mdbConnect, "tblData");
odbcClose(mdbConnect);

On 5/31/07, livia <[EMAIL PROTECTED]> wrote:
>
>
> Hi, I want to import some data from Access and I am using the following
> codes:
>
> testdb <- file.path("c/../db1")
> channel <- odbcConnect("testdb")
> sqlFetch(channel,"tbl",colnames = TRUE, rownames = FALSE)
>
> It comes out the error message:
>
> 1: [RODBC] ERROR: state IM002, code 0, message [Microsoft][ODBC Driver
> Manager] Data source name not found and no default driver specified
> 2: ODBC connection failed in: odbcDriverConnect(st, ...)
>
> Anyone can help me sort it out? Many thanks.
>
> --
> View this message in context:
> http://www.nabble.com/Import--data-from-Access-tf3847342.html#a10896743
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
WenSui Liu
A lousy statistician who happens to know a little programming
(http://spaces.msn.com/statcompute/blog)

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Excel data into R

2007-05-14 Thread Wensui Liu

library(RODBC);

# 1. READ DATA FROM EXCEL INTO R

xlsConnect<-odbcConnectExcel("C:\\temp\\demo.xls");
demo<-sqlFetch(xlsConnect, "Sheet1");
odbcClose(xlsConnect);
rm(demo);

On 5/12/07, Ozlem Ipekci <[EMAIL PROTECTED]> wrote:
> Hello to all,
> How can I make R read the data from an Excel sheet?
> thanks,
> ozlem
>
> [[alternative HTML version deleted]]
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
WenSui Liu
A lousy statistician who happens to know a little programming
(http://spaces.msn.com/statcompute/blog)

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Read SAS data into R

2007-05-11 Thread Wensui Liu

not necessarily, if R can access SAS data through SAS ODBC driver. I
do so for *.sav data all the time.

On 5/11/07, Frank E Harrell Jr <[EMAIL PROTECTED]> wrote:
> kseefeld wrote:
> > Kim's EZ Recipe for….
> >
> > SAS TO R, perfectly formatted table with no loss of data
> >
> > • In SAS: Export SAS DB as access db
> > • In R go to directory where access db is stored
> > • Use package RODBC
> >
> > #R code
> > #Create connection object (Can set up DSN but I'm too lazy to)
> > c<-odbcConnectAccess("x.mdb")
> > #Create table object, store db in object
> > x<-sqlFetch(c, "test")
> > #Close connection object
> > odbcClose(c)
> >
> >
>
> That doesn't help people who don't have SAS.
>
> Note that an upcoming release of the Hmisc package has a new Access
> import function for users who have access to the mdbtools package on
> their operating system (e.g., linux).
>
> --
> Frank E Harrell Jr   Professor and Chair   School of Medicine
>   Department of Biostatistics   Vanderbilt University
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
WenSui Liu
A lousy statistician who happens to know a little programming
(http://spaces.msn.com/statcompute/blog)

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Read SAS data into R

2007-05-10 Thread Wensui Liu

see foreign package.
but personally, i think it might be better to transfer through csv or db.

On 5/10/07, AbouEl-Makarim Aboueissa <[EMAIL PROTECTED]> wrote:
> Dear ALL:
>
> Could you please let me know how to read SAS data file into R.
>
> Thank you so much for your helps.
>
> Regards;
>
> Abou
>
>
> ==
> AbouEl-Makarim Aboueissa, Ph.D.
> Assistant Professor of Statistics
> Department of Mathematics & Statistics
> University of Southern Maine
> 96 Falmouth Street
> P.O. Box 9300
> Portland, ME 04104-9300
>
> Tel: (207) 228-8389
> Email: [EMAIL PROTECTED]
>   [EMAIL PROTECTED]
> Office: 301C Payson Smith
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
WenSui Liu
A lousy statistician who happens to know a little programming
(http://spaces.msn.com/statcompute/blog)

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Neural Nets (nnet) - evaluating success rate of predictions

2007-05-07 Thread Wensui Liu

well, how to do you know which ones are the best out of several hundreds?
I will average all results out of several hundreds.

On 5/7/07, hadley wickham <[EMAIL PROTECTED]> wrote:
> On 5/6/07, nathaniel Grey <[EMAIL PROTECTED]> wrote:
> > Hello R-Users,
> >
> > I have been using (nnet) by Ripley  to train a neural net on a test 
> > dataset, I have obtained predictions for a validtion dataset using:
> >
> > PP<-predict(nnetobject,validationdata)
> >
> > Using PP I can find the -2 log likelihood for the validation datset.
> >
> > However what I really want to know is how well my nueral net is doing at 
> > classifying my binary output variable. I am new to R and I can't figure out 
> > how you can assess the success rates of predictions.
> >
>
> table(PP, binaryvariable)
> should get you started.
>
> Also if you're using nnet with random starts, I strongly suggest
> taking the best out of several hundred (or maybe thousand) trials - it
> makes a big difference!
>
> Hadley
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
WenSui Liu
A lousy statistician who happens to know a little programming
(http://spaces.msn.com/statcompute/blog)

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Reasons to Use R

2007-04-11 Thread Wensui Liu

I think the reason that stata is fast is because it only keeps 1 work
table in ram. if you just keep 1 data frame in R, it will run fast
too. But ...

On 4/11/07, Robert Duval <[EMAIL PROTECTED]> wrote:
> So I guess my question is...
>
> Is there any hope of R being modified on its core in order to handle
> more graciously large datasets? (You've mentioned SAS and SPSS, I'd
> add Stata to the list).
>
> Or should we (the users of large datasets) expect to keep on working
> with the present tools for the time to come?
>
> robert
>
> On 4/11/07, Marc Schwartz <[EMAIL PROTECTED]> wrote:
> > On Wed, 2007-04-11 at 11:26 -0500, Marc Schwartz wrote:
> > > On Wed, 2007-04-11 at 17:56 +0200, Bi-Info
> > > (http://members.home.nl/bi-info) wrote:
> > > > I certainly have that idea too. SPSS functions in a way the same,
> > > > although it specialises in PC applications. Memory addition to a PC is
> > > > not a very expensive thing these days. On my first AT some extra memory
> > > > cost 300 dollars or more. These days you get extra memory with a package
> > > > of marshmellows or chocolate bars if you need it.
> > > > All computations on a computer are discrete steps in a way, but I've
> > > > heard that SAS computations are split up in strictly divided steps. That
> > > > also makes procedures "attachable" I've been told, and interchangable.
> > > > Different procedures can use the same code which alternatively is
> > > > cheaper in memory usages or disk usage (the old days...). That makes SAS
> > > > by the way a complicated machine to build because procedures who are
> > > > split up into numerous fragments which make complicated bookkeeping. If
> > > > you do it that way, I've been told, you can do a lot of computations
> > > > with very little memory. One guy actually computed quite complicated
> > > > models with "only 32MB or less", which wasn't very much for "his type of
> > > > calculations". Which means that SAS is efficient in memory handling I
> > > > think. It's not very efficient in dollar handling... I estimate.
> > > >
> > > > Wilfred
> > >
> > > 
> > >
> > > OhSAS is quite efficient in dollar handling, at least when it comes
> > > to the annual commercial licenses...along the same lines as the
> > > purported efficiency of the U.S. income tax system:
> > >
> > >   "How much money do you have?  Send it in..."
> > >
> > > There is a reason why SAS is the largest privately held software company
> > > in the world and it is not due to the academic licensing structure,
> > > which constitutes only about 12% of their revenue, based upon their
> > > public figures.
> >
> > Hmmm..here is a classic example of the problems of reading pie
> > charts.
> >
> > The figure I quoted above, which is from reading the 2005 SAS Annual
> > Report on their web site (such as it is for a private company) comes
> > from a 3D exploded pie chart (ick...).
> >
> > The pie chart uses 3 shades of grey and 5 shades of blue to
> > differentiate 8 market segments and their percentages of total worldwide
> > revenue.
> >
> > I mis-read the 'shade of grey' allocated to Education as being 12%
> > (actually 11.7%).
> >
> > A re-read of the chart, zooming in close on the pie in a PDF reader,
> > appears to actually show that Education is but 1.8% of their annual
> > worldwide revenue.
> >
> > Government based installations, which are presumably the other notable
> > market segment in which substantially discounted licenses are provided,
> > is 14.6%.
> >
> > The report is available here for anyone else curious:
> >
> >   http://www.sas.com/corporate/report05/annualreport05.pdf
> >
> > Somebody needs to send SAS a copy of Tufte or Cleveland.
> >
> > I have to go and rest my eyes now...  ;-)
> >
> > Regards,
> >
> > Marc
> >
> > __
> > R-help@stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
WenSui Liu
A lousy statistician who happens to know a little programming
(http://spaces.msn.com/statcompute/blog)

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Reasons to Use R

2007-04-10 Thread Wensui Liu

Greg,
As far as I understand, SAS is more efficient handling large data
probably than S+/R. Do you have any idea why?

On 4/10/07, Greg Snow <[EMAIL PROTECTED]> wrote:
> > -Original Message-
> > From: [EMAIL PROTECTED]
> > [mailto:[EMAIL PROTECTED] On Behalf Of
> > Bi-Info (http://members.home.nl/bi-info)
> > Sent: Monday, April 09, 2007 4:23 PM
> > To: Gabor Grothendieck
> > Cc: Lorenzo Isella; r-help@stat.math.ethz.ch
> > Subject: Re: [R] Reasons to Use R
>
> [snip]
>
> > So what's the big deal about S using files instead of memory
> > like R. I don't get the point. Isn't there enough swap space
> > for S? (Who cares
> > anyway: it works, isn't it?) Or are there any problems with S
> > and large datasets? I don't get it. You use them, Greg. So
> > you might discuss that issue.
> >
> > Wilfred
> >
> >
>
> This is my understanding of the issue (not anything official).
>
> If you use up all the memory while in R, then the OS will start swapping
> memory to disk, but the OS does not know what parts of memory correspond
> to which objects, so it is entirely possible that the chunk swapped to
> disk contains parts of different data objects, so when you need one of
> those objects again, everything needs to be swapped back in.  This is
> very inefficient.
>
> S-PLUS occasionally runs into the same problem, but since it does some
> of its own swapping to disk it can be more efficient by swapping single
> data objects (data frames, etc.).  Also, since S-PLUS is already saving
> everything to disk, it does not actually need to do a full swap, it can
> just look and see that a particular data frame has not been used for a
> while, know that it is already saved on the disk, and unload it from
> memory without having to write it to disk first.
>
> The g.data package for R has some of this functionality of keeping data
> on the disk until needed.
>
> The better approach for large data sets is to only have some of the data
> in memory at a time and to automatically read just the parts that you
> need.  So for big datasets it is recommended to have the actual data
> stored in a database and use one of the database connection packages to
> only read in the subset that you need.  The SQLiteDF package for R is
> working on automating this process for R.  There are also the bigdata
> module for S-PLUS and the biglm package for R have ways of doing some of
> the common analyses using chunks of data at a time.  This idea is not
> new.  There was a program in the late 1970s and 80s called Rummage by
> Del Scott (I guess technically it still exists, I have a copy on a 5.25"
> floppy somewhere) that used the approach of specify the model you wanted
> to fit first, then specify the data file.  Rummage would then figure out
> which sufficient statistics were needed and read the data in chunks,
> compute the sufficient statistics on the fly, and not keep more than a
> couple of lines of the data in memory at once.  Unfortunately it did not
> have much of a user interface, so when memory was cheap and datasets
> only medium sized it did not compete well, I guess it was just a bit too
> ahead of its time.
>
> Hope this helps,
>
>
>
> --
> Gregory (Greg) L. Snow Ph.D.
> Statistical Data Center
> Intermountain Healthcare
> [EMAIL PROTECTED]
> (801) 408-8111
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
WenSui Liu
A lousy statistician who happens to know a little programming
(http://spaces.msn.com/statcompute/blog)

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Reasons to Use R [Broadcast]

2007-04-09 Thread Wensui Liu

Andy,
I totally agree with you. Money should be spent on the people working
hard instead of on the fancy software. But in real life, it is the
opposite. ^_^.

On 4/9/07, Liaw, Andy <[EMAIL PROTECTED]> wrote:
> I've probably been away from SAS for too long... we've recently tried to
> get SAS on our 64-bit Linux boxes (because SAS on PC is not sufficient
> for some of my colleagues who need it).  I was shocked by the quote for
> our 28-core Scyld cluster--- the annual fee was a few times the total
> cost of our hardware.  We ended up buying a new quad 3GHz Opterons box
> with 32GB ram just so that the fee for SAS on such a box would be more
> tolerable.  It just boggles my mind that the right to use SAS for a year
> is about the price of a nice four-bedroom house (near SAS Institute!).
> I don't understand people who rather pay that kind of price for the
> software, instead of spending the money on state-of-the-art hardware and
> save more than a bundle.
>
> Just my $0.02...
> Andy
>
> From: Jorge Cornejo-Donoso
> >
> > I have a Dell with 2 Intel XEON 3.0 procesors and 2GB of ram
> > The problem is the DB size.
> >
> > -Mensaje original-
> > De: Gabor Grothendieck [mailto:[EMAIL PROTECTED]
> > Enviado el: Lunes, 09 de Abril de 2007 11:28
> > Para: Jorge Cornejo-Donoso
> > CC: r-help@stat.math.ethz.ch
> > Asunto: Re: [R] Reasons to Use R
> >
> > Have you tried 64 bit machines with larger memory or do you
> > mean that you can't use R on your current machines?
> >
> > Also have you tried S-Plus?  Will that work for you? The
> > transition from that to R would be less than from SAS to R.
> >
> > On 4/9/07, Jorge Cornejo-Donoso <[EMAIL PROTECTED]> wrote:
> > > tha s9ze of db is an issue with R. We are still using SAS because R
> > > can't handle own db, and of couse we don't want to sacrify
> > resolution,
> > > because the data collection is expensive (at least in fisheries and
> > > oceagraphy), so.. I think that R need to improve the use of big DBs.
> > > Now I only can use R for graph preparation and some data
> > analisis, but
> > > we can't do the main work on R, abd that is really sad.
> > >
> >
> > __
> > R-help@stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
> >
> >
>
>
> --
> Notice:  This e-mail message, together with any attachments,...{{dropped}}
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
WenSui Liu
A lousy statistician who happens to know a little programming
(http://spaces.msn.com/statcompute/blog)

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to choose the df when using GAM function?

2007-04-02 Thread Wensui Liu

Assumed you are using the gam package developed by Hastie, you might
use step.gam() to do so.

On 4/2/07, Jin Huang <[EMAIL PROTECTED]> wrote:
> Dear all,
>
>   When using GAM function in R, we need to specify the degree of freedom for 
> the smooth function (i.e. s=(x, df=#)). I am wondering how to choose an 
> appropriate df.
>
>   Thanks a lot,
>   Jin
>   
>   North Carolina State University
>   USA
>
>
> -
>
>
> [[alternative HTML version deleted]]
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
WenSui Liu
A lousy statistician who happens to know a little programming
(http://spaces.msn.com/statcompute/blog)

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Problem dropping rows based on values in a column

2007-03-25 Thread Wensui Liu

Sorry, John
Marc's method is correct.

On 3/25/07, John Sorkin <[EMAIL PROTECTED]> wrote:
> I am trying to drop rows of a dataframe based on values of the column PID, 
> but my strategy is not working. I hope someoen can tell me what I am doing 
> incorrectly.
>
>
> # Values of PID column
> > jdata[,"PID"]
>  [1] 16608 16613 16355 16378 16371 16280 16211 16169 16025 11595 15883 15682 
> 15617 15615 15212 14862 16539
> [18] 12063 16755 16720 16400 16257 16209 16200 16144 11598 13594 15419 15589 
> 15982 15825 15834 15491 15822
> [35] 15803 15795 10202 15680 15587 15552 15588 15375 15492 15568 15196 10217 
> 15396 15477 15446 15374 14092
> [52] 14033 15141 14953 15473 10424 13445 14854 10481 14793 14744 14772
>
> #Prepare to drop last two rows, rows that ahve 14744 and 14772 in the PID 
> column
> > delete<-c(14772,14744)
>
> #Try to delete last two rows, but as you will see, I am not able to drop the 
> last two rows.
> > jdata[jdata$PID!=delete,"PID"]
>  [1] 16608 16613 16355 16378 16371 16280 16211 16169 16025 11595 15883 15682 
> 15617 15615 15212 14862 16539
> [18] 12063 16755 16720 16400 16257 16209 16200 16144 11598 13594 15419 15589 
> 15982 15825 15834 15491 15822
> [35] 15803 15795 10202 15680 15587 15552 15588 15375 15492 15568 15196 10217 
> 15396 15477 15446 15374 14092
> [52] 14033 15141 14953 15473 10424 13445 14854 10481 14793 14744 14772
> >
>
>
> Thanks,
> John
>
> John Sorkin M.D., Ph.D.
> Chief, Biostatistics and Informatics
> Baltimore VA Medical Center GRECC,
> University of Maryland School of Medicine Claude D. Pepper OAIC,
> University of Maryland Clinical Nutrition Research Unit, and
> Baltimore VA Center Stroke of Excellence
>
> University of Maryland School of Medicine
> Division of Gerontology
> Baltimore VA Medical Center
> 10 North Greene Street
> GRECC (BT/18/GR)
> Baltimore, MD 21201-1524
>
> (Phone) 410-605-7119
> (Fax) 410-605-7913 (Please call phone number above prior to faxing)
> [EMAIL PROTECTED]
>
> Confidentiality Statement:
> This email message, including any attachments, is for the so...{{dropped}}
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
WenSui Liu
A lousy statistician who happens to know a little programming
(http://spaces.msn.com/statcompute/blog)

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Problem dropping rows based on values in a column

2007-03-25 Thread Wensui Liu

> jdata
PID
1 14854
2 10481
3 14793
4 14744
5 14772
> jdata[jdata[1] != delete, 1]
[1] 14854 10481 14793


On 3/25/07, John Sorkin <[EMAIL PROTECTED]> wrote:
> I am trying to drop rows of a dataframe based on values of the column PID, 
> but my strategy is not working. I hope someoen can tell me what I am doing 
> incorrectly.
>
>
> # Values of PID column
> > jdata[,"PID"]
>  [1] 16608 16613 16355 16378 16371 16280 16211 16169 16025 11595 15883 15682 
> 15617 15615 15212 14862 16539
> [18] 12063 16755 16720 16400 16257 16209 16200 16144 11598 13594 15419 15589 
> 15982 15825 15834 15491 15822
> [35] 15803 15795 10202 15680 15587 15552 15588 15375 15492 15568 15196 10217 
> 15396 15477 15446 15374 14092
> [52] 14033 15141 14953 15473 10424 13445 14854 10481 14793 14744 14772
>
> #Prepare to drop last two rows, rows that ahve 14744 and 14772 in the PID 
> column
> > delete<-c(14772,14744)
>
> #Try to delete last two rows, but as you will see, I am not able to drop the 
> last two rows.
> > jdata[jdata$PID!=delete,"PID"]
>  [1] 16608 16613 16355 16378 16371 16280 16211 16169 16025 11595 15883 15682 
> 15617 15615 15212 14862 16539
> [18] 12063 16755 16720 16400 16257 16209 16200 16144 11598 13594 15419 15589 
> 15982 15825 15834 15491 15822
> [35] 15803 15795 10202 15680 15587 15552 15588 15375 15492 15568 15196 10217 
> 15396 15477 15446 15374 14092
> [52] 14033 15141 14953 15473 10424 13445 14854 10481 14793 14744 14772
> >
>
>
> Thanks,
> John
>
> John Sorkin M.D., Ph.D.
> Chief, Biostatistics and Informatics
> Baltimore VA Medical Center GRECC,
> University of Maryland School of Medicine Claude D. Pepper OAIC,
> University of Maryland Clinical Nutrition Research Unit, and
> Baltimore VA Center Stroke of Excellence
>
> University of Maryland School of Medicine
> Division of Gerontology
> Baltimore VA Medical Center
> 10 North Greene Street
> GRECC (BT/18/GR)
> Baltimore, MD 21201-1524
>
> (Phone) 410-605-7119
> (Fax) 410-605-7913 (Please call phone number above prior to faxing)
> [EMAIL PROTECTED]
>
> Confidentiality Statement:
> This email message, including any attachments, is for the so...{{dropped}}
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
WenSui Liu
A lousy statistician who happens to know a little programming
(http://spaces.msn.com/statcompute/blog)

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] any way to append a table in SQL server

2007-03-20 Thread Wensui Liu

Dear Lister,
Is there an interface in R with SQL server that allows me to append
records to table in the DB? Might I do that using RODBC?
Thanks a lot.

-- 
WenSui Liu
A lousy statistician who happens to know a little programming
(http://spaces.msn.com/statcompute/blog)

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] any simple way to put comment with multiple lines

2007-03-18 Thread Wensui Liu

Dear Lister,
I understand I can put '#' to put comment line by line. But is there a
way to put comment with multiple lines without having to put '#' on
the every line?
Thanks.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] plot.randomForest default mtry values

2007-03-15 Thread Wensui Liu

Joe,
here is a piece of junk copied from my blog, showing how to use mtry and hth.
library(MASS);
library(randomForest);
data(Boston);

set.seed(2007);

# SEARCH FOR BEST VALUE OF MTRY FOR RANDOM FORESTS
mtry <- tuneRF(Boston[, -14], Boston[, 14], mtryStart = 1,
   stepFactor = 2, ntreeTry = 500, improve = 0.01);
best.m <- mtry[mtry[, 2] == min(mtry[, 2]), 1];

# FIT A RF MODEL
rf <- randomForest(medv~., data = Boston, mtry = best.m, ntree = 1000,
importance = TRUE);

# EXTRACT VARIABLE IMPORTANCE
imp.tmp <- importance(rf, type = 1);
rf.imp <- imp.tmp[order(imp.tmp[, 1], decreasing = TRUE),];
par(mar = c(3, 0, 4, 0));
barplot(rf.imp, col = gray(0:(ncol(Boston) - 1)/(ncol(Boston) - 1)),
names.arg = names(rf.imp), yaxt = "n", cex.names = 1);
title(main = list("Importance Rank of Predictors", font = 4, cex = 1.5));

# PLOT PARTIAL DEPENDENCE OF EACH PREDICTOR
par(mfrow = c(3, 5), mar = c(2, 2, 2, 2), pty = "s");
for (i in 1:(ncol(Boston) - 1))
  {
partialPlot(rf, Boston, names(Boston)[i], xlab = names(Boston)[i],
main = NULL);
  }

On 3/15/07, Joseph Retzer <[EMAIL PROTECTED]> wrote:
> When using the plot.randomForest method, 3 error series (by number of trees) 
> are plotted. I suspect they are associated with the 3 default values of mtry 
> that are used, for example, in the tuneRF method but I'm not sure. Could 
> someone confirm?
>
> Also, is it possible to force different values of mtry to be used when 
> creating the plots? I specified them explicitly in the randomForest statement 
> but it did not seem to have an effect.
> Many thanks,
> Joe Retzer
>
> [[alternative HTML version deleted]]
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
WenSui Liu
A lousy statistician who happens to know a little programming
(http://spaces.msn.com/statcompute/blog)

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] read.table for a subset of data

2007-03-11 Thread Wensui Liu

Jim,

Glad to see your reply.

Refering to your email, what if I just want to read 10 rows from a csv
table with 10 rows? Do you think it a waste of resource to read
the whole table in?
Anything thought?

wensui

On 3/11/07, jim holtman <[EMAIL PROTECTED]> wrote:
> Why cann't you read in the whole data set and then create the subsets?  This
> is easily done with 'split'.  If the data is too large, then consider a data
> base.
>
> On 3/11/07, gnv shqp <[EMAIL PROTECTED]> wrote:
> >
> > Hi R-experts,
> >
> > I have data from four conditions of an experiment.  I tried to create four
> > subsets of the data with read.table, for example,
> > read.table("Experiment.csv",subset=(condition=="1"))
> > .  I found a similar post in the archive, but the answer to that post was
> > no.   Any  new ideas about  reading subsets of data with read.table?
> >
> > Thanks!
> >
> > Feng
> >
> >[[alternative HTML version deleted]]
> >
> > __
> > R-help@stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
>
>
> --
> Jim Holtman
> Cincinnati, OH
> +1 513 646 9390
>
> What is the problem you are trying to solve?
>
> [[alternative HTML version deleted]]
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
WenSui Liu
A lousy statistician who happens to know a little programming
(http://spaces.msn.com/statcompute/blog)

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] read.table for a subset of data

2007-03-11 Thread Wensui Liu

as far as I've know, I don't think you can do so with read.table. But
I am also thinking about RODBC and wondering if you could assign a DSN
to your .csv file and then use sql to fetch the subset.

On 3/11/07, gnv shqp <[EMAIL PROTECTED]> wrote:
> Hi R-experts,
>
> I have data from four conditions of an experiment.  I tried to create four
> subsets of the data with read.table, for example,
> read.table("Experiment.csv",subset=(condition=="1"))
> .  I found a similar post in the archive, but the answer to that post was
> no.   Any  new ideas about  reading subsets of data with read.table?
>
> Thanks!
>
> Feng
>
> [[alternative HTML version deleted]]
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
WenSui Liu
A lousy statistician who happens to know a little programming
(http://spaces.msn.com/statcompute/blog)

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] use nnet

2007-03-09 Thread Wensui Liu

no, it is called regression. ^_^.

On 3/9/07, Aimin Yan <[EMAIL PROTECTED]> wrote:
> thank you very much.
> I have a another question about nnet
> if I set size=0, and skip=TRUE.
> Then this network has just input layer and out layer.
> Is this also called perceptron network?
>
> thanks,
>
> Aimin Yan
>
>
> At 12:39 PM 3/9/2007, Wensui Liu wrote:
> >AM,
> >Sorry. please ignore the top box in the code. It is not actually a cv
> >validation but just a simple split-sample validation.
> >sorry for confusion.
> >
> >On 3/9/07, Wensui Liu <[EMAIL PROTECTED]> wrote:
> >>AM,
> >>I have a pieice of junk on my blog. Here it is.
> >>#
> >># USE CROSS-VALIDATION TO DO A GRID-SEARCH FOR  #
> >># THE OPTIMAL SETTINGS (WEIGHT DECAY AND NUMBER #
> >># OF HIDDEN UNITS) OF NEURAL NETS   #
> >>#
> >>
> >>library(nnet);
> >>library(MASS);
> >>data(Boston);
> >>X <- I(as.matrix(Boston[-14]));
> >># STANDARDIZE PREDICTORS
> >>st.X <- scale(X);
> >>Y <- I(as.matrix(Boston[14]));
> >>boston <- data.frame(X = st.X, Y);
> >>
> >># DIVIDE DATA INTO TESTING AND TRAINING SETS
> >>set.seed(2005);
> >>test.rows <- sample(1:nrow(boston), 100);
> >>test.set <- boston[test.rows, ];
> >>train.set <- boston[-test.rows, ];
> >>
> >># INITIATE A NULL TABLE
> >>sse.table <- NULL;
> >>
> >># SEARCH FOR OPTIMAL WEIGHT DECAY
> >># RANGE OF WEIGHT DECAYS SUGGESTED BY B. RIPLEY
> >>for (w in c(0.0001, 0.001, 0.01))
> >>{
> >>   # SEARCH FOR OPTIMAL NUMBER OF HIDDEN UNITS
> >>   for (n in 1:10)
> >>   {
> >> # UNITIATE A NULL VECTOR
> >> sse <- NULL;
> >> # FOR EACH SETTING, RUN NEURAL NET MULTIPLE TIMES
> >> for (i in 1:10)
> >> {
> >>   # INITIATE THE RANDOM STATE FOR EACH NET
> >>   set.seed(i);
> >>   # TRAIN NEURAL NETS
> >>   net <- nnet(Y~X, size = n, data = train.set, rang = 0.1,
> >>linout = TRUE, maxit = 1, decay = w,
> >>skip = FALSE, trace = FALSE);
> >>   # CALCULATE SSE FOR TESTING SET
> >>   test.sse <- sum((test.set$Y - predict(net, test.set))^2);
> >>   # APPEND EACH SSE TO A VECTOR
> >>   if (i == 1) sse <- test.sse else sse <- rbind(sse, test.sse);
> >> }
> >> # APPEND AVERAGED SSE WITH RELATED PARAMETERS TO A TABLE
> >> sse.table <- rbind(sse.table, c(WT = w, UNIT = n, SSE = mean(sse)));
> >>   }
> >>}
> >># PRINT OUT THE RESULT
> >>print(sse.table);http://statcompute.spaces.live.com/Blog/cns!39C8032DBD1321B7!290.entry
> >>
> >>
> >>On 3/9/07, Aimin Yan <[EMAIL PROTECTED]> wrote:
> >> > I want to adjust weight decay and number of hidden units for nnet by
> >> > a loop like
> >> > for(decay)
> >> > {
> >> >   for(number of unit)
> >> >{
> >> > for(#run)
> >> >  {model<-nnet()
> >> >test.error<-
> >> >  }
> >> >}
> >> > }
> >> >
> >> > for example:
> >> > I set decay=0.1, size=3, maxit=200, for this set I run 10 times, and
> >> > calculate test error
> >> >
> >> > after that I want to get a matrix like this
> >> >
> >> > decay  size   maxit  #run  test_error
> >> > 0.13200   1   1.2
> >> > 0.13200   2   1.1
> >> > 0.13200   3   1.0
> >> > 0.13200   4   3.4
> >> > 0.13200   5..
> >> > 0.13200   6 ..
> >> > 0.13200   7   ..
> >> > 0.13200   8  ..
> >> > 0.13200   9   ..
> >> > 0.13200   10   ..
> >> > 0.23200   1   1.2
> >> > 0.2    3200   2   1.1
> >> > 0.23200   3   1.0
> >> > 0.23    200   4   3.4
> >> > 0.23200   5..
> >> > 0.23200   6 ..
> >> > 0.2

Re: [R] Extracting text from a character string

2007-03-09 Thread Wensui Liu

actually, I am thinking of strsplit().

On 3/9/07, Shawn Way <[EMAIL PROTECTED]> wrote:
> I have a set of character strings like below:
>
>   > data3[1]
> [1] "CB01_0171_03-27-2002-(Sample 26609)-(126)"
> >
>
>   I am trying to extract the text 03-27-2002 and convert this into a date for 
> the same record.  I keep looking at the grep function, however I cannot quite 
> get it to work.
>
>   grep("\d\d-\d\d-\d\d\d\d",data3[1],perl=TRUE,value=TRUE)
>
>   Any hints?
>
>   Shawn Way
>
>
> -
> Sucker-punch spam with award-winning protection.
>
> [[alternative HTML version deleted]]
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
WenSui Liu
A lousy statistician who happens to know a little programming
(http://spaces.msn.com/statcompute/blog)

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] use nnet

2007-03-09 Thread Wensui Liu

AM,
Sorry. please ignore the top box in the code. It is not actually a cv
validation but just a simple split-sample validation.
sorry for confusion.

On 3/9/07, Wensui Liu <[EMAIL PROTECTED]> wrote:
> AM,
> I have a pieice of junk on my blog. Here it is.
> #
> # USE CROSS-VALIDATION TO DO A GRID-SEARCH FOR  #
> # THE OPTIMAL SETTINGS (WEIGHT DECAY AND NUMBER #
> # OF HIDDEN UNITS) OF NEURAL NETS   #
> #
>
> library(nnet);
> library(MASS);
> data(Boston);
> X <- I(as.matrix(Boston[-14]));
> # STANDARDIZE PREDICTORS
> st.X <- scale(X);
> Y <- I(as.matrix(Boston[14]));
> boston <- data.frame(X = st.X, Y);
>
> # DIVIDE DATA INTO TESTING AND TRAINING SETS
> set.seed(2005);
> test.rows <- sample(1:nrow(boston), 100);
> test.set <- boston[test.rows, ];
> train.set <- boston[-test.rows, ];
>
> # INITIATE A NULL TABLE
> sse.table <- NULL;
>
> # SEARCH FOR OPTIMAL WEIGHT DECAY
> # RANGE OF WEIGHT DECAYS SUGGESTED BY B. RIPLEY
> for (w in c(0.0001, 0.001, 0.01))
> {
>   # SEARCH FOR OPTIMAL NUMBER OF HIDDEN UNITS
>   for (n in 1:10)
>   {
> # UNITIATE A NULL VECTOR
> sse <- NULL;
> # FOR EACH SETTING, RUN NEURAL NET MULTIPLE TIMES
> for (i in 1:10)
> {
>   # INITIATE THE RANDOM STATE FOR EACH NET
>   set.seed(i);
>   # TRAIN NEURAL NETS
>   net <- nnet(Y~X, size = n, data = train.set, rang = 0.1,
>linout = TRUE, maxit = 1, decay = w,
>skip = FALSE, trace = FALSE);
>   # CALCULATE SSE FOR TESTING SET
>   test.sse <- sum((test.set$Y - predict(net, test.set))^2);
>   # APPEND EACH SSE TO A VECTOR
>   if (i == 1) sse <- test.sse else sse <- rbind(sse, test.sse);
> }
> # APPEND AVERAGED SSE WITH RELATED PARAMETERS TO A TABLE
> sse.table <- rbind(sse.table, c(WT = w, UNIT = n, SSE = mean(sse)));
>   }
> }
> # PRINT OUT THE RESULT
> print(sse.table);http://statcompute.spaces.live.com/Blog/cns!39C8032DBD1321B7!290.entry
>
>
> On 3/9/07, Aimin Yan <[EMAIL PROTECTED]> wrote:
> > I want to adjust weight decay and number of hidden units for nnet by
> > a loop like
> > for(decay)
> > {
> >   for(number of unit)
> >{
> > for(#run)
> >  {model<-nnet()
> >test.error<-
> >  }
> >}
> > }
> >
> > for example:
> > I set decay=0.1, size=3, maxit=200, for this set I run 10 times, and
> > calculate test error
> >
> > after that I want to get a matrix like this
> >
> > decay  size   maxit  #run  test_error
> > 0.13200   1   1.2
> > 0.13200   2   1.1
> > 0.13200   3   1.0
> > 0.13200   4   3.4
> > 0.13200   5..
> > 0.13200   6 ..
> > 0.13200   7   ..
> > 0.13200   8  ..
> > 0.13200   9   ..
> > 0.13200   10   ..
> > 0.23200   1   1.2
> > 0.23200   2   1.1
> > 0.23200   3   1.0
> > 0.23200   4   3.4
> > 0.23200   5..
> > 0.23200   6 ..
> > 0.23200   7   ..
> > 0.23200   8  ..
> > 0.23200   9   ..
> > 0.23200   10   ..
> >
> > I am not sure if this is correct way to do this?
> > Does anyone tune these parameters like this before?
> > thanks,
> >
> > Aimin
> >
> > __
> > R-help@stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
>
> --
> WenSui Liu
> A lousy statistician who happens to know a little programming
> (http://spaces.msn.com/statcompute/blog)
>


-- 
WenSui Liu
A lousy statistician who happens to know a little programming
(http://spaces.msn.com/statcompute/blog)

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] use nnet

2007-03-09 Thread Wensui Liu

AM,
I have a pieice of junk on my blog. Here it is.
#
# USE CROSS-VALIDATION TO DO A GRID-SEARCH FOR  #
# THE OPTIMAL SETTINGS (WEIGHT DECAY AND NUMBER #
# OF HIDDEN UNITS) OF NEURAL NETS   #
#

library(nnet);
library(MASS);
data(Boston);
X <- I(as.matrix(Boston[-14]));
# STANDARDIZE PREDICTORS
st.X <- scale(X);
Y <- I(as.matrix(Boston[14]));
boston <- data.frame(X = st.X, Y);

# DIVIDE DATA INTO TESTING AND TRAINING SETS
set.seed(2005);
test.rows <- sample(1:nrow(boston), 100);
test.set <- boston[test.rows, ];
train.set <- boston[-test.rows, ];

# INITIATE A NULL TABLE
sse.table <- NULL;

# SEARCH FOR OPTIMAL WEIGHT DECAY
# RANGE OF WEIGHT DECAYS SUGGESTED BY B. RIPLEY
for (w in c(0.0001, 0.001, 0.01))
{
  # SEARCH FOR OPTIMAL NUMBER OF HIDDEN UNITS
  for (n in 1:10)
  {
# UNITIATE A NULL VECTOR
sse <- NULL;
# FOR EACH SETTING, RUN NEURAL NET MULTIPLE TIMES
for (i in 1:10)
{
  # INITIATE THE RANDOM STATE FOR EACH NET
  set.seed(i);
  # TRAIN NEURAL NETS
  net <- nnet(Y~X, size = n, data = train.set, rang = 0.1,
   linout = TRUE, maxit = 1, decay = w,
   skip = FALSE, trace = FALSE);
  # CALCULATE SSE FOR TESTING SET
  test.sse <- sum((test.set$Y - predict(net, test.set))^2);
  # APPEND EACH SSE TO A VECTOR
  if (i == 1) sse <- test.sse else sse <- rbind(sse, test.sse);
}
# APPEND AVERAGED SSE WITH RELATED PARAMETERS TO A TABLE
sse.table <- rbind(sse.table, c(WT = w, UNIT = n, SSE = mean(sse)));
  }
}
# PRINT OUT THE RESULT
print(sse.table);http://statcompute.spaces.live.com/Blog/cns!39C8032DBD1321B7!290.entry


On 3/9/07, Aimin Yan <[EMAIL PROTECTED]> wrote:
> I want to adjust weight decay and number of hidden units for nnet by
> a loop like
> for(decay)
> {
>   for(number of unit)
>{
> for(#run)
>  {model<-nnet()
>test.error<-
>  }
>}
> }
>
> for example:
> I set decay=0.1, size=3, maxit=200, for this set I run 10 times, and
> calculate test error
>
> after that I want to get a matrix like this
>
> decay  size   maxit  #run  test_error
> 0.13200   1   1.2
> 0.13200   2   1.1
> 0.13200   3   1.0
> 0.13200   4   3.4
> 0.13200   5..
> 0.13200   6 ..
> 0.13200   7   ..
> 0.13200   8  ..
> 0.13200   9   ..
> 0.13200   10   ..
> 0.23200   1   1.2
> 0.23200   2   1.1
> 0.23200   3   1.0
> 0.23200   4   3.4
> 0.23200   5..
> 0.23200   6 ..
> 0.23200   7   ..
> 0.23200   8  ..
> 0.23200   9   ..
> 0.23200   10   ..
>
> I am not sure if this is correct way to do this?
> Does anyone tune these parameters like this before?
> thanks,
>
> Aimin
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
WenSui Liu
A lousy statistician who happens to know a little programming
(http://spaces.msn.com/statcompute/blog)

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Heteroskedastic Time Series

2007-03-05 Thread Wensui Liu

check fseris library.

On 3/5/07, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:
> Hi R-helpers,
>
> I'm new to time series modelling, but my requirement seems to fall just
> outside the capabilities of the arima function in R.  I'd like to fit an
> ARMA model where the variance of the disturbances is a function of some
> exogenous variable.  So something like:
>
> Y_t = a_0 + a_1 * Y_(t-1) +...+ a_p * Y_(t-p) + b_1 * e_(t-1) +...+ b_q *
> e_(t-q) + e_t,
>
> where
>
> e_t ~ N(0, sigma^2_t),
>
> and with the variance specified by something like
>
> sigma^2_t = exp(beta_t * X_t),
>
> where X_t is my exogenous variable.  I would be very grateful if somebody
> could point me in the direction of a library that could fit this (or a
> similar) model.
>
> Thanks,
>
> James Kirkby
> Actuarial Maths and Stats
> Heriot Watt University
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
WenSui Liu
A lousy statistician who happens to know a little programming
(http://spaces.msn.com/statcompute/blog)

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] [friday topic]: what exactly is statistical computing

2007-03-02 Thread Wensui Liu

Thanks for your insight, Roger.

Actually, my question is not related to R only.

statistical computing is a popular topic recently. However, when I
check its meaning on wikipedia/google, I couldn't find it.

another reason why I asked is related to myself. I am very interested
in this area and maintaining a blog in this topic. however, when asked
what 'statistical computing' is, I am not able to give a
well-verbalized answer.


On 3/2/07, Bos, Roger <[EMAIL PROTECTED]> wrote:
> This means it comes with substantial statistical routines built-in.  You
> could just as well use VBA or Java for your programming language, but
> with those you would have to write pretty much any stat routine you
> need.  With R, since it is a 'statistical computing' language, you know
> that most of what you need has already been programmed, tested(?), and
> is ready to use.
>
> I have seen you on this list for a while.  You already know all this.  I
> am not sure why you are asking this question.
>
> Roger
>
> -Original Message-----
> From: [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED] On Behalf Of Wensui Liu
> Sent: Friday, March 02, 2007 9:43 AM
> To: r-help@stat.math.ethz.ch
> Subject: [R] [friday topic]: what exactly is statistical computing
>
> Dear List,
> on www.r-project.org, the title says 'The R Project for Statistical
> Computing'.
>
> but what exactly is the definition of statistical computing?
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
> ** *
> This message is for the named person's use only. It may
> contain confidential, proprietary or legally privileged
> information. No right to confidential or privileged treatment
> of this message is waived or lost by any error in
> transmission. If you have received this message in error,
> please immediately notify the sender by e-mail,
> delete the message and all copies from your system and destroy
> any hard copies. You must not, directly or indirectly, use,
> disclose, distribute, print or copy any part of this message
> if you are not the intended recipient.
> **
>


-- 
WenSui Liu
A lousy statistician who happens to know a little programming
(http://spaces.msn.com/statcompute/blog)

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] [friday topic]: what exactly is statistical computing

2007-03-02 Thread Wensui Liu

Dear List,
on www.r-project.org, the title says 'The R Project for Statistical Computing'.

but what exactly is the definition of statistical computing?

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Another newbie book recommandation question

2007-03-01 Thread Wensui Liu

for the size of your data file, I think R can handle it. of course, it
also depends on your hardware. however, it might not be a good idea to
do heavy data manipulation work in R.

stata has very good routine for survey analysis. i am not sure if R is
as good as stata in terms of survey analysis.

S programming by the same authors as MASS might be a good reference
good you would like it on your shelf.

On 3/1/07, Zembower, Kevin <[EMAIL PROTECTED]> wrote:
> I hope this question is sufficiently different from the other requests
> for book recommendations that it's not repetitious. If not, I apologize
> in advance.
>
> I'm curious what standard reference books working statisticians, or
> biostatisticians, have within easy reach of their desk. I'm a computer
> systems administrator, and have a two-foot bookshelf directory under my
> monitor that contains 13 paperback manuals that I refer to frequently,
> some once or twice a day. Are there standard reference works for
> statisticians that are used the same way? From reading this list, I'm
> guessing that one might be W. N. Venables and B. D. Ripley (2002),
> "Modern Applied Statistics with S. Fourth Edition", Springer, ISBN
> 0-387-95457-0. However, I'm not limiting this to books pertaining to R.
>
> On the other hand, maybe Google and other on-line sources, as well as
> interactive programs like R that can spit out numbers previously looked
> up in tables, have completely replaced the need for reference books. Is
> this the case today?
>
> I'm particularly interested in reference books that may be helpful in my
> organization's work. We typically deal with datasets from international
> Demographic and Health Surveys (DHS) similar to those available at
> http://www.measuredhs.com/aboutsurveys/search/search_survey_main.cfm?Srv
> yTp=type&listtypes=1. These typically contain 10,000+ respondents and
> can have up to 800 fields. We currently analyze these datasets using
> Stata.
>
> Thanks for taking the time to think about and respond to this question.
> I'll summarize the answers in a later post for the archive.
>
> -Kevin
>
> Kevin Zembower
> Internet Services Group manager
> Center for Communication Programs
> Bloomberg School of Public Health
> Johns Hopkins University
> 111 Market Place, Suite 310
> Baltimore, Maryland  21202
> 410-659-6139
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
WenSui Liu
A lousy statistician who happens to know a little programming
(http://spaces.msn.com/statcompute/blog)

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] question regression trees

2007-02-28 Thread Wensui Liu

with seeing more code and output, i guess your tree fails to grow.

On 2/28/07, Claudia Romero <[EMAIL PROTECTED]> wrote:
> Hello,
> This is my first time addressing such a big audience so apologies in
> advance in case I fail to formulate this question.
>
> I am working with 13 species of trees, and the data I have are:
>   1 continuous (phenolic concentration in xylem and in phloem) and 2
> categorical variables: lineage (3 subclades) and habitat (fire and non
> fire).
>
> I am trying to see how species can be splitted 'objectively' based on
> these variables. I tried to do a regression tree using the rpart
> library, but repeatedly got the following answer, even when I tried to
> run it using ONLY the categorical variables:
>
>  > plot(fit, compress=TRUE)
> Error in plot.rpart(fit, compress = TRUE) :
>  fit is not a tree, just a root
>
> Can anyone please help me think about this?
>
> Many thanks,
> claudia romero
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
WenSui Liu
A lousy statistician who happens to know a little programming
(http://spaces.msn.com/statcompute/blog)

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Help on GAM

2007-02-28 Thread Wensui Liu

which library are you using for gam?

On 2/28/07, Dacha Atienza <[EMAIL PROTECTED]> wrote:
> 1) I have a semiparametric model, like
> *Y~x1+s(x2)+s(x3)*
> When I rum gam package I only obtained the estimates and the statistics of
> the nonparametric part. How can I get the parametric part? Please could you
> give me the complete comand to do it.
>
> 2) How are the negative coefficients identified. I run different examples
> and I never got any negative parameters.
>
> Thank you,
>
> Dacha
>
> [[alternative HTML version deleted]]
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
WenSui Liu
A lousy statistician who happens to know a little programming
(http://spaces.msn.com/statcompute/blog)

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Datamining-package-?

2007-02-27 Thread Wensui Liu

what do you mean by data preprocessing? there are tons of R functions
that you can use to process data and do data mining.

On 2/27/07, j.joshua thomas <[EMAIL PROTECTED]> wrote:
> Dear Group,
>
> I am looking for a package that is going to help me on Data preprocessing
> methods in Datamining.
>
> Is there any package in R2.4.0 to support DM? or what is the suitable
> package that i can adopt do the work?
>
> Kindly need your assistance.
>
> Thanks & Regards
>
>
>
>
> JJ
> ---
>
> --
> Lecturer J. Joshua Thomas
> KDU College Penang Campus
> Research Student,
> University Sains Malaysia
>
> [[alternative HTML version deleted]]
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
WenSui Liu
A lousy statistician who happens to know a little programming
(http://spaces.msn.com/statcompute/blog)

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] rpart minimum sample size

2007-02-27 Thread Wensui Liu

amy,
without looking at your actual code, i would suggest you to take a
look at rpart.control()

On 2/27/07, Amy Uhrin <[EMAIL PROTECTED]> wrote:
> Is there an optimal / minimum sample size for attempting to construct a
> classification tree using /rpart/?
>
> I have 27 seagrass disturbance sites (boat groundings) that have been
> monitored for a number of years.  The monitoring protocol for each site
> is identical.  From the monitoring data, I am able to determine the
> level of recovery that each site has experienced.  Recovery is our
> categorical dependent variable with values of none, low, medium, high
> which are based upon percent seagrass regrowth into the injury over
> time.  I wish to be able to predict the level of recovery of future
> vessel grounding sites based upon a number of categorical / continuous
> predictor variables used here including (but not limited to) such
> parameters as:  sediment grain size, wave exposure, original size
> (volume) of the injury, injury age, injury location.
>
> When I run /rpart/, the data is split into only two terminal nodes based
> solely upon values of the original volume of each injury.  No other
> predictor variables are considered, even though I have included about
> six of them in the model.  When I remove volume from the model the same
> thing happens but with injury area - two terminal nodes are formed based
> upon area values and no other variables appear.  I was hoping that this
> was a programming issue, me being a newbie and all, but I really think
> I've got the code right.  Now I am beginning to wonder if my N is too
> small for this method?
>
> --
> Amy V. Uhrin, Research Ecologist
>
> NOAA, National Ocean Service
> Center for Coastal Fisheries and Habitat Research
> 101 Pivers Island Road
> Beaufort, NC 28516
> (252) 728-8778
> (252) 728-8784 (fax)
> [EMAIL PROTECTED]
>
> 
>  \!/ \!/   <:}><   \!/ \!/  >^<**>^<  \!/ \!/
>
>
> [[alternative HTML version deleted]]
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
WenSui Liu
A lousy statistician who happens to know a little programming
(http://spaces.msn.com/statcompute/blog)

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Partial whitening of time series?

2007-02-26 Thread Wensui Liu

andy,

if your model is Xt = 0.5 * Xt-1 + e, then it should have
Xt = 0.1 * Xt-1 + 0.4 * Xt-1 + e
(Xt - 0.1*Xt-1) = 0.4 * Xt-1 + e

so what you need to do is to substract part of lag from your series.
it is just my $0.02.

On 2/26/07, Andy Bunn <[EMAIL PROTECTED]> wrote:
> I have a time series with a one year lag, ar=0.5. The series has some
> interesting events that disappear when the series is whitened (i.e.,
> fitting an AR process and looking at the residuals). I'd like to remove
> the autocorrelation in stages to see the effect on the time series. Is
> there a way to specify the autocorrelation term while fitting an AR
> process?
>
> For instance, given the following:
>
> x <- arima.sim(model = list(order = c(1,0,0), ar = 0.5), n = 500,
> sd=0.25)
>
> Can I filter x in a way that the autocorrelation at lag one is 0.4, then
> 0.3, 0.2, 0.1, until I get to a clean series equivalent to:
>
> y <- arima(x, order = c(1,0,0))$resid
>
> Thanks in advance,
> Andy
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
WenSui Liu
A lousy statistician who happens to know a little programming
(http://spaces.msn.com/statcompute/blog)

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] If you had just one book on R to buy...

2007-02-25 Thread Wensui Liu

I have both handbook and MASS and think MASS is much better. But of
course, it also depends on how you want to use R or your previous
exposure to R.


On 2/25/07, Julien Barnier <[EMAIL PROTECTED]> wrote:
> Hi,
>
> I am starting a new job as a study analyst for a social science
> research unit. I would really like to use R as my main tool for data
> manipulation and analysis. So I'd like to ask you, if you had just one
> book on R to buy (or to keep), which one would it be ? I already
> bought the Handbook of Statistical Analysis Using R, but I'd like to
> have something more complete, both on the statistical point of view
> and on R usage.
>
> I thought that "Modern applied statistics with S-Plus" would be a good
> choice, but maybe some of you could have interesting suggestions ?
>
> Thanks in advance,
>
> --
> Julien
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
WenSui Liu
A lousy statistician who happens to know a little programming
(http://spaces.msn.com/statcompute/blog)

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Chi-Square test

2007-02-22 Thread Wensui Liu

?pchisq

On 2/21/07, Mohsen Jafarikia <[EMAIL PROTECTED]> wrote:
> Hello all,
>
> I am doing a Likelihood Ratio (LR) test in my simulation and I have a vector
> LR values (each with 1 degree of freedom) at the end of my simulation.
>
> Can anybody tell me how I can write a 'R' code which gives me the p-value
> for each of those LR values.
> Thanks
>
> [[alternative HTML version deleted]]
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
WenSui Liu
A lousy statistician who happens to know a little programming
(http://spaces.msn.com/statcompute/blog)

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Randomly extract rows from a data frame

2007-02-18 Thread Wensui Liu

amy,
here is a piece of code copied from my blog, which might answer part
of your question.

library(MASS);
data(Boston);

# DIVIDE DATA INTO TESTING AND TRAINING SETS
set.seed(2005);
test.rows <- sample(1:nrow(Boston), 100);
test.set <- Boston[test.rows, ];
train.set <- Boston[-test.rows, ];


On 2/18/07, Amy Whitehead <[EMAIL PROTECTED]> wrote:
> Hi,
>
> I am looking for a way to randomly extract a specified number of rows from a
> data frame.  I was planning on binding a column of random numbers to the
> data frame and then sorting the data frame using this bound column.  But I
> can't figure out how to use this column to sort the entire data frame so
> that the content of the rows remains together.  Does anyone know how I can
> do this?  Hints for other ways to approach this problem would also be
> appreciated.
>
> Cheers
> Amy
>
>
> Amy Whitehead
> School of Biological Sciences
> University of Canterbury
> Private Bag 4800
> Christchurch
> Ph 03 364 2987 ext 7033
> Cellphone 021 2020525
> Email [EMAIL PROTECTED]
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
WenSui Liu
A lousy statistician who happens to know a little programming
(http://spaces.msn.com/statcompute/blog)

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to add obj to a list?

2007-02-15 Thread Wensui Liu

a = 1:2
b = 1:3
yours = list()
yours[[1]] = a
yours[[2]] = b

On 2/15/07, Perez Alvarez, Susana <[EMAIL PROTECTED]> wrote:
> Hello everybody!
>
> I'm quite new using R and i'm trying to develope a function, but i have
> a problem.
> What i want to build is something like an objects vector. I have a list
> with two tables, and after or next to them, I want to add more tables or
> vectors to that list one by one.  But i cannot find how to do it!
> Does someone can help me?
>
> I will be very grateful for any of your help!
>
> Thank you in advance,
> susana.
>
> [[alternative HTML version deleted]]
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
WenSui Liu
A lousy statistician who happens to know a little programming
(http://spaces.msn.com/statcompute/blog)

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] rpart tree node label

2007-02-14 Thread Wensui Liu

not sure how you want to label it.
could you be more specific?
thanks.

On 2/14/07, Aimin Yan <[EMAIL PROTECTED]> wrote:
> I generate a tree use rpart.
> In the node of tree, split is based on the some factor.
> I want to label these node based on the levels of this factor.
>
> Does anyone know how to do this?
>
> Thanks,
>
> Aimin
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
WenSui Liu
A lousy statistician who happens to know a little programming
(http://spaces.msn.com/statcompute/blog)

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Help neural network in R

2007-02-12 Thread Wensui Liu

the one I will recommend is MASS by Dr B. Ripley.

On 2/12/07, vinod gullu <[EMAIL PROTECTED]> wrote:
> I am interested in Neural network models in R. Is
> there any reference material/tutorial which i can use.
> Regards,
>
>
>
> 
> TV dinner still cooling?
> Check out "Tonight's Picks" on Yahoo! TV.
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
WenSui Liu
A lousy statistician who happens to know a little programming
(http://spaces.msn.com/statcompute/blog)

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] SQL statements (directly) in R

2007-02-12 Thread Wensui Liu

chris,
if you could create a package similar to proc sql in SAs, that will be so sweet.


On 2/12/07, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:
> Hi R-users,
>
> This note will interest people who would like to use sql statements on R data 
> frames (a bit like proc sql in SAS). Please reply to my only, unless you 
> really want to keep the entire R-help list posted on this.
>
> I've been thinking about a packgage implementing sql queries in R. I'm almost 
> about starting to write it in a very rudimentary version. What I have in mind 
> is the following:
>
> Work in two ways:
> via a generic sql("..") wrapper which allows a generic query statement
> and via convenience functions, such as SELECT("..."), ...
> what would be needed is an "sqlTable" class extending the data frame. This 
> class will have to have extra slots for indices and some other stuff. I would 
> try to stay very basic in the beginning and also use relatively inefficient 
> handling of the tables. Later-on, direct calls using the binary 
> representations could  replace the high level handling.
>
> Now come my questions:
> - have others started working on this?
> - are others interested in this?
> - ideas on how to go about it?
>
> Chris
>
> P.S.:
> Here are a few ideas I was thinking about
> One way would be to incorporate a gpl or lgpl rdbms into the package, to push 
> the data-frame to it, to execute the statement there and to get the result 
> back. The advantage: fast to implement. The disadvantage: pushing the data is 
> a bad idea (but then again, at the top level, R will make a copy of it 
> anyway, most probably). The convenience wrappers would then construct sql 
> statements and the db engine would evaluate them.
>
> The other idea is to stay in R and to link the wrappers to adequately 
> composed calls to subset, cbind, rbind, etc. Here it would be more 
> challenging to create the sql("..") interface since its string would have to 
> be parsed.
>
> The political incorrect thing about these SQL functions is that they (UPDATE, 
> INSERT) will have to modify objects within the function call. They would not 
> work via the return object.
>
> As I said, comments welcome.
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
WenSui Liu
A lousy statistician who happens to know a little programming
(http://spaces.msn.com/statcompute/blog)

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Linking R with Microsoft SQL Server / Client

2007-02-12 Thread Wensui Liu

steve,
i think you can use R to link to sql server directly with RODBC. but
it is not wise to dump the whole table from db into R and then do
manipulation, which will slow the speed of R.


On 2/12/07, Steve Friedman <[EMAIL PROTECTED]> wrote:
> Hello
>
> My colleagues and I have recently established a large database (40 tables
> each with greater than 15 variables) in Microsoft's SQL Server 2000.
> Currently we are accessing this database via SQL client running an Windows
> XP.   Our objectives are many fold including running SQL applications,
> outputting results to ARC/INFO IMS, production of summarizing tables -
> graphs and web interfaces for user accessibility.
>
> The project is still very much in a design phase.  I'm interested in knowing
> if we can link R directly to the database  as it is either stored in SQL
> Server, or SQL Client, or if we are better off keeping it simple and
> extracting ascii (csv)  files from SQL server prior to processing
> summarizing and model development.
>
> Any insight provided will be greatly appreciated.
>
> Steve
>
> --
> Steve Friedman
> Computational Ecology and Visualization Laboratory
> Michigan State University
>
> Envisioning Ecosystem Decisions
>
> [[alternative HTML version deleted]]
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
WenSui Liu
A lousy statistician who happens to know a little programming
(http://spaces.msn.com/statcompute/blog)

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R in Industry

2007-02-06 Thread Wensui Liu

I've been looking for job that allows me to use R/S+ since I got out
of graduate school 2 years ago but with no success. I am wondering if
there is something that can be done to promote the use of R in
industry.

It's been very frustrating to see people doing statistics using
excel/spss and even more frustrating to see people paying $$$ for
something much inferior to R.


On 2/6/07, Doran, Harold <[EMAIL PROTECTED]> wrote:
> The other day, CNN had a story on working at Google. Out of curiosity, I
> went to the Google employment web site (I'm not looking, but just
> curious). In perusing their job posts for statisticians, preference is
> given to those who use R and python. Other languages, S-Plus and
> something called SAS were listed as lower priorities.
>
> When I started using Python, I noted they have a portion of the web site
> with job postings. CRAN does not have something similar, but think it
> might be useful. I think R is becoming more widely used in industry and
> I wonder if helping it move along a bit, the maintainer of CRAN could
> create a section of the web site devoted to jobs where R is a
> requirement.
>
> Hence, we could have our own little "monster.com" kind of thing going
> on. Of the multitude of ways the gospel can be spread, this is small.
> But, I think every small step forward is good.
>
> Anyone think this is useful?
>
> Harold
>
>
> [[alternative HTML version deleted]]
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
WenSui Liu
A lousy statistician who happens to know a little programming
(http://spaces.msn.com/statcompute/blog)

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Query about merging two tables

2007-02-06 Thread Wensui Liu

subset(table1, rate != 999&id == table2$id)

On 2/6/07, lalitha viswanath <[EMAIL PROTECTED]> wrote:
> Hi
> I have table1 which has the foll. columns
> id age rate
>
> and table2 which has the foll. columns
> id count
>
> I wish to get data from table1 for all the ids which
> are persent in table2 and where the rate is not equal
> to 999.
> The ids in table2 are a subset of those in table1 and
> every id in table2 has an entry in table1.
>
> I would appreciate your input regarding the above.
>
> Thanks in advance
> Lalitha
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
WenSui Liu
A lousy statistician who happens to know a little programming
(http://spaces.msn.com/statcompute/blog)

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] rpart

2007-02-05 Thread Wensui Liu

man, oh, man
Surely you can use bagging, or probably boosting. But that doesn't
answer your question, does it?
Believe me, even you use bagging, the result will vary, depending on set.seed().

On 2/5/07, Aimin Yan <[EMAIL PROTECTED]> wrote:
> Yes, I use the same setting, and I calculate MSE and CC as
> prediction accuracy measure.
> Someone told me  I should not trust one tree and should do bagging.
> Is this correct?
> Aimin
>
> At 03:11 PM 2/5/2007, Wensui Liu wrote:
> >are you sure you are using the same setting,  tree size, and so on?
> >
> >On 2/5/07, Aimin Yan <[EMAIL PROTECTED]> wrote:
> >>Hello,
> >>I have a question for rpart,
> >>I try to use it to do prediction for a continuous variable.
> >>But I get the different prediction accuracy for same training set,
> >>anyone know why?
> >>
> >>Aimin
> >>
> >>__
> >>R-help@stat.math.ethz.ch mailing list
> >>https://stat.ethz.ch/mailman/listinfo/r-help
> >>PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> >>and provide commented, minimal, self-contained, reproducible code.
> >
> >
> >--
> >WenSui Liu
> >A lousy statistician who happens to know a little programming
> >(http://spaces.msn.com/statcompute/blog)
>
>
>


-- 
WenSui Liu
A lousy statistician who happens to know a little programming
(http://spaces.msn.com/statcompute/blog)

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] rpart

2007-02-05 Thread Wensui Liu

are you sure you are using the same setting,  tree size, and so on?

On 2/5/07, Aimin Yan <[EMAIL PROTECTED]> wrote:
> Hello,
> I have a question for rpart,
> I try to use it to do prediction for a continuous variable.
> But I get the different prediction accuracy for same training set,
> anyone know why?
>
> Aimin
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
WenSui Liu
A lousy statistician who happens to know a little programming
(http://spaces.msn.com/statcompute/blog)

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Loop with string variable AND customizable "summary" output

2007-01-29 Thread Wensui Liu

Carlo,

try something like:

for (i in c("UK","USA"))
{
summ<-summary(lm(y ~ x), subset = (country = i))
assign(paste('output', i, sep = ''), summ);
}

(note: it is untested, sorry).

On 1/29/07, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:
> Dear All,
>
> I am using R for my research and I have two questions about it:
>
> 1) is it possible to create a loop using a string, instead of a numeric 
> vector? I have in mind a specific problem:
>
> Suppose you have 2 countries: UK, and USA, one dependent (y) and one 
> independent variable (y) for each country (vale a dire: yUK, xUK, yUSA, xUSA) 
> and you want to run automatically the following regressions:
>
>
>
> for (i in c("UK","USA"))
>
> output{i}<-summary(lm(y{i} ~ x{i}))
>
>
>
> In other words, at the end I would like to have two objects as output: 
> "outputUK" and "outputUSA", which contain respectively the results of the 
> first and second regression (yUK on xUK and yUSA on xUSA).
>
>
>
> 2) in STATA there is a very nice code ("outreg") to display nicely (and as 
> the user wants to) your regression results.
>
> Is there anything similar in R / R contributed packages? More precisely, I am 
> thinking of something that is close in spirit to "summary" but it is also 
> customizable. For example, suppose you want different Signif. codes:  0 '***' 
> 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 or a different format display (i.e. 
> without "t value" column) implemented automatically (without manually editing 
> it every time).
>
> In alternative, if I was able to see it, I could modify the source code of 
> the function "summary", but I am not able to see its (line by line) code. Any 
> idea?
>
> Or may be a customizable regression output already exists?
>
> Thanks really a lot!
>
> Carlo
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
WenSui Liu
A lousy statistician who happens to know a little programming
(http://spaces.msn.com/statcompute/blog)

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] comparing random forests and classification trees

2007-01-28 Thread Wensui Liu

Amy,
If I were you, I will check the misclassification rates in both
training set and testing set from 2 models.


On 1/28/07, Amy Koch <[EMAIL PROTECTED]> wrote:
> Hi,
>
> I have done an analysis using 'rpart' to construct a Classification Tree. I
> am wanting to retain the output in tree form so that it is easily
> interpretable. However, I am wanting to compare the 'accuracy' of the tree
> to a Random Forest to estimate how much predictive ability is lost by using
> one simple tree. My understanding is that the error automatically displayed
> by the two functions is calculated differently so it is therefore incorrect
> to use this as a comparison. Instead I have produced a table for both
> analyses comparing the observed and predicted response.
>
> E.g. table(data$dependent,predict(model,type="class"))
>
> I am looking for confirmation that (a) it is incorrect to compare the error
> estimates for the two techniques and (b) that comparing the
> misclassification rates is an appropriate method for comparing the two
> techniques.
>
> Thanks
>
> Amy
>
>
>
>
>
> Amelia Koch
>
> University of Tasmania
>
> School of Geography and Environmental Studies
>
> Private Bag 78 Hobart
>
> Tasmania, Australia 7001
>
> Ph: +61 3 6226 7454
>
> [EMAIL PROTECTED]
>
>
>
>
> [[alternative HTML version deleted]]
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
WenSui Liu
A lousy statistician who happens to know a little programming
(http://spaces.msn.com/statcompute/blog)

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] nnet question

2007-01-28 Thread Wensui Liu

AM,
Don't worry. It is correct to get different correlation each time.
Unless you are very lucky, you will get different prediction for each
different training process, depending on your initial random state.

Take a look at Dr Ripley's MASS book, there are several excellent
examples on how to use nnet.

On 1/28/07, Aimin Yan <[EMAIL PROTECTED]> wrote:
> Hello,
> I use nnet to do prediction for a continuous variable.
> after that, I calculate correlation coefficient between predicted value and
> real observation.
>
> I run my code(see following) several time, but I get different correlation
> coefficient each time.
>
> Anyone know why?
>
> In addition, How to calculate prediction accuracy for prediction of
> continuous variable?
>
> Aimin
> thanks,
>
>
>  > m.nn.omega <- nnet(omega~aa_three+bas+bcu+aa_ss, data=training, size=2,
> linout=TRUE)
> # weights:  57
> initial  value 89153525.582093
> iter  10 value 15036439.951888
> iter  20 value 15010796.121891
> iter  30 value 15000761.804392
> iter  40 value 14955839.294531
> iter  50 value 14934746.564215
> iter  60 value 14933978.758615
> iter  70 value 14555668.381007
> iter  80 value 14553072.231507
> iter  90 value 14031071.223996
> iter 100 value 13709055.312482
> final  value 13709055.312482
> stopped after 100 iterations
>  > pr.nn.train<-predict(m.nn.omega,training)
>  > corr.pr.nn.train<-round(cor(pr.nn.train,training$omega),2)
>  > pr.nn.test<-predict(m.nn.omega,test)
>  > corr.pr.nn.test<-round(cor(pr.nn.test,test$omega),2)
>  > cat("correlation coefficient for train using neural
> network:",corr.pr.nn.train,"\n")
> correlation coefficient for train using neural network: 0.32
>  > cat("correlation coefficient for test using neural
> network:",corr.pr.nn.test,"\n")
> correlation coefficient for test using neural network: 0.39
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
WenSui Liu
A lousy statistician who happens to know a little programming
(http://spaces.msn.com/statcompute/blog)

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] question about nnet

2007-01-28 Thread Wensui Liu

in nnet(), you should add linout = TRUE. The default setting is logistic output.

hth.

On 1/28/07, Aimin Yan <[EMAIL PROTECTED]> wrote:
> I use neural network to predict a continuous variable( omega in training).
>
> But I get all "1" instead of real value.
>
> Do you know why?
>
> Aimin
>
> Thanks
>
> The following is code
>
>  > m.nn.omega <- nnet(omega~aa_three+bas+bcu+aa_ss, aata=training,size=2)
> # weights:  57
> initial  value 97329662.256069
> final  value 96367717.444383
> converged
>  > pr.nn.train<-predict(m.nn.omega,training,type="raw")
>  > head(pr.nn.train)
>[,1]
> 11
> 21
> 31
> 41
> 51
> 61
>  > head(training)
>  pr aa_three aa_one aa_ss aa_posaas bas   ams bmsacu
> bcu omega   y index
> 1 1acx  ALA  A C  1 127.71   0 69.99   0
> -0.2498560   0  79.91470 outward  TRUE
> 2 1acx  PRO  P C  2  68.55   0 55.44   0
> -0.0949008   0  76.60380 outward  TRUE
> 3 1acx  ALA  A E  3  52.72   0 47.82   0
> -0.0396550   0  52.19970 outward  TRUE
> 4 1acx  PHE  F E  4  22.62   0 31.21   0  0.1270330   0
> 169.52500  inward  TRUE
> 5 1acx  SER  S E  5  71.32   0 52.84   0
> -0.1312380   0   7.47528 outward  TRUE
> 6 1acx  VAL  V E  6  12.92   0 22.40   0  0.1728390   0
> 149.09400  inward  TRUE
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
WenSui Liu
A lousy statistician who happens to know a little programming
(http://spaces.msn.com/statcompute/blog)

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] how to create daily / weekly ts object?

2007-01-26 Thread Wensui Liu

Dear All,

Monthly and Quarterly ts obj. is easy to understand. But I couldn't
find an example in R manual how to create daily or weekly ts object.
Could you please shed some light on it?
I really appreciate it.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] New to R

2006-12-31 Thread Wensui Liu

what is the format of your data files, txt/csv/mdb/xls? the syntax is
very different.
could you please give more info?

thanks.


On 12/31/06, Obinna Duru <[EMAIL PROTECTED]> wrote:
> Hey, I am very new to R and I need to use it (and the ACEPACK
> package) to do some statistical analysis.
>
> I have installed acepack but efforts to get started has been
> unsuccessful. I can't seem to be able to load my data files because I
> am yet to figure the syntax to use. Is there a work directory in R
> where I can put my files and call them anytime, like in Matlab? My
> files are on my C drive and I just can't figure the syntax to get them into R.
>
> Any help?
>
> Best Regards
>
> Obinna Duru
>
> Energy Resources Engineering Department,
> Green Earth Sciences Building,
> 367 Panama Street,
> Stanford, CA 94305-2220
> cell:   (650) 814 6079
> fax:(659) 725 2099
> email:  [EMAIL PROTECTED]
>
>
>
>
>
> [[alternative HTML version deleted]]
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
WenSui Liu
A lousy statistician who happens to know a little programming
(http://spaces.msn.com/statcompute/blog)

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] OT: any recommendation for scripting language

2006-12-30 Thread Wensui Liu

Dear Experts,

Thank you so much for your opinions. I probably will go with python.

Following your suggestion, I started reading some tutorials but have a
quick question. In the sense of statistical computing, is there
anything that can be easily done with python but not with SAS/R? Could
you please give such an example?

Wish you all have a happy new year!

wensui

On 12/23/06, Wensui Liu <[EMAIL PROTECTED]> wrote:
> Right now, I am using SAS and S+/R. As a new year resolution, I am
> planning to learn a scripting language.
>
> from statisticians' point of view, which scripting language is worth
> to learn, perl, python, or any other recommendation? (Most likely, I
> will be learning it in windows.) Since I am not in research, I will
> prefer one widely used in industry and related to statistical work.
>
> if you recommend one, I will really appreciate it if you could point
> out a good source for learning as well.
>
> thank you so much!
>
> Have a happy holiday.
>
> wensui
>

-- 
WenSui Liu
A lousy statistician who happens to know a little programming
(http://spaces.msn.com/statcompute/blog)

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] how to transform string to variable name in a fuction?

2006-12-26 Thread Wensui Liu

try ?assign

On 12/26/06, jingjiangyan <[EMAIL PROTECTED]> wrote:
> there is a data frame, like this:
> > df
>   aa   bb
> 1  a 20.27802
> 2  b 22.10664
> 3  c 21.33470
> 4  a 22.32898
> 5  b 19.73760
> 6  c 20.38979
> .(suppressed)
> what I want to do is to copy the data frame's rows into different data frames 
> according to the levels of 'aa' column,
> > df.a <- df[df[,1]=='a',] ; df.b <- df[df[,1]=='b',] ; 
> > df.a
>   aa   bb
> 1  a 20.27802
> 4  a 22.32898
> ...
>
> So, when completed, there should be df.a, df.b,df.c, etc.
> If we could do this by hand, it is pretty fine.  But could I write a loop to 
> do this ?
> when I tried this using a funciton, there is a problem.
>
> > for ( i in levels(df[,1])) {
> +  name = paste('df',i,sep='')
> +  name <- df[df[,1]==i,]
> + }
> > name
>   aa   bb
> 3  c 21.33470
> 6  c 20.38979
> > ls()
> [1] "df"   "i""name"
> > i
> [1] "c"
> there is not data frames df.a, df.b,etc.
>
> Could you please give me some suggestion?
> I have found that write a function in R for a beginner is difficult. Is there 
>  any tutorial on writing the functions in R?
> Furthermore, someone also said that loop is not used as frequently as in 
> other script language (e.g. bash, perl). So, If you have any other smart 
> means do this more efficiently, please let me know, I would appreciate your 
> kindness.
>
> [[alternative HTML version deleted]]
>
> ______
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
WenSui Liu
A lousy statistician who happens to know a little programming
(http://spaces.msn.com/statcompute/blog)

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] OT: any recommendation for scripting language

2006-12-23 Thread Wensui Liu

Right now, I am using SAS and S+/R. As a new year resolution, I am
planning to learn a scripting language.

from statisticians' point of view, which scripting language is worth
to learn, perl, python, or any other recommendation? (Most likely, I
will be learning it in windows.) Since I am not in research, I will
prefer one widely used in industry and related to statistical work.

if you recommend one, I will really appreciate it if you could point
out a good source for learning as well.

thank you so much!

Have a happy holiday.

wensui

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] X-fold cross validation function for discriminant analysis

2006-11-16 Thread Wensui Liu

how hard is it to write one though?

On 11/16/06, Wade Wall <[EMAIL PROTECTED]> wrote:
> Hi all,
>
> I ran a discriminant analysis with some data and want to get a general idea
> of prediction error rate.  Some have suggested using X-fold cross validation
> procedure.  Anyone know if there is a function for this in R?
>
> Thanks,
>
> Wade
>
> [[alternative HTML version deleted]]
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
WenSui Liu
(http://spaces.msn.com/statcompute/blog)
Senior Decision Support Analyst
Cincinnati Children Hospital Medical Center

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Comparison between GARCH and ARMA

2006-11-07 Thread Wensui Liu

 input will be highly appreciated.
>
> Thanks and regards,
> Megh
>
>
>
>
>
> 
> 
> Sponsored Link
>
> Degrees online in as fast as 1 Yr - MBA, Bachelor's, Master's, Associate
> Click now to apply http://yahoo.degrees.info
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
>
> This is not an offer (or solicitation of an offer) to buy/se...{{dropped}}
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
WenSui Liu
(http://spaces.msn.com/statcompute/blog)
Senior Decision Support Analyst
Cincinnati Children Hospital Medical Center

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Automatic File Reading [Broadcast]

2006-10-28 Thread Wensui Liu

Andy,

First of all, thanks for your solution.

When I test your code, it doesn't work. I am not sure if I miss something.

Here is the code I tested:
flist<-list.files(path = file.path(, "c:\\"),pattern="[.]csv$")
csvlist<-lapply(flist, read.csv, header = TRUE)

Here is the error:
Error in file(file, "r") : unable to open connection
In addition: Warning message:
cannot open file 'test1.csv', reason 'No such file or directory'

Thank you so much!

On 10/18/06, Liaw, Andy <[EMAIL PROTECTED]> wrote:
> Works on all platforms:
>
> flist <- list.files(path=file.path("somedir", "somewhere"),
> pattern="[.]csv$")
> csvlist <- lapply(flist, read.csv, header=TRUE)
> whateverList <- lapply(csvlist, whatever)
>
> Andy
>
> From: Richard M. Heiberger
> >
> > Wensui Lui asks:
> > > is there a similar way to read all txt or csv files with same
> > > structure from a folder?
> >
> >
> >
> > On Windows I use this construct to find all files with the
> > specified wild card name.
> > I used the "\\" in the file paths with the translate=FALSE,
> > because the "/" in
> > the DOS switches "/w/B" must not be translated.  On Windows
> > this picks up
> > both lower and upper case filenames
> >
> > A similar construct can be written for Unix.
> >
> > tmp <- shell('dir c:\\HOME\\rmh\\tmp\\*.R /w/B', intern=TRUE,
> > translate=FALSE)  ##msdos
> > for (i in tmp) source(paste("c:\\HOME\\rmh\\tmp\\", i, sep=""))
> >
> > __
> > R-help@stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
> >
> >
>
>
> --
> Notice:  This e-mail message, together with any attachment...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R & gams

2006-10-27 Thread Wensui Liu

I am so surprised to hear "gams is specialised in dealing with huge,
hevy-weight linear and non-linear modelling ". That is not what I know
of about GAM, which means generalized additive model.


On 10/27/06, vittorio <[EMAIL PROTECTED]> wrote:
> At office I have been introduced by another company  to new, complex energy
> forecasting models using gams as the basic software.
> I have been told by the company offering the models that gams is specialised
> in dealing with huge, hevy-weight linear and non-linear modelling (see an
> example in http://www.gams.com/modtype/index.htm) and they say it is almost
> the only option for doing it.
>
> I would like to know your opinion on the subject and, above all, if R can be
> an effective alternative and to what extent, if any.
>
> Thanks
> Vittorio
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
WenSui Liu
(http://spaces.msn.com/statcompute/blog)
Senior Decision Support Analyst
Cincinnati Children Hospital Medical Center

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] how to convert multiple dummy variables to 1 factor variable?

2006-10-22 Thread Wensui Liu

Thank you so much, Marc and Peter,

Your method works great if I want to convert N dummies into N-level
factor. But what if I want to convert N dummies into (N+1)-level
factor? I tried both ways but none  works.

Again, thank you so much!

On 10/22/06, Marc Schwartz <[EMAIL PROTECTED]> wrote:
> On Sun, 2006-10-22 at 09:35 +0200, Peter Dalgaard wrote:
> > Marc Schwartz <[EMAIL PROTECTED]> writes:
> >
> > > On Sat, 2006-10-21 at 21:04 -0400, Wensui Liu wrote:
> > > > Dear Listers,
> > > >
> > > > I am wondering how to convert multiple dummy variables to 1 factor 
> > > > variable.
> > > >
> > > > Thanks.
> > > >
> > > > wensui
> > >
> > > I was thinking of a function that is essentially the reverse of
> > > model.matrix() which is used by R modeling functions. I did not see one,
> > > though it is possible that I missed it.
> > >
> > > However, I suppose that something along the lines of the following would
> > > work.
> > >
> > > Say we have a matrix as follows, where the columns represent the
> > > presence or absence of the factor levels, as one would see in a model
> > > matrix. There should be a single '1' in each row as each row corresponds
> > > to a single observation.
> > >
> > > > mat
> > >  Level1 Level2 Level3 Level4 Level5
> > > [1,]  0  1  0  0  0
> > > [2,]  1  0  0  0  0
> > > [3,]  0  0  0  1  0
> > > [4,]  0  0  1  0  0
> > > [5,]  0  0  0  0  1
> > >
> > >
> > > # Create a new factor based upon the index of each 1 in each row
> > > # Use the matrix column names as the labels for each level
> > > NewFactor <- factor(apply(mat, 1, function(x) which(x == 1)),
> > > labels = colnames(mat))
> > >
> > > > NewFactor
> > > [1] Level2 Level1 Level4 Level3 Level5
> > > Levels: Level1 Level2 Level3 Level4 Level5
> >
> > How about
> >
> > factor(mat%*%(1:5), labels = colnames(mat))
> >
> > ?
>
> That'll do it too...and more efficiently of course.
>
> Thanks Peter.
>
> Regards,
>
> Marc
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
WenSui Liu
(http://spaces.msn.com/statcompute/blog)
Senior Decision Support Analyst
Cincinnati Children Hospital Medical Center

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] how to convert multiple dummy variables to 1 factor variable?

2006-10-22 Thread Wensui Liu

Great!

It seems everyone is having fun with R in the weekend afternoon.

Thank you so much, Marc and Peter.


On 10/22/06, Marc Schwartz <[EMAIL PROTECTED]> wrote:
> On Sun, 2006-10-22 at 14:03 -0500, Marc Schwartz wrote:
> > On Sun, 2006-10-22 at 14:37 -0400, Wensui Liu wrote:
> > > Thank you so much, Marc and Peter,
> > >
> > > Your method works great if I want to convert N dummies into N-level
> > > factor. But what if I want to convert N dummies into (N+1)-level
> > > factor? I tried both ways but none  works.
> > >
> > > Again, thank you so much!
> >
> >
> > I presume that you are referring to the situation where the base level
> > of the factor is not present as a column in the matrix, such that all of
> > the columns would be 0 in the case where the base level is present. This
> > would be the typical result of model.matrix() with default Treatment
> > contrasts.
> >
> > In that situation, we would have a matrix as follows:
> >
> > > mat
> >  Level2 Level3 Level4 Level5
> > [1,]  0  0  0  0
> > [2,]  1  0  0  0
> > [3,]  0  1  0  0
> > [4,]  0  0  1  0
> > [5,]  0  1  0  0
> > [6,]  0  0  0  0
> > [7,]  0  0  0  1
> >
> > Note that now, we do not have a 'Level1' column.
> >
> > Thus, rows 1 and 6 are all 0's, indicating that "Level1" is present.
> >
> > Taking Peter's more efficient approach of using matrix multiplication,
> > and expanding upon it:
> >
> > > factor((mat %*% (1:ncol(mat))) + 1,
> >  labels = c("Level1", colnames(mat)))
> > [1] Level1 Level2 Level3 Level4 Level3 Level1 Level5
> > Levels: Level1 Level2 Level3 Level4 Level5
>
> Actually, I was wrong in the numeric to factor conversion. The addition
> of 1 is really not needed. We just need to be sure that there are 5
> labels, one more than the number of columns:
>
> > factor(mat %*% 1:ncol(mat),
>  labels = c("Level1", colnames(mat)))
> [1] Level1 Level2 Level3 Level4 Level3 Level1 Level5
> Levels: Level1 Level2 Level3 Level4 Level5
>
> HTH,
>
> Marc
>
>
>


-- 
WenSui Liu
(http://spaces.msn.com/statcompute/blog)
Senior Decision Support Analyst
Cincinnati Children Hospital Medical Center

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] how to convert multiple dummy variables to 1 factor variable?

2006-10-21 Thread Wensui Liu

Dear Listers,

I am wondering how to convert multiple dummy variables to 1 factor variable.

Thanks.

wensui

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] RODBC problem

2006-10-20 Thread Wensui Liu

In a mdb table, I have a text field with values of 1, 2,  When I
use rodbc to read it into R, it becomes numeric. Is it a bug or
something?

Thanks.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Automatic File Reading

2006-10-18 Thread Wensui Liu

is there a similar way to read all txt or csv files with same
structure from a folder?

thanks.

On 10/18/06, Jerome Asselin <[EMAIL PROTECTED]> wrote:
> On Wed, 2006-10-18 at 17:09 +0200, Lorenzo Isella wrote:
> > Dear All,
> > I am given a set of files names as:
> > velocity1.txt
> > velocity2.txt
> >  and so on.
> > I am sure there must be a way to read them automatically in R.
> > It is really taking me longer to read them than to analyze them.
> > Anybody has a suggestion to help me out with this?
> > Many thanks
>
> Not what you mean by "reading".
> ?read.table
> ?read.csv
> ?scan
> ...
>
> However, consider this example. If you have 100 files, you can do:
>
> for(i in 1:100)
> {
>   fn <- paste("velocity",i,".txt",sep="")
>   dat <- read.csv(fn)
>   # ... do your stuff on "dat" here ...
> }
>
> HTH,
> Jerome
>
> --
> Jerome Asselin, M.Sc., Agent de recherche, RHCE
> CHUM -- Centre de recherche
> 3875 rue St-Urbain, 3e etage // Montreal QC  H2W 1V1
> Tel.: 514-890-8000 Poste 15914; Fax: 514-412-7106
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
WenSui Liu
(http://spaces.msn.com/statcompute/blog)
Senior Decision Support Analyst
Cincinnati Children Hospital Medical Center

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Data frames questions

2006-09-23 Thread Wensui Liu

> 3) How to I "append" to the bottom of a dataframe?

APPEND ROWS ITERATIVELY TO A DATA FRAME

#
# APPEND ROWS ITERATIVELY TO A DATA FRAME   #
#

# LOOP FROM 1 TO 10
for(i in 1:10)
{
  # SET THE RANDOM STATE
  set.seed(i);
  # CREATE A DUMMY DATA FRAME ITERATIVELY
  data<-data.frame(iter = i, x = rnorm(1), y = runif(1));
  # ASSIGN THE ROW NAME
  row.names(data)<-i;
  # APPEND NEW DATA TO THE END OF MAIN DATA FRAME
  if (i == 1) main<-data else main<-rbind(main, data);
};

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Question about merge()

2006-09-23 Thread Wensui Liu

MERGE DATA FRAMES BY 2 OR MORE VARIABLES
###
# MERGE 2 DATA FRAMES BASED ON#
# 2 OR MORE VARIABLES #
###

data1<-data.frame(x.id1 = 1:10, x.id2 = (1:10) * 2, x = rnorm(length(1:10)));
data2<-data.frame(y.id1 = seq(1, 20, by = 2), y.id2 = seq(1, 20, by =
2) * 2, y = rnorm(length(1:10)));

merged<-merge(data1, data2, by.x = c("x.id1", "x.id2"), by.y =
c("y.id1", "y.id2"), all = TRUE);


On 9/23/06, Jonathan Greenberg <[EMAIL PROTECTED]> wrote:
> If I want to do a join based on *two* matching fields in two data frames,
> can merge() handle this?  It appears to only handle a single matching column
> -- do I need to make a "metacolumn" or is there some way to do this?  E.g.:
>
> Dataframe 1 contains columns A,B,C and Dataframe 2 contains A,B,D
>
> I want an output A,B,C,D which places C and D together if A and B match
> (otherwise, make two new rows, e.g. Ax,Bx,Cx,nodata and Ay,By,nodata,Dy)
>
> --j
>
> --
> Jonathan A. Greenberg, PhD
> NRC Research Associate
> NASA Ames Research Center
> MS 242-4
> Moffett Field, CA 94035-1000
> Office: 650-604-5896
> Cell: 415-794-5043
> AIM: jgrn307
> MSN: [EMAIL PROTECTED]
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
WenSui Liu
(http://spaces.msn.com/statcompute/blog)
Senior Decision Support Analyst
Cincinnati Children Hospital Medical Center

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Problem with rpart

2006-09-19 Thread Wensui Liu

Andrew,

Not sure what your problem is based on your email.

But data volume is not a problem if there is only 1400 obs and 15 predictors.


On 9/19/06, Andrew Zachary <[EMAIL PROTECTED]> wrote:
> Not sure if anyone has posted on this problem ... I want to use rpart to
> build a binary tree on a relatively large dataset with ~1400 data points
> and 15 predictors. But I've noticed that rpart fails almost immediately
> in the call to C_s_to_rp, as that code returns nonsense. Looking at the
> code itself isn't terribly helpful, and there don't seem to be any hard
> limits coded anywhere. Does anyone have a suggestion for what might be
> going on?
>
> Thanks in advance for you help
> Andrew Zachary
>
> 
> Wetherby Partners LLC believes the information provided herein is reliable. 
> While every care has been taken to ensure accuracy, the information is 
> furnished to the recipients with no warranty as to the completeness and 
> accuracy of its contents and on condition that any errors or omissions shall 
> not be made the basis for any claim, demand or cause for action.
> The information in this email is intended only for the named...{{dropped}}
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
WenSui Liu
(http://spaces.msn.com/statcompute/blog)
Senior Decision Support Analyst
Health Policy and Clinical Effectiveness
Cincinnati Children Hospital Medical Center

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] how to rescale the limits of yaxis rather than using the data range by default?

2006-09-16 Thread Wensui Liu

HI, Peter,

It is exactly what I want. Thank you so much!

wensui

On 9/16/06, Peter Konings <[EMAIL PROTECTED]> wrote:
> Hi,
>
> I'm not sure I understand your question correctly, but does the ylim option
> of plot do what you want?
>
> HTH
> Peter.
>
>
> On 9/16/06, Wensui Liu <[EMAIL PROTECTED]> wrote:
> >
> Dear Lister,
>
> plot() is using the data range as the default limits of yaxis. Is
> there any way I can change the limits? I just look at the help of
> plot() and par() and couldn't find answers.
>
> Thanks.
>
>
> --
>  WenSui Liu
> (http://spaces.msn.com/statcompute/blog)
> Senior Decision Support Analyst
> Health Policy and Clinical Effectiveness
> Cincinnati Children Hospital Medical Center
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>


-- 
WenSui Liu
(http://spaces.msn.com/statcompute/blog)
Senior Decision Support Analyst
Health Policy and Clinical Effectiveness
Cincinnati Children Hospital Medical Center

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] how to rescale the limits of yaxis rather than using the data range by default?

2006-09-16 Thread Wensui Liu

Dear Lister,

plot() is using the data range as the default limits of yaxis. Is
there any way I can change the limits? I just look at the help of
plot() and par() and couldn't find answers.

Thanks.


-- 
WenSui Liu
(http://spaces.msn.com/statcompute/blog)
Senior Decision Support Analyst
Health Policy and Clinical Effectiveness
Cincinnati Children Hospital Medical Center

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Table manipulation question

2006-09-14 Thread Wensui Liu

Hi, Russell,

Here is a piece of code and you might need to tweak it a little.
MERGE 2 DATA FRAMES###
# MERGE 2 DATA FRAMES:#
# INNER JOIN, LEFT JOIN, RIGHT JOIN,  #
# FULL JOIN, & CARTESIAN PRODUCT  #
###

data1<-data.frame(id1 = 1:10, x = rnorm(length(1:10)));
data2<-data.frame(id2 = seq(1, 20, by = 2), y = rnorm(length(seq(1, 20, by =
2;

# INNER JOIN
inner.join<-merge(data1, data2, by.x = "id1", by.y = "id2");

# LEFT JOIN
left.join<-merge(data1, data2, by.x = "id1", by.y = "id2", all.x = TRUE);

# RIGHT JOIN
right.join<-merge(data1, data2, by.x = "id1", by.y = "id2", all.y = TRUE);

# FULL JOIN
full.join<-merge(data1, data2, by.x = "id1", by.y = "id2", all = TRUE);

# CARTESIAN PRODUCT
cartesian<-merge(data1, data2);



On 9/14/06, Geoff Russell <[EMAIL PROTECTED]> wrote:
>
> I have a table:
>
>  C1
> RowName13
> RowName22
>
> and another table:
>
>   C2
> RowName15.6
> RowName1a  4.3
> RowName2NA
>
> I want to join join the tables with matching rows:
>
>   C1   C2
> RowName1 35.6
> RowName22 NA
>
> I'm thinking of something like:
>
> T1$C2=T2$C2[index-expression-to-pullout the matching ones]
>
> Any ideas would be appreciated.
>
> Cheers,
> Geoff Russell
>
> [[alternative HTML version deleted]]
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
WenSui Liu
(http://spaces.msn.com/statcompute/blog)
Senior Decision Support Analyst
Health Policy and Clinical Effectiveness
Cincinnati Children Hospital Medical Center

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R-question

2006-09-13 Thread Wensui Liu

well, Harrell,

I understand sweave or R2html could be a solution.

but please  show me their applications in a large business setting. On
the contrary, I can give you many such cases using SAS.

On 9/13/06, Frank E Harrell Jr <[EMAIL PROTECTED]> wrote:
> Wensui Liu wrote:
> > For your 1st question, you can write query against the tables in DB using 
> > RODBC.
> >
> > Being a SAS programmer, I have to say that reporting function of R is
> > not as good as that of SAS.
>
> I beg to differ.  See for example
> http://biostat.mc.vanderbilt.edu/StatReport
>
> Frank Harrell
> >
> >
> >
> > On 9/13/06, Thorsten Muehge <[EMAIL PROTECTED]> wrote:
> >>
> >> Hello Colleagues,
> >> I programmed in SAS for 3 years and would like to switch to a not so costly
> >> software product.
> >>
> >> Hence I started to evaluate R, and my first test look promising.
> >>
> >> However I have some question:
> >>
> >> 1. Is it possible to query R files by SQL internally on data frames (not on
> >> a database) and how is the syntax (I have the RODBC package installed).
> >>
> >> I would like to extract year, Quarter, week, from a date column in a data
> >> frame (see attachment). After this I want to attach the column to the
> >> original data frame.
> >>
> >> How do I do this in R?
> >>
> >> Dr .Th.Mühge,
> >>
> >> PMP(r)
> >> Procurement Technology Center
> >> IBM Deutschland GmbH, Hechtsheimer Str.2, D-55131 Mainz
> >> Phone: xx49-(0)6131-84-2416
> >> Mobile: xx49-(0)15117457978
> >> e-mail: [EMAIL PROTECTED]
> >> (See attached file: Debug1.csv)
> >>
> >> __
> >> R-help@stat.math.ethz.ch mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide 
> >> http://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
> >>
> >>
> >>
> >
> >
>
>
> --
> Frank E Harrell Jr   Professor and Chair   School of Medicine
>   Department of Biostatistics   Vanderbilt University
>


-- 
WenSui Liu
(http://spaces.msn.com/statcompute/blog)
Senior Decision Support Analyst
Health Policy and Clinical Effectiveness
Cincinnati Children Hospital Medical Center

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R-question

2006-09-13 Thread Wensui Liu

For your 1st question, you can write query against the tables in DB using RODBC.

Being a SAS programmer, I have to say that reporting function of R is
not as good as that of SAS.



On 9/13/06, Thorsten Muehge <[EMAIL PROTECTED]> wrote:
>
>
> Hello Colleagues,
> I programmed in SAS for 3 years and would like to switch to a not so costly
> software product.
>
> Hence I started to evaluate R, and my first test look promising.
>
> However I have some question:
>
> 1. Is it possible to query R files by SQL internally on data frames (not on
> a database) and how is the syntax (I have the RODBC package installed).
>
> I would like to extract year, Quarter, week, from a date column in a data
> frame (see attachment). After this I want to attach the column to the
> original data frame.
>
> How do I do this in R?
>
> Dr .Th.Mühge,
>
> PMP(r)
> Procurement Technology Center
> IBM Deutschland GmbH, Hechtsheimer Str.2, D-55131 Mainz
> Phone: xx49-(0)6131-84-2416
> Mobile: xx49-(0)15117457978
> e-mail: [EMAIL PROTECTED]
> (See attached file: Debug1.csv)
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
>


-- 
WenSui Liu
(http://spaces.msn.com/statcompute/blog)
Senior Decision Support Analyst
Health Policy and Clinical Effectiveness
Cincinnati Children Hospital Medical Center

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] variables in object names

2006-09-12 Thread Wensui Liu

Ken,

I have a similar example in my blog:
http://statcompute.spaces.live.com/blog/cns!39C8032DBD1321B7!229.entry

On 9/12/06, Pierce, Ken <[EMAIL PROTECTED]> wrote:
> Is there any way to put an argument into an object name. For example,
> say I have 5 objects,  model1, model2, model3, model4 and model5.
>
> I would like to make a vector of the r.squares from each model by code
> such as this:
>
>
> rsq <- summary(model1)$r.squared
> for(i in 2:5){
> rsq <- c(rsq, summary(model%i%)$r.squared)
> }
>
>
> So I assign the first value to rsq then cycle through models 2 through 5
> gathering there values. The %i% in my third line indicates which object
> to draw from. The question is is there any way to pass a variable such
> as i as part of a name?
>
> Ken
>
>
>
> Kenneth B. Pierce Jr.
>
> Research Ecologist
>
> Landscape Ecology, Modeling, Mapping and Analysis Team
>
> PNW Research Station - USDA-FS
>
> 3200 SW Jefferson Way,  Corvallis,  OR 97331
>
> [EMAIL PROTECTED]
>
> 541 750-7393
>
> http://www.fsl.orst.edu/lemma/gnnfire
>
>
>
> [[alternative HTML version deleted]]
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
WenSui Liu
(http://spaces.msn.com/statcompute/blog)
Senior Decision Support Analyst
Health Policy and Clinical Effectiveness
Cincinnati Children Hospital Medical Center

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] refer to objects with sequential names

2006-08-27 Thread Wensui Liu

Dear Listers,

If I have several glm objects with names glm1, glm2 and want to apply
new data to these objects. Instead of typing "predict(glm1, newdata)..." 100
times, is there way I could do so in a loop?

Thank you so much!

wensui

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] how to create many objects with sequencial names?

2006-08-26 Thread Wensui Liu

Hi, Jim,

It is you again. I couldn't remember how many times you answered my
silly questions. ^_^

I am not sure assign() is what I want. Say, if I want to create 1000
linear model objects with names lm1, lm2lm1000, it seems assign
can't solve it.

But your second solution is close to what I am looking for.

Any idea?

Thanks.

wesui
On 8/26/06, jim holtman <[EMAIL PROTECTED]> wrote:
> Yes there is with statements like:
>
> assign(paste('m', i, sep='', value)
>
> But I would suggest that you put the values in a list to make it
> easier to access since all the data is in a single object.  You could
> do it in a loop:
>
> result <- list()
> for(i in 1:100){
> ..computation.
> result[[i]] <- yourResult
> }
>
>
> On 8/26/06, Wensui Liu <[EMAIL PROTECTED]> wrote:
> > Dear Lister,
> >
> > Is there a way to create many objects with sequencial names, say lm1,
> > lm2...lm100?
> >
> > Thanks.
> >
> > __
> > R-help@stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
>
> --
> Jim Holtman
> Cincinnati, OH
> +1 513 646 9390
>
> What is the problem you are trying to solve?
>

-- 
WenSui Liu
(http://spaces.msn.com/statcompute/blog)
Senior Decision Support Analyst
Health Policy and Clinical Effectiveness
Cincinnati Children Hospital Medical Center

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] how to create many objects with sequencial names?

2006-08-26 Thread Wensui Liu

Dear Lister,

Is there a way to create many objects with sequencial names, say lm1,
lm2...lm100?

Thanks.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] for() loop question

2006-08-26 Thread Wensui Liu

Thank you so much Marc.

Your solution is exactly what I am looking for.

Have a nice weekend.

wensui

On 8/26/06, Marc Schwartz <[EMAIL PROTECTED]> wrote:
> On Sat, 2006-08-26 at 13:06 -0400, Wensui Liu wrote:
> > Dear Lister,
> >
> > If I have a list of number, say x<-c(0.1, 0.5, 0.6...), how to use a for()
> > to loop through each number in x one by one?
> >
> > Thank you so much!
> >
> > wensui
>
> Two options:
>
> x <- c(0.1, 0.5, 0.6)
>
> > for (i in x) {print (i)}
> [1] 0.1
> [1] 0.5
> [1] 0.6
>
>
> > for (i in seq(along = x)) {print (x[i])}
> [1] 0.1
> [1] 0.5
> [1] 0.6
>
>
> Which approach you take tends to depends upon what else you are doing
> within the loop.
>
> I would also take a look at ?sapply, depending up what is it you are
> doing.
>
> HTH,
>
> Marc Schwartz
>
>
>


-- 
WenSui Liu
(http://spaces.msn.com/statcompute/blog)
Senior Decision Support Analyst
Health Policy and Clinical Effectiveness
Cincinnati Children Hospital Medical Center

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] for() loop question

2006-08-26 Thread Wensui Liu

Dear Lister,

If I have a list of number, say x<-c(0.1, 0.5, 0.6...), how to use a for()
to loop through each number in x one by one?

Thank you so much!

wensui

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] apply least angle regression to generalized linear models

2006-08-18 Thread Wensui Liu

Any is right.

I don't think current version of lars can be implemented in generalized LM.


On 8/18/06, Liaw, Andy <[EMAIL PROTECTED]> wrote:
> I believe `lars' does not currently fit glms.  For that you'll probably need
> to look at `glar', at:
>
> http://www.insightful.com/Hesterberg/glars/default.asp
>
> HTH,
> Andy
>
> From: Marc Schwartz
> >
> > On Fri, 2006-08-18 at 11:17 -0400, Mike Wolfgang wrote:
> > > Hello list,
> > >
> > > I've been searching around trying to find whether somebody
> > has written
> > > such a package of least angle regression on generalized
> > linear models,
> > > like what
> > > Lasso2 package does. The extension to generalized linear models is
> > > briefly discussed in the comment by D. Madigan and G. Ridgeway. Is
> > > such a package available? Thanks,
> > >
> > > Mike
> >
> > See the aptly named 'lars' package on CRAN and the attendant
> > paper here:
> >
> >   http://www-stat.stanford.edu/~hastie/Papers/LARS/LeastAngle_2002.pdf
> >
> > You might also want to review Professor Hastie's presentation at useR!
> > 2006 this past spring:
> >
> >   http://www.r-project.org/useR-2006/Slides/Hastie.pdf
> >
> > HTH,
> >
> > Marc Schwartz
> >
> > __
> > R-help@stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
> >
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
WenSui Liu
(http://spaces.msn.com/statcompute/blog)
Senior Decision Support Analyst
Health Policy and Clinical Effectiveness
Cincinnati Children Hospital Medical Center

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] merge 2 data frame based on more than 2 variables

2006-08-14 Thread Wensui Liu

what if the names are different in 2 data frames?


On 8/14/06, Simon Blomberg <[EMAIL PROTECTED]> wrote:
> Wensui Liu wrote:
> > Dear Lister,
> >
> > I understand merge() can be used to join 2 data frames based on 1 variable.
> > But how about merge based on more than 2 variables?
> >
> > Thank you so much!
> >
> >
> Just specify the 2 (or more) variable names in a column vector for "by")
>
> merge(dat1, dat2, by= c("VarA", "VarB"))
>
> assuming both data frames have columns VarA and VarB.
>
> --
> Simon Blomberg, B.Sc.(Hons.), Ph.D, M.App.Stat.
> Centre for Resource and Environmental Studies
> The Australian National University
> Canberra ACT 0200
> Australia
> T: +61 2 6125 7800 email: Simon.Blomberg_at_anu.edu.au
> F: +61 2 6125 0757
> CRICOS Provider # 00120C
>
>


-- 
WenSui Liu
(http://spaces.msn.com/statcompute/blog)
Senior Decision Support Analyst
Health Policy and Clinical Effectiveness
Cincinnati Children Hospital Medical Center

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] merge 2 data frame based on more than 2 variables

2006-08-14 Thread Wensui Liu

Dear Lister,

I understand merge() can be used to join 2 data frames based on 1 variable.
But how about merge based on more than 2 variables?

Thank you so much!

-- 
WenSui Liu
(http://spaces.msn.com/statcompute/blog)
Senior Decision Support Analyst
Health Policy and Clinical Effectiveness
Cincinnati Children Hospital Medical Center

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] proc standardize & data frame x and y

2006-08-12 Thread Wensui Liu

zubin,

for your second question:

supposed you have x1 and x2 and want to combine them in a matrix X in
a data frame called data, try the following code:

X<-matrix(1:10, ncol = 2, dimnames = list(NULL, c("x1", "x2")));
class(X)
class(X)<-"AsIs";
class(X)
data<-data.frame(X);
summary(data);



On 8/12/06, zubin <[EMAIL PROTECTED]> wrote:
> Hello!  i know these are basic but i cannot seem to find the answer thru
> my searches..
>
> 1) Can someone recommend an equivalent to SAS PROC Standardize in R?  I
> am in need to frequently standardize a data frame, with z-scores, or
> squash to 0-1 scale - is there a slick function or package someone can
> recommend?
>
> 2) Also, have data sets with a lot of predictor variables.  in the
> diabetes data frame i see that fields have been grouped to X and Y
> variables, making it very easy to identify X and Y in the regression
> techniques.  How is this done, how do you group lets say a group of
> columns into 1 matrix, within a data frame.  example: the AsIs group is
> a matrix of X variables:
>
>  > str(diabetes)
> `data.frame':   442 obs. of  3 variables:
>  $ x : AsIs [1:442, 1:10] 0.038075 -0.00188 0.085298
> -0.08906 0.005383 ...
>   ..- attr(*, "dimnames")=List of 2
>   .. ..$ : NULL
>   .. ..$ : chr  "age" "sex" "bmi" "map" ...
>   ..- attr(*, "class")= chr "AsIs"
>  $ y : num  151 75 141 206 135 97 138 63 110 310 ...
>  $ x2: AsIs [1:442, 1:64] 0.038075 -0.00188 0.085298
> -0.08906 0.005383 ...
>   ..- attr(*, ".Names")= chr  "age" "age" "age" "age" ...
>   ..- attr(*, "dimnames")=List of 2
>   .. ..$ : chr  "1" "2" "3" "4" ...
>   .. ..$ : chr  "age" "sex" "bmi" "map" ...
>   ..- attr(*, "class")= chr "AsIs"
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
WenSui Liu
(http://spaces.msn.com/statcompute/blog)
Senior Decision Support Analyst
Health Policy and Clinical Effectiveness
Cincinnati Children Hospital Medical Center

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] proc standardize & data frame x and y

2006-08-12 Thread Wensui Liu

for proc standardzize, check ?scale



On 8/12/06, zubin <[EMAIL PROTECTED]> wrote:
>
> Hello!  i know these are basic but i cannot seem to find the answer thru
> my searches..
>
> 1) Can someone recommend an equivalent to SAS PROC Standardize in R?  I
> am in need to frequently standardize a data frame, with z-scores, or
> squash to 0-1 scale - is there a slick function or package someone can
> recommend?
>
> 2) Also, have data sets with a lot of predictor variables.  in the
> diabetes data frame i see that fields have been grouped to X and Y
> variables, making it very easy to identify X and Y in the regression
> techniques.  How is this done, how do you group lets say a group of
> columns into 1 matrix, within a data frame.  example: the AsIs group is
> a matrix of X variables:
>
> > str(diabetes)
> `data.frame':   442 obs. of  3 variables:
> $ x : AsIs [1:442, 1:10] 0.038075 -0.00188 0.085298
> -0.08906 0.005383 ...
>   ..- attr(*, "dimnames")=List of 2
>   .. ..$ : NULL
>   .. ..$ : chr  "age" "sex" "bmi" "map" ...
>   ..- attr(*, "class")= chr "AsIs"
> $ y : num  151 75 141 206 135 97 138 63 110 310 ...
> $ x2: AsIs [1:442, 1:64] 0.038075 -0.00188 0.085298
> -0.08906 0.005383 ...
>   ..- attr(*, ".Names")= chr  "age" "age" "age" "age" ...
>   ..- attr(*, "dimnames")=List of 2
>   .. ..$ : chr  "1" "2" "3" "4" ...
>   .. ..$ : chr  "age" "sex" "bmi" "map" ...
>   ..- attr(*, "class")= chr "AsIs"
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
WenSui Liu
(http://spaces.msn.com/statcompute/blog)
Senior Decision Support Analyst
Health Policy and Clinical Effectiveness
Cincinnati Children Hospital Medical Center

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] lasso for variable selection

2006-08-12 Thread Wensui Liu

Zubin,

my understanding about lasso is that it is a restricted version of
regression, where minimization of sse subject to sum(abs(beta)) < upper
limit such that for unimportant feature, its beta will be restricted by
ZERO. the whole game of lasso is to find the proper upper limit. I think in
lasso package, this upper limit is found by CV.

Speaking of lasso, I think it can be also implemented in SAS with proc
glmselect.

for more information, Prof. Tibshirani's lasso page is extremely helpful:
http://www-stat.stanford.edu/~tibs/lasso.html

HTH.

wensui

On 8/12/06, zubin <[EMAIL PROTECTED]> wrote:
>
> Attended JSM last week and Friedman mentioned the use of LASSO for
> variable selection (he uses it for rules ensembles).  I am an
> econometrician and not familiar with, i started running the examples in
> R this week and you get to the plots section of the LARS package.
> Plots of beta/max(beta)  vs standardized coefficients.  How does one
> interpret them?  u see plots of each variable converging to zero at
> different times - its pretty cool - but can i use this for variable
> importance?
>
> for variable selection - i have a group of correlated variables that we
> need to determine importance in predicting change of a Y variable.
>
> -zubin
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
WenSui Liu
(http://spaces.msn.com/statcompute/blog)
Senior Decision Support Analyst
Health Policy and Clinical Effectiveness
Cincinnati Children Hospital Medical Center

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to obtain 95th percentile of a normal distribution of a continuous variable

2006-07-23 Thread Wensui Liu

?quantile

On 7/23/06, jenny tan <[EMAIL PROTECTED]> wrote:
>
> Hi,
>
> How do I get R to output the 95% cutoff from a distribution of a continous
> variable?
> summary() only displays a few statistics
>
> Thanks!
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
WenSui Liu
(http://spaces.msn.com/statcompute/blog)
Senior Decision Support Analyst
Health Policy and Clinical Effectiveness
Cincinnati Children Hospital Medical Center

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] comparison between parametric and semiparametric models

2006-07-13 Thread Wensui Liu

Dear Listers,

I have a question somehow off-topic but think I can find a good answer here.

Using the same dataset, I've built a logistic regression and
generalized additive model. Might I use likelihood ratio test to see
if there is significant improvement in GAM model? Something like:
(deviance of GLM - deviance of GAM) with degree of free = additional
number of DF in GAM using chisquare test.

Thank you so much!

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Data Manipulations - Group By equivalent

2006-07-01 Thread Wensui Liu

Zubin,

I bet you are working for intercontinental hotels and think you probably are
not the real Zubin there. right? ^_^. If you have chance, could you please
say hi to him for me?

Here is a piece of R code I copy from my blog side by side with SAS. You
might need to tweak it a little to get what you need.

 CALCULATE GROUP SUMMARY IN R
##
# HOW TO CALCULATE GROUP SUMMARY IN R #
# DATE : DEC-13, 2005 #
##
# EQUIVALENT SAS CODE: #
# #
# DATA DATA; #
# DO I = 1 TO 2; #
# DO J = 1 TO 4; #
# GROUP = 'TREATMENT_'||PUT(I, 1.); #
# X = RANNOR(1); #
# OUTPUT; #
# END; #
# END; #
# KEEP GROUP X; #
# RUN; #
# #
# PROC SQL; #
# CREATE TABLE COMBINE AS #
# SELECT *, MEAN(X) AS MEAN_X, SUM(X) AS SUM_X #
# FROM DATA #
# GROUP BY GROUP; #
# QUIT; #
##


# GENERATE A TREATMENT GROUP #
group<-as.factor(paste("treatment", rep(1:2, 4), sep = '_'));

# CREATE A SERIES OF RANDOM VALUES #
x<-rnorm(length(group));

# CREATE A DATA FRAME TO COMBINE THE ABOVE TWO #
data<-data.frame(group, x);

# CALCULATE SUMMARY FOR X #
x.mean<-tapply(data$x, data$group, mean, na.rm = T);
x.sum<-tapply(data$x, data$group, sum, na.rm = T);

# CREATE A DATA FRAME TO COMBINE SUMMARIES #
summ<-data.frame(x.mean, x.sum, group = names(x.mean));

# COMBINE DATA AND SUMMARIES TOGETHER #
combine<-merge(data, summ, by = "group");


On 7/1/06, zubin <[EMAIL PROTECTED]> wrote:
>
> Hello, a beginner R user - boy i wish there was a book on just data
> manipulations for SAS users learning R (equivalent to the SAS DATA
> STEP)..  Okay, my question:
>
> I have a panel data set, hotel data occupancy by month for 12 months,
> 1000 hotels.  I have a field labeled 'year' and want to consolidate the
> monthly records using an average into 1000 occupancy numbers - just a
> simple average of the 12 months by hotel.  In SQL this operation is
> pretty easy, a group by query (group by hotel where year = 2005, avg
> occupancy) - how is this done in R? (in R language not SQL).  Thx!
>
> -zubin
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
> http://www.R-project.org/posting-guide.html
>



-- 
WenSui Liu
(http://spaces.msn.com/statcompute/blog)
Senior Decision Support Analyst
Health Policy and Clinical Effectiveness
Cincinnati Children Hospital Medical Center

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] R connectivity to database

2006-06-23 Thread Wensui Liu

Ray,

R can talk to Ms access very well through RODBC. Here is how:

mdbConnect<-odbcConnectAccess("C:\\temp\\demo.mdb");

sqlTables(mdbConnect);

demo<-sqlFetch(mdbConnect, "tblDemo");

odbcClose(mdbConnect);

rm(demo);



On 6/23/06, Ray D. <[EMAIL PROTECTED]> wrote:
>
> Hello, does anyone know how I would go about getting R to connect to
> OpenOffice's Base program (OOo's version of MS Access) such that I can
> retrieve data from the database and perform calculations and data
> analysis?  I'm totally new to R and Base and I've looked at some
> documentation, but found only examples for R connecting to PostgreSQL and
> MySQL, but nothing for OOo's Base (there wasn't any examples for MS Access
> either even though it says that R can connect to MS Access).  Is R even
> capable of this or am I just out of luck to use R and OOo's Base
> together?  Thanks in advance.
>
> -ray
>
>
> -
>
>
>[[alternative HTML version deleted]]
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
> http://www.R-project.org/posting-guide.html
>



-- 
WenSui Liu
(http://spaces.msn.com/statcompute/blog)
Senior Decision Support Analyst
Health Policy and Clinical Effectiveness
Cincinnati Children Hospital Medical Center

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] is there a way to find the best ARIMA model automatically in R?

2006-05-18 Thread Wensui Liu

what is your criterion of 'best arima'? there are more than 1
criterion for 'best' model.


On 5/15/06, Michael <[EMAIL PROTECTED]> wrote:
> I tried to find a function called "bestARIMA" but it did not show up... even
> on google it does not show up often:
>
> I've only found the following link with "bestARIMA" in it:
>
> http://sirio.stat.unipd.it/files/ts02-03/tsR.pdf
>
> but where is the package and the function in R?
>
> [[alternative HTML version deleted]]
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>


-- 
WenSui Liu
(http://spaces.msn.com/statcompute/blog)
Senior Decision Support Analyst
Health Policy and Clinical Effectiveness
Cincinnati Children Hospital Medical Center

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] how to read/write tables in xml

2006-04-04 Thread Wensui Liu

Dear Listers,

I have 2 questions regarding xml.

1) how to read/write tables in xml?
2) compared with csv, is xml a better way to transfer data cross
systems/applications?

Thank you so much for your insight.


--
WenSui Liu
(http://statcompute.blogspot.com)
Senior Decision Support Analyst
Health Policy and Clinical Effectiveness
Cincinnati Children Hospital Medical Center

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] can R be run without installation on to a computer

2006-03-31 Thread Wensui Liu

"If it is possible to run R from a CD or a USB stick without installation to
a computer"

YES

On 3/31/06, Bob Green <[EMAIL PROTECTED]> wrote:
>
> I have been trying for a year to get approval to install R on a work
> computer and am not optimistic of a positive reply in the near future. I
> was considering whether an option might be to run R from a CD/USB
> stick.  I
> looked through the installation manual but could see no mention of this
> option.
>
> If it is possible to run R from a CD or a USB stick without installation
> to
> a computer, I would appreciate direction to information or advice on how
> this might be done,
>
> Any assistance is appreciated,
>
> regards
>
> Bob Green
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
> http://www.R-project.org/posting-guide.html
>



--
WenSui Liu
(http://statcompute.blogspot.com)
Senior Decision Support Analyst
Health Policy and Clinical Effectiveness
Cincinnati Children Hospital Medical Center

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] [Q] BIC as a goodness-of-fit stat

2006-03-06 Thread Wensui Liu

Young-Jin,

Similar to AIC, BIC is nothing but a penalized version of loglikelihood.
There is no way to tell 'how small is small' for BIC unless you compare BIC
from one model with BIC from another model.


On 3/6/06, Young-Jin Lee <[EMAIL PROTECTED]> wrote:
>
> Dear R-List
>
> I have a question about how to interpret BIC as a goodness-of-fit
> statistic.
> I was trying to use "EMclust" and other "mclust" library and found that
> BIC
> was used as a goodness-of-fit statistic.
> Although I know that smaller BIC indicates a better fit, it is not clear
> to
> me how good a fit is by reading a BIC number. Is there a standard way of
> interpreting a BIC value?
>
> Thanks in advance.
>
> [[alternative HTML version deleted]]
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
> http://www.R-project.org/posting-guide.html
>



--
WenSui Liu
(http://statcompute.blogspot.com)
Senior Decision Support Analyst
Health Policy and Clinical Effectiveness
Cincinnati Children Hospital Medical Center

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Question about variable selection

2006-02-18 Thread Wensui Liu

Dear John,

I fully understand your point that a IV might not be significantly
correlated with DV in bivariate situation but might be significantly
correlated with DV with the presense of other IVs. But does this significant
partial relationship reflect the true relation between IV and DV and really
help to predict DV?

>From here, let's go one step further. If I do multiple resampling from
original dataset, build bivariate LM between IV and DV with different
samples, and still can't get significant result, do you think I should give
a chance to this IV by looking at its partial relationship with DV?

Thank you so much!

On 2/18/06, John Fox <[EMAIL PROTECTED]> wrote:
>
> Dear Wensui and Andy,
>
> When the explanatory variables are correlated it's perfectly possible for
> the marginal relationship between and X and Y to be zero and a partial
> relationship nonzero (even in the absence of interactions) -- this is
> simply
> a reflection of the more general point that partial and marginal
> relationships can differ.
>
> Regards,
> John
>
> 
> John Fox
> Department of Sociology
> McMaster University
> Hamilton, Ontario
> Canada L8S 4M4
> 905-525-9140x23604
> http://socserv.mcmaster.ca/jfox
> 
>
> > -Original Message-
> > From: [EMAIL PROTECTED]
> > [mailto:[EMAIL PROTECTED] On Behalf Of Wensui Liu
> > Sent: Saturday, February 18, 2006 2:03 PM
> > To: Liaw, Andy
> > Cc: r-help@stat.math.ethz.ch
> > Subject: Re: [R] Question about variable selection
> >
> > Thank you so much for your reply, Andy.
> >
> > But what if I am only interesed in main effects instead of
> > interactions?
> >
> >
> >
> > On 2/18/06, Liaw, Andy <[EMAIL PROTECTED]> wrote:
> > >
> > > That depends on whether the IV could have some significant
> > > interactions with other Ivs not considered in the bivariate
> > analysis.
> > > E.g.,
> > >
> > > > iv <- expand.grid(-2:2, -2:2)
> > > > y <- 3 + iv[,1] * iv[,2] + rnorm(nrow(iv), sd=0.1) summary(lm(y ~
> > > > iv[,1]))
> > >
> > > Call:
> > > lm(formula = y ~ iv[, 1])
> > >
> > > Residuals:
> > >  Min   1Q   Median   3Q  Max
> > > -4.06259 -1.06048 -0.02377  1.05901  4.04315
> > >
> > > Coefficients:
> > > Estimate Std. Error t value Pr(>|t|)
> > > (Intercept)  3.019080.41482   7.278 2.09e-07 ***
> > > iv[, 1]  0.014170.29332   0.0480.962
> > > ---
> > > Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
> > >
> > > Residual standard error: 2.074 on 23 degrees of freedom Multiple
> > > R-Squared: 0.0001014,  Adjusted R-squared: -0.04337
> > > F-statistic: 0.002333 on 1 and 23 DF,  p-value: 0.9619
> > >
> > > > summary(lm(y ~ iv[,1] * iv[,2]))
> > >
> > > Call:
> > > lm(formula = y ~ iv[, 1] * iv[, 2])
> > >
> > > Residuals:
> > >  Min   1Q   Median   3Q  Max
> > > -0.22390 -0.08894 -0.01279  0.13525  0.17608
> > >
> > > Coefficients:
> > >  Estimate Std. Error t value Pr(>|t|)
> > > (Intercept)  3.019083   0.026330 114.665   <2e-16 ***
> > > iv[, 1]  0.014167   0.018618   0.7610.455
> > > iv[, 2] -0.005486   0.018618  -0.2950.771
> > > iv[, 1]:iv[, 2]  0.992865   0.013165  75.418   <2e-16 ***
> > > ---
> > > Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
> > >
> > > Residual standard error: 0.1316 on 21 degrees of freedom
> > > Multiple R-Squared: 0.9963, Adjusted R-squared: 0.9958
> > > F-statistic:  1896 on 3 and 21 DF,  p-value: < 2.2e-16
> > >
> > >
> > >
> > >
> > > Andy
> > >
> > > From: Wensui Liu
> > > >
> > > > Dear Lister,
> > > >
> > > > I have a question about variable selection for regression.
> > > >
> > > > if the IV is not significantly related to DV in the bivariate
> > > > analysis, does it make sense to include this IV into the
> > full model
> > > > with multiple IVs?
> > > >
> > > > Thank you so much!
> > > >
> > > >   [[alternative HTML version deleted]]
> > > >
> > > > ___

1 2 >

1 - 100 of 160 matches

Mail list logo