date:20091020

[R] Best SVM Performance measure?

2009-10-20 Thread Noah Silverman


Hi,

This is probably going to be one of those, It depends what you want 
kind of answers, but I'm very curious to see if the group has an opinion 
or some general suggestions.


The actual experiment is too complicated for a quick e-mail, but I'll 
summarize well enough(hopefully) to get the concepts across.



Binary classification problem
Using and SVM (e1071) to train a model
Experimenting with different features, costs, etc.)

Training data and test data are complete separate data sets drawn from 
the same population.  The general concept was to train on a large set of 
data and then test of a medium sized set of unseen data.


We're looking for the best classification performance for future 
unlabeled data.


Here is the puzzle:

Comparing two versions of the model.
A - Lower R2 (r squared) score but higher percentage labeled 
correct on test data

B - Higher R2 score but lower percentage labeled correct on test data

We're using the val.prob function from the Design library to evaluate 
our model.


Additionally graphs from val.prob are interesting:
A - Our non-parametric line mostly parallels the ideal line but 
is just a bit above.
B - Our non-parametric line mostly parallels the ideal line but 
is just a bit below.


If I understand things correctly, with model A, the actual probability 
is slightly higher than our predicted probability (not a bad thing for 
our application - better to under-predict than over predict.)


One thought was that the R2 measures the distance from the ideal line. 
 With model A, we are a touch further from the ideal line, but in a 
better position than model B.


Does anybody have any insight?

Thanks,

-N

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Possible bug in plot.POSIXct regarding x axis

2009-10-20 Thread Karl Ove Hufthammer

In article 80b45a8c0910192051nd558023p9b76566464fc3...@mail.gmail.com, 
remkoduur...@gmail.com says...
 I think it is a bug. Apparently plot.POSIXct calls axis.POSIXct for
 both y (correct) and x (incorrect) axes:

Thanks. Reported as bug #14016.

-- 
Karl Ove Hufthammer

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] how to draw stacked ellipses to illustrate the shared and specific of multiple objects using R

2009-10-20 Thread Mao Jianfeng

Dear R-help listers,

I am now asking for helps on how to draw stacked ellipses to
illustrate the shared and specific of multiple objects using R.

My problem comes from my population genetics study. Now, I genotyped
three species, and I get known about the amount of shared and specific
haplotypes in each of the species and their combinations. I want to
illustrate this result in three stack ellipses, with the shared and
unique area of the stack ellipses were annotated with the relative
amount.

 my data is as followed:


hap.typehap.dis.num hap.dis.per types
hap.Pd  8   27.5862069  Pd specific
hap.Pt  6   20.68965517 Pt specific
hap.Py  9   31.03448276 Py specific
hap.PdPt3   10.34482759 PdPt shared
hap.PdPy3   10.34482759 PdPy shared
hap.PtPy0   0   PtPy shared
hap.3P  0   0   3 species shared
-

Several individuals of 3 species (Pd, Pt, Py) were genotyped, the
summary of haplotype distribution in species level were contained in
the forward dataset.

Could you please give any direction on this graphical problem? Thank
you in advance.

Best regards,

Mao J-F

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Sweave file generation

2009-10-20 Thread christiaan pauw

I use Lyx (www.lyx.org) with the Sweave noweb report or article class for
the same purpose (if I understand you correctly). LyX is a LateX front-end
where you can embed long and short R code. You can write the complete R
program inside the document. For latex tables the most elegant solution is
sometimes to write the latex to a file and put the file as an include in the
Lyx document.
I have a project report that I run every morning to give a refined and
up-to-date document in a matter of minutes.


2009/10/19 Gabriel Koutilellis kgabr...@in.com

 Dear list,I have read really a lot the past few days, but I haven't found a
 matching solution for my problem.I have R 2.9.2 on Windows XP and MikTex 2.8
 installed.What I want to do is to automate the sweave file generation.I
 thought I could use the R2Sweave, RweaveLatex, and Sweave in a combination
 so thatI won't need to do anything.Perhaps some minor modifications at the
 last step.My purpose is to print text (summaries) and plots on the same pdf
 file, fast and easily.If later I need to produce a much more elegant paper I
 know I will need to fix the latexpart of my code.Let's say I have a file of
 R code script.Rso in R R2Sweave(script.R) RweaveLatex()
 Sweave(script.Rnw) texi2dvi(script.tex, pdf=T)would produce of
 something like the pdf with the summaries and plots coded inside the
 script.RThank you

[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Problems importing Unix SAS .ssd04 file to R (Win)

2009-10-20 Thread johannes rara

Hello,

I'm trying to import a SAS file made using SAS on Unix. Currently I'm
using SAS on Windows and I'm trying to import that .ssd04 file to R.
The file name of the file is testfile.ssd04 and it is located in
'M:\sasuser'. I'm using Windows XP and R 2.91. Basically what I'm
doing is

 r code ##
 library(foreign)
 sashome - C:/Program Files/SAS Institute/SAS/V8
 folder_for_datafiles - M:/sasuser
 read.ssd(folder_for_datafiles, testfile, sascmd=file.path(sashome, 
 sas.exe))

SAS failed.  SAS program at
C:\DOCUME~1\jrara\LOCALS~1\Temp\RtmpAAxO3X\file41bb5af1.sas
The log file will be file41bb5af1.log in the current directory
NULL
Warning message:
In read.ssd(folder_for_datafiles, testfile, sascmd = file.path(sashome,  :
  SAS return code was 2

##

This temporary SAS file 'file41bb5af1.sas' looks like this

 sas code #
option validvarname = v6;libname src2rd 'M:/sasuser';
libname rd xport 'C:\DOCUME~1\jrara\LOCALS~1\Temp\RtmpAAxO3X\file6df11649';
proc copy in=src2rd out=rd;
select testfile ;
##

Any ideas what I'm doing wrong?


 sessionInfo()
R version 2.9.1 (2009-06-26)
i386-pc-mingw32

locale:
LC_COLLATE=Finnish_Finland.1252;LC_CTYPE=Finnish_Finland.1252;LC_MONETARY=Finnish_Finland.1252;LC_NUMERIC=C;LC_TIME=Finnish_Finland.1252

attached base packages:
[1] graphics  grDevices utils datasets  grid  stats
methods   base

other attached packages:
[1] foreign_0.8-38 gregmisc_2.1.1 gplots_2.7.1   caTools_1.9
bitops_1.0-4.1 gtools_2.6.1   gmodels_2.15.0 gdata_2.6.1

loaded via a namespace (and not attached):
[1] MASS_7.2-49


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] RPgSQL installation problem

2009-10-20 Thread christiaan pauw

Hi everybody
I am trying to install RPsSQL and get the following error message:

When I do ./configure form the untarred source directory I get

loading cache ./config.cache
checking for crypt in -lcrypt... no
No crypt function found

When I use the Package installer in R I get

install.packages(/Users/christiaanpauw/tmp/RPgSQL/, , NULL, type =
source)
Warning in install.packages(/Users/christiaanpauw/tmp/RPgSQL/, , NULL,  :
  argument 'lib' is missing: using
'/Users/christiaanpauw/Library/R/2.8/library'
* Installing *source* package 'RPgSQL' ...
creating cache ./config.cache
checking for crypt in -lcrypt... no
No crypt function found

I have a threefold question

What is the crypt function, do I need it for the installation to work and if
so where do I get it?

Thanks in advance
Christiaan

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] HOW to determine the number of components of the mixture model stratfied by age

2009-10-20 Thread lybaomc


Hi,all

with my data,there are more than 1000 quantitative results of antibody
concentrations, there may be 2 components(positive and negative), or 3
components (may be strong positive, positive, and negative), or 4-6
components. Could you tell me how to determine the number of components of
the mixture model? the  anova.mix in mixdist of the R software seems not
work. 
 
my data is a little complicated(TABLE ), there are more than 1000
quantitative results of antibody concentrations stratified by age.The
overall density of results at age j, Fj, is a mixture of the component
densities, so if there are 5 age groups, then there will be 5 mixture
models. Do i have to analyse each stratum respectively?
 
TABLE

age  Bin   length   freq  

1 1 19.75 4 
1 2 21.7510   
 
11241.7536   
  

2 1 19.75 4   
2 2 21.7510
 
   21241.7536   
  
 ---

appreciated

lybao
-- 
View this message in context: 
http://www.nabble.com/HOW-to-determine-the-number-of-components-of-the-mixture-model-stratfied-by-age-tp25971060p25971060.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] how to draw stacked ellipses to illustrate the shared and specific of multiple objects using R

2009-10-20 Thread Jim Lemon


On 10/20/2009 06:10 PM, Mao Jianfeng wrote:

Dear R-help listers,

I am now asking for helps on how to draw stacked ellipses to
illustrate the shared and specific of multiple objects using R.

My problem comes from my population genetics study. Now, I genotyped
three species, and I get known about the amount of shared and specific
haplotypes in each of the species and their combinations. I want to
illustrate this result in three stack ellipses, with the shared and
unique area of the stack ellipses were annotated with the relative
amount.

  my data is as followed:


hap.typehap.dis.num hap.dis.per types
hap.Pd  8   27.5862069  Pd specific
hap.Pt  6   20.68965517 Pt specific
hap.Py  9   31.03448276 Py specific
hap.PdPt3   10.34482759 PdPt shared
hap.PdPy3   10.34482759 PdPy shared
hap.PtPy0   0   PtPy shared
hap.3P  0   0   3 species shared
-

Several individuals of 3 species (Pd, Pt, Py) were genotyped, the
summary of haplotype distribution in species level were contained in
the forward dataset.

Could you please give any direction on this graphical problem? Thank
you in advance.
   

Hi Mao,
I can't help you with ellipses, but intersectDiagram in the plotrix 
package may do what you want in a different way. Try this:


library(plotrix)
hapIntList-
 getIntersectList(3,xnames=c(hap.Pd,hap.Pt,hap.Py))
# enter the data as follows
#Number of elements in hap.Pd - 1: 27.586
#Number of elements in hap.Pt - 1: 20.689
#Number of elements in hap.Py - 1: 31.034
#Number of elements in hap.Pdhap.Pt - 1: 10.345
#Number of elements in hap.Pdhap.Py - 1: 10.345
#Number of elements in hap.Pthap.Py - 1: 0
#Number of elements in hap.Pdhap.Pthap.Py - 1: 0
#Total number of elements - 1: 99.999
# this is a small bug that I have fixed and will
# not be necessary from v2.7-2 onward
class(hapIntList)-intersectList
intersectDiagram(hapIntList)

I'm quite interested in whether this is of any use to you, as I had not 
even thought of this particular use for intersectDiagram.


Jim

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Spatstat: xy binary data into mask type to use in owin(mask=)

2009-10-20 Thread Adrian Baddeley



If 'X' is a data frame containing columns 'x', 'y' and 'value, try

m - with(X, tapply(value, list(y,x), all))
z - im(m, xcol=sort(unique(X$x)), yrow=sort(unique(X$y)))
w - as.owin(z)

This will only work if the x, y values form a rectangular grid in some 
order.


Adrian Baddeley

=
Javier PB wrote:



Dear users,

I am trying to export polygons from Arcmap into Spatstat to run some
simulations using functions available in Spatstat package.

One particular area to be exported is formed by a number of polygons
defining the external boundaries of the area (as a groups of islands) 
and a

number of polygons inside the previous ones,  as “holes” not to be
considered as part of the area.

I have managed to export one of these areas using owin(poly=list()),
including in the list the number of external boundaries and holes, 
but the
number of polygons that constitute each area is large and I was 
wondering if
this could be easily done using owin(mask=my_area), using only one 
single

matrix file my_area with TRUE, FALSE values defining the whole area.

Here is where I got stuck. The file I have created in ArcMap contains 
three
columns: x and y coordinates, and the values at points x[i], y[j] (1 
if the

area is TRUE and 0 is the area is FALSE).

How can I covert this xy file into a mask type my_area that I can read
using owin(mask=my_area)?


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Re posting various problems with two-way anova, lme, etc.

2009-10-20 Thread Mark Difford


Hi Michael,

 How do you control what is the (intercept) in the model returned by the 
 lme function and is there a way to still be able to refer to all groups
 and 
 timepoints in there without referring to intercept?

Here is some general help. The intercept is controlled by the contrasts that
you used when fitting your model. By default these are treatment (i.e.
Dunnett) contrasts, but you can change this.

##
?options
options(contrasts)
?contrasts
options(contrasts=c(contr.sum, contr.poly))
options(contrasts)

You can design your own contrasts or you can use those provided by base R:

##
?contr.treatment

(There is also contr.sdif in MASS, which is a very useful type for
environmental studies) See
http://users.monash.edu.au/~murray/stats/BIO4200/LinearModels.pdf, which
covers the main ready-made types in R. For ANOVA, sum-to-zero contrasts are
commonly used.

If you do use treatment contrasts you can (usually) change your reference
level on-the-fly or you can set it when you make your factor.

##
?factor
?levels
?reorder
?relevel

If you are using the multcomp package then look at:

?contrMat

An example of how to use it is given under the glht function.

Regards, Mark.


Michael Schacht Hansen wrote:
 
 Hi,
 
 I posted the message below last week, but no answers, so I'm giving it
 another attempt in case somebody who would be able to help might have
 missed
 it and it has now dropped off the end of the list of mails.
 
 I am fairly new to R and still trying to figure out how it all works, and
 I
 have run into a few issues. I apologize in advance if my questions are a
 bit
 basic, I'm also no statistics wizard, so part of my problem my be a more
 fundamental lack of knowledge in the field.
 
 I have a dataset that looks something like this:
 
 Week Subj   Group Readout
 01  A 352.2
 11  A 366.6
 21  A 377.3
 31  A 389.8
 41  A 392.5
 02  B 344.7
 12  B 360.9
 .
 .
 .
 .
 
 So basically I have a number of subjects that are divided into different
 groups receiving different interventions/treatments. Observations on these
 subjects are made on 5 occasions (Week 0-4). I would like to see if there
 is
 difference between the treatment groups and if the observations that we
 are making change in the individual groups over time. According to my very
 limited statistics training that means that I would have to do a two-way
 anova with repeated measures because the same subjects are observed on the
 different weeks.
 
 Now in R I can do something like this:
 
 MyData$fWeek - factor(MyData$Week) #Convert week to factor
 m1 - aov(Readout ~ Group*fWeek + Error(Subj/fWeek), data=MyData)
 
 My first naive question is: Is the Error(...) term correct for the design
 that I describe? I am not quite sure if I understand the syntax correctly.
 
 In any case, it actually seems to work fine, but I can't figure out how to
 do post hoc testing on this. TukeyHSD does not work for the aovlist which
 is
 returned, so I am kind of stuck. Is there a simple way to do the post hoc
 test based on the aovlist?
 
 I have been reading various questions on the list and I think that I have
 understood that I should be using lme from the nlme package and this is
 where I run into some problems. As I understand it, the lme command that I
 would need would look something like this:
 
 m1 - lme(Readout ~ Group*fWeek,random=~1|Subj/fWeek, data=MyData)
 
 Now this actually fails with an error message like:
 
 Error in MEEM(object, conLin, control$niterEM) :
   Singularity in backsolve at level 0, block 1
 
 The reason (I believe) is that I have some weeks where there are no
 observations for a specific group. In this case, one of the groups had a
 lot
 of drop-out and at Week 4, there are no subjects left in one of the groups
 and that seems to be causing the problem. I can get it to run by excluding
 the last week with something like:
 
 m1 - lme(Readout ~ Group*fWeek,random=~1|Subj/fWeek,
 data=MyData[which(MyData$Week  4), ])
 
 My next question is: Is there another way around that? I would still like
 to
 run the analysis for the remaining groups on the last time point and I
 believe that it should somehow be included into the entire analysis. I
 have
 also managed to find a few postings with similar problems, but no real
 solutions, so anything helpful comments would be much appreciated.
 
 My final issue is how do I do the post hoc testing on the model. I
 understand (I think) that I should use the glht function from the multcomp
 package. For Instance, I could do something like:
 
 summary(glht(m1,linfct=c(GroupB:fWeek1 - GroupC:fWeek1 =
 0,GroupB:fWeek2
 - GroupC:fWeek2 = 0)))
 
 And that actually works fine. My problem is that Group A and fWeek 0 seems
 to have turned into the (intercept), so I can't write something like:
 
 summary(glht(m1,linfct=c(GroupB:fWeek0 - GroupB:fWeek1 = 0)))
 
 To check for changes between baseline and week 1 in Group B

[R] system() or shell() with python script

2009-10-20 Thread Remko Duursma

Hi all,

I am having some problems calling a python script from R that resides
in a folder that is in the path (WindowsXP):

 system(quickPadTool.py)
Warning message:
In system(quickPadTool.py) : quickPadTool.py not found

# I also tried 'shell' (and shell.exec as well).
 shell(quickPadTool.py)
'quickPadTool.py' is not recognized as an internal or external command,
operable program or batch file.
Warning message:
In shell(quickPadTool.py) :
  'quickPadTool.py' execution failed with error code 1

I can run the script fine from a command window just fine, from the
same directory.

Any pointers?

thanks,
Remko





-
Remko Duursma
Post-Doctoral Fellow

Centre for Plants and the Environment
University of Western Sydney
Hawkesbury Campus
Richmond NSW 2753

Dept of Biological Science
Macquarie University
North Ryde NSW 2109
Australia

Mobile: +61 (0)422 096908
www.remkoduursma.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] system() or shell() with python script

2009-10-20 Thread Paul Hiemstra


Remko Duursma wrote:

Hi all,

I am having some problems calling a python script from R that resides
in a folder that is in the path (WindowsXP):
  

Hi Remko,

Some suggestions:

1. Try to see if the path that R has from a call to system is correct 
(i.e. the same as from cmd):


system(path)

2. Try calling it with python added in front:

system(python script.py)

3. Add a shebang line to the top of your script like:

#! c:/Program Files/Python/python.exe

This tells the OS which program you want to use to run the script.

cheers,
Paul

ps maybe superfluous, but try the python getopt package for reading 
commandline arguments.
  

system(quickPadTool.py)


Warning message:
In system(quickPadTool.py) : quickPadTool.py not found

# I also tried 'shell' (and shell.exec as well).
  

shell(quickPadTool.py)


'quickPadTool.py' is not recognized as an internal or external command,
operable program or batch file.
Warning message:
In shell(quickPadTool.py) :
  'quickPadTool.py' execution failed with error code 1

I can run the script fine from a command window just fine, from the
same directory.

Any pointers?

thanks,
Remko





-
Remko Duursma
Post-Doctoral Fellow

Centre for Plants and the Environment
University of Western Sydney
Hawkesbury Campus
Richmond NSW 2753

Dept of Biological Science
Macquarie University
North Ryde NSW 2109
Australia

Mobile: +61 (0)422 096908
www.remkoduursma.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
  



--
Drs. Paul Hiemstra
Department of Physical Geography
Faculty of Geosciences
University of Utrecht
Heidelberglaan 2
P.O. Box 80.115
3508 TC Utrecht
Phone:  +3130 274 3113 Mon-Tue
Phone:  +3130 253 5773 Wed-Fri
http://intamap.geo.uu.nl/~paul

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] system() or shell() with python script

2009-10-20 Thread Duncan Murdoch


Paul Hiemstra wrote:

Remko Duursma wrote:
  

Hi all,

I am having some problems calling a python script from R that resides
in a folder that is in the path (WindowsXP):
  


Hi Remko,

Some suggestions:

1. Try to see if the path that R has from a call to system is correct 
(i.e. the same as from cmd):


system(path)
  


That won't work.  You would use Sys.getenv(PATH) for this.

2. Try calling it with python added in front:

system(python script.py)

3. Add a shebang line to the top of your script like:

#! c:/Program Files/Python/python.exe

This tells the OS which program you want to use to run the script.
  


That's unlikely to work on Windows.

If file associations are set to do it, then shell.exec(script.py) 
should work.  (That should do the same as double clicking the file in 
Explorer.)


Duncan

cheers,
Paul

ps maybe superfluous, but try the python getopt package for reading 
commandline arguments.
  
  


system(quickPadTool.py)

  

Warning message:
In system(quickPadTool.py) : quickPadTool.py not found

# I also tried 'shell' (and shell.exec as well).
  


shell(quickPadTool.py)

  

'quickPadTool.py' is not recognized as an internal or external command,
operable program or batch file.
Warning message:
In shell(quickPadTool.py) :
  'quickPadTool.py' execution failed with error code 1

I can run the script fine from a command window just fine, from the
same directory.

Any pointers?

thanks,
Remko





-
Remko Duursma
Post-Doctoral Fellow

Centre for Plants and the Environment
University of Western Sydney
Hawkesbury Campus
Richmond NSW 2753

Dept of Biological Science
Macquarie University
North Ryde NSW 2109
Australia

Mobile: +61 (0)422 096908
www.remkoduursma.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
  







__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Problem using the source-function within R-functions

2009-10-20 Thread Johan Lassen

Dear R community,

You may have the solution to how to construct a function using the function
source() to build the function; i.e.

myfunction - function(...){
source('file1.r')
source('file2.r')
}

After compiling and installing the myfunction in R, then calling the
myfunction gives an error because the content of 'file1.r' and 'file2.r'
seems to be missing.

Anyone has the trick to overcome this problem?

Thanks in advance!

best wishes, Johan


PS: My function is:


run_accumm_value - function(ind_noder_0,
ind_loc_val,ind_retention,downstream){
## Preprocessing of looping calculations:
koersel_uden_ret - length(unique(ind_noder_0$oplid))
opsaml_b_0_2 - numeric(koersel_uden_ret)
opsaml_b_0_2_1 - numeric(koersel_uden_ret)
opsaml_b_0_2_2 - seq(1:koersel_uden_ret)
## Preprocessing of topology and local values to be summed:
source('preproces_topology.r', local = T)
source('preproces_loc_val.r', local = T)
# Loop for each grouping factor (column in ind_noder_0: oplid):
for(j in 1:koersel_uden_ret){
source('matrix_0.r', local = T)
source('matrix.r', local = T)
source('local_value.r', local = T)
source('fordeling.r', local = T)
source('fordeling_manuel.r', local = T)
source('local_ret.r', local = T)
source('Ax=b.r', local = T)
source('opsamling_x_0_acc.r', local = T)
}
source('opsamling_b_1.r', local = T)
opsaml_b_2
}





-- 
Johan Lassen
Environment Center Nykøbing F
Denmark

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Interpretation of VarCorr results

2009-10-20 Thread r-quantide

Dear all,

I'm working with a model to estimate components of variance by using the
lme() function.

 

The model is as the following:

model=lme(fixed=X~1+as.factor(station),data=myData,na.action=na.exclude,rand
om=~N+1|spliceOrHoming)

 

Where X is the response measured variable, station is treated as fixed
effects factor, N is a continuous variable, and spliceOrHoming is a
(ordered) factor.

The idea is that the response variable (X) varies almost linearly with N (in
a decreasing fashion) , but the slope and intercept of this variation change
with the levels of spliceOrHoming random factor.

 

Now, the REML estimated standard deviation components are:

StdDev   Corr  

(Intercept) 5.274544e-01 (Intr)

N   7.912717e-05 -0.58 

Residual1.508647e-01   

 

My questions are: 

. Is the model correctly specified?

. If yes, how should I read the standard deviation estimate for N
(7.912717e-05)? Is this  the standard deviation for a unitary variation of
N? Or is it the standard deviation due to the global contribution of the N
independent variable? (note that I have a strong feeling that N contributes
a lot to the final variance; so, the value that I obtain here seems too
small for my beliefs)

. Is there any methods/functions to obtain the variance components
for the station factor too? (please, give me some references, at least.)

 

Thank you a lot in advance for every suggestions you'll give me.

Enrico 


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Problem using the source-function within R-functions

2009-10-20 Thread Ista Zahn

Hi Johan,
Although it's not clear to my why you're putting this in a function in
the first place...

The way you've written your function it will only work if your working
directory is the same as the directory where file1.r and file2.r are
stored, and I suspect this is the problem. You can 1) set the working
directory in your function or 2) specify the full path to file1.r and
file2.r in your function. Option 1 will look like

myfunction - function(...){
setwd(/path/where/files/are/saved)
source('file1.r')
source('file2.r')

and option 2 will look like

 myfunction - function(...){
 source('/path/where/files/are/saved/file1.r')
 source('/path/where/files/are/saved/file2.r')

source() also has a chdir option that you could investigate. See

?source

-Ista

On Tue, Oct 20, 2009 at 7:00 AM, Johan Lassen jle...@gmail.com wrote:
 Dear R community,

 You may have the solution to how to construct a function using the function
 source() to build the function; i.e.

 myfunction - function(...){
 source('file1.r')
 source('file2.r')
 }

 After compiling and installing the myfunction in R, then calling the
 myfunction gives an error because the content of 'file1.r' and 'file2.r'
 seems to be missing.

 Anyone has the trick to overcome this problem?

 Thanks in advance!

 best wishes, Johan


 PS: My function is:


 run_accumm_value - function(ind_noder_0,
 ind_loc_val,ind_retention,downstream){
 ## Preprocessing of looping calculations:
 koersel_uden_ret - length(unique(ind_noder_0$oplid))
 opsaml_b_0_2 - numeric(koersel_uden_ret)
 opsaml_b_0_2_1 - numeric(koersel_uden_ret)
 opsaml_b_0_2_2 - seq(1:koersel_uden_ret)
 ## Preprocessing of topology and local values to be summed:
 source('preproces_topology.r', local = T)
 source('preproces_loc_val.r', local = T)
 # Loop for each grouping factor (column in ind_noder_0: oplid):
 for(j in 1:koersel_uden_ret){
 source('matrix_0.r', local = T)
 source('matrix.r', local = T)
 source('local_value.r', local = T)
 source('fordeling.r', local = T)
 source('fordeling_manuel.r', local = T)
 source('local_ret.r', local = T)
 source('Ax=b.r', local = T)
 source('opsamling_x_0_acc.r', local = T)
 }
 source('opsamling_b_1.r', local = T)
 opsaml_b_2
 }





 --
 Johan Lassen
 Environment Center Nykøbing F
 Denmark

        [[alternative HTML version deleted]]


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.





-- 
Ista Zahn
Graduate student
University of Rochester
Department of Clinical and Social Psychology
http://yourpsyche.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] How to get slope estimates from a four parameter logistic with SSfpl?

2009-10-20 Thread Prof. John C Nash


Is the following helpful?

 pdd-deriv(~a+(b-a)/(1+exp((c-t)/d)),d)
 pdd
expression({
.expr1 - b - a
.expr2 - c - t
.expr4 - exp(.expr2/d)
.expr5 - 1 + .expr4
.value - a + .expr1/.expr5
.grad - array(0, c(length(.value), 1L), list(NULL, c(d)))
.grad[, d] - .expr1 * (.expr4 * (.expr2/d^2))/.expr5^2
attr(.value, gradient) - .grad
.value
})

Or perhaps you want it with respect to t?

JN



Message: 46
Date: Mon, 19 Oct 2009 14:50:15 +0100
From: Weber, Sam sam.we...@exeter.ac.uk
Subject: [R] How to get slope estimates from a four parameter logistic
with SSfpl?
To: r-help@R-project.org r-help@r-project.org
Message-ID:

5bb78285b60f9b4db2c6a4da8f3e788c12c64b3...@exchmbs04.isad.isadroot.ex.ac.uk

Content-Type: text/plain

Hi,

I was hoping to get some advice on how to derive estimates of slopes from four 
parameter logistic models fit with SSfpl.

I fit the model using:

model-nls(temp~SSfpl(time,a,b,c,d))
summary(model)

I am interested in the values of the lower and upper asymptotes (parameters a 
and b), but also in the gradient of the line at the inflection point (c) which 
I assume tells me my rate of increase when it is steepest (?).

However, I cannot work out how to derive a slope estimate. I'm guessing it has 
something to do with the scaling parameter d but having searched the internet 
for hours I have not made any progress, and it is probably quite simple. Any 
help would be hugely appreciated!

All the best

Sam


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] (no subject) -- rename to optim() crash due to NA returned

2009-10-20 Thread Prof. John C Nash

While developing updates to optimization tools for R (I was behind 3 of 
the 5 codes in optim(), but Brian Ripley did the implementation), I've 
been seeing this kind of error when the objective function cannot be 
computed so returns NA. Examples: attempts to divide by 0 or sqrt(-ve) 
or log(0). A crude work-around is to detect bad situations when 
evaluating the function and return a very large number e.g., something 
using .Machine$double.xmax  (I used this divided by 4 or so to avoid 
troubles if algorithms try to do some arithmetic, as in gradient based 
routines).


Most routines will bounce off such a value, but the scaling of the 
surface is messed up so it is not always efficient. A better way is if 
optimization codes can handle a failed evaluation flag and simply back 
off from a step or whatever is needed, but in R non tools do this as yet 
-- I'm hoping to put this in some of my codes using returned values 
larger than some threshhold as the flag. My 1980s BASIC codes had this 
(and bounds and masks). However, it takes time and 


For information, I'd like to know if this turns out to be the problem, 
as it would raise the priority for working on such issues.


JN




Message: 57
Date: Mon, 19 Oct 2009 17:48:12 +0200
From: Reynaerts, Jo jo.reynae...@econ.kuleuven.be
Subject: [R] (no subject)
To: r-help@r-project.org r-help@r-project.org
Message-ID:

603d249cbe0b304490d0f774142b8e0f01cdcb9a4...@econsrvex6.econ.kuleuven.ac.be

Content-Type: text/plain

Dear R users

I have the following problem when calling optim() to minimize a function f.obj (= outer 
loop) that calls another function f.con (a contraction mapping, = inner loop).  It 
seems to me that it is a numerical problem that I currently fail to take into account when coding.

Calling optim(), for the first few iterations of the outer loop, everything 
seems fine; the contraction mapping is calculated in each run.  However, after 
a number of outer loop iterations, an error occurs and the following message is 
displayed:

Error in while (max.dev = tol.in) { :
missing value where TRUE/FALSE needed

The previous conditional statement ensures that the iteration in the inner loop 
should run as long as max.dev - max(abs(x - x.in)) is greater than the inner loop 
tolerance level (tol.in - 1E-9), where x is computed by the contraction mapping 
using x.in.  I used different stopping rules and tolerance levels, but this gives the 
same result.

As I said, I think it's a numerical problem.  Has anyone had similar 
experiences using optim() and could you give some coding advice?

Thanks in advance,

Jo Reynaerts


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] how to get rid of 2 for-loops and optimize runtime

2009-10-20 Thread Ian Willems

Hi Joris,

The amount of a month ago is normally one value from another row.
But I used 'sum-sum + dataset[i,22]' because I would like to reuse the code 
also for other tables. In some tables it is possible that the value of last 
month is the sum of values from different rows.

Thank u for your time
Greetings,

Ian

-Oorspronkelijk bericht-
Van: joris meys [mailto:jorism...@gmail.com] 
Verzonden: maandag 19 oktober 2009 16:12
Aan: Ian Willems
CC: r-help@r-project.org
Onderwerp: Re: [R] how to get rid of 2 for-loops and optimize runtime

Hi Ian,

first of all, take a look at the functions sapply, mapply, lapply,
tapply, ... : they are the more efficient way of implementing loops.

Second, could you elaborate a bit further on the data set : the amount
of the month ago, is that one value from another row, or the sum of
all values in the previous month? I saw in your example dataset that
the last month has 2 rows, but couldn't figure out whether that's a
typo or really means something. That's necessary information to
optimize your code. 129s is indeed far too long for a simple action.

Cheers
Joris

On Mon, Oct 19, 2009 at 3:49 PM, Ian Willems
ian.will...@uz.kuleuven.ac.be wrote:
 Short: get rid of the loops I use and optimize runtime

 Dear all,

 I want to calculate for each row the amount of the month ago. I use a matrix 
 with 2100 rows and 22 colums (which is still a very small matrix. nrows of 
 other matrixes can easily be more then 10)

 Table before
 Year  month quarter yearmonth Service ...  Amount
 2009  9        Q3            092009          A                ...    120
 2009  9        Q3            092009          B                 ...     80
 2009  8        Q3           082009           A                  ...     40
 2009  7        Q3           072009           A                   ...      50

 The result I want
 Year month  quarter yearmonth Service ...    Amount   amound_lastmonth
 2009 9           Q3          092009              A            ...    120      
    40
 2009 9           Q3          092009              B            ...    80       
     ...
 2009 8           Q3          082009              A           ...    40        
     50
 2009 7           Q3          072009              A         ...     50         
     ...

 Table is not exactly the same but gives a good idea what I have and what I 
 want

 The code I have written (see below) does what I want but it is very very 
 slow. It takes 129s for 400 rows. And the time gets four times higher each 
 time I double the amount of rows.
 I'm new in programming in R, but I found that you can use Rprof and 
 summaryRprof to analyse your code (output see below)
 But I don't really understand the output
 I guess I need code that requires linear time and need to get rid of the 2 
 for loops.
 can someone help me or tell me what else I can do to optimize my runtime

 I use R 2.9.2
 windows Xp service pack3

 Thank you in advance

 Best regards,

 Willems Ian


 *
 dataset[,5]= month
 dataset[,3]= year
 dataset[,22]= amount
 dataset[,14]= servicetype

 [CODE]
 #for each row of the matrix check if each row has..
 for (j in 1:Number_rows) {
 + sum-0
 + for(i in 1:Number_rows){
 + if (dataset[j,14]== dataset[i,14]) #..the same service type
 +   {if (dataset[j,18]== dataset[i,18]) # .. the same department
 +        {if (dataset[j,5]== 1)  # if month=1, month ago is 12 and year is 
 -1
 +           {if (12== dataset[i,5])
 +            {if ((dataset[j,3]-1)== dataset[i,3])
 +
 +         { sum-sum + dataset[i,22]}
 +      }}
 +      else {
 +       if ((dataset[j,5]-1)== dataset[i,5])  if month != 1, month ago is 
 month -1
 +         { if (dataset[j,3]== dataset[i,3])
 +         {sum-sum + dataset[i,22]}
 +      }}

 [\Code]

 summaryRprof()
 $by.self
               self.time self.pct total.time total.pct
 [.data.frame       33.92  26.2    80.90      62.5
 NextMethod         12.68  9.8     12.68       9.8
 [.factor            8.60  6.6      18.36      14.2
 Ops.factor          8.10  6.3      40.08      31.0
 sort.int            6.82  5.3      13.70      10.6
 [                   6.70  5.2      85.44      66.0
 names               6.54  5.1       6.54       5.1
 length              5.66  4.4       5.66       4.4
 ==                  5.04  3.9      44.92      34.7
 levels              4.80  3.7       5.56       4.3
 is.na               4.24  3.3       4.24       3.3
 dim                 3.66  2.8       3.66       2.8
 switch              3.60  2.8       3.80       2.9
 vector              2.68  2.1       8.02       6.2
 inherits            1.90  1.5       1.90       1.5
 any                 1.68  1.3       1.68       1.3
 noNA.levels         1.46  1.1       7.84       6.1
 .Call               1.40  1.1       1.40       1.1
 !                   1.26  1.0       1.26       1.0
 attr-              1.06  0.8       1.06       0.8
 .subset             1.00  0.8       1.00       0.8
 class-             0.82  0.6

[R] underflow of fisher.test result

2009-10-20 Thread Peng Yu

fisher.test() gives a very small p-value, which is underflow on my
machine. However, the log of it should not be underflow. I'm wondering
if there is a way get log() of a small p-value. An approximation is
acceptable in this case. Thank you!

 fisher.test(rbind(c(1,10),c(10,1000)))$p.value
[1] 0

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Random Forest - partial dependence plot

2009-10-20 Thread Liaw, Andy

Are you talking about the y-axis or the x-axis?  If you're talking about
the y-axis, that range isn't really very meaningful.  The partial
dependence function basically gives you the average trend of that
variable (integrating out all others in the model).  It's the shape of
that trend that is important.  You may interpret the relative range of
these plots from different predictor variables, but not the absolute
range.  Hope that helps.

Andy 

 -Original Message-
 From: r-help-boun...@r-project.org 
 [mailto:r-help-boun...@r-project.org] On Behalf Of Carlos M. 
 Zambrana-Torrelio
 Sent: Monday, October 19, 2009 3:47 PM
 To: r-help@r-project.org
 Subject: [R] Random Forest - partial dependence plot
 
 Hi everybody,
 
 I used random forest regression to explain the patterns of species
 richness and a bunch of climate variables (e.g. Temperature,
 precipitation, etc.) All are continuos variables. My results are
 really interesting and my model explained 96,7% of the variance.
 
 Now I am  trying to take advantage of the  importance variable
 function and depicts the observed patterns using partial dependence
 plots.
 
 However, I found a really strange (at least for me...) behavior: the
 species number ranges between 1 to 150, but when I make the partial
 plot the graphic only represent values between 43 to 50!!
 
 
 I  use the following code to get the partial plot:
 
 partialPlot(ampric.rf, amp.data, Temp)
 
 where ampric.rf is the random forest object; amp.data are the data and
 Temp is the variable I am interested.
 
 How I can have partial plot explaining all species number 
 (from 1 to 150)??
 Also, I read the RF documentation and I was wondering what its the
 meaning of marginal effect of a variable
 
 Thanks for your help
 
 Carlos
 
 
 
  I found really interesting
 
 -- 
 Carlos M. Zambrana-Torrelio
 Department of Biology
 University of Puerto Rico - RP
 PO BOX 23360
 San Juan, PR 00931-3360
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
Notice:  This e-mail message, together with any attachme...{{dropped:12}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] how to get rid of 2 for-loops and optimize runtime

2009-10-20 Thread Ian Willems

Hi William,

Your programs works perfect and very fast for the table I'm using right now 
(only one match per row)
If I want to reuse this code other tables, it can match with more than one row.
Is it possible to adapt your code easily, if I have to sum the values of last 
month from different rows?

Thank u for your help
regards,

Ian


-Oorspronkelijk bericht-
Van: William Dunlap [mailto:wdun...@tibco.com] 
Verzonden: maandag 19 oktober 2009 18:08
Aan: Ian Willems; r-help@r-project.org
Onderwerp: RE: [R] how to get rid of 2 for-loops and optimize runtime


 -Original Message-
 From: r-help-boun...@r-project.org 
 [mailto:r-help-boun...@r-project.org] On Behalf Of Ian Willems
 Sent: Monday, October 19, 2009 6:50 AM
 To: 'r-help@r-project.org'
 Subject: [R] how to get rid of 2 for-loops and optimize runtime
 
 Short: get rid of the loops I use and optimize runtime
 
 Dear all,
 
 I want to calculate for each row the amount of the month ago. 
 I use a matrix with 2100 rows and 22 colums (which is still a 
 very small matrix. nrows of other matrixes can easily be more 
 then 10)
 
 Table before
 Year  month quarter yearmonth Service ...  Amount
 2009  9Q3092009  A
 ...120
 2009  9Q3092009  B
  ... 80
 2009  8Q3   082009   A
   ... 40
 2009  7Q3   072009   A
...  50
 
 The result I want
 Year month  quarter yearmonth Service ...Amount   amound_lastmonth
 2009 9  Q3  092009  A ...120  40
 2009 9  Q3  092009  B ...80   ...
 2009 8  Q3  082009  A ...40   50
 2009 7  Q3  072009  A ...50   ...
 
 Table is not exactly the same but gives a good idea what I 
 have and what I want
 
 The code I have written (see below) does what I want but it 
 is very very slow. It takes 129s for 400 rows. And the time 
 gets four times higher each time I double the amount of rows.
 I'm new in programming in R, but I found that you can use 
 Rprof and summaryRprof to analyse your code (output see below)
 But I don't really understand the output
 I guess I need code that requires linear time and need to get 
 rid of the 2 for loops.
 can someone help me or tell me what else I can do to optimize 
 my runtime
 
 I use R 2.9.2
 windows Xp service pack3
 
 Thank you in advance
 
 Best regards,
 
 Willems Ian
 
 
 *
 dataset[,5]= month
 dataset[,3]= year
 dataset[,22]= amount
 dataset[,14]= servicetype
 
 [CODE]
 #for each row of the matrix check if each row has..
  for (j in 1:Number_rows) {
 + sum-0
 + for(i in 1:Number_rows){
 + if (dataset[j,14]== dataset[i,14]) #..the same service type
 +   {if (dataset[j,18]== dataset[i,18]) # .. the same department
 +{if (dataset[j,5]== 1)  # if month=1, month ago is 
 12 and year is -1
 +   {if (12== dataset[i,5])
 +{if ((dataset[j,3]-1)== dataset[i,3])
 +
 + { sum-sum + dataset[i,22]}
 +  }}
 +  else {
 +   if ((dataset[j,5]-1)== dataset[i,5])  if month != 1, 
 month ago is month -1
 + { if (dataset[j,3]== dataset[i,3])
 + {sum-sum + dataset[i,22]}
 +  }}

match() is often useful for quickly finding the locations of
many items in a vector.  It has no special methods for data.frames
so you must combine the columns of interest into a character
vector of keys and use match on the key vectors.  E.g.

# your test data in a format that mail readers
# can copy and paste into R:
d - read.table(header=TRUE, textConnection(
   Year  month quarter yearmonth Service  Amount
   2009  9Q3   092009  A 120
   2009  9Q3   092009  B  80
   2009  8Q3   082009  A  40
   2009  7Q3   072009  A  50
   ))
# The key functions
dKey - function(d) {
   with(d, paste(d$Year, d$month, d$Service, sep=;))
}
keyThisMonth - function(d)dKey(d)
keyPrevMonth - function(d) {
   stopifnot(!is.null(d$Year), !is.null(d$month), !is.null(d$Service))
   isJan - d$month==1
   d$Year[isJan] - d$Year[isJan] - 1
   d$month[isJan] - 12
   d$month[!isJan] - d$month[!isJan] - 1
   dKey(d)
}
# Make the new column:
d$AmountPrevMonth - d$Amount[ match(keyPrevMonth(d), keyThisMonth(d)) ]
# The result
print(d)

  Year month quarter yearmonth Service Amount AmountPrevMonth
1 2009 9  Q3 92009   A120  40
2 2009 9  Q3 92009   B 80  NA
3 2009 8  Q3 82009   A 40  50
4 2009 7  Q3 72009   A 50  NA

This assumes there is only one match per row.  Is this the
result you are looking for?

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com 

 
 [\Code]
 
  summaryRprof()
 $by.self
self.time self.pct total.time total.pct
 [.data.frame

[R] rbind with different columns

2009-10-20 Thread Antje


Hello there,

with the following dummy code I'd like to give you an idea how my data 
looks like:


df1 - data.frame(a = rnorm(10), b = rnorm(10))
df2 - data.frame(a = rnorm(10), c = rnorm(10))
df3 - data.frame(a = rnorm(10), b = rnorm(10), c = rnorm(10))

myList - list(df1, df2, df3) # (myList is representing the data 
structure I have to handle, it's read from single files with lapply)


In every list entry is a data.frame but the columns are only partially 
the same. If I have exactly the same columns, I could do the following 
command to combine my data:


do.call(rbind, myList)

but of course it does not work with differnt column names. Is there any 
easy way to retrieve a combined table like this:


 a  b  c
1  -0.54586587 -0.3607873 NA
2   1.10876842  1.1439414 NA
3   0.57357988 -1.2117743 NA
4  -1.40975759 -1.2390305 NA
5   0.03371093 -1.8860748 NA
6  -1.27682961  0.9990840 NA
7  -1.78715858  0.8400642 NA
8  -0.22663310  1.5224016 NA
9   0.45703787  0.0599217 NA
10 -1.21984635  1.1991689 NA
11 -2.58848301 NA  0.2394272
12  0.71155177 NA -0.7107332
13 -2.16440676 NA -0.1744845
14  1.33043121 NA  0.5951272
15  1.51034297 NA  0.1956956
16  1.00844947 NA  0.6726101
17  0.78693840 NA  1.2189904
18  0.68622170 NA  1.2230500
19 -1.09376863 NA  0.4267472
20  2.23647873 NA  0.7328574
21 -0.38144792  0.1532647  1.4824618
22  0.27078024 -0.4264737  0.1317450
23  1.10812086  1.2550117  0.1677935
24  0.14881701 -0.2928157 -1.4081529
25 -1.00635045 -0.7885968 -0.3502532
26  0.32024094  0.4681016 -1.5477557
27  0.82974710  0.2345186 -0.6572728
28  0.49608133  1.7463265  0.6493405
29 -0.33022738  1.9510503 -1.7930093
30 -0.62615365  0.7330671 -0.4032405

The only thing I can think about is checking the names of each list 
entry and adding NA-columns before combining them.

Is there any other way to do so?


Antje

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] AFT-model with time-dependent covariates

2009-10-20 Thread Göran Broström

Sorry for being late in responding to this thread, but I was made
aware of it only two weeks ago. In my package 'eha' there is a
function 'aftreg', which performs what is asked for, given that the
time-varying covariates are step functions of time and that the
observation period for each individual is an interval. Left truncation
works if it can be assumed that the covariate values during the first
non-observable interval are the same as at the beginning of the first
interval under observation.

As Terry wrote, there is a lot of book-keeping in writing down the
likelihood function, and even more for scores and the hessian, so I
adopted a lazy way in the programming: I have only coded the log
likelihood function, and I use 'optim' (in R code) and 'BFGS', without
gradient, and I ask for a numerically differentiated Hessian, which I
use for calculating standard errors and p-values. Tests show that this
works surprisingly well, but for huge data sets it is very slow. If it
was possible to ask for a hessian in the C version of BFGS, things
would improve a lot.

Also note that you need the latest version (1.2-12) of eha for this to
work. aftreg in arlier versions only works (correctly) for
time-constant covariates. And, this is not well tested, so care is
needed. And I appreciate performance and bug reports, of course.

Göran

On Wed, May 13, 2009 at 9:34 PM, spencerg spencer.gra...@prodsyse.com wrote:
 To see what's available in other packages, try the following:

 library(RSiteSearch)
 AFT - RSiteSearch.function('AFT model')
 summary(AFT) # 24 help files found in 8 different packages
 HTML(AFT) # opens a table with 24 rows in a web browser.
     There may be nothing here that will help you, but this provides a quick
 overview of what's available.  If this doesn't find what you want, it either
 has not been contributed or its help page does not use the phrase AFT
 model.

     Hope this helps.     Spencer Graves

 Terry Therneau wrote:

  The coding for an AFT model with time-dependent covariates will be very
 hard, and I don't know of anyone who has done it.  (But I don't keep watch
 of other survival packages, so something might be there).
    In a Cox model, a subject's risk depends only on the current value of
 his/her covariates; in an AFT model the risk depends on the entire covariate
 history.  (My 'accelerated age' is the sum of all the extra years I have
 ever gained).  Coding this is not theoretically complex, but would be a
 pain-in-the-rear amount of bookkeeping.
          Terry Therneau

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Göran Broström

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] rbind with different columns

2009-10-20 Thread Karl Ove Hufthammer

In article 4addc1d0.2040...@yahoo.de, niederlein-rs...@yahoo.de 
says...
 In every list entry is a data.frame but the columns are only partially 
 the same. If I have exactly the same columns, I could do the following 
 command to combine my data:
 
 do.call(rbind, myList)
 
 but of course it does not work with differnt column names. Is there any 
 easy way to retrieve a combined table like this:

You're in luck. 'rbind.fill' in the 'plyr' package does exactly what you 
want.

-- 
Karl Ove Hufthammer

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Counting

2009-10-20 Thread Ashta

Hi All,

Assume that I have the following data set  with two variables and I
want count the number of observation with identical values  and number
of time each factor changed from x1 to x2.

x1  x2
 1    1
 1    0
 0    1
 0    1
 0    0
 1    1
 0    1

The output should be
x1  changed
  0   3    # has changed 3 times
  1   1    # has changed 1 time
x1 unchanged
      0  1    # has unchanged only 1 time
  1  2     # has unchanged 2 times

Can someone help me how to do it in R?

Thanks in advance

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] underflow of fisher.test result

2009-10-20 Thread Ted Harding

On 20-Oct-09 13:34:49, Peng Yu wrote:
 fisher.test() gives a very small p-value, which is underflow on my
 machine. However, the log of it should not be underflow. I'm wondering
 if there is a way get log() of a small p-value. An approximation is
 acceptable in this case. Thank you!
 
 fisher.test(rbind(c(1,10),c(10,1000)))$p.value
 [1] 0

I have not attempted an exact approach (though may do so later),
but the P-value in question is about 1e-15000 so (using log to
base e)

  log(P) approx = -33000

In such a case, P=0 is a pretty good approximation!

Which prompts the question: Why the interest in having the value
of such a very small number?
Ted.


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 20-Oct-09   Time: 15:14:26
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] rbind with different columns

2009-10-20 Thread Ista Zahn

Nice! I was going to recommend

merge(merge(myList[[1]], myList[[2]], all=TRUE, sort=FALSE),
myList[[3]], all=TRUE, sort=FALSE)

but rbind.fill is better.

-Ista

On Tue, Oct 20, 2009 at 10:05 AM, Karl Ove Hufthammer k...@huftis.org wrote:
 In article 4addc1d0.2040...@yahoo.de, niederlein-rs...@yahoo.de
 says...
 In every list entry is a data.frame but the columns are only partially
 the same. If I have exactly the same columns, I could do the following
 command to combine my data:

 do.call(rbind, myList)

 but of course it does not work with differnt column names. Is there any
 easy way to retrieve a combined table like this:

 You're in luck. 'rbind.fill' in the 'plyr' package does exactly what you
 want.

 --
 Karl Ove Hufthammer

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Ista Zahn
Graduate student
University of Rochester
Department of Clinical and Social Psychology
http://yourpsyche.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] 2x2 Contingency table with much sampling zeroes

2009-10-20 Thread Etienne Toffin


Hi,

I'm analyzing experimental results where two different events (T1  
and T2) can occur or not during an experiment. I made my experiments  
with one factor (Substrate) with two levels (Sand and Clay).
I would like to know wether or not Substrate affects the occurrence  
probability of the two events. Moreover, for each condition I would  
like to test the heterogeneity of my experimental contingency table  
with a theoretical one (from simulations).


However, my problem is that several cells have sampling zeroes. My  
experiments can't be done again to fill these cells. Thus Chi-square  
requirements are not fulfilled and I have to find another statistical  
method.


After spending hours searching for a solution, I thought I could use  
loglinear model to answer my questions, but :
- I'm not sure I can use loglinear model = do I fulfill the required  
conditions ?

- would this method answer to my hypothesis ?
- I not sure to really understand how I have to use loglin()…

Here is the data frame of my results.

DF-data.frame(Subs=c(rep(Sand,4),rep(Clay,4)),T1=rep(c 
(YES,YES,NO,NO),2),T2=rep(c(YES,NO,YES,NO),2),Freq=c 
(12,5,0,7,24,1,0,0))


What do you think of such datas ? Can I use any statistical method to  
test my hypothesis ? Any advice ?


Thanks,

Etienne Toffin


---
Etienne Toffin, PhD Student
Unit of Social Ecology
Université Libre de Bruxelles, CP 231
Boulevard du Triomphe
B-1050 Brussels
Belgium

Tel: +32(0)2/650.55.30
Fax: +32(0)2/650.57.67
http://www.ulb.ac.be/sciences/use/toffin.html
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Counting

2009-10-20 Thread Jorge Ivan Velez

Hi Ashta,
Take a look at ?rle, e.g.

 rle(x1)
Run Length Encoding
  lengths: int [1:4] 2 3 1 1
  values : num [1:4] 1 0 1 0
 rle(x1)$lengths
[1] 2 3 1 1
 rle(x1)$values
[1] 1 0 1 0

HTH,
Jorge


On Tue, Oct 20, 2009 at 10:10 AM, Ashta  wrote:

 Hi All,

 Assume that I have the following data set  with two variables and I
 want count the number of observation with identical values  and number
 of time each factor changed from x1 to x2.

 x1  x2
  11
  10
  01
  01
  00
  11
  01

 The output should be
 x1  changed
  0   3# has changed 3 times
  1   1# has changed 1 time
 x1 unchanged
  0  1# has unchanged only 1 time
  1  2 # has unchanged 2 times

 Can someone help me how to do it in R?

 Thanks in advance

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Copulas

2009-10-20 Thread emkayenne


nobody?

emkayenne wrote:
 
 Hello! 
 I am currently using R to deal empirically with copulas. I am using
 financial return data to construct copula models that seem appropriate for
 my data sets. Is there anyone who has done something similar and who is
 interested in talking (or writing...) with me about this topic
 (difficulties, experiences...)? 
 I have extensive experience in this area now but there are still some
 things espacially revolving around testing copula models that are still
 slightly messy. 
 Waiting for your replies..., emkay
 

-- 
View this message in context: 
http://www.nabble.com/Copulas-tp25967876p25974053.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to get slope estimates from a four parameter logistic with SSfpl?

2009-10-20 Thread Weber, Sam

Yes that worked perfectly. Many thanks. 

From: Peter Ehlers [ehl...@ucalgary.ca]
Sent: 19 October 2009 19:17
To: Weber, Sam
Cc: r-help@R-project.org
Subject: Re: [R] How to get slope estimates from a four parameter logistic with 
SSfpl?

Weber, Sam wrote:
 Hi,

 I was hoping to get some advice on how to derive estimates of slopes from 
 four parameter logistic models fit with SSfpl.

 I fit the model using:

 model-nls(temp~SSfpl(time,a,b,c,d))
 summary(model)

 I am interested in the values of the lower and upper asymptotes (parameters a 
 and b), but also in the gradient of the line at the inflection point (c) 
 which I assume tells me my rate of increase when it is steepest (?).

 However, I cannot work out how to derive a slope estimate. I'm guessing it 
 has something to do with the scaling parameter d but having searched the 
 internet for hours I have not made any progress, and it is probably quite 
 simple. Any help would be hugely appreciated!

Sam,

If I understand you correctly, all you want is the derivative
of the fpl wrt the input. You can differentiate the fpl function
or you can use the result provided by predict.nls() in the
gradient attribute. The gradient wrt 'c' is the negative
of the slope you want. The maximum slope will occur at time = c.

So this should give you the slope:

  ypred - predict(model,newdata=data.frame(Time=coef(model)[3]))
  myslope - -attr(ypred, gradient)[,3]

  -Peter Ehlers

 All the best

 Sam

   [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] How to create a legend that automatically reads the values from two vectors?

2009-10-20 Thread jcano


Hi all!!!

How can I create a legend to my plot that automatically reads the values
from two vectors?
betav-c(0.78,0.94,0.88,0.41,0.59,4.68)
etav-c(235.6,59.5,31.2,8.7,3.2,1174)
 I want my legend to show 6 lines
beta(in greeks)=0.78,eta(in greeks)=235.6
beta(in greeks)=0.94,eta(in greeks)=6.59

beta(in greeks)=4.68,eta(in greeks)=1174

I know how to do it by hand but I have to write them one by one and this
is not efficient. How can I do that? I guess it has to be with the lapply or
rapply functions, but I can't really get it


Many thanks in advance

Best regards

Javier
-- 
View this message in context: 
http://www.nabble.com/How-to-create-a-legend-that-automatically-reads-the-values-from-two-vectors--tp25976164p25976164.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] LDA Precdict - Seems to be predicting on the Training Data

2009-10-20 Thread BostonR


When I import a simple dataset, run LDA, and then try to use the model to
forecast out of sample data, I get a forecast for the training set not the
out of sample set.  Others have posted this question, but I do not see the
answers to their posts.

Here is some sample data:

DateNames   v1  v2  v3  c1
1/31/2009   Name1   0.714472361 0.902552278 0.783353694 a
1/31/2009   Name2   0.512158919 0.770451596 0.111853346 a
1/31/2009   Name3   0.470693282 0.129200065 0.800973877 a
1/31/2009   Name4   0.24236898  0.472219638 0.486599763 b
1/31/2009   Name5   0.785619735 0.628511593 0.106868172 b
1/31/2009   Name6   0.718718387 0.697257275 0.690326648 b
1/31/2009   Name7   0.327331186 0.01715109  0.861421706 c
1/31/2009   Name8   0.632011743 0.599040196 0.320741634 c
1/31/2009   Name9   0.302804404 0.475166304 0.907143632 c
1/31/2009   Name10  0.545284813 0.967196462 0.945163717 a
1/31/2009   Name11  0.563720418 0.024862018 0.970685281 a
1/31/2009   Name12  0.357614427 0.417490445 0.415162276 a
1/31/2009   Name13  0.154971203 0.425227967 0.856866993 b
1/31/2009   Name14  0.935080173 0.488659307 0.194967973 a
1/31/2009   Name15  0.363069339 0.334206603 0.639795596 b
1/31/2009   Name16  0.862889297 0.821752532 0.549552875 a

Attached is the code:

myDat -read.csv(file=f:\\Systematiq\\data\\TestData.csv,
header=TRUE,sep=,)
myData - data.frame(myDat)

length(myDat[,1])

train - myDat[1:10,]
outOfSample - myDat[11:16,]
outOfSample - (cbind(outOfSample$v1,outOfSample$v2,outOfSample$v3))
outOfSample -data.frame(outOfSample)

length(train[,1])
length(outOfSample[,1])

fit - lda(train$c1~train$v1+train$v2+train$v3)

forecast - predict(fit,outOfSample)$class

length(forecast)# I am expecting this to be same as
lengthoutOfSample[,1]), which is 6

Output:

length(forecast)# I am expecting this to be same as
lengthoutOfSample[,1]), which is 6
[1] 10






-- 
View this message in context: 
http://www.nabble.com/LDA-Precdict---Seems-to-be-predicting-on-the-Training-Data-tp25976178p25976178.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Perl error - Can't locate R/Rdconv.pm in @INC

2009-10-20 Thread Grainne Kerr

Dear list,

I've a package, myPkg, which was developed on R v 2.6.1.

I have installed R v2.9.2 from source, with a shared library and tcktk
support. The
package, myPkg, depends on RSPerl, and imports it in its NAMESPACE.

I've downloaded and installed RSPerl using command:

R CMD INSTALL -c --configure-args=--with-modules='IO \
Fcntl Socket Storable XML::Parser::Expat DB_File File::Glob \
GDBM_File SDBM_File POSIX' RSPerl

When installing myPkg, I use the following command:

R CMD INSTALL myPkg

* Installing to library ‘/usr/local/lib/R/library’
* Installing *source* package ‘myPkg’ ...
** R
** inst
** preparing package for lazy loading
Warning: 'Sys.putenv' is deprecated.
Use 'Sys.setenv' instead.
See help(Deprecated)
** help
*** installing help indices
Can't locate R/Rdconv.pm in @INC (@INC contains: /usr/local/lib/R/library/
RSPerl/perl/i486-linux-gnu-thread-multi
/usr/local/lib/R/library/RSPerl/perl /etc/perl
/usr/local/lib/perl/5.10.0 /usr/local/share/perl/5.10.0 /usr/lib/perl5
/usr/share/perl5 /usr/lib/perl/5.10 /usr/share/perl/5.10
/usr/local/lib/site_perl .) at
/usr/local/lib/R/share/perl/build-help.pl line 21.
BEGIN failed--compilation aborted at
/usr/local/lib/R/share/perl/build-help.pl line 21.
ERROR: building help failed for package ‘myPkg’

So - its can't find the perl module, Rdconv.pm, to build the help modules in the
perl path. Doing a quick search for this module - its found at the
location below (a location not in the perl path).

find / -name Rdconv.pm
/usr/local/lib/R/share/perl/R/Rdconv.pm

I changed the NAMESPACE file in my myPkg package, so as not to
import RSPerl. I altered the build-help.pl in R to print out the perl
path.

Path when RSPerl not imported:

/usr/local/lib/R/share/perl
/usr/local/lib/R/share/perl
/etc/perl
/usr/local/lib/perl/5.10.0
/usr/local/share/perl/5.10.0
/usr/lib/perl5
/usr/share/perl5
/usr/lib/perl/5.10
/usr/share/perl/5.10
/usr/local/lib/site_perl

(Also, the perl path when installing RSPerl (i.e. before installing myPkg)

/usr/local/lib/R/share/perl
/usr/local/lib/R/share/perl
/etc/perl
/usr/local/lib/perl/5.10.0
/usr/local/share/perl/5.10.0
/usr/lib/perl5
/usr/share/perl5
/usr/lib/perl/5.10
/usr/share/perl/5.10
/usr/local/lib/site_perl)


Obviously, if I  run R CMD INSTALL --no-docs myPkg it installs
fine. It also installs with older versions of R.
I'm running Ubuntu 9.04.

I have already posted (probably erroneously) this help plea of the
r-devel mailing list - I apologise for cross-posting. Any ideas what
should I look for in the code in order to fix this? Any pointers in
the right direction or what to look for would be very much
appreciated.

Thanks in advance!
Grainne.

R sessionInfo()

R version 2.9.2 (2009-08-24)
i686-pc-linux-gnu

locale:
LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US.UTF-8;LC_MONETARY=C;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Putting names on a ggplot

2009-10-20 Thread hadley wickham

On Sun, Oct 18, 2009 at 10:29 AM, John Kane jrkrid...@yahoo.ca wrote:
 Thanks Stefan, the annotate approach works beautifully.  I had not got that 
 far in Hadley's book apparently :(

 I'm not convinced though that the explaination

 you shouldn't use aes in this case since nampost,
 temprange, ... are not
 part of the dataframe year.

 makes sense since it seems to work in this case unless I am missing something 
 obvious. Mind you I'm good at missing the obvious especially in ggplot.

It currently works (because I can't figure out how to make it an
error) but you really should not do it.

Hadley


-- 
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] RGtk2:::gdkColorToString throws an error

2009-10-20 Thread Ronggui Huang

Dear all,

I try to use RGtk2:::gdkColorToString, but it throws an error:
Error in .RGtkCall(S_gdk_color_to_string, object, PACKAGE = RGtk2) :
  gdk_color_to_string exists only in Gdk = 2.12.0

I know what it means, but don't know to solve this problem because I
don't know where I can download the referred gdk library. Any
information? Thank you.

-- 
HUANG Ronggui, Wincent
Doctoral Candidate
Dept of Public and Social Administration
City University of Hong Kong
Home page: http://asrr.r-forge.r-project.org/rghuang.html

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Time Zone names on Windows

2009-10-20 Thread Keith

Dear R-users,

I tried to read recorded data into R, but confronted the problem about
time zone. The data was recorded in GMT+1 (without summer time
changing), and at first I tried to do in the way:

data - read.zoo(data.txt, header=FALSE, sep=,, format=%H:%M:%S
%d.%m.%Y, strip.white=TRUE, tz=GMT+1, skip=3)

However, it failed. Then, I found the description of Time Zones in R
help and noticed the Olson database is stored in the specific folder
under Windows OS which is the OS I'm working on. I noticed there is no
file related to GMT+1 time zone if I understood correctly. So far my
workaround is to set the tz to Africa/Lagos where is the city in
Africa sharing the same time zone. However, it's a little bit weird
because the data was not collected in that city nor the place near by.

My question is that if there is other way to set the tz to GMT+1
instead using my workaround to have a neutral meaning in my script?

Regards,
Keith

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] LDA Precdict - Seems to be predicting on the Training Data

2009-10-20 Thread Tony Plate


Maybe you're getting strange results because you're not supplying a data object 
to lda() when you build your fit.

When I do it the standard way, predict.lda() uses the new data and produces a 
result of length 6 as expected:


myDat - read.csv(clipboard, sep=\t)
fit - lda(c1 ~ v1 + v2 + v3, data=myDat[1:10,])
predict(fit, myDat[11:16,])

$class
[1] c c c b c a
Levels: a b c
...




-- Tony Plate


BostonR wrote:

When I import a simple dataset, run LDA, and then try to use the model to
forecast out of sample data, I get a forecast for the training set not the
out of sample set.  Others have posted this question, but I do not see the
answers to their posts.

Here is some sample data:

DateNames   v1  v2  v3  c1
1/31/2009   Name1   0.714472361 0.902552278 0.783353694 a
1/31/2009   Name2   0.512158919 0.770451596 0.111853346 a
1/31/2009   Name3   0.470693282 0.129200065 0.800973877 a
1/31/2009   Name4   0.24236898  0.472219638 0.486599763 b
1/31/2009   Name5   0.785619735 0.628511593 0.106868172 b
1/31/2009   Name6   0.718718387 0.697257275 0.690326648 b
1/31/2009   Name7   0.327331186 0.01715109  0.861421706 c
1/31/2009   Name8   0.632011743 0.599040196 0.320741634 c
1/31/2009   Name9   0.302804404 0.475166304 0.907143632 c
1/31/2009   Name10  0.545284813 0.967196462 0.945163717 a
1/31/2009   Name11  0.563720418 0.024862018 0.970685281 a
1/31/2009   Name12  0.357614427 0.417490445 0.415162276 a
1/31/2009   Name13  0.154971203 0.425227967 0.856866993 b
1/31/2009   Name14  0.935080173 0.488659307 0.194967973 a
1/31/2009   Name15  0.363069339 0.334206603 0.639795596 b
1/31/2009   Name16  0.862889297 0.821752532 0.549552875 a

Attached is the code:

myDat -read.csv(file=f:\\Systematiq\\data\\TestData.csv,
header=TRUE,sep=,)
myData - data.frame(myDat)

length(myDat[,1])

train - myDat[1:10,]
outOfSample - myDat[11:16,]
outOfSample - (cbind(outOfSample$v1,outOfSample$v2,outOfSample$v3))
outOfSample -data.frame(outOfSample)

length(train[,1])
length(outOfSample[,1])

fit - lda(train$c1~train$v1+train$v2+train$v3)

forecast - predict(fit,outOfSample)$class

length(forecast)# I am expecting this to be same as
lengthoutOfSample[,1]), which is 6

Output:

length(forecast)# I am expecting this to be same as
lengthoutOfSample[,1]), which is 6
[1] 10








__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] LDA Precdict - Seems to be predicting on the Training Data

2009-10-20 Thread Gabriela Cendoya


This is not an explanation but it gives you a solution,

Instead of using lda with a formula do it by giving the variables and the 
classification factor as arguments, base on your example and data:




outOfSample - myDat[11:16,]

train - myDat[1:10,]

outOfSample - outOfSample[,3:5]

train2 - train[,3:5]

fit - lda(train2,train$c1)

forecast - predict(fit,outOfSample)$class

length(forecast)

[1] 6



Seems that the problem arise when predict.lda works on lda fit applied to a 
formula class object.




Hope this help,

Gabriela.



__
Lic. María Gabriela Cendoya
Magíster en Biometría
Profesor Adjunto
Cátedra de Estadística y Diseño
Facultad de Ciencias Agrarias
Universidad Nacional de Mar del Plata
__

- Original Message - 
From: BostonR dp...@capitaliq.com

To: r-help@r-project.org
Sent: Tuesday, October 20, 2009 11:31 AM
Subject: [R] LDA Precdict - Seems to be predicting on the Training Data




When I import a simple dataset, run LDA, and then try to use the model to
forecast out of sample data, I get a forecast for the training set not the
out of sample set.  Others have posted this question, but I do not see the
answers to their posts.

Here is some sample data:

Date Names v1 v2 v3 c1
1/31/2009 Name1 0.714472361 0.902552278 0.783353694 a
1/31/2009 Name2 0.512158919 0.770451596 0.111853346 a
1/31/2009 Name3 0.470693282 0.129200065 0.800973877 a
1/31/2009 Name4 0.24236898 0.472219638 0.486599763 b
1/31/2009 Name5 0.785619735 0.628511593 0.106868172 b
1/31/2009 Name6 0.718718387 0.697257275 0.690326648 b
1/31/2009 Name7 0.327331186 0.01715109 0.861421706 c
1/31/2009 Name8 0.632011743 0.599040196 0.320741634 c
1/31/2009 Name9 0.302804404 0.475166304 0.907143632 c
1/31/2009 Name10 0.545284813 0.967196462 0.945163717 a
1/31/2009 Name11 0.563720418 0.024862018 0.970685281 a
1/31/2009 Name12 0.357614427 0.417490445 0.415162276 a
1/31/2009 Name13 0.154971203 0.425227967 0.856866993 b
1/31/2009 Name14 0.935080173 0.488659307 0.194967973 a
1/31/2009 Name15 0.363069339 0.334206603 0.639795596 b
1/31/2009 Name16 0.862889297 0.821752532 0.549552875 a

Attached is the code:

myDat -read.csv(file=f:\\Systematiq\\data\\TestData.csv,
header=TRUE,sep=,)
myData - data.frame(myDat)

length(myDat[,1])

train - myDat[1:10,]
outOfSample - myDat[11:16,]
outOfSample - (cbind(outOfSample$v1,outOfSample$v2,outOfSample$v3))
outOfSample -data.frame(outOfSample)

length(train[,1])
length(outOfSample[,1])

fit - lda(train$c1~train$v1+train$v2+train$v3)

forecast - predict(fit,outOfSample)$class

length(forecast)# I am expecting this to be same as
lengthoutOfSample[,1]), which is 6

Output:

length(forecast)# I am expecting this to be same as
lengthoutOfSample[,1]), which is 6
[1] 10






--
View this message in context: 
http://www.nabble.com/LDA-Precdict---Seems-to-be-predicting-on-the-Training-Data-tp25976178p25976178.html

Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.



___

Aviso:
=

El contenido del presente e-mail y sus posibles adjuntos pertenecen al INTA y pueden contener información confidencial. Si usted no es el destinatario original de este mensaje y por este medio pudo acceder a dicha información, por favor solicitamos contactar al remitente y eliminar el mensaje de inmediato. Se encuentra prohibida la divulgación, copia, distribución o cualquier otro uso de la información contenida en el presente e-mail por parte de personas distintas al destinatario. 



This e-mail contents and its possible attachments belong to INTA and may 
contain confidential information. If this message was not originally addressed 
to you, but you have accessed to such information by this means, please contact 
the sender and eliminate this message immediately. Circulation, copy, 
distribution, or any other use of the information contained in this e-mail is 
not allowed on part of those different from the addressee.


Antes de imprimir este mensaje, asegúrese de que sea necesario. Proteger el 
medio ambiente está también en su mano.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Counting

2009-10-20 Thread Peter Ehlers


How about

 unch - aggregate(x2==x1, by = list(x1=x1), FUN = sum)
 chgd - aggregate(x2!=x1, by = list(x1=x1), FUN = sum)

 -Peter Ehlers

Ashta wrote:

Hi All,

Assume that I have the following data set  with two variables and I
want count the number of observation with identical values  and number
of time each factor changed from x1 to x2.

x1  x2
 11
 10
 01
 01
 00
 11
 01

The output should be
x1  changed
  0   3# has changed 3 times
  1   1# has changed 1 time
x1 unchanged
  0  1# has unchanged only 1 time
  1  2 # has unchanged 2 times

Can someone help me how to do it in R?

Thanks in advance

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] kendall.global

2009-10-20 Thread Rosa Manrique

Hi every body:
I need some help with kendall.global. The example in the manual seems not 
working well, and cannot used with my data, always the same error.

data(mite)
 mite.hel - decostand(mite, hel)
 
 # Reproduce the results shown in Table 2 of Legendre (2005), a single group
 mite.small - mite.hel[c(4,9,14,22,31,34,45,53,61,69),c(13:15,23)]
 kendall.global(mite.small)
Errore in FUN(newX[, i], ...) : 
  .Random.seed is not an integer vector but of type 'list'
 kendall.post(mite.small, mult=holm)
Errore in sample(R.gr[, j]) : 
  .Random.seed is not an integer vector but of type 'list'
 
 # Reproduce the results shown in Tables 3 and 4 of Legendre (2005), 2 groups
 group 
 -c(1,1,2,1,1,1,1,1,2,1,1,1,1,1,1,2,1,2,1,1,1,1,2,1,2,1,1,1,1,1,2,2,2,2,2)
 kendall.global(mite.hel, group=group)
Errore in FUN(newX[, i], ...) : 
  .Random.seed is not an integer vector but of type 'list'
 kendall.post(mite.hel, group=group, mult=holm, nperm=99)
Errore in sample(R.gr[, j]) : 
  .Random.seed is not an integer vector but of type 'list'

Thank you very much if you know how to solve it. 
Rosa.



















[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Time Zone names on Windows

2009-10-20 Thread Glen Sargeant


Keith,

If you are working within a single time zone, including time zone
information with each record does not seem necessary and you probably are
not recording times to the sub-second.  The best solution may thus be to use
a simpler date/time class that does not include time zone information.  You
should consider the chron package and see R News 4/1 for helpful hints.

chron makes construction and use of date/time records very easy:

 library(chron)
 dt - c(10/20/2009)
 tm - c(11:02:00)
 chron(dt,tm)
[1] (10/20/09 11:02:00)

Good luck!

Glen



Keith-119 wrote:
 
 
 My question is that if there is other way to set the tz to GMT+1
 instead using my workaround to have a neutral meaning in my script?
 
 
 

-- 
View this message in context: 
http://www.nabble.com/Time-Zone-names-on-Windows-tp25977196p25977926.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Counting

2009-10-20 Thread William Dunlap

 From: r-help-boun...@r-project.org 
 [mailto:r-help-boun...@r-project.org] On Behalf Of Peter Ehlers
 Sent: Tuesday, October 20, 2009 8:48 AM
 To: Ashta
 Cc: R help
 Subject: Re: [R] Counting

 How about

   unch - aggregate(x2==x1, by = list(x1=x1), FUN = sum)
   chgd - aggregate(x2!=x1, by = list(x1=x1), FUN = sum)

   -Peter Ehlers

When I hear 'count' I think first of the table() function.
E.g.,
d-data.frame(x1=c(1,1,0,0,0,1,0), x2=c(1,0,1,1,0,1,1))
with(d, table(x1, x1==x2))

   x1  FALSE TRUE
 0 31
 1 12
or
with(d, table(x1, factor(x1==x2,labels=c(Changed,Unchanged

   x1  Changed Unchanged
 0   3 1
 1   1 2
or use dimnames- to change the labels on the table itself.

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com

 Ashta wrote:
  Hi All,

  Assume that I have the following data set  with two variables and I
  want count the number of observation with identical values  
 and number
  of time each factor changed from x1 to x2.

  x1  x2
   11
   10
   01
   01
   00
   11
   01

  The output should be
  x1  changed
0   3# has changed 3 times
1   1# has changed 1 time
  x1 unchanged
0  1# has unchanged only 1 time
1  2 # has unchanged 2 times

  Can someone help me how to do it in R?

  Thanks in advance

  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] 2x2 Contingency table with much sampling zeroes

2009-10-20 Thread Charles C. Berry

On Tue, 20 Oct 2009, Etienne Toffin wrote:

Hi,

I'm analyzing experimental results where two different events (T1 and T2)
can occur or not during an experiment. I made my experiments with one factor
(Substrate) with two levels (Sand and Clay).
I would like to know wether or not Substrate affects the occurrence
probability of the two events.

It is not clear to me what you mean by 'affects the occurence ...'.

This sounds like 'Independence of Substrate from the two other variables',
which is a 3 degree of freedom hypothesis (at least in the example you
give).

Is that what you are after or are only some of those contrasts
interesting?

Moreover, for each condition I would like to
test the heterogeneity of my experimental contingency table with a
theoretical one (from simulations).

Do you mean you have some prior values for the counts or proportions? If
so a standard goodness of fit test should do. If not, you need to describe
the problem in more detail.

However, my problem is that several cells have sampling zeroes. My
experiments can't be done again to fill these cells. Thus Chi-square
requirements are not fulfilled and I have to find another statistical method.

Sampling zeroes in the cells are not a problem as long as the marginal
tables do not have such zeroes. Depending on the hypotheses you want to
test, the marginal tables may be OK. 'Substrate' is OK and so is 'T1 by
T2', so you can do the 3 degree of freedom test implied by those margins.

After spending hours searching for a solution, I thought I could use
loglinear model to answer my questions, but :
- I'm not sure I can use loglinear model = do I fulfill the required
conditions ?

Have you studied the Agresti reference listed in the help page?? I'll bet
it addresses 'the required conditions' - which go to the sampling
distribution of the counts.

- would this method answer to my hypothesis ?
- I not sure to really understand how I have to use loglin()…

run

example(loglin)

and reread

?loglin

The example is the same setup as you have here (albeit with more degrees
of freedom), so you might emulate it.

Here is the data frame of my results.

DF-data.frame(Subs=c(rep(Sand,4),rep(Clay,4)),T1=rep(c(YES,YES,NO,NO),2),T2=rep(c(YES,NO,YES,NO),2),Freq=c(12,5,0,7,24,1,0,0))

What do you think of such datas ? Can I use any statistical method to test my
hypothesis ? Any advice ?

Recruit a statistician to your committee. Questions like these are better
hashed out in front of a blackboard than over the internet.

HTH,

Chuck

Thanks,

Etienne Toffin

---
Etienne Toffin, PhD Student
Unit of Social Ecology
Université Libre de Bruxelles, CP 231
Boulevard du Triomphe
B-1050 Brussels
Belgium

Tel: +32(0)2/650.55.30
Fax: +32(0)2/650.57.67
http://www.ulb.ac.be/sciences/use/toffin.html
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Charles C. Berry(858) 534-2098
Dept of Family/Preventive Medicine
E mailto:cbe...@tajo.ucsd.edu UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] kendall.global

2009-10-20 Thread Viknesh


Hi,

I seem to be able to run it without any problems:

 kendall.global(mite.small)
$Concordance_analysis
  Group.1
W  0.44160305
F  2.37252221
Prob.F 0.04403791
Chi2  15.89770992
Prob.perm  0.0450

attr(,class)
[1] kendall.global

It seems to be .Random.seed that is causing the problem, so maybe you can do
this before running any of the commands,

 set.seed(1)

HTH,
Vik


Rosa Manrique wrote:
 
 Hi every body:
 I need some help with kendall.global. The example in the manual seems not
 working well, and cannot used with my data, always the same error.
 
 data(mite)
 mite.hel - decostand(mite, hel)
 
 # Reproduce the results shown in Table 2 of Legendre (2005), a single
 group
 mite.small - mite.hel[c(4,9,14,22,31,34,45,53,61,69),c(13:15,23)]
 kendall.global(mite.small)
 Errore in FUN(newX[, i], ...) : 
   .Random.seed is not an integer vector but of type 'list'
 kendall.post(mite.small, mult=holm)
 Errore in sample(R.gr[, j]) : 
   .Random.seed is not an integer vector but of type 'list'
 
 # Reproduce the results shown in Tables 3 and 4 of Legendre (2005), 2
 groups
 group
 -c(1,1,2,1,1,1,1,1,2,1,1,1,1,1,1,2,1,2,1,1,1,1,2,1,2,1,1,1,1,1,2,2,2,2,2)
 kendall.global(mite.hel, group=group)
 Errore in FUN(newX[, i], ...) : 
   .Random.seed is not an integer vector but of type 'list'
 kendall.post(mite.hel, group=group, mult=holm, nperm=99)
 Errore in sample(R.gr[, j]) : 
   .Random.seed is not an integer vector but of type 'list'
 
 Thank you very much if you know how to solve it. 
 Rosa.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 

-- 
View this message in context: 
http://www.nabble.com/kendall.global-tp25977847p25978086.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] how to get rid of 2 for-loops and optimize runtime

2009-10-20 Thread William Dunlap

 -Original Message-
 From: Ian Willems [mailto:ian.will...@uz.kuleuven.ac.be] 
 Sent: Tuesday, October 20, 2009 6:46 AM
 To: William Dunlap; r-help@r-project.org
 Subject: RE: [R] how to get rid of 2 for-loops and optimize runtime

 Hi William,

 Your programs works perfect and very fast for the table I'm 
 using right now (only one match per row)
 If I want to reuse this code other tables, it can match with 
 more than one row.
 Is it possible to adapt your code easily, if I have to sum 
 the values of last month from different rows?

You can use aggregate() with one of those
keys to sum up the values with a common key
value.

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com  

 Thank u for your help
 regards,

 Ian

 -Oorspronkelijk bericht-
 Van: William Dunlap [mailto:wdun...@tibco.com] 
 Verzonden: maandag 19 oktober 2009 18:08
 Aan: Ian Willems; r-help@r-project.org
 Onderwerp: RE: [R] how to get rid of 2 for-loops and optimize runtime

  -Original Message-
  From: r-help-boun...@r-project.org 
  [mailto:r-help-boun...@r-project.org] On Behalf Of Ian Willems
  Sent: Monday, October 19, 2009 6:50 AM
  To: 'r-help@r-project.org'
  Subject: [R] how to get rid of 2 for-loops and optimize runtime

  Short: get rid of the loops I use and optimize runtime

  Dear all,

  I want to calculate for each row the amount of the month ago. 
  I use a matrix with 2100 rows and 22 colums (which is still a 
  very small matrix. nrows of other matrixes can easily be more 
  then 10)

  Table before
  Year  month quarter yearmonth Service ...  Amount
  2009  9Q3092009  A
  ...120
  2009  9Q3092009  B
   ... 80
  2009  8Q3   082009   A
... 40
  2009  7Q3   072009   A
 ...  50

  The result I want
  Year month  quarter yearmonth Service ...Amount   
 amound_lastmonth
  2009 9  Q3  092009  A ...120  40
  2009 9  Q3  092009  B ...80   ...
  2009 8  Q3  082009  A ...40   50
  2009 7  Q3  072009  A ...50   ...

  Table is not exactly the same but gives a good idea what I 
  have and what I want

  The code I have written (see below) does what I want but it 
  is very very slow. It takes 129s for 400 rows. And the time 
  gets four times higher each time I double the amount of rows.
  I'm new in programming in R, but I found that you can use 
  Rprof and summaryRprof to analyse your code (output see below)
  But I don't really understand the output
  I guess I need code that requires linear time and need to get 
  rid of the 2 for loops.
  can someone help me or tell me what else I can do to optimize 
  my runtime

  I use R 2.9.2
  windows Xp service pack3

  Thank you in advance

  Best regards,

  Willems Ian

  *
  dataset[,5]= month
  dataset[,3]= year
  dataset[,22]= amount
  dataset[,14]= servicetype

  [CODE]
  #for each row of the matrix check if each row has..
   for (j in 1:Number_rows) {
  + sum-0
  + for(i in 1:Number_rows){
  + if (dataset[j,14]== dataset[i,14]) #..the same service type
  +   {if (dataset[j,18]== dataset[i,18]) # .. the same department
  +{if (dataset[j,5]== 1)  # if month=1, month ago is 
  12 and year is -1
  +   {if (12== dataset[i,5])
  +{if ((dataset[j,3]-1)== dataset[i,3])
  +
  + { sum-sum + dataset[i,22]}
  +  }}
  +  else {
  +   if ((dataset[j,5]-1)== dataset[i,5])  if month != 1, 
  month ago is month -1
  + { if (dataset[j,3]== dataset[i,3])
  + {sum-sum + dataset[i,22]}
  +  }}

 match() is often useful for quickly finding the locations of
 many items in a vector.  It has no special methods for data.frames
 so you must combine the columns of interest into a character
 vector of keys and use match on the key vectors.  E.g.

 # your test data in a format that mail readers
 # can copy and paste into R:
 d - read.table(header=TRUE, textConnection(
Year  month quarter yearmonth Service  Amount
2009  9Q3   092009  A 120
2009  9Q3   092009  B  80
2009  8Q3   082009  A  40
2009  7Q3   072009  A  50
))
 # The key functions
 dKey - function(d) {
with(d, paste(d$Year, d$month, d$Service, sep=;))
 }
 keyThisMonth - function(d)dKey(d)
 keyPrevMonth - function(d) {
stopifnot(!is.null(d$Year), !is.null(d$month), !is.null(d$Service))
isJan - d$month==1
d$Year[isJan] - d$Year[isJan] - 1
d$month[isJan] - 12
d$month[!isJan] - d$month[!isJan] - 1
dKey(d)
 }
 # Make the new column:
 d$AmountPrevMonth - d$Amount[ match(keyPrevMonth(d), 
 keyThisMonth(d)) ]
 # The result
 print(d)

   Year month quarter yearmonth

Re: [R] Counting

2009-10-20 Thread Peter Ehlers


Nice solution, Bill.

-Peter Ehlers

William Dunlap wrote:
From: r-help-boun...@r-project.org 
[mailto:r-help-boun...@r-project.org] On Behalf Of Peter Ehlers

Sent: Tuesday, October 20, 2009 8:48 AM
To: Ashta
Cc: R help
Subject: Re: [R] Counting

How about

  unch - aggregate(x2==x1, by = list(x1=x1), FUN = sum)
  chgd - aggregate(x2!=x1, by = list(x1=x1), FUN = sum)

  -Peter Ehlers


When I hear 'count' I think first of the table() function.
E.g.,
d-data.frame(x1=c(1,1,0,0,0,1,0), x2=c(1,0,1,1,0,1,1))
with(d, table(x1, x1==x2))
   
   x1  FALSE TRUE

 0 31
 1 12
or
with(d, table(x1, factor(x1==x2,labels=c(Changed,Unchanged
   
   x1  Changed Unchanged

 0   3 1
 1   1 2
or use dimnames- to change the labels on the table itself.

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com
 

Ashta wrote:

Hi All,

Assume that I have the following data set  with two variables and I
want count the number of observation with identical values  

and number

of time each factor changed from x1 to x2.

x1  x2
 11
 10
 01
 01
 00
 11
 01

The output should be
x1  changed
  0   3# has changed 3 times
  1   1# has changed 1 time
x1 unchanged
  0  1# has unchanged only 1 time
  1  2 # has unchanged 2 times

Can someone help me how to do it in R?

Thanks in advance

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 

http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.






__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] plotw5.png

2009-10-20 Thread ogbos . okike


I've shared a document with you:

plotw5.png
http://docs.google.com/Doc?docid=0AfJ5_yv8GrERZGY3NmhuMnNfMWZtcGhjc2d3hl=eninvite=CPzN4ZsF

It's not an attachment -- it's stored online at Google Docs. To open this  
document, just click the link above.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Problems importing Unix SAS .ssd04 file to R (Win)

2009-10-20 Thread Nordlund, Dan (DSHS/RDA)

 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On
 Behalf Of johannes rara
 Sent: Tuesday, October 20, 2009 12:26 AM
 To: r-help@r-project.org
 Subject: [R] Problems importing Unix SAS .ssd04 file to R (Win)

 Hello,

 I'm trying to import a SAS file made using SAS on Unix. Currently I'm
 using SAS on Windows and I'm trying to import that .ssd04 file to R.
 The file name of the file is testfile.ssd04 and it is located in
 'M:\sasuser'. I'm using Windows XP and R 2.91. Basically what I'm
 doing is

  r code ##
  library(foreign)
  sashome - C:/Program Files/SAS Institute/SAS/V8
  folder_for_datafiles - M:/sasuser
  read.ssd(folder_for_datafiles, testfile, sascmd=file.path(sashome, 
  sas.exe))

 SAS failed.  SAS program at
 C:\DOCUME~1\jrara\LOCALS~1\Temp\RtmpAAxO3X\file41bb5af1.sas
 The log file will be file41bb5af1.log in the current directory
 NULL
 Warning message:
 In read.ssd(folder_for_datafiles, testfile, sascmd = file.path(sashome,  :
   SAS return code was 2

 ##

 This temporary SAS file 'file41bb5af1.sas' looks like this

  sas code #
 option validvarname = v6;libname src2rd 'M:/sasuser';
 libname rd xport
 'C:\DOCUME~1\jrara\LOCALS~1\Temp\RtmpAAxO3X\file6df11649';
 proc copy in=src2rd out=rd;
 select testfile ;
 ##

 Any ideas what I'm doing wrong?

  sessionInfo()
 R version 2.9.1 (2009-06-26)
 i386-pc-mingw32

 locale:
 LC_COLLATE=Finnish_Finland.1252;LC_CTYPE=Finnish_Finland.1252;LC_MON
 ETARY=Finnish_Finland.1252;LC_NUMERIC=C;LC_TIME=Finnish_Finland.1252

 attached base packages:
 [1] graphics  grDevices utils datasets  grid  stats
 methods   base

 other attached packages:
 [1] foreign_0.8-38 gregmisc_2.1.1 gplots_2.7.1   caTools_1.9
 bitops_1.0-4.1 gtools_2.6.1   gmodels_2.15.0 gdata_2.6.1

 loaded via a namespace (and not attached):
 [1] MASS_7.2-49

C can you read that dataset just using your Windows SAS v8?

Dan

Daniel J. Nordlund
Washington State Department of Social and Health Services
Planning, Performance, and Accountability
Research and Data Analysis Division
Olympia, WA  98504-5204

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] descriptive statistics qn

2009-10-20 Thread Sibusiso Moyo

This is day one on R for me, I am trying to figure out how to do simple
computations. For example I have a data set with 200 observations. I am
trying to compute the mean and variance in r for 1:25 (first 25
observations); 1:50 (first 50 obs) ; 1:100th observation etc. Here  is the
dataset: 

Id value
1 2.2338
2 3.13597
3 1.98685
4 4.35593
5 2.43963
6 4.20262
7 3.12131
8 4.79583
9 3.13937
10 3.47266
11 3.79954
12 4.32285
13 3.00058
14 1.26102
15 4.07329
16 3.42037
17 1.99195
18 5.4461
19 3.03448
2 2.63395
21 0.99851
22
4.59577
23
2.77719
24
4.82748
25
4.11373
26
2.38615
27
1.81247
28
0.920195
29
2.78129
30
3.10531
31
3.60793
32
1.66597
33
4.80673
34
4.80266
35
5.35791
36
2.44573
37
2.63854
38
4.13737
39
1.68496
40
3.86962
41
1.60028
42
4.35621
43
3.60847
44
2.38884
45
1.03786
46
5.48904
47
1.95086
48
2.75401
49
2.25081
50
3.46435
51
0.309484
52
2.49405
53
4.52522
54
1.17947
55
2.50136
56
1.74537
57
5.48072
58
1.81081
59
2.67793
60
4.40603
61
3.21767
62
2.68124
63
1.9723
64
1.25973
65
2.64177
66
3.28414
67
3.2394
68
4.07508
69
2.32812
70
3.42763
71
2.40944
72
2.43776
73
0.072419
74
6.29773
75
2.29704
76
6.07705
77
1.59146
78
4.74327
79
5.29088
80
4.17874
81
1.80639
82
3.70503
83
3.13294
84
5.44017
85
3.50002
86
2.1762
87
2.27356
88
4.45687
89
1.99272
90
3.74362
91
3.89474
92
0.93481
93
4.99464
94
1.02872
95
2.63587
96
3.85226
97
4.82513
98
1.1171
99
1.09368
100
3.91599
101
3.91832
102
1.81166
103
1.48144
104
5.24373
105
3.15134
106
3.03343
107
3.373
108
6.10409
109
4.84878
110
1.95195
111
2.25949
112
1.45546
113
3.60767
114
3.10228
115
3.22694
116
0.861855
117
3.93392
118
2.81212
119
1.62434
120
2.41149
121
4.42936
122
2.4765
123
0.979153
124
3.3482
125
1.20345
126
1.56576
127
5.1418
128
3.5585
129
3.21124
130
3.52901
131
2.38191
132
3.6129
133
3.56378
134
2.70181
135
0.355836
136
3.10734
137
6.22159
138
4.7172
139
2.88295
140
2.87566
141
2.01538
142
2.78835
143
2.26519
144
1.38217
145
2.63885
146
1.0584
147
1.93247
148
3.10316
149
2.24714
150
5.08326
151
2.21789
152
4.63046
153
3.90057
154
2.19171
155
5.46094
156
1.84918
157
1.04059
158
4.57513
159
5.06956
160
3.65212
161
1.14078
162
5.89406
163
2.86261
164
1.87261
165
4.47285
166
3.13892
167
1.59146
168
1.07166
169
2.82925
170
2.83742
171
5.13528
172
3.20376
173
1.56253
174
3.08726
175
3.52339
176
3.66837
177
2.70302
178
2.00471
179
3.6823
180
2.90728
181
2.39302
182
2.54367
183
3.98498
184
4.17124
185
1.99922
186
2.61908
187
2.24057
188
3.29836
189
2.59912
190
4.49458
191
1.44657
192
2.99004
193
5.15769
194
2.64342
195
1.81755
196
5.1977
197
4.10043
198
4.63281
199
3.20347
200
2.08211

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Counting

2009-10-20 Thread Ashta

Hi Bill and all,

On Tue, Oct 20, 2009 at 12:09 PM, William Dunlap wdun...@tibco.com wrote:
 From: r-help-boun...@r-project.org
 [mailto:r-help-boun...@r-project.org] On Behalf Of Peter Ehlers
 Sent: Tuesday, October 20, 2009 8:48 AM
 To: Ashta
 Cc: R help
 Subject: Re: [R] Counting

 How about

   unch - aggregate(x2==x1, by = list(x1=x1), FUN = sum)
   chgd - aggregate(x2!=x1, by = list(x1=x1), FUN = sum)

   -Peter Ehlers

 When I hear 'count' I think first of the table() function.
 E.g.,
    d-data.frame(x1=c(1,1,0,0,0,1,0), x2=c(1,0,1,1,0,1,1))
    with(d, table(x1, x1==x2))

   x1  FALSE TRUE
     0     3    1
     1     1    2
 or
    with(d, table(x1, factor(x1==x2,labels=c(Changed,Unchanged

   x1  Changed Unchanged
     0       3         1
     1       1         2
 or use dimnames- to change the labels on the table itself.

 This works very well for  numeric.
 How about if the factors are character such  as F and M  (male and female) ?

 Bill Dunlap
 Spotfire, TIBCO Software
 wdunlap tibco.com

 Ashta wrote:
  Hi All,

  Assume that I have the following data set  with two variables and I
  want count the number of observation with identical values
 and number
  of time each factor changed from x1 to x2.

  x1  x2
   1    1
   1    0
   0    1
   0    1
   0    0
   1    1
   0    1

  The output should be
  x1  changed
                        0   3    # has changed 3 times
                        1   1    # has changed 1 time
  x1 unchanged
                        0  1    # has unchanged only 1 time
                        1  2     # has unchanged 2 times

  Can someone help me how to do it in R?

  Thanks in advance

  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] RODBC sqlSave does not append the records to a DB2 table

2009-10-20 Thread Elaine Jones

Hello Caveman!

I am able to append records using other applications. My credentials have
the authority to append. I am able to query to select records, and even
delete records from the remote DB2 table using RODBC. I'm just stuck when
it comes to APPEND.

The DB2 database is on an AIX server, and I am on a Windows XP laptop.

 Elaine McGovern Jones 

 ISC Tape and DASD Storage Products
 Characterization and Failure Analysis Engineering
   Phone:  408  705-9588  Internal tieline: 587-9588
   jon...@us.ibm.com




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] descriptive statistics qn

2009-10-20 Thread stephen sefick

foo is the data frame you provided

mean(foo[1:25,2])
var(foo[1:25,2])

indexing key:
foo[rows,cols]

hth

Stephen Sefick

On Tue, Oct 20, 2009 at 11:04 AM, Sibusiso Moyo sm...@vt.edu wrote:
 This is day one on R for me, I am trying to figure out how to do simple
 computations. For example I have a data set with 200 observations. I am
 trying to compute the mean and variance in r for 1:25 (first 25
 observations); 1:50 (first 50 obs) ; 1:100th observation etc. Here  is the
 dataset:

 Id value
 1 2.2338
 2 3.13597
 3 1.98685
 4 4.35593
 5 2.43963
 6 4.20262
 7 3.12131
 8 4.79583
 9 3.13937
 10 3.47266
 11 3.79954
 12 4.32285
 13 3.00058
 14 1.26102
 15 4.07329
 16 3.42037
 17 1.99195
 18 5.4461
 19 3.03448
 2 2.63395
 21 0.99851
 22
 4.59577
 23
 2.77719
 24
 4.82748
 25
 4.11373
 26
 2.38615
 27
 1.81247
 28
 0.920195
 29
 2.78129
 30
 3.10531
 31
 3.60793
 32
 1.66597
 33
 4.80673
 34
 4.80266
 35
 5.35791
 36
 2.44573
 37
 2.63854
 38
 4.13737
 39
 1.68496
 40
 3.86962
 41
 1.60028
 42
 4.35621
 43
 3.60847
 44
 2.38884
 45
 1.03786
 46
 5.48904
 47
 1.95086
 48
 2.75401
 49
 2.25081
 50
 3.46435
 51
 0.309484
 52
 2.49405
 53
 4.52522
 54
 1.17947
 55
 2.50136
 56
 1.74537
 57
 5.48072
 58
 1.81081
 59
 2.67793
 60
 4.40603
 61
 3.21767
 62
 2.68124
 63
 1.9723
 64
 1.25973
 65
 2.64177
 66
 3.28414
 67
 3.2394
 68
 4.07508
 69
 2.32812
 70
 3.42763
 71
 2.40944
 72
 2.43776
 73
 0.072419
 74
 6.29773
 75
 2.29704
 76
 6.07705
 77
 1.59146
 78
 4.74327
 79
 5.29088
 80
 4.17874
 81
 1.80639
 82
 3.70503
 83
 3.13294
 84
 5.44017
 85
 3.50002
 86
 2.1762
 87
 2.27356
 88
 4.45687
 89
 1.99272
 90
 3.74362
 91
 3.89474
 92
 0.93481
 93
 4.99464
 94
 1.02872
 95
 2.63587
 96
 3.85226
 97
 4.82513
 98
 1.1171
 99
 1.09368
 100
 3.91599
 101
 3.91832
 102
 1.81166
 103
 1.48144
 104
 5.24373
 105
 3.15134
 106
 3.03343
 107
 3.373
 108
 6.10409
 109
 4.84878
 110
 1.95195
 111
 2.25949
 112
 1.45546
 113
 3.60767
 114
 3.10228
 115
 3.22694
 116
 0.861855
 117
 3.93392
 118
 2.81212
 119
 1.62434
 120
 2.41149
 121
 4.42936
 122
 2.4765
 123
 0.979153
 124
 3.3482
 125
 1.20345
 126
 1.56576
 127
 5.1418
 128
 3.5585
 129
 3.21124
 130
 3.52901
 131
 2.38191
 132
 3.6129
 133
 3.56378
 134
 2.70181
 135
 0.355836
 136
 3.10734
 137
 6.22159
 138
 4.7172
 139
 2.88295
 140
 2.87566
 141
 2.01538
 142
 2.78835
 143
 2.26519
 144
 1.38217
 145
 2.63885
 146
 1.0584
 147
 1.93247
 148
 3.10316
 149
 2.24714
 150
 5.08326
 151
 2.21789
 152
 4.63046
 153
 3.90057
 154
 2.19171
 155
 5.46094
 156
 1.84918
 157
 1.04059
 158
 4.57513
 159
 5.06956
 160
 3.65212
 161
 1.14078
 162
 5.89406
 163
 2.86261
 164
 1.87261
 165
 4.47285
 166
 3.13892
 167
 1.59146
 168
 1.07166
 169
 2.82925
 170
 2.83742
 171
 5.13528
 172
 3.20376
 173
 1.56253
 174
 3.08726
 175
 3.52339
 176
 3.66837
 177
 2.70302
 178
 2.00471
 179
 3.6823
 180
 2.90728
 181
 2.39302
 182
 2.54367
 183
 3.98498
 184
 4.17124
 185
 1.99922
 186
 2.61908
 187
 2.24057
 188
 3.29836
 189
 2.59912
 190
 4.49458
 191
 1.44657
 192
 2.99004
 193
 5.15769
 194
 2.64342
 195
 1.81755
 196
 5.1977
 197
 4.10043
 198
 4.63281
 199
 3.20347
 200
 2.08211

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Stephen Sefick

Let's not spend our time and resources thinking about things that are
so little or so large that all they really do for us is puff us up and
make us feel like gods.  We are mammals, and have not exhausted the
annoying little problems of being mammals.

-K. Mullis

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] underflow of fisher.test result

2009-10-20 Thread Peng Yu

On Tue, Oct 20, 2009 at 9:14 AM, Ted Harding
ted.hard...@manchester.ac.uk wrote:
 On 20-Oct-09 13:34:49, Peng Yu wrote:
 fisher.test() gives a very small p-value, which is underflow on my
 machine. However, the log of it should not be underflow. I'm wondering
 if there is a way get log() of a small p-value. An approximation is
 acceptable in this case. Thank you!

 fisher.test(rbind(c(1,10),c(10,1000)))$p.value
 [1] 0

 I have not attempted an exact approach (though may do so later),
 but the P-value in question is about 1e-15000 so (using log to
 base e)

  log(P) approx = -33000

 In such a case, P=0 is a pretty good approximation!

 Which prompts the question: Why the interest in having the value
 of such a very small number?

This is just an example showing the result is underflow. You don't
have to take it too literally.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] descriptive statistics qn

2009-10-20 Thread Daniel Malter

Type

?mean
?sd

in the R prompt. For your specific goal, type

mean(r[1:25])

in the prompt. Proceed analogously for sd

 Just download one of the very many beginner's manuals or introductory
course materials that you get for free on the internet, and get Ricci's
R-refcard.

Daniel


-
cuncta stricte discussurus
-

-Ursprüngliche Nachricht-
Von: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] Im
Auftrag von Sibusiso Moyo
Gesendet: Tuesday, October 20, 2009 12:05 PM
An: r-help@r-project.org
Betreff: [R] descriptive statistics qn

This is day one on R for me, I am trying to figure out how to do simple
computations. For example I have a data set with 200 observations. I am
trying to compute the mean and variance in r for 1:25 (first 25
observations); 1:50 (first 50 obs) ; 1:100th observation etc. Here  is the
dataset: 

Id value
1 2.2338
2 3.13597
3 1.98685
4 4.35593
5 2.43963
6 4.20262
7 3.12131
8 4.79583
9 3.13937
10 3.47266
11 3.79954
12 4.32285
13 3.00058
14 1.26102
15 4.07329
16 3.42037
17 1.99195
18 5.4461
19 3.03448
2 2.63395
21 0.99851
22
4.59577
23
2.77719
24
4.82748
25
4.11373
26
2.38615
27
1.81247
28
0.920195
29
2.78129
30
3.10531
31
3.60793
32
1.66597
33
4.80673
34
4.80266
35
5.35791
36
2.44573
37
2.63854
38
4.13737
39
1.68496
40
3.86962
41
1.60028
42
4.35621
43
3.60847
44
2.38884
45
1.03786
46
5.48904
47
1.95086
48
2.75401
49
2.25081
50
3.46435
51
0.309484
52
2.49405
53
4.52522
54
1.17947
55
2.50136
56
1.74537
57
5.48072
58
1.81081
59
2.67793
60
4.40603
61
3.21767
62
2.68124
63
1.9723
64
1.25973
65
2.64177
66
3.28414
67
3.2394
68
4.07508
69
2.32812
70
3.42763
71
2.40944
72
2.43776
73
0.072419
74
6.29773
75
2.29704
76
6.07705
77
1.59146
78
4.74327
79
5.29088
80
4.17874
81
1.80639
82
3.70503
83
3.13294
84
5.44017
85
3.50002
86
2.1762
87
2.27356
88
4.45687
89
1.99272
90
3.74362
91
3.89474
92
0.93481
93
4.99464
94
1.02872
95
2.63587
96
3.85226
97
4.82513
98
1.1171
99
1.09368
100
3.91599
101
3.91832
102
1.81166
103
1.48144
104
5.24373
105
3.15134
106
3.03343
107
3.373
108
6.10409
109
4.84878
110
1.95195
111
2.25949
112
1.45546
113
3.60767
114
3.10228
115
3.22694
116
0.861855
117
3.93392
118
2.81212
119
1.62434
120
2.41149
121
4.42936
122
2.4765
123
0.979153
124
3.3482
125
1.20345
126
1.56576
127
5.1418
128
3.5585
129
3.21124
130
3.52901
131
2.38191
132
3.6129
133
3.56378
134
2.70181
135
0.355836
136
3.10734
137
6.22159
138
4.7172
139
2.88295
140
2.87566
141
2.01538
142
2.78835
143
2.26519
144
1.38217
145
2.63885
146
1.0584
147
1.93247
148
3.10316
149
2.24714
150
5.08326
151
2.21789
152
4.63046
153
3.90057
154
2.19171
155
5.46094
156
1.84918
157
1.04059
158
4.57513
159
5.06956
160
3.65212
161
1.14078
162
5.89406
163
2.86261
164
1.87261
165
4.47285
166
3.13892
167
1.59146
168
1.07166
169
2.82925
170
2.83742
171
5.13528
172
3.20376
173
1.56253
174
3.08726
175
3.52339
176
3.66837
177
2.70302
178
2.00471
179
3.6823
180
2.90728
181
2.39302
182
2.54367
183
3.98498
184
4.17124
185
1.99922
186
2.61908
187
2.24057
188
3.29836
189
2.59912
190
4.49458
191
1.44657
192
2.99004
193
5.15769
194
2.64342
195
1.81755
196
5.1977
197
4.10043
198
4.63281
199
3.20347
200
2.08211

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Problem using the source-function within R-functions

2009-10-20 Thread Giovanni Petris


The problem probably lies in the source-ing part: look at

getwd()
setwd()

HTH,
Giovanni

 Date: Tue, 20 Oct 2009 13:00:02 +0200
 From: Johan Lassen jle...@gmail.com
 Sender: r-help-boun...@r-project.org
 Precedence: list
 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
 DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma;
 
 --===0554064772==
 Content-Type: text/plain
 Content-Disposition: inline
 Content-Transfer-Encoding: quoted-printable
 Content-length: 1477
 
 Dear R community,
 
 You may have the solution to how to construct a function using the function
 source() to build the function; i.e.
 
 myfunction - function(...){
 source('file1.r')
 source('file2.r')
 }
 
 After compiling and installing the myfunction in R, then calling the
 myfunction gives an error because the content of 'file1.r' and 'file2.r'
 seems to be missing.
 
 Anyone has the trick to overcome this problem?
 
 Thanks in advance!
 
 best wishes, Johan
 
 
 PS: My function is:
 
 
 run_accumm_value - function(ind_noder_0,
 ind_loc_val,ind_retention,downstream){
 ## Preprocessing of looping calculations:
 koersel_uden_ret - length(unique(ind_noder_0$oplid))
 opsaml_b_0_2 - numeric(koersel_uden_ret)
 opsaml_b_0_2_1 - numeric(koersel_uden_ret)
 opsaml_b_0_2_2 - seq(1:koersel_uden_ret)
 ## Preprocessing of topology and local values to be summed:
 source('preproces_topology.r', local =3D T)
 source('preproces_loc_val.r', local =3D T)
 # Loop for each grouping factor (column in ind_noder_0: oplid):
 for(j in 1:koersel_uden_ret){
 source('matrix_0.r', local =3D T)
 source('matrix.r', local =3D T)
 source('local_value.r', local =3D T)
 source('fordeling.r', local =3D T)
 source('fordeling_manuel.r', local =3D T)
 source('local_ret.r', local =3D T)
 source('Ax=3Db.r', local =3D T)
 source('opsamling_x_0_acc.r', local =3D T)
 }
 source('opsamling_b_1.r', local =3D T)
 opsaml_b_2
 }
 
 
 
 
 
 --=20
 Johan Lassen
 Environment Center Nyk=F8bing F
 Denmark
 
   [[alternative HTML version deleted]]
 
 
 --===0554064772==
 Content-Type: text/plain; charset=us-ascii
 MIME-Version: 1.0
 Content-Transfer-Encoding: 7bit
 Content-Disposition: inline
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 --===0554064772==--
 
 

-- 

Giovanni Petris  gpet...@uark.edu
Associate Professor
Department of Mathematical Sciences
University of Arkansas - Fayetteville, AR 72701
Ph: (479) 575-6324, 575-8630 (fax)
http://definetti.uark.edu/~gpetris/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] RODBC sqlSave does not append the records to a DB2 table

2009-10-20 Thread Elaine Jones

Hi Felipe,
Thanks for your message. Your approach works for an Access database. I am
trying to insert records into a table in a DB2 database on a remote AIX
server.
My userid has the authority to insert records, and I am able to do that
using another application (DB2 Client command line).
Sincerely,
 Elaine McGovern Jones 

 ISC Tape and DASD Storage Products
 Characterization and Failure Analysis Engineering
   Phone:  408  705-9588  Internal tieline: 587-9588
   jon...@us.ibm.com




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Systemfit package

2009-10-20 Thread Axel Leroix






 
Dear Arne Henningsen,
 
I send you this message because I have question with regard to systemfit 
package. I hope you answer to my request.
 
I estimated a system of equation bu using SUR method. The function summary(xx) 
gives me summary of estimated equation system. However, this function does not 
give my the value of the durbin watson statistic  for each one of my equations 
(to chek for serial correlation). 
Thus, my question is is there any function in the systemfit package which 
permit to return the value of durbin watson statistic or should I write my own 
program ?
 
Thank you in advance
 


  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] AFT-model with time-dependent covariates

2009-10-20 Thread Ravi Varadhan

Goran,

You had stated that:

I have only coded the log likelihood function, and I use 'optim' (in R code) 
and 'BFGS', without
gradient, and I ask for a numerically differentiated Hessian, which I use for 
calculating standard errors and p-values. Tests show that this works 
surprisingly well, but for huge data sets it is very slow. If it was possible 
to ask for a hessian in the C version of BFGS, things would improve a lot.

Here is a suggestion that could potentially work well for large data sets.  
There is a function called `bobyqa' in a package called minqa that is 
currently on R-forge (soon to be released; Kate Mullen is the author).  

http://r-forge.r-project.org/R/?group_id=395

This does not use derivatives, and might be faster on large-data sets.  You 
could then use the numDeriv package to calculate the hessian of the 
log-likelihood at the converged parameter value.  

Best,
Ravi.

---

Ravi Varadhan, Ph.D.

Assistant Professor, The Center on Aging and Health

Division of Geriatric Medicine and Gerontology 

Johns Hopkins University

Ph: (410) 502-2619

Fax: (410) 614-9625

Email: rvarad...@jhmi.edu

Webpage:  
http://www.jhsph.edu/agingandhealth/People/Faculty_personal_pages/Varadhan.html





-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Göran Broström
Sent: Tuesday, October 20, 2009 10:07 AM
To: spencerg
Cc: r-help@r-project.org; Terry Therneau
Subject: Re: [R] AFT-model with time-dependent covariates

Sorry for being late in responding to this thread, but I was made
aware of it only two weeks ago. In my package 'eha' there is a
function 'aftreg', which performs what is asked for, given that the
time-varying covariates are step functions of time and that the
observation period for each individual is an interval. Left truncation
works if it can be assumed that the covariate values during the first
non-observable interval are the same as at the beginning of the first
interval under observation.

As Terry wrote, there is a lot of book-keeping in writing down the
likelihood function, and even more for scores and the hessian, so I
adopted a lazy way in the programming: I have only coded the log
likelihood function, and I use 'optim' (in R code) and 'BFGS', without
gradient, and I ask for a numerically differentiated Hessian, which I
use for calculating standard errors and p-values. Tests show that this
works surprisingly well, but for huge data sets it is very slow. If it
was possible to ask for a hessian in the C version of BFGS, things
would improve a lot.

Also note that you need the latest version (1.2-12) of eha for this to
work. aftreg in arlier versions only works (correctly) for
time-constant covariates. And, this is not well tested, so care is
needed. And I appreciate performance and bug reports, of course.

Göran

On Wed, May 13, 2009 at 9:34 PM, spencerg spencer.gra...@prodsyse.com wrote:
 To see what's available in other packages, try the following:

 library(RSiteSearch)
 AFT - RSiteSearch.function('AFT model')
 summary(AFT) # 24 help files found in 8 different packages
 HTML(AFT) # opens a table with 24 rows in a web browser.
 There may be nothing here that will help you, but this provides a quick
 overview of what's available.  If this doesn't find what you want, it either
 has not been contributed or its help page does not use the phrase AFT
 model.

 Hope this helps. Spencer Graves

 Terry Therneau wrote:

  The coding for an AFT model with time-dependent covariates will be very
 hard, and I don't know of anyone who has done it.  (But I don't keep watch
 of other survival packages, so something might be there).
In a Cox model, a subject's risk depends only on the current value of
 his/her covariates; in an AFT model the risk depends on the entire covariate
 history.  (My 'accelerated age' is the sum of all the extra years I have
 ever gained).  Coding this is not theoretically complex, but would be a
 pain-in-the-rear amount of bookkeeping.
  Terry Therneau

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Göran Broström

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html

Re: [R] Interpretation of VarCorr results

2009-10-20 Thread Kingsford Jones

On Tue, Oct 20, 2009 at 5:09 AM, r-quantide r...@quantide.com wrote:
 Dear all,

 I'm working with a model to estimate components of variance by using the
 lme() function.



 The model is as the following:

 model=lme(fixed=X~1+as.factor(station),data=myData,na.action=na.exclude,rand
 om=~N+1|spliceOrHoming)



 Where X is the response measured variable, station is treated as fixed
 effects factor, N is a continuous variable, and spliceOrHoming is a
 (ordered) factor.

 The idea is that the response variable (X) varies almost linearly with N (in
 a decreasing fashion) , but the slope and intercept of this variation change
 with the levels of spliceOrHoming random factor.



 Now, the REML estimated standard deviation components are:

            StdDev       Corr

 (Intercept) 5.274544e-01 (Intr)

 N           7.912717e-05 -0.58

 Residual    1.508647e-01



 My questions are:

 .         Is the model correctly specified?


This depends on the questions you are trying to approach with the model...



 .         If yes, how should I read the standard deviation estimate for N
 (7.912717e-05)? Is this  the standard deviation for a unitary variation of
 N? Or is it the standard deviation due to the global contribution of the N
 independent variable? (note that I have a strong feeling that N contributes
 a lot to the final variance; so, the value that I obtain here seems too
 small for my beliefs)


Generally if you have a random slope you would also include that
covariate in the fixed component, producing an estimate of the
population (or marginal) slope in the fixed effects output, and an
estimate of the variation in slopes around that population slope among
the levels of the grouping variable (here spliceOrHoming) in the
random effects output.


 .         Is there any methods/functions to obtain the variance components
 for the station factor too? (please, give me some references, at least.)


With the proper design (i.e. enough data in the appropriate levels)
you might specify something like:
random = ~N + station - 1|spliceOrHoming
To get estimates of the between group variation for each of the levels
of the station factor, as well as for the N slope.


hth,

Kingsford Jones





 Thank you a lot in advance for every suggestions you'll give me.

 Enrico


        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Problems importing Unix SAS .ssd04 file to R (Win)

2009-10-20 Thread johannes rara

Thanks Daniel for your response. Actually I haven't tried that yet,
cause I'm not so fluent with SAS. I got this file from a colleague. I
was thinking that R can read these files, but maybe the problem is
with SAS.

-Johannes



C can you read that dataset just using your Windows SAS v8?

Dan

Daniel J. Nordlund
Washington State Department of Social and Health Services
Planning, Performance, and Accountability
Research and Data Analysis Division
Olympia, WA  98504-5204

2009/10/20 johannes rara johannesr...@gmail.com:
 Hello,

 I'm trying to import a SAS file made using SAS on Unix. Currently I'm
 using SAS on Windows and I'm trying to import that .ssd04 file to R.
 The file name of the file is testfile.ssd04 and it is located in
 'M:\sasuser'. I'm using Windows XP and R 2.91. Basically what I'm
 doing is

  r code ##
 library(foreign)
 sashome - C:/Program Files/SAS Institute/SAS/V8
 folder_for_datafiles - M:/sasuser
 read.ssd(folder_for_datafiles, testfile, sascmd=file.path(sashome, 
 sas.exe))

 SAS failed.  SAS program at
 C:\DOCUME~1\jrara\LOCALS~1\Temp\RtmpAAxO3X\file41bb5af1.sas
 The log file will be file41bb5af1.log in the current directory
 NULL
 Warning message:
 In read.ssd(folder_for_datafiles, testfile, sascmd = file.path(sashome,  :
  SAS return code was 2

 ##

 This temporary SAS file 'file41bb5af1.sas' looks like this

  sas code #
 option validvarname = v6;libname src2rd 'M:/sasuser';
 libname rd xport 'C:\DOCUME~1\jrara\LOCALS~1\Temp\RtmpAAxO3X\file6df11649';
 proc copy in=src2rd out=rd;
 select testfile ;
 ##

 Any ideas what I'm doing wrong?


 sessionInfo()
 R version 2.9.1 (2009-06-26)
 i386-pc-mingw32

 locale:
 LC_COLLATE=Finnish_Finland.1252;LC_CTYPE=Finnish_Finland.1252;LC_MONETARY=Finnish_Finland.1252;LC_NUMERIC=C;LC_TIME=Finnish_Finland.1252

 attached base packages:
 [1] graphics  grDevices utils     datasets  grid      stats
 methods   base

 other attached packages:
 [1] foreign_0.8-38 gregmisc_2.1.1 gplots_2.7.1   caTools_1.9
 bitops_1.0-4.1 gtools_2.6.1   gmodels_2.15.0 gdata_2.6.1

 loaded via a namespace (and not attached):
 [1] MASS_7.2-49



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] RODBC sqlSave does not append the records to a DB2 table

2009-10-20 Thread Orvalho Augusto

Now it becomes strange.

One thing I note from this generated query:
INSERT INTO STORAGE.TEST_APPEND2 ( MACRONAME, MACROUSER,
MACRO_RT ) VALUES ( 's_ej_mach_config_vz', 'jones2', 5 )

The names of the variables are in double quotes; That is a problem.
Can you try to run this query on another application and see what you
get.

Caveman


On Tue, Oct 20, 2009 at 7:05 PM, Elaine Jones jon...@us.ibm.com wrote:
 Hi Felipe,
 Thanks for your message. Your approach works for an Access database. I am
 trying to insert records into a table in a DB2 database on a remote AIX
 server.
 My userid has the authority to insert records, and I am able to do that
 using another application (DB2 Client command line).
 Sincerely,
  Elaine McGovern Jones 

  ISC Tape and DASD Storage Products
     Characterization and Failure Analysis Engineering
       Phone:  408  705-9588  Internal tieline: 587-9588
       jon...@us.ibm.com




        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Interpretation of VarCorr results

2009-10-20 Thread Kingsford Jones

On Tue, Oct 20, 2009 at 5:09 AM, r-quantide r...@quantide.com wrote:
[snip]
 .         Is there any methods/functions to obtain the variance components
 for the station factor too? (please, give me some references, at least.)

Pinheiro and Bates 2000 is (practically) a prerequisite for
intelligent use of the nlme package...





 Thank you a lot in advance for every suggestions you'll give me.

 Enrico


        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Counting

2009-10-20 Thread Peter Ehlers




Ashta wrote:

Hi Bill and all,


On Tue, Oct 20, 2009 at 12:09 PM, William Dunlap wdun...@tibco.com wrote:

From: r-help-boun...@r-project.org
[mailto:r-help-boun...@r-project.org] On Behalf Of Peter Ehlers
Sent: Tuesday, October 20, 2009 8:48 AM
To: Ashta
Cc: R help
Subject: Re: [R] Counting

How about

  unch - aggregate(x2==x1, by = list(x1=x1), FUN = sum)
  chgd - aggregate(x2!=x1, by = list(x1=x1), FUN = sum)

  -Peter Ehlers

When I hear 'count' I think first of the table() function.
E.g.,
   d-data.frame(x1=c(1,1,0,0,0,1,0), x2=c(1,0,1,1,0,1,1))
   with(d, table(x1, x1==x2))

  x1  FALSE TRUE
0 31
1 12
or
   with(d, table(x1, factor(x1==x2,labels=c(Changed,Unchanged

  x1  Changed Unchanged
0   3 1
1   1 2
or use dimnames- to change the labels on the table itself.


 This works very well for  numeric.
 How about if the factors are character such  as F and M  (male and female) ?



Did you try it? Works fine for me.

-Peter Ehlers






Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com


Ashta wrote:

Hi All,

Assume that I have the following data set  with two variables and I
want count the number of observation with identical values

and number

of time each factor changed from x1 to x2.

x1  x2
 11
 10
 01
 01
 00
 11
 01

The output should be
x1  changed
  0   3# has changed 3 times
  1   1# has changed 1 time
x1 unchanged
  0  1# has unchanged only 1 time
  1  2 # has unchanged 2 times

Can someone help me how to do it in R?

Thanks in advance

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide

http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.






__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] RODBC sqlSave does not append the records to a DB2 table

2009-10-20 Thread Elaine Jones

The query generated by R looks ok to me.  I went ahead and pasted it into
my DB2 Client command editor. And I got the message confirmation message.

INSERT INTO STORAGE.TEST_APPEND2 ( MACRONAME, MACROUSER, MACRO_RT )
VALUES ( 's_ej_mach_config_vz', 'jones2', 5 )

DB2I  The SQL command completed successfully.

And I confirmed the record was added to the table.

 Elaine McGovern Jones 

 ISC Tape and DASD Storage Products
 Characterization and Failure Analysis Engineering
   Phone:  408  705-9588  Internal tieline: 587-9588
   jon...@us.ibm.com




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] New Award Announcement ASA Stat Comp/Graph Sections: The Statistical Computing and Graphics Award

2009-10-20 Thread Fei Chen





The Statistical Computing and Graphics Award

The ASA Sections of Statistical Computing and Statistical Graphics
have established the Statistical Computing and Graphics Award to
recognize an individual or team for innovation in computing, software,
or graphics that has had a great impact on statistical practice or
research. Typically, awards are granted bi-annually.

The prize carries with it a cash award of $5,000 plus an allowance of
up to $1,000 for travel to the annual Joint Statistical Meetings (JSM)
where the award will be presented.

Qualifications
The prize-winning contribution will have had significant and lasting
impact on statistical computing, software or graphics.

The Awards Committee depends on the American Statistical Association
membership to submit nominations. Committee members will review the
nominations and make the final determination of who, if any, should
receive the award. The award may not be given to a sitting member of
the Awards Committee or a sitting member of the Executive Committee of
the Section of Statistical Computing or the Section of Statistical
Graphics.

Nomination and Award Dates
Nominations are due by December 15 of the award year. The award is
presented at the Joint Statistical Meetings in August of the same
year. The first award will be given in 2010, and subsequent awards are
to be made at most bi-annually according to the discretion of the
Awards Committee.

Nominations should be submitted as a complete packet, consisting of: 

 * nomination letter, no longer than four pages, addressing points in the
   selection criteria
 * nominee's curriculum vita(s)
 * maximum of four supporting letters, each no longer than two pages

Selection Process
The Awards Committee will consist of the Chairs and Past Chairs of the
Sections on Statistical Computing and Statistical Graphics.

The selection process will be handled by the Awards Chair of the
Statistical Computing Section. Nominations and questions are to be
sent to the e-mail address below.

Fei Chen
Avaya Labs
233 Mt Airy Rd
Basking Ridge, NJ 07920
f...@avaya.com

  
_
Hotmail: Powerful Free email with security by Microsoft.
http://clk.atdmt.com/GBL/go/171222986/direct/01/
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] ScatterPlot

2009-10-20 Thread Marsha Melnyk

I am trying to make a scatterplot with containing three columns.  I 
know how to plot the two columns but now the third column consists of M 
or F (male or female) and I don't know how to separate the data so I 
can make two separate regression lines on the same graph.

meta   #name of file
plot(meta$mass, meta$rate, xlab=mass, ylab=rate)

I can make the regression line with all the data but I just don't know 
how to split everything up.

abline(lm(rate ~ mass, data=meta), col=red)



Thanks,
Marsha

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] New Award Announcement ASA Stat Comp/Graph Sections: The Statistical Computing and Graphics Award

2009-10-20 Thread Fei Chen

Apologies for a mistake in the announcement. To clarify, for the 2010
Award, nomination deadline is Dec 15, 2009, not Dec 15, 2010 as the
announcement had implied.

From: f...@live.com
To: r-help@r-project.org
Subject: New Award Announcement ASA Stat Comp/Graph Sections: The Statistical 
Computing and Graphics Award
Date: Tue, 20 Oct 2009 15:03:43 -0400

The Statistical Computing and Graphics Award

The ASA Sections of Statistical Computing and Statistical Graphics
have established the Statistical Computing and Graphics Award to
recognize an individual or team for innovation in computing, software,
or graphics that has had a great impact on statistical practice or
research. Typically, awards are granted bi-annually.

The prize carries with it a cash award of $5,000 plus an allowance of
up to $1,000 for travel to the annual Joint Statistical Meetings (JSM)
where the award will be presented.

Qualifications
The prize-winning contribution will have had significant and lasting
impact on statistical computing, software or graphics.

The Awards Committee depends on the American Statistical Association
membership to submit nominations. Committee members will review the
nominations and make the final determination of who, if any, should
receive the award. The award may not be given to a sitting member of
the Awards Committee or a sitting member of the Executive Committee of
the Section of Statistical Computing or the Section of Statistical
Graphics.

Nomination and Award Dates
Nominations are due by December 15 of the award year. The award is
presented at the Joint Statistical Meetings in August of the same
year. The first award will be given in 2010, and subsequent awards are
to be made at most bi-annually according to the discretion of the
Awards Committee.

Nominations should be submitted as a complete packet, consisting of: 

 * nomination letter, no longer than four pages, addressing points in the
   selection criteria
 * nominee's curriculum vita(s)
 * maximum of four supporting letters, each no longer than two pages

Selection Process
The Awards Committee will consist of the Chairs and Past Chairs of the
Sections on Statistical Computing and Statistical Graphics.

The selection process will be handled by the Awards Chair of the
Statistical Computing Section. Nominations and questions are to be
sent to the e-mail address below.

Fei Chen
Avaya Labs
233 Mt Airy Rd
Basking Ridge, NJ 07920
f...@avaya.com

Hotmail: Powerful Free email with security by Microsoft. Get it now.

_
Hotmail: Powerful Free email with security by Microsoft.
http://clk.atdmt.com/GBL/go/171222986/direct/01/
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Dummy variables or factors?

2009-10-20 Thread Luciano La Sala

Dear R-people, 

I am analyzing epidemiological data using GLMM using the lmer package. I 
usually explore the assumption of linearity of continuous variables in the 
logit of the outcome by creating 4 categories of the variable, performing a 
bivariate logistic regression, and then plotting the coefficients of each 
category against their mid points. That gives me a pretty good idea about the 
linearity assumption and possible departures from it. 

I know of people who create 0,1 dummy variables in order to relax the linearity 
assumption. However, I've read that dummy variables are never needed (nor are 
desireble) in R! Instead, one should make use of factors variable. That is much 
easier to work with than dummy variables and the model itself will create the 
necessary dummy variables.

Having said that, if my data violates the linearity assumption, does the use of 
a factors for the variable in question helps overcome the lack of linearity?

Thanks in advance, 

Luciano 



  Yahoo! Cocina

Encontra las mejores recetas con Yahoo! Cocina.


http://ar.mujer.yahoo.com/cocina/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Fixed: Re: Re: [stats-rosuda-devel] how to install JGR manually?

2009-10-20 Thread cgw

To follow up:
I believe I had a version mismatch between one of the packages and R.  I made 
sure to install the latest of all, at which
point jgr.exe launched just fine.  Thanks to Simon and Liviu for their comments.
Just one note: the Windows launcher does not provide any info if it can't reach 
the 'net (as opposed to listing missing packages as suggested previously).

Carl



Oct 16, 2009 10:31:48 AM, simon.urba...@r-project.org wrote:

===

On Oct 16, 2009, at 9:14 , Liviu Andronic wrote:

 (cc'ing JGR specific list)

Yes, thanks, stats-rosuda-devel is where this post should go.

Carl,

 On 10/15/09, Carl Witthoft  wrote:
 Here's the problem: on Windows, the 'jgr.exe' tool starts up by  
 checking
 for a connecting to the 'net in order to grab the support packages.  
 Well,
 we have machines at work that are not and never will be connected  
 to the
 Internet.   I tried manually installing all the packages (JGR,  
 Rjava,  etc)
 but the jgr.exe still tries to find a connection.

What this means that you have either installed them in the wrong  
library or you have not installed them all. The launcher actually  
tells you exactly which packages are missing so it should be easy to  
install them manually. If in doubt, remove your preferences file  
.JGRprefsrc (in your home).

If you're still stuck, please tell us exactly your setup (where you  
installed R, the packages etc.). Also try to run JGR with --debug and  
it will create a file C:\JGRdebug.txt with additional information that  
may be helpful.

Cheers,
Simon

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] glm.fit to use LAPACK instead of LINPACK

2009-10-20 Thread Ted


Hi,

I understand that the glm.fit calls LINPACK fortran routines instead of 
LAPACK because it can handle the 'rank deficiency problem'.  If my data 
matrix is not rank deficient, would a glm.fit function which runs on 
LAPACK be faster?  Would this be worthwhile to convert glm.fit to use 
LAPACK?  Has anyone done this already??  What is the best way to do this?


I'm looking at very large datasets (thousands of glm calls), and would 
like to know if it's worth the effort for performance issues.


Thanks,

Ted

-
Ted Chiang
 Bioinformatics Analyst
 Centre for Computational Biology
 Hospital for Sick Children, Toronto
 416.813.7028
 tchi...@sickkids.ca

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Systemfit package

2009-10-20 Thread Arne Henningsen

Hi Axel,

On Tue, Oct 20, 2009 at 7:09 PM, Axel Leroix axel.ler...@yahoo.fr wrote:
 I estimated a system of equation bu using SUR method. The function summary(xx)
 gives me summary of estimated equation system. However, this function does not
 give my the value of the durbin watson statistic  for each one of my 
 equations (to
 chek for serial correlation).
 Thus, my question is is there any function in the systemfit package which 
 permit
 to return the value of durbin watson statistic or should I write my own 
 program ?

The systemfit package does not provide any tools for the durbin watson
test statistic (yet). So, you are invited to write this tool, whereas
you could either add this via the SVN repo on R-Forge or send it to me
by email :-)

/Arne

-- 
Arne Henningsen
http://www.arne-henningsen.name

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] TukeyHSD no longer working with aov output?

2009-10-20 Thread Clayton Coffman

I can prove I've done this before, but I recently installed Rexcel (and it
was easiest to reinstall R and some other bits to make it work) and now its
no longer working.

Before I would do an ANOVA and a tukey post-hoc like this:

data1.aov=aov(result~factor1*factor2, data=data1)

then...

TukeyHSD(summary(data1.aov))

and it would give me a nice tukey table of all the pairwise comparisons, now
though it gives me:

in UseMethod(TukeyHSD) : no applicable method for TukeyHSD

Has something changed?  Does TukeyHSD no longer accept aov results?

Is there a package I'm missing?  I am very frustrated.

Thanks,
Clayton

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Random Forest - partial dependence plot

2009-10-20 Thread Carlos M. Zambrana-Torrelio

Thanks Andy,

I'm sorry, I didn't clear myself. I was talking about the y-axis, so
your explanation was very helpful.



On Tue, Oct 20, 2009 at 9:39 AM, Liaw, Andy andy_l...@merck.com wrote:
 Are you talking about the y-axis or the x-axis?  If you're talking about
 the y-axis, that range isn't really very meaningful.  The partial
 dependence function basically gives you the average trend of that
 variable (integrating out all others in the model).  It's the shape of
 that trend that is important.  You may interpret the relative range of
 these plots from different predictor variables, but not the absolute
 range.  Hope that helps.

 Andy

 -Original Message-
 From: r-help-boun...@r-project.org
 [mailto:r-help-boun...@r-project.org] On Behalf Of Carlos M.
 Zambrana-Torrelio
 Sent: Monday, October 19, 2009 3:47 PM
 To: r-help@r-project.org
 Subject: [R] Random Forest - partial dependence plot

 Hi everybody,

 I used random forest regression to explain the patterns of species
 richness and a bunch of climate variables (e.g. Temperature,
 precipitation, etc.) All are continuos variables. My results are
 really interesting and my model explained 96,7% of the variance.

 Now I am  trying to take advantage of the  importance variable
 function and depicts the observed patterns using partial dependence
 plots.

 However, I found a really strange (at least for me...) behavior: the
 species number ranges between 1 to 150, but when I make the partial
 plot the graphic only represent values between 43 to 50!!


 I  use the following code to get the partial plot:

 partialPlot(ampric.rf, amp.data, Temp)

 where ampric.rf is the random forest object; amp.data are the data and
 Temp is the variable I am interested.

 How I can have partial plot explaining all species number
 (from 1 to 150)??
 Also, I read the RF documentation and I was wondering what its the
 meaning of marginal effect of a variable

 Thanks for your help

 Carlos



  I found really interesting

 --
 Carlos M. Zambrana-Torrelio
 Department of Biology
 University of Puerto Rico - RP
 PO BOX 23360
 San Juan, PR 00931-3360

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

 Notice:  This e-mail message, together with any attachments, contains
 information of Merck  Co., Inc. (One Merck Drive, Whitehouse Station,
 New Jersey, USA 08889), and/or its affiliates (which may be known
 outside the United States as Merck Frosst, Merck Sharp  Dohme or
 MSD and in Japan, as Banyu - direct contact information for affiliates is
 available at http://www.merck.com/contact/contacts.html) that may be
 confidential, proprietary copyrighted and/or legally privileged. It is
 intended solely for the use of the individual or entity named on this
 message. If you are not the intended recipient, and have received this
 message in error, please notify us immediately by reply e-mail and
 then delete it from your system.





-- 
Carlos M. Zambrana-Torrelio
Department of Biology
University of Puerto Rico - RP
PO BOX 23360
San Juan, PR 00931-3360

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] TukeyHSD no longer working with aov output?

2009-10-20 Thread Ista Zahn

Hi Clayton,
I don't think you need summary().

TukeyHSD(data1.aov)

should work.

-Ista

On Tue, Oct 20, 2009 at 4:17 PM, Clayton Coffman
clayton.coff...@gmail.com wrote:
 I can prove I've done this before, but I recently installed Rexcel (and it
 was easiest to reinstall R and some other bits to make it work) and now its
 no longer working.

 Before I would do an ANOVA and a tukey post-hoc like this:

data1.aov=aov(result~factor1*factor2, data=data1)

 then...

TukeyHSD(summary(data1.aov))

 and it would give me a nice tukey table of all the pairwise comparisons, now
 though it gives me:

in UseMethod(TukeyHSD) : no applicable method for TukeyHSD

 Has something changed?  Does TukeyHSD no longer accept aov results?

 Is there a package I'm missing?  I am very frustrated.

 Thanks,
 Clayton

        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Ista Zahn
Graduate student
University of Rochester
Department of Clinical and Social Psychology
http://yourpsyche.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Weighted Logistic Regressions using svyglm

2009-10-20 Thread tlumley

On Mon, 19 Oct 2009, Fulton wrote:


 I’m running some logistic regressions and I’ve been trying to include weights
 in the equation.  However, when I run the model, I get this warning message:

 Here’s what it says:  Warning message: In eval(expr, envir, enclos) :
 non-integer #successes in a binomial glm!  

 I think it is because the weights are non-integer values.

Yes

 What is a good way to run logistic regressions in R when using non-integer
 weights?


You should use family=quasibinomial(). For svyglm() this is identical to 
binomial() except that it doesn't warn about non-integer weights.  

I'll update the documentation for svyglm() to make this explicit.

-thomas

Thomas Lumley   Assoc. Professor, Biostatistics
tlum...@u.washington.eduUniversity of Washington, Seattle
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Transparent Bands in R

2009-10-20 Thread mnstn


Hello All,

My question is regarding the attached plot. I would like to have multiple
transparent green bands running the length (yaxis) of the plot the width of
which is determined by the green lines at y=0 in the plot. Can you suggest a
way to do it? 
For those who can't or are unwilling to download the file the plot is at
http://www.twitpic.com/ma8w0

Thanks!
http://www.nabble.com/file/p25983169/foo.ps foo.ps 
-- 
View this message in context: 
http://www.nabble.com/Transparent-Bands-in-R-tp25983169p25983169.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] editors for R

2009-10-20 Thread Mark Mueller

Hello all,

After reviewing the IDE/Script Editors article at sciviews.org, I
wanted to pose a quick question here to see if anyone can offer an
opinion or commentary about GUI editors that can be installed in a
Windoze environment that allow editing/saving of remote .R files and
running R programs from within a shell that is housed in the editor
(although R itself is installed in a Linux environment).  Windoze
would essentially be providing a GUI-based tool to edit, save, and
execute without making the user copy files back and forth and switch
between various programs to execute their routines.

Thus far, BlueFish seems to come closest to this; but other
recommendations will be most appreciated.

Best,
Mark

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] editors for R

2009-10-20 Thread Richard M. Heiberger


ESS does everything you requested.

ess.r-project.org

ess-remote is the specific feature that allows you to run a program
on another computer from within a buffer on the local computer.

Rich

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] TukeyHSD no longer working with aov output?

2009-10-20 Thread Clayton Coffman

I've tried that as well, and I get an error like:
 TukeyHSD(phenolic.aov)
Error in rep.int(n, length(means)) : unimplemented type 'NULL' in 'rep'
In addition: Warning messages:
1: In replications(paste(~, xx), data = mf) : non-factors ignored: time
2: In replications(paste(~, xx), data = mf) :
  non-factors ignored: sink.status, time
3: In replications(paste(~, xx), data = mf) :
  non-factors ignored: time, cat
4: In replications(paste(~, xx), data = mf) :
  non-factors ignored: sink.status, time, cat

however, I think its a problem with my data, something about how Rexcel
imported it, becuase when I do it on the example data in the HSAUR package,
it works

I'll try importing the data manually.



On Tue, Oct 20, 2009 at 3:33 PM, Ista Zahn istaz...@gmail.com wrote:

 Hi Clayton,
 I don't think you need summary().

 TukeyHSD(data1.aov)

 should work.

 -Ista

 On Tue, Oct 20, 2009 at 4:17 PM, Clayton Coffman
 clayton.coff...@gmail.comhttps://mail.google.com/mail?view=cmtf=0to=clayton.coff...@gmail.com
 wrote:
  I can prove I've done this before, but I recently installed Rexcel (and
 it
  was easiest to reinstall R and some other bits to make it work) and now
 its
  no longer working.
 
  Before I would do an ANOVA and a tukey post-hoc like this:
 
 data1.aov=aov(result~factor1*factor2, data=data1)
 
  then...
 
 TukeyHSD(summary(data1.aov))
 
  and it would give me a nice tukey table of all the pairwise comparisons,
 now
  though it gives me:
 
 in UseMethod(TukeyHSD) : no applicable method for TukeyHSD
 
  Has something changed?  Does TukeyHSD no longer accept aov results?
 
  Is there a package I'm missing?  I am very frustrated.
 
  Thanks,
  Clayton
 
 [[alternative HTML version deleted]]
 
  __
  R-help@r-project.orghttps://mail.google.com/mail?view=cmtf=0to=r-h...@r-project.orgmailing
   list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 



 --
 Ista Zahn
 Graduate student
 University of Rochester
 Department of Clinical and Social Psychology
 http://yourpsyche.org


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Transparent Bands in R

2009-10-20 Thread Megha Patnaik

Hi,

Maybe use polygon() ?

Good luck,
Megha.

2009/10/20 mnstn pavan.n...@gmail.com:

 Hello All,

 My question is regarding the attached plot. I would like to have multiple
 transparent green bands running the length (yaxis) of the plot the width of
 which is determined by the green lines at y=0 in the plot. Can you suggest a
 way to do it?
 For those who can't or are unwilling to download the file the plot is at
 http://www.twitpic.com/ma8w0

 Thanks!
 http://www.nabble.com/file/p25983169/foo.ps foo.ps
 --
 View this message in context: 
 http://www.nabble.com/Transparent-Bands-in-R-tp25983169p25983169.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Dummy variables or factors?

2009-10-20 Thread David Winsemius



On Oct 20, 2009, at 4:00 PM, Luciano La Sala wrote:


Dear R-people,

I am analyzing epidemiological data using GLMM using the lmer  
package. I usually explore the assumption of linearity of continuous  
variables in the logit of the outcome by creating 4 categories of  
the variable, performing a bivariate logistic regression, and then  
plotting the coefficients of each category against their mid points.  
That gives me a pretty good idea about the linearity assumption and  
possible departures from it.


I know of people who create 0,1 dummy variables in order to relax  
the linearity assumption. However, I've read that dummy variables  
are never needed (nor are desireble) in R! Instead, one should make  
use of factors variable. That is much easier to work with than dummy  
variables and the model itself will create the necessary dummy  
variables.


Having said that, if my data violates the linearity assumption, does  
the use of a factors for the variable in question helps overcome the  
lack of linearity?


No. If done by dividing into samall numbers of categories after  
looking at the data, it merely creates other (and probably more  
severe) problems. If you are in the unusal (although desirable)  
position of having a large number of events across the range of the  
covariates in your data, you may be able to cut your variable into  
quintiles or deciles and analyze the resulting factor, but the  
preferred approach would be to fit a regression spline of sufficient  
complexity.



Thanks in advance.


--

David Winsemius, MD
Heritage Laboratories
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] TukeyHSD no longer working with aov output?

2009-10-20 Thread Clayton Coffman

Problem solved:

Lesson learned (I think):

TukeyHSD doesn't like it when you use numbers as names for the factors.  ie
factor time cannot be 24 and 48 but twenty-four and forty-eight
work fine.

On Tue, Oct 20, 2009 at 5:36 PM, Clayton Coffman
clayton.coff...@gmail.comwrote:

 I've tried that as well, and I get an error like:
  TukeyHSD(phenolic.aov)
 Error in rep.int(n, length(means)) : unimplemented type 'NULL' in 'rep'
 In addition: Warning messages:
 1: In replications(paste(~, xx), data = mf) : non-factors ignored: time
 2: In replications(paste(~, xx), data = mf) :
   non-factors ignored: sink.status, time
 3: In replications(paste(~, xx), data = mf) :
   non-factors ignored: time, cat
 4: In replications(paste(~, xx), data = mf) :
   non-factors ignored: sink.status, time, cat

 however, I think its a problem with my data, something about how Rexcel
 imported it, becuase when I do it on the example data in the HSAUR package,
 it works

 I'll try importing the data manually.




 On Tue, Oct 20, 2009 at 3:33 PM, Ista Zahn 
 istaz...@gmail.comhttps://mail.google.com/mail?view=cmtf=0to=istaz...@gmail.com
  wrote:

 Hi Clayton,
 I don't think you need summary().

 TukeyHSD(data1.aov)

 should work.

 -Ista

 On Tue, Oct 20, 2009 at 4:17 PM, Clayton Coffman
 clayton.coff...@gmail.comhttps://mail.google.com/mail?view=cmtf=0to=clayton.coff...@gmail.com
 wrote:
  I can prove I've done this before, but I recently installed Rexcel (and
 it
  was easiest to reinstall R and some other bits to make it work) and now
 its
  no longer working.
 
  Before I would do an ANOVA and a tukey post-hoc like this:
 
 data1.aov=aov(result~factor1*factor2, data=data1)
 
  then...
 
 TukeyHSD(summary(data1.aov))
 
  and it would give me a nice tukey table of all the pairwise comparisons,
 now
  though it gives me:
 
 in UseMethod(TukeyHSD) : no applicable method for TukeyHSD
 
  Has something changed?  Does TukeyHSD no longer accept aov results?
 
  Is there a package I'm missing?  I am very frustrated.
 
  Thanks,
  Clayton
 
 [[alternative HTML version deleted]]
 
  __
  R-help@r-project.orghttps://mail.google.com/mail?view=cmtf=0to=r-h...@r-project.orgmailing
   list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 



 --
 Ista Zahn
 Graduate student
 University of Rochester
 Department of Clinical and Social Psychology
 http://yourpsyche.org




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] TukeyHSD no longer working with aov output?

2009-10-20 Thread Richard M. Heiberger


I think its a problem with my data, something about how Rexcel
imported it

We don't have enough information to be sure.  My guess is that your
data in Excel is integers which are intended to be levels of a factor.
Excel doesn't distinguish between integers and integers that might be
factor levels.  This is an Excel issue, not an RExcel issue.

When you sent your data to R outside of RExcel, what did you do to
force the levels to be interpreted as factors.  Something similar must
be done when you send the data to R from within RExcel.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] TukeyHSD no longer working with aov output?

2009-10-20 Thread Clayton Coffman

interesting to know, that might explain it.  When I imported I just saved as
CSV and imported with read.csv, I've never done anything to specify the
integers as factors (this is the first time I've used numbers as names).

Here are the precise commands I use(d):

 phen=read.csv(file=phenolics.csv)
 phen.aov=aov(phenolics~sink*time*cat, data=phen)
 summary(phen.aov)
   Df  Sum Sq Mean Sq F valuePr(F)
sink2 1817.79  908.89 59.5934  2.2e-16 ***
time1   19.07   19.07  1.2501  0.265039
cat 1  125.95  125.95  8.2581  0.004548 **
sink:time   2   12.026.01  0.3942  0.674834
sink:cat2  305.68  152.84 10.0213 7.493e-05 ***
time:cat1  106.53  106.53  6.9849  0.008949 **
sink:time:cat   24.222.11  0.1384  0.870867
Residuals 179 2730.03   15.25
---
Signif. codes:  0 *** 0.001 ** 0.01 * 0.05 . 0.1   1
1 observation deleted due to missingness

TukeyHSD(phen.aov)
  Tukey multiple comparisons of means
95% family-wise confidence level

Fit: aov(formula = phenolics ~ sink * time * cat, data = phen)

$sink
 diff   lwr  upr p adj
Cut-Bolted -0.3491591 -1.980764 1.282446 0.8686316
Veg-Bolted  6.3800257  4.741959 8.018092 0.000
Veg-Cut 6.7291848  5.091118 8.367252 0.000

$time
...etc.

I can see how the data format of the cell in Excel might confuse R if you
imported it as an .xls file, or used Rexcel, thats avoided by using .csv.

Whats interesting is that the aov function has no problem determining the
levels of the factor be they integers or text, TukeyHSD doesn't like it
though.  When I did the summary(aov) with the data formatted such that it
would not go into TukeyHSD it was imported with read.csv, but it had
numbers/integers as the factor levels.  TukeyHSD didn't work.  When I
reformatted those numbers to text (ie 24 to twenty-four), TukeyHSD
worked fine.

I assume it has something to do with how aov and TukeyHSD determine factor
levels, aov may see anything in the specified factor-vectors as characters,
whereas TukeyHSD will read integers as integers and get confused.




On Tue, Oct 20, 2009 at 5:50 PM, Richard M. Heiberger r...@temple.eduwrote:

 I think its a problem with my data, something about how Rexcel
 imported it

 We don't have enough information to be sure.  My guess is that your
 data in Excel is integers which are intended to be levels of a factor.
 Excel doesn't distinguish between integers and integers that might be
 factor levels.  This is an Excel issue, not an RExcel issue.

 When you sent your data to R outside of RExcel, what did you do to
 force the levels to be interpreted as factors.  Something similar must
 be done when you send the data to R from within RExcel.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] TukeyHSD no longer working with aov output?

2009-10-20 Thread Richard M. Heiberger


Now I am worried that you have a wrong analysis.
the aov function is perfectly happy using either factors or
numeric variables.  Are there really only two levels of time,
which is what one degree of freedom for time suggests?  Or are there
more than two level, but since aov() sees that as a numeric variable
it has only one df.

csv files can interfere with the interpretation of data values, but
not likely in this case.  They too will have numbers interpreted
as numeric.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] List of Windows time zones?

2009-10-20 Thread Mark Na

Hello,

I would like to adapt the following code:

 data$datetime3-format(data$datetime2, tz=EST)

for different time zones. For example, North America's CST and MDT (but
those codes don't work).

I have read ?Sys.timezone but I'm afraid it isn't very helpful. I'm just
looking for a list of the timezones, not a history of how time zones have
been dealt with throughout R's history. Previously asked messages (and
replies) on R-help provide confusing and sometimes contradictory advice.

Is there a simple list of timezones, for R on Windows? Or, can they be
calculated on the fly (but e.g., GMT+6 does not work in the above code).

Any help would be much appreciated as I am getting a bit frustrated, thanks!

Mark Na

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] ScatterPlot

2009-10-20 Thread Kingsford Jones

Here's an example:

x - runif(100)
sex - sample(0:1, 100, replace = TRUE)
y - sex + 2*x + x*sex + rnorm(100)
dfr - data.frame(y, x, sex)
f1 - lm(y ~ x, data = dfr, subset = sex == 0)
f2 - lm(y ~ x, data = dfr, subset = sex == 1)
plot(x, y, pch = sex+1)
abline(f1, col = 2, lty = 2)
abline(f2, col = 3, lty = 3)
legend('topleft', c('male', 'female'), lty = 2:3, col = 2:3)

For some fancier plots see the ggplot2 examples here:

http://had.co.nz/ggplot2/stat_smooth.html

hth,

Kingsford Jones



On Tue, Oct 20, 2009 at 1:20 PM, Marsha Melnyk mmel...@stevens.edu wrote:
 I am trying to make a scatterplot with containing three columns.  I
 know how to plot the two columns but now the third column consists of M
 or F (male or female) and I don't know how to separate the data so I
 can make two separate regression lines on the same graph.

 meta   #name of file
 plot(meta$mass, meta$rate, xlab=mass, ylab=rate)

 I can make the regression line with all the data but I just don't know
 how to split everything up.

 abline(lm(rate ~ mass, data=meta), col=red)



 Thanks,
 Marsha

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] TukeyHSD no longer working with aov output?

2009-10-20 Thread Clayton Coffman

There are two factors of time, but they are evenly replicated across all the
other factors/levels.  The experiment is perfectly balanced except for one
lost sample, which is deleted automatically in the aov.  I am very certain
the analysis is correct.  I think its merely a discrepancy between how aov
and TukeyHSD group the levels of the factor variable. Probably aov handles
any vector specified as a character vector, whereas TukeyHSD merely assumes
its a character vector.

It might be something to refine in the stats package so the functions, which
are designed somewhat to be used together, can work on the same kinds of
data.

On Tue, Oct 20, 2009 at 6:13 PM, Richard M. Heiberger r...@temple.eduwrote:

 Now I am worried that you have a wrong analysis.
 the aov function is perfectly happy using either factors or
 numeric variables.  Are there really only two levels of time,
 which is what one degree of freedom for time suggests?  Or are there
 more than two level, but since aov() sees that as a numeric variable
 it has only one df.

 csv files can interfere with the interpretation of data values, but
 not likely in this case.  They too will have numbers interpreted
 as numeric.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] RODBC sqlSave does not append the records to a DB2 table

2009-10-20 Thread Orvalho Augusto

Well I do not know what could be happening.

I tried to append records to a table on DB2 (community Edition 9.5 on
a Linux machine). And it works greatly. The only odd thing (may be
because of my low skills on DB2) is I could not create neither append
to a table on a different schema than dbuser (actually my account was
dbuser).

My R version is 2.7.1 under another Linux PC and RODBC is 1.2-3.

Caveman

On Tue, Oct 20, 2009 at 8:45 PM, Elaine Jones jon...@us.ibm.com wrote:
 The query generated by R looks ok to me. I went ahead and pasted it into my
 DB2 Client command editor. And I got the message confirmation message.

 INSERT INTO STORAGE.TEST_APPEND2 ( MACRONAME, MACROUSER, MACRO_RT )
 VALUES ( 's_ej_mach_config_vz', 'jones2', 5 )

 DB2I The SQL command completed successfully.

 And I confirmed the record was added to the table.

  Elaine McGovern Jones 

 ISC Tape and DASD Storage Products
 Characterization and Failure Analysis Engineering
 Phone: 408 705-9588 Internal tieline: 587-9588
 jon...@us.ibm.com






__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Dummy variables or factors?

2009-10-20 Thread andrew

The following is *significantly* easier to do than try and add in
dummy variables, although the dummy variable approach is going to give
you exactly the same answer as the factor method, but possibly with a
different baseline.

Basically, you might want to search the lm help and possibly consult a
stats book on information about how the design matrix is constructed
in both cases.

 xF - factor(1:10)
 N - 1000
 xFs - sample(x=xF,N,replace = T)
 yFs - rnorm(N, mean = as.numeric(xFs))
 lm(yFs ~ xFs)

Call:
lm(formula = yFs ~ xFs)

Coefficients:
(Intercept) xFs2 xFs3 xFs4
xFs5 xFs6 xFs7 xFs8
 0.7845   1.1620   2.1474   3.1391   4.2183
5.2621   6.0814   7.4170
   xFs9xFs10
 8.2193   9.2987

 lm(yFs ~ diag(10)[,1:9][xFs,])

Call:
lm(formula = yFs ~ diag(10)[, 1:9][xFs, ])

Coefficients:
(Intercept)  diag(10)[, 1:9][xFs, ]1  diag(10)[, 1:9]
[xFs, ]2  diag(10)[, 1:9][xFs, ]3
 10.083   -9.299
-8.137   -7.151
diag(10)[, 1:9][xFs, ]4  diag(10)[, 1:9][xFs, ]5  diag(10)[, 1:9]
[xFs, ]6  diag(10)[, 1:9][xFs, ]7
 -6.160   -5.080
-4.037   -3.217
diag(10)[, 1:9][xFs, ]8  diag(10)[, 1:9][xFs, ]9
 -1.882   -1.079




On Oct 21, 9:44 am, David Winsemius dwinsem...@comcast.net wrote:
 On Oct 20, 2009, at 4:00 PM, Luciano La Sala wrote:



  Dear R-people,

  I am analyzing epidemiological data using GLMM using the lmer  
  package. I usually explore the assumption of linearity of continuous  
  variables in the logit of the outcome by creating 4 categories of  
  the variable, performing a bivariate logistic regression, and then  
  plotting the coefficients of each category against their mid points.  
  That gives me a pretty good idea about the linearity assumption and  
  possible departures from it.

  I know of people who create 0,1 dummy variables in order to relax  
  the linearity assumption. However, I've read that dummy variables  
  are never needed (nor are desireble) in R! Instead, one should make  
  use of factors variable. That is much easier to work with than dummy  
  variables and the model itself will create the necessary dummy  
  variables.

  Having said that, if my data violates the linearity assumption, does  
  the use of a factors for the variable in question helps overcome the  
  lack of linearity?

 No. If done by dividing into samall numbers of categories after  
 looking at the data, it merely creates other (and probably more  
 severe) problems. If you are in the unusal (although desirable)  
 position of having a large number of events across the range of the  
 covariates in your data, you may be able to cut your variable into  
 quintiles or deciles and analyze the resulting factor, but the  
 preferred approach would be to fit a regression spline of sufficient  
 complexity.

  Thanks in advance.

 --

 David Winsemius, MD
 Heritage Laboratories
 West Hartford, CT

 __
 r-h...@r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Dummy variables or factors?

2009-10-20 Thread andrew

Oh dear, that doesn't look right at all.  I shall have a think about
what I did wrong and maybe follow my own advice and consult the doco
myself!


On Oct 21, 2:45 pm, andrew andrewjohnro...@gmail.com wrote:
 The following is *significantly* easier to do than try and add in
 dummy variables, although the dummy variable approach is going to give
 you exactly the same answer as the factor method, but possibly with a
 different baseline.

 Basically, you might want to search the lm help and possibly consult a
 stats book on information about how the design matrix is constructed
 in both cases.

  xF - factor(1:10)
  N - 1000
  xFs - sample(x=xF,N,replace = T)
  yFs - rnorm(N, mean = as.numeric(xFs))
  lm(yFs ~ xFs)

 Call:
 lm(formula = yFs ~ xFs)

 Coefficients:
 (Intercept)         xFs2         xFs3         xFs4
 xFs5         xFs6         xFs7         xFs8
      0.7845       1.1620       2.1474       3.1391       4.2183
 5.2621       6.0814       7.4170
        xFs9        xFs10
      8.2193       9.2987

  lm(yFs ~ diag(10)[,1:9][xFs,])

 Call:
 lm(formula = yFs ~ diag(10)[, 1:9][xFs, ])

 Coefficients:
             (Intercept)  diag(10)[, 1:9][xFs, ]1  diag(10)[, 1:9]
 [xFs, ]2  diag(10)[, 1:9][xFs, ]3
                  10.083                   -9.299
 -8.137                   -7.151
 diag(10)[, 1:9][xFs, ]4  diag(10)[, 1:9][xFs, ]5  diag(10)[, 1:9]
 [xFs, ]6  diag(10)[, 1:9][xFs, ]7
                  -6.160                   -5.080
 -4.037                   -3.217
 diag(10)[, 1:9][xFs, ]8  diag(10)[, 1:9][xFs, ]9
                  -1.882                   -1.079

 On Oct 21, 9:44 am, David Winsemius dwinsem...@comcast.net wrote:



  On Oct 20, 2009, at 4:00 PM, Luciano La Sala wrote:

   Dear R-people,

   I am analyzing epidemiological data using GLMM using the lmer  
   package. I usually explore the assumption of linearity of continuous  
   variables in the logit of the outcome by creating 4 categories of  
   the variable, performing a bivariate logistic regression, and then  
   plotting the coefficients of each category against their mid points.  
   That gives me a pretty good idea about the linearity assumption and  
   possible departures from it.

   I know of people who create 0,1 dummy variables in order to relax  
   the linearity assumption. However, I've read that dummy variables  
   are never needed (nor are desireble) in R! Instead, one should make  
   use of factors variable. That is much easier to work with than dummy  
   variables and the model itself will create the necessary dummy  
   variables.

   Having said that, if my data violates the linearity assumption, does  
   the use of a factors for the variable in question helps overcome the  
   lack of linearity?

  No. If done by dividing into samall numbers of categories after  
  looking at the data, it merely creates other (and probably more  
  severe) problems. If you are in the unusal (although desirable)  
  position of having a large number of events across the range of the  
  covariates in your data, you may be able to cut your variable into  
  quintiles or deciles and analyze the resulting factor, but the  
  preferred approach would be to fit a regression spline of sufficient  
  complexity.

   Thanks in advance.

  --

  David Winsemius, MD
  Heritage Laboratories
  West Hartford, CT

  __
  r-h...@r-project.org mailing 
  listhttps://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.

 __
 r-h...@r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Dummy variables or factors?

2009-10-20 Thread andrew

Sorry for this third posting - the second method is the same as the
first after all: the coefficients of the first linear model *is* a
linear transformation of the second.  Just got confused with the
pasting, tis all.


On Oct 21, 2:51 pm, andrew andrewjohnro...@gmail.com wrote:
 Oh dear, that doesn't look right at all.  I shall have a think about
 what I did wrong and maybe follow my own advice and consult the doco
 myself!

 On Oct 21, 2:45 pm, andrew andrewjohnro...@gmail.com wrote:



  The following is *significantly* easier to do than try and add in
  dummy variables, although the dummy variable approach is going to give
  you exactly the same answer as the factor method, but possibly with a
  different baseline.

  Basically, you might want to search the lm help and possibly consult a
  stats book on information about how the design matrix is constructed
  in both cases.

   xF - factor(1:10)
   N - 1000
   xFs - sample(x=xF,N,replace = T)
   yFs - rnorm(N, mean = as.numeric(xFs))
   lm(yFs ~ xFs)

  Call:
  lm(formula = yFs ~ xFs)

  Coefficients:
  (Intercept)         xFs2         xFs3         xFs4
  xFs5         xFs6         xFs7         xFs8
       0.7845       1.1620       2.1474       3.1391       4.2183
  5.2621       6.0814       7.4170
         xFs9        xFs10
       8.2193       9.2987

   lm(yFs ~ diag(10)[,1:9][xFs,])

  Call:
  lm(formula = yFs ~ diag(10)[, 1:9][xFs, ])

  Coefficients:
              (Intercept)  diag(10)[, 1:9][xFs, ]1  diag(10)[, 1:9]
  [xFs, ]2  diag(10)[, 1:9][xFs, ]3
                   10.083                   -9.299
  -8.137                   -7.151
  diag(10)[, 1:9][xFs, ]4  diag(10)[, 1:9][xFs, ]5  diag(10)[, 1:9]
  [xFs, ]6  diag(10)[, 1:9][xFs, ]7
                   -6.160                   -5.080
  -4.037                   -3.217
  diag(10)[, 1:9][xFs, ]8  diag(10)[, 1:9][xFs, ]9
                   -1.882                   -1.079

  On Oct 21, 9:44 am, David Winsemius dwinsem...@comcast.net wrote:

   On Oct 20, 2009, at 4:00 PM, Luciano La Sala wrote:

Dear R-people,

I am analyzing epidemiological data using GLMM using the lmer  
package. I usually explore the assumption of linearity of continuous  
variables in the logit of the outcome by creating 4 categories of  
the variable, performing a bivariate logistic regression, and then  
plotting the coefficients of each category against their mid points.  
That gives me a pretty good idea about the linearity assumption and  
possible departures from it.

I know of people who create 0,1 dummy variables in order to relax  
the linearity assumption. However, I've read that dummy variables  
are never needed (nor are desireble) in R! Instead, one should make  
use of factors variable. That is much easier to work with than dummy  
variables and the model itself will create the necessary dummy  
variables.

Having said that, if my data violates the linearity assumption, does  
the use of a factors for the variable in question helps overcome the  
lack of linearity?

   No. If done by dividing into samall numbers of categories after  
   looking at the data, it merely creates other (and probably more  
   severe) problems. If you are in the unusal (although desirable)  
   position of having a large number of events across the range of the  
   covariates in your data, you may be able to cut your variable into  
   quintiles or deciles and analyze the resulting factor, but the  
   preferred approach would be to fit a regression spline of sufficient  
   complexity.

Thanks in advance.

   --

   David Winsemius, MD
   Heritage Laboratories
   West Hartford, CT

   __
   r-h...@r-project.org mailing 
   listhttps://stat.ethz.ch/mailman/listinfo/r-help
   PLEASE do read the posting 
   guidehttp://www.R-project.org/posting-guide.html
   and provide commented, minimal, self-contained, reproducible code.

  __
  r-h...@r-project.org mailing 
  listhttps://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.

 __
 r-h...@r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] TukeyHSD no longer working with aov output?

2009-10-20 Thread Peter Ehlers


I think that Richard may be correct. In my experience it's always a
bad idea to use numerical labels for factors, especially when
importing data from Excel. But why not do a str() on your data to
see whether R thinks that time is a factor or not? And if not, why
not convert time to factor before passing it to aov()? I'm a bit
uneasy over your use of the phrase there are two factors of time;
that's not how most stats people would use the terminology and it
may indicate a misunderstanding. You probably mean that there are
two levels of the factor 'time'. Anyway, do str(your_data) and
see what it says.

 -Peter Ehlers

Clayton Coffman wrote:

There are two factors of time, but they are evenly replicated across all the
other factors/levels.  The experiment is perfectly balanced except for one
lost sample, which is deleted automatically in the aov.  I am very certain
the analysis is correct.  I think its merely a discrepancy between how aov
and TukeyHSD group the levels of the factor variable. Probably aov handles
any vector specified as a character vector, whereas TukeyHSD merely assumes
its a character vector.

It might be something to refine in the stats package so the functions, which
are designed somewhat to be used together, can work on the same kinds of
data.

On Tue, Oct 20, 2009 at 6:13 PM, Richard M. Heiberger r...@temple.eduwrote:


Now I am worried that you have a wrong analysis.
the aov function is perfectly happy using either factors or
numeric variables.  Are there really only two levels of time,
which is what one degree of freedom for time suggests?  Or are there
more than two level, but since aov() sees that as a numeric variable
it has only one df.

csv files can interfere with the interpretation of data values, but
not likely in this case.  They too will have numbers interpreted
as numeric.



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

95 matches

Mail list logo